Fog client weirdness
-
Weird thing here also is that even after I cancelled a load of tasks, they still show up under “Active Multicast Tasks”. I can’t even cancel them from this page - cancel doesn’t actually do anything. But many of those tasks were deleted already by me from the all tasks page…
[url=“/_imported_xf_attachments/1/1480_er.png?:”]er.png[/url]
-
For the second problem - I know there are times when fog does not really clean up the tasks in the actual database. For that, I used phpmyadmin to go into the actual database structure of fog - and clear (literally delete entries) in the “multicastSessions” and “multicastSessionsAssoc” tables … sometimes a session gets stuck in there if (for example) you had a group of 30 computers queued up, but only 29 actually came online, and it is still waiting for that last one - but you delete the task … so its left ‘unfinished’ in the database … it causes some confusion with future tasks … so i just clear that manually from time to time.
The first problem you notice with “er! host not found” can be caused by:
- Your mac address not associated with the host - sometimes it happens when a computer has a wifi mac you do not know about, is using a VPN / masking for some stupid reason, or has had its wifi card replaced. I’m sure that is not your problem though - you seem like you’ve taken care of it and registered ALL possible MAC addresses for your hosts (all laptops have 2 MACs at least, one for WIFI card, one for physical LAN port).
- A more interesting problem I ran into is multiple SSIDs -> yes, its weird. So basically, what happens is your laptop boots up (btw, this ONLY affects laptops) and FOG starts up as a service like normal. At my school, we recently switched SSID passwords - so I ended up adding 2 wifi network profiles - one for the old password, one with the new. The idea was “let the laptop connect to whatever wifi accepts the password … so it may connect directly to the new network, or can try to connect to the old network, then error out, then switch to the new network” … assigning multiple profiles to the wifi card works if you are going to switch the wifi password mid-school year. The problem for FOG - is when the client starts up - it looks for an active MAC address to send to the server. If your computer is unlucky - it will try to connect to the old network. It will fail - but during that time - FOG client is unable to find a proper mac address, and errors out - in the sense that it looks for a “null” mac address on the fog server. The solution I found is to put FOG service to start “Delayed Start” … to give the laptop plenty of time to figure out which fuckin’ wifi network works and connect to it properly before looking for a mac address.
TL;DR: FOG needs a ‘proper’ mac address from windows - if you wifi card is too slow to connect to wifi, FOG client will not have a mac address to look for on the FOG server
-
Not using wireless at all. And as you say, the hosts are all in the database (which is apparent from the fog log too). It is really confusing and kind of frustrating too to be honest as I have no idea really what to even look for to resolve this.
I’ll try the wiping of those sessions though, cheers!
-
Mental.
[url=“/_imported_xf_attachments/1/1491_mental.png?:”]mental.png[/url]
-
The weird thing is, is in this room of 21 PCs, 2 managed to apparently take a snapin after wiping the snapins, tasks and multicast sessions. No idea why though as their fog.log files are spammed with GUI notification/dispatch failed entries (stops after you login apparently)
But, really bizarrely, manually restarting the FOG service on one of the PCs seemed to fix this issue. But after a restart, it persists. This just makes no sense at all!
-
Two log files here - the first is when I had restarted the FOG service (left side) and the second is after a restart (right side). It just makes no sense to me as to why errors are being returned like this. The only difference between hardware types in our rooms is the presence of an SSD in these labs that has never caused an issue before!
[url=“/_imported_xf_attachments/1/1492_worksdoesnt1.png?:”]worksdoesnt1.png[/url][url=“/_imported_xf_attachments/1/1493_worksdoesnt2.png?:”]worksdoesnt2.png[/url]
-
After trying some other PCs, I have a feeling that this might be to do with the speed at which the PCs using SSDs boot up. If anyone wants to correct me on that for sure, please do - but what I’ll do is reimage one HDD and one SSD machine to see if they are behaving the same way and then secondly turn the FOG service into a delayed start service first and then try and reimage some machines to see if that changes anything.
-
what version of fog are you running?
do you have vmware/virtualbox etc installed on these lab computers? -
FOG 1.2
All our PCs have VMWare Workstation installed (hence the two additional MAC addresses). -
and since it was installed on the image you deployed, the randomized mac address on the virtual network adapter is identical. multiple hosts sharing the same mac address results in the server saying it’s invalid and not delivering tasks. this has been corrected for in the SVN. if you’re willing to update to the latest dev version, this problem will almost certainly go away.
(make a backup and be ready to revert back in case you have problems, the dev version is mostly stable but can have issues, it IS pre-release software) -
I’ll try and update to it - is it going to eventually be a part of the next FOG update then?
And what do you think would make this issue present on only some machines (which happen to all be machines with SSDs)? I never accepted any of the pending MAC addresses (as this did give an issue in 0.32 when I tried to accept one, once) and unless the machine boots from an SSD, it works fine.
Thanks!
-
yes, this will be part of fog 1.3.0
i have never noticed any difference in behavior between computers with HDDs and SSDs so long as they image properly. (i have heard of some SSDs not being recognized and being unable to be imaged, but that has been a very rare issue)
i also noticed that your hardware network card is coming up as “local area connection 2”
is there more then one physical network adapter in these machines? -
On most of our PCs there are, yes - DQ77MK motherboards have an Intel AMT management port and we avoid using this. I’ll reply back here after I try out the delayed start works as I thought it might - but I don’t think having an SSD on the system should make any difference at all. But… it does seem to. Somehow.
-
it looks to me like the vmware mac isn’t the issue then, since you say you’re aware of the related issues and you’ve never approved a pending mac. the issue seems to be that the wrong network adapter is registered with fog.
-
Yeah we know about it and have worked around it with no hassle - actually that has been a way to determine if someone has plugged a machine into the wrong port and we have actually started disabling the second network controller in bios often. I am pretty certain that the adapter isn’t the issue (if it was, it wouldn’t work in either of the cases)
But the key point is that with the same image, on essentially the same hardware, the fog service acts differently - seemingly down to whether or not it uses an SSD. On the SSD machine, the fog service gives the errors - but if you restart the service (stop and start), it works fine.
I just tried to set it to “Automatic (Delayed Start)” and - bizarrely - it is now working fine for those machines that I have done. Could it possibly be that some service is being relied on by FOG that hasn’t started at the point which it is needed? Its really all I can think of - and that could be down to my individual build ultimately. But it isn’t something we have ever seen as an issue before.
-
what version of windows is installed on these machines? if it is windows 8, then it could actually be somewhat related to the SSD. windows 8, i am told, says network adapters are disabled unless it can detect a network attached. that means we can’t get the mac address for them. the current fog client only polls the hardware once on startup, and it requests all active network adapters. if the system is starting up and the fog service starts before the network adapter has initialized, that could possibly be causing the issue.
-
Windows 7. But this sounds kind of logical…
-
windows 7 doesn’t have the same issue. what power saving features do you have enabled on these computers? and does wol work? if the onboard network card is getting set to a low power mode, that could be related
-
Un-ethically, we have everything set to not power down or use power saving modes currently. WOL might work but for the last year or so its been put on the backburner because getting it to work with systems we have limited control over was proving a hassle.
-
have you explicitly disabled the various power states? from the factory, we’ve had some computers set to disable onboard network cards to the extent that they don’t even provide link lights in the off state.
we too have our computers set to not power down or use power saving modes. they just aren’t as reliable that way, in my experience. (although, we have a couple university owned wind turbines, that counteracts the un-ethical-ness, right?)