FOG service on 0.10.6 not restarting after reboot
-
@Jbob @tmerrick
I found something in the event viewer
Event 7009 A timeout was reached (30000 milliseconds) while waiting for the FOGService service to connect.
followed by
Event 7000 The FOGService service failed to start due to the following error: The service did not respond to the start or control request in a timely fashion.
In the services console their are options for automatic recovery that we can try. I hadn’t noticed these before, perhaps they’re new? Either way I’m giving a 3 time minute delay a try.
-
@Arrowhead-IT The recovery options didn’t seem to do anything. Setting to delayed start seems to do the trick.
-
@Arrowhead-IT @jbob Scratch that, delayed start didn’t do the trick and I have discovered this issue happens on all windows 10 x64 1511 computers. Why oh why did I not just use LTSB to begin with?
-
@Arrowhead-IT What’s interesting is that on delayed start the Fog service may not start, but the fog user service and the fog tray pop up no problem.
With delayed start there is nothing in the event viewer concerning the fogservice, it simply never started. -
Does the issue present itself when you re-install the client on the machine (be sure to reset encryption data when doing so) with this update?
-
@Jbob You’d think I would have tried that since I suggested it to this guy, but I have not tried that because I am dumb.
-
I’ve actually been noticing that the fog client isn’t starting for me in Win10. I didn’t think anything of it, because manually starting the service works… I’ll have to look into that.
-
@jbob A re-install of the service appears to work. Should the service not be installed in the image for windows 10 and just be added as part of a firstlogon/sysprep type script?
-
While that is a band aid, it should work until I can narrow down the root cause. Just use the msi’s switches in silent mode in your setup complete file (and then start it or restart the machine).
-
@tmerrick do you still have some computers experiencing this issue? I am attempting to track down exactly what’s failing but can’t replicate the issue.
-
I do - I had a lab of 28 computers imaged on a fresh image of win10 edu/1511. Setupcomplete re-enables FOGService after it was disabled for sysprep. Image has nothing else added except office 2016.
Of the 28 computers, 2 of them failed to start fogservice during the setupcomplete, and 8 more of them failed to start fogservice after a snapin called for a reboot after install. All 28 computers are identical, bought at the same time. i5@2.3g,4gb ram.
As with the other examples here, the fog.log has nothing in it except stating it’s gone to sleep - no exit info for shutting down with the system, and no startup. Zazzles.log doesn’t appear to exist…
I’ve changes my setupcomplete.cmd to have this in it…
sc config FOGService start= delayed-auto waitfor /t 5 null sc failure FOGService reset= 86400 actions= restart/60000/restart/60000/restart/60000 waitfor /t 5 null sc failureflag FOGService 1 waitfor /t 5 null sc start FOGService
to see if this has any effect. Note that the failure flag of 1 tells the system to restart the service if it exits unexpectedly (non-0 exit code) using the failure sequence.
This workaround won’t be tested until next deploy (next weekend maybe?)
Also - I had periodically experienced this in a VM guest (VMware ESXi 6.0 host, 4virt cpu, 8gvem). Server was practically idle at the time, and is ridiculously fast (deploy, oobe, reboot, desktop takes about 6 minutes). This same guest after another reboot started the service fine, and had been used to test images repeatedly. I’d say it was only about a 5% rate of failure to start the service in the VM.
-
v0.11 of the client should prevent this. The FOG Service will have a dependency of
dnscache
which is Window’s DNS Client. The DNS Client is one of the last network services to start and all version of Windows within reason use it. -
v0.11.0 is released and fixes this issue. Can you test when you have a chance?
-
@Joe-Schmitt I am having a similar issue though I have already updated to client 11.1 after I read this post a couple days ago. Ubuntu 14.0.4, FOG trunk 8095. In a lab of 25 Dell 790’s I had 10 or so not join the domain. I found the FOG service not running on each of them. I had no problems with my lab of Dell 7020’s or Dell 7040’s, both of which have better processors and more memory. The fog.log has no errors, immediately after imaging it commands a shutdown to rename/rejoin and then the logging stops. Unfortunately I am off-site until Monday so I can’t reach a machine to test much but I am trying to get one of my slower VM’s tested to see if it happens there.
-
Please update your server again. v0.11.2 was released and patches another power issue.
-
@Joe-Schmitt Sweet. Will do!
-
@Joe-Schmitt - Trunk 8257 - Joe, I had client 11.2 deploy to 25 Dell 790 machines today as a test and only 2 of them joined the domain. They all renamed but the service doesn’t appear to restart after the first reboot. No logs on PC after first shutdown command. I am attaching a fog.log from one of the machines and the apache error log during the time of the task. Hopefully this will tell you something:
APACHE ERROR LOG IMMED AFTER IMAGING COMPLETED
1_1467056255448_apache error 11.2 client non-restart.txtAPACHE ERROR LOG NEXT 60 MINUTES
0_1467072182110_apache err 11.2 next 60 minutes.txtMACHINE THAT DIDN’T JOIN DOMAIN - 192.168.132.165
0_1467056255444_fog.log
0_1467060379059_zazzles.log
0_1467060828192_.fog_user.logMACHINE THAT DID JOIN DOMAIN - 192.168.132.154
0_1467071798888_fog.log
0_1467071815418_zazzles.log
0_1467071823241_.fog_user.logOddly, I don’t have this issue on all machines I’ve been working on, mainly Dell 7020 and later. Maybe it’s a problem with the older hardware keeping up??
Edits: I found the zazzles log and user log thought that may also help so I just uploaded them. Seems like a clue here but fog.log hasn’t changed any since it stopped at 1:36.
-
@Joe-Schmitt The post below this one, that’s the issue I was referring to here: https://forums.fogproject.org/topic/7912/potential-client-issue
I wasn’t able to find a log though, yet. Good thing @gwhitfield posted one.
-
@Joe-Schmitt Joe, I edited my post (a couple times now) of logs to include a machine that DID join the domain, but then the service didn’t restart after the second reboot. These have both been manually rebooted a couple times since and the service refuses to start.
There IS one machine which seems to have experienced the same failures (non-joining domain and service start failure), but after a single manual reboot, the service did restart and the machine joined the domain and the service has continued to successfully restart during multiple manual reboots since. Unfortunately I will have to wait until tomorrow to get those logs since this machine is sleeping soundly and won’t wake-on-LAN.
EDIT: reading back early in the thread I find it may help to note these machines are WIN 10x64 LTSB (10240)
Thanks for your help and Wayne’s input as well!!
-
@gwhitfield Those logs indicate that the client is performing perfectly. Here is our current theory on what is happening:
You have the issue on Dell machines. Dell machines are notorious for having slight hardware configuration differences even on the same model (wifi, graphics, …). When deployed, Windows notices that driver XX doesn’t quite match the hardware. It applies the correct driver and schedules a restart. This scheduling of the restart locks Window’s power operations.
The client then starts up, joins the domain, and asks Windows to restart. Windows refuses since there is already a lock on the power operations. The client doesn’t know this, and assumes the power operation is under way. It then gracefully exits.
This theory is consistent with the logs you provided. Here is a quick test for you:
Select one of your problem machines. Make sure all drivers are installed correctly on this one machine. Re-upload the image using this machine and re-deploy to all of your problem machines. Are there less occurrences of your issue? If so then this is your problem.