Rolling Reboot -- FOG Client AD issue
-
Wed Jul 27, 2016 12:55 pm
Running Version 1.3.0-RC-4
SVN Revision: 5941I am having a very strange issue with my Hosts joining the domain. I am now using a sysprepped image and am updating AD. I have created a SetupComplete script that runs the FOG service re-activation prior to installing AV and prior to driver injection as my current image is a Windows 7 Universal image.
Currently the issue I’ve run into is that some machines get stuck in a reboot cycle and that particular machine’s active task never finishes in FOG. The machine physically finishes but the task never ends. It appears that the FOG client is the culprit for restarting the machine.
The host name on the machine changes (sometimes) but it never adds itself to AD. When you try and add it manually, it gives you the “Welcome to the Domain” message but it never finalizes the add process. In other words it reboots, than it goes back into reboot hell.
This happens on about 1 out of 5 machines in a multi task group.
A side issue that is somewhat related is that most hosts add successfully, but some will rename correctly but fail to join AD. About 1 in 5 do this as well.
TIA!
Cheers!
Joe Gill
-
OP had AntiVirus installing inside SetupComplete.cmd before the FOG Client was re-enabled and started. The specific AntiVirus installation caused a reboot, which left the SetupComplete.cmd un-finished, and therefore all these other problems happened.
Consider building a snapin for the AntiVirus, which will auto-queue for any computer that is told to image and has the snapin associated with it, and the snapin will deploy after imaging is completed.
-
I would need to see a fog.log from a problem host to be able to help.
-
Here is a snippet of the top of the file. The file is also attached below.
Also, I killed the process on the server and it joined the domain automatically just fine.
7/27/2016 11:45 AM Main Overriding exception handling 7/27/2016 11:45 AM Main Bootstrapping Zazzles 7/27/2016 11:45 AM Controller Initialize 7/27/2016 11:45 AM Zazzles Creating main thread 7/27/2016 11:45 AM Zazzles Service construction complete 7/27/2016 11:45 AM Controller Start 7/27/2016 11:45 AM Service Starting service 7/27/2016 11:45 AM Bus Became bus server 7/27/2016 11:45 AM Bus { "self": true, "channel": "Status", "data": "{\r\n \"action\": \"load\"\r\n}" } 7/27/2016 11:45 AM Bus Emmiting message on channel: Status 7/27/2016 11:45 AM Service Invoking early JIT compilation on needed binaries ------------------------------------------------------------------------------ --------------------------------Authentication-------------------------------- ------------------------------------------------------------------------------ 7/27/2016 11:45 AM Client-Info Version: 0.11.4 7/27/2016 11:45 AM Client-Info OS: Windows 7/27/2016 11:45 AM Middleware::Authentication Waiting for authentication timeout to pass 7/27/2016 11:45 AM Middleware::Communication Download: http://172.16.1.17/fog/management/other/ssl/srvpublic.crt 7/27/2016 11:46 AM Data::RSA FOG Server CA cert found 7/27/2016 11:46 AM Middleware::Authentication Cert OK 7/27/2016 11:46 AM Middleware::Authentication ERROR: Could not get security token 7/27/2016 11:46 AM Middleware::Authentication ERROR: Could not find file 'C:\Program Files (x86)\FOG\token.dat'. 7/27/2016 11:46 AM Middleware::Communication POST URL: http://172.16.1.17/fog/management/index.php?sub=requestClientInfo&authorize&newService 7/27/2016 11:46 AM Middleware::Response Success 7/27/2016 11:46 AM Middleware::Authentication Authenticated 7/27/2016 11:46 AM Bus Registering ParseBus in channel Power 7/27/2016 11:46 AM Middleware::Communication URL: http://172.16.1.17/fog/management/index.php?sub=requestClientInfo&mac=B8:AC:6F:36:AE:41||00:00:00:00:00:00:00:E0&newService&json 7/27/2016 11:46 AM Middleware::Response Success 7/27/2016 11:46 AM Middleware::Communication URL: http://172.16.1.17/fog/service/getversion.php?clientver&newService&json 7/27/2016 11:46 AM Middleware::Communication URL: http://172.16.1.17/fog/service/getversion.php?newService&json 7/27/2016 11:46 AM Service Creating user agent cache 7/27/2016 11:46 AM Middleware::Response Success 7/27/2016 11:46 AM Middleware::Response Success 7/27/2016 11:46 AM Middleware::Response Module is disabled globally on the FOG server 7/27/2016 11:46 AM Service Initializing modules ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 7/27/2016 11:46 AM Client-Info Client Version: 0.11.4 7/27/2016 11:46 AM Client-Info Client OS: Windows 7/27/2016 11:46 AM Client-Info Server Version: 1.3.0-RC-4 7/27/2016 11:46 AM Middleware::Response Success ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 7/27/2016 11:46 AM Client-Info Client Version: 0.11.4 7/27/2016 11:46 AM Client-Info Client OS: Windows 7/27/2016 11:46 AM Client-Info Server Version: 1.3.0-RC-4 7/27/2016 11:46 AM Middleware::Response Success 7/27/2016 11:46 AM TaskReboot Restarting computer for task 7/27/2016 11:46 AM Power Creating shutdown request 7/27/2016 11:46 AM Power Parameters: /r /c "TaskReboot" /t 0 7/27/2016 11:46 AM Bus { "self": true, "channel": "Power", "data": "{\r\n \"action\": \"shuttingdown\"\r\n}" } 7/27/2016 11:46 AM Bus Emmiting message on channel: Power ------------------------------------------------------------------------------ 7/27/2016 11:47 AM Main Overriding exception handling 7/27/2016 11:47 AM Main Bootstrapping Zazzles 7/27/2016 11:47 AM Controller Initialize 7/27/2016 11:47 AM Zazzles Creating main thread 7/27/2016 11:47 AM Zazzles Service construction complete 7/27/2016 11:47 AM Controller Start 7/27/2016 11:47 AM Service Starting service 7/27/2016 11:47 AM Bus Became bus server 7/27/2016 11:47 AM Bus { "self": true, "channel": "Status", "data": "{\r\n \"action\": \"load\"\r\n}" } 7/27/2016 11:47 AM Bus Emmiting message on channel: Status 7/27/2016 11:47 AM Service Invoking early JIT compilation on needed binaries ------------------------------------------------------------------------------ --------------------------------Authentication-------------------------------- ------------------------------------------------------------------------------ 7/27/2016 11:47 AM Client-Info Version: 0.11.4 7/27/2016 11:47 AM Client-Info OS: Windows 7/27/2016 11:47 AM Middleware::Authentication Waiting for authentication timeout to pass 7/27/2016 11:47 AM Middleware::Communication Download: http://172.16.1.17/fog/management/other/ssl/srvpublic.crt 7/27/2016 11:47 AM Data::RSA FOG Server CA cert found 7/27/2016 11:47 AM Middleware::Authentication Cert OK 7/27/2016 11:47 AM Middleware::Communication POST URL: http://172.16.1.17/fog/management/index.php?sub=requestClientInfo&authorize&newService 7/27/2016 11:47 AM Middleware::Response Success 7/27/2016 11:47 AM Middleware::Authentication Authenticated 7/27/2016 11:47 AM Bus Registering ParseBus in channel Power 7/27/2016 11:47 AM Middleware::Communication URL: http://172.16.1.17/fog/management/index.php?sub=requestClientInfo&mac=B8:AC:6F:36:AE:41||00:00:00:00:00:00:00:E0&newService&json 7/27/2016 11:47 AM Middleware::Response Success 7/27/2016 11:47 AM Middleware::Communication URL: http://172.16.1.17/fog/service/getversion.php?clientver&newService&json 7/27/2016 11:47 AM Middleware::Communication URL: http://172.16.1.17/fog/service/getversion.php?newService&json 7/27/2016 11:47 AM Service Creating user agent cache 7/27/2016 11:47 AM Middleware::Response Success 7/27/2016 11:47 AM Middleware::Response Success 7/27/2016 11:47 AM Middleware::Response Module is disabled globally on the FOG server 7/27/2016 11:47 AM Service Initializing modules ------------------------------------------------------------------------------ ---------------------------------ClientUpdater-------------------------------- ------------------------------------------------------------------------------ 7/27/2016 11:47 AM Client-Info Client Version: 0.11.4 7/27/2016 11:47 AM Client-Info Client OS: Windows 7/27/2016 11:47 AM Client-Info Server Version: 1.3.0-RC-4 7/27/2016 11:47 AM Middleware::Response Success ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ----------------------------------TaskReboot---------------------------------- ------------------------------------------------------------------------------ 7/27/2016 11:47 AM Client-Info Client Version: 0.11.4 7/27/2016 11:47 AM Client-Info Client OS: Windows 7/27/2016 11:47 AM Client-Info Server Version: 1.3.0-RC-4 7/27/2016 11:47 AM Middleware::Response Success 7/27/2016 11:47 AM TaskReboot Restarting computer for task 7/27/2016 11:47 AM Power Creating shutdown request 7/27/2016 11:47 AM Power Parameters: /r /c "TaskReboot" /t 0 7/27/2016 11:47 AM Bus { "self": true, "channel": "Power", "data": "{\r\n \"action\": \"shuttingdown\"\r\n}" } 7/27/2016 11:47 AM Bus Emmiting message on channel: Power ------------------------------------------------------------------------------ 7/27/2016 11:48 AM Main Overriding exception handling 7/27/2016 11:48 AM Main Bootstrapping Zazzles 7/27/2016 11:48 AM Controller Initialize 7/27/2016 11:48 AM Zazzles Creating main thread 7/27/2016 11:48 AM Zazzles Service construction complete 7/27/2016 11:48 AM Controller Start 7/27/2016 11:48 AM Service Starting service 7/27/2016 11:48 AM Bus Became bus server 7/27/2016 11:48 AM Bus { "self": true, "channel": "Status", "data": "{\r\n \"action\": \"load\"\r\n}" } 7/27/2016 11:48 AM Bus Emmiting message on channel: Status 7/27/2016 11:48 AM Service Invoking early JIT compilation on needed binaries ------------------------------------------------------------------------------ --------------------------------Authentication-------------------------------- ------------------------------------------------------------------------------ 7/27/2016 11:48 AM Client-Info Version: 0.11.4 7/27/2016 11:48 AM Client-Info OS: Windows 7/27/2016 11:48 AM Middleware::Authentication Waiting for authentication timeout to pass 7/27/2016 11:48 AM Middleware::Communication Download: http://172.16.1.17/fog/management/other/ssl/srvpublic.crt 7/27/2016 11:48 AM Data::RSA FOG Server CA cert found 7/27/2016 11:48 AM Middleware::Authentication Cert OK 7/27/2016 11:48 AM Middleware::Communication POST URL: http://172.16.1.17/fog/management/index.php?sub=requestClientInfo&authorize&newService 7/27/2016 11:48 AM Middleware::Response Success 7/27/2016 11:48 AM Middleware::Authentication Authenticated 7/27/2016 11:48 AM Bus Registering ParseBus in channel Power 7/27/2016 11:48 AM Middleware::Communication URL: http://172.16.1.17/fog/management/index.php?sub=requestClientInfo&mac=B8:AC:6F:36:AE:41||00:00:00:00:00:00:00:E0&newService&json 7/27/2016 11:48 AM Middleware::Response Success 7/27/2016 11:48 AM Middleware::Communication URL: http://172.16.1.17/fog/service/getversion.php?clientver&newService&json 7/27/2016 11:48 AM Middleware::Communication URL: http://172.16.1.17/fog/service/getversion.php?newService&json 7/27/2016 11:48 AM Service Creating user agent cache 7/27/2016 11:48 AM Middleware::Response Success 7/27/2016 11:48 AM Middleware::Response Success 7/27/2016 11:48 AM Middleware::Response Module is disabled globally on the FOG server 7/27/2016 11:48 AM Service Initializing modules ------------------------------------------------------------------------------
-
This isn’t an issue with AD joining. Rather the server isn’t clearing the task properly, thus the client thinks it needs to restart to let thr task run.
-
Any ideas?
-
I don’t understand. If you go to the Task Management Page, (not multicast tasks) do the tasks still exist?
-
Yes the hosts that go into reboot hell still exist in Task Management Page. All of the other hosts in that session seem to finish just fine.
-
@Joe-Gill So it’s not ALL hosts in the multicast task?
-
@Tom-Elliott Exactly. Only 1 in 5 it seems. But it’s identical hardware and images all around.
-
@Joe-Gill What I mean, 1 in 5 hosts fail to complete their taskings? Or 1 in 5 get stuck in reboot loop?
If the tasking is still present under the “active tasks” page (not active multicast) the reboot would cause those systems to boot into the Imaging side of things.
-
@Tom-Elliott 1 in 5 get stuck in reboot loop AND they also are the ones that fail to complete taskings. I kill the taskings and reboot loop ceases and operations complete as normal.
-
@Joe-Gill apache error logs from the time at the end of imaging to when the reboots start happening would be very informative.
-
I’ve confirmed the non closing multicast task. I still don’t understand what you mean by “I kill the taskings” Which tasking are you referring to? The host’s individual tasking, or the multicast tasking?
THe non-closing multicast will be available for RC-5.
-
@Tom-Elliott and by that I mean will be fixed.
-
@Tom-Elliott
Ok. So the fix I have is to cancel the active task for the problem host in task manager. Than, I go into the MYSQL database on the server and clear out the task there. I then stop and restart the multi tasking service. Then things seem to be fine for the next session.@Wayne-Workman Here is the Apache2 Error.log file for today. I did notice their was a fatal error the day before. It appears that this same entry happened many times that day.
[Tue Jul 26 20:49:35.416710 2016] [:error] [pid 4097] [client 172.16.19.149:65194] PHP Fatal error: Uncaught exception 'Exception' with message 'Window size must be g$ [Tue Jul 26 20:49:36.423986 2016] [:error] [pid 4100] [client 172.16.19.149:65197] PHP Fatal error: Uncaught exception 'Exception' with message 'Window size must be g$ [Wed Jul 27 07:26:32.280481 2016] [:error] [pid 4201] [client 172.16.16.150:59389] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 07:26:48.354429 2016] [:error] [pid 4201] [client 172.16.16.150:59404] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 07:27:01.053034 2016] [:error] [pid 4201] [client 172.16.16.150:59417] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 07:36:57.647527 2016] [:error] [pid 7834] [client 172.16.16.150:60058] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 07:37:15.662213 2016] [:error] [pid 4206] [client 172.16.16.150:60082] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 07:37:28.563481 2016] [:error] [pid 4097] [client 172.16.16.150:60101] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 08:30:45.361208 2016] [:error] [pid 9327] [client 172.16.16.150:62604] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 09:10:51.246977 2016] [:error] [pid 4201] [client 172.16.16.150:64132] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 09:11:10.725368 2016] [:error] [pid 6575] [client 172.16.16.150:64148] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 09:11:19.335300 2016] [:error] [pid 7905] [client 172.16.16.150:64159] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 09:12:35.911531 2016] [:error] [pid 4101] [client 172.16.16.150:64236] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 11:16:54.796082 2016] [:error] [pid 4206] [client 172.16.16.150:52767] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 11:17:07.981127 2016] [:error] [pid 4201] [client 172.16.16.150:52780] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 12:58:17.660609 2016] [:error] [pid 11833] [client 172.16.18.89:49845] PHP Strict Standards: Only variables should be passed by reference in /var/www/html$ [Wed Jul 27 15:01:59.879251 2016] [:error] [pid 13392] [client 172.16.16.150:62040] PHP Strict Standards: Only variables should be passed by reference in /var/www/htm$ [Wed Jul 27 15:02:17.611327 2016] [:error] [pid 13023] [client 172.16.16.150:62056] PHP Strict Standards: Only variables should be passed by reference in /var/www/htm$ [Wed Jul 27 15:53:39.483338 2016] [:error] [pid 4201] [client 172.16.16.150:63406] PHP Warning: pack(): Type H: illegal hex digit K in /var/www/html/fog/lib/fog/fogba$
-
Well, I’m back at it again. I noticed that this issue also occurs on unicasts. I deployed several machines yesterday; pushing each machine as I added them to FOG. I had about half of those machines where the task never completed in the “Active Task” screen and went into reboot hell.
I had a snapin addon on these guys. I’m going to try re-pushing them without said plugin and see what happens.
Cheers,
Joe
-
@Joe-Gill please try updating too.
-
Will do and I’ll update. It happens even without snapins running. But I’ll update and see if it continues.
Thanks!
-
Well after an update, I still had 1 out of 4 clients do the rolling reboot. It’s interesting because the one that was stuck, was a different model. I"m wondering if their isn’t a driver issue with the image as it’s a universal image. The only funny thing is that, again, after you delete the task from the active task list, things proceed as they should. AD add works, drivers seem fine, everything is happy. So I’m still a little stumped. LOL!
If you’d like any logs just say so and I’ll post whatever you need.
I’ll be away from the office tomorrow but I’ll probably still check in some time. I can always remote in and send you something.
Thanks!
Cheers,
Joe
-
Well I am currently running version RC-8. I am still having the rolling reboot issue with one of my images. I am deploying a new image this morning on a few remaining machines. I’m wondering if it’s not my image instead of FOG. In the image I’m deploying this morning I have changed the order in which the SetupComplete finishes. I was having SetupComplete install AV first and then do some other tasks… What I failed to realize is that this causes the machine to reboot before the script finishes and it does not let FOG finish. I had a similar issue with snapins not finishing because of a reboot. If the new image seems to resolve this, I’ll be sure to let you all know.
Thanks for all of the support!
Cheers,
Joe Gill