FOG Clients are Unable to Connect to Server - sort of
-
Hi and thanks for any help anyone can provide.
I’m in the middle of a FOG server 1.5.10 migration to get off of CEntOS 7 and onto AlmaLinux 9.4 running FOG 1.5.10.
I tested this process in our lab and have overcome several issues. I followed this:
https://wiki.fogproject.org/wiki/index.php?title=Migrate_FOGThe difference is that we have several locations that also have to be migrated. I uninstalled and reinstalled the location Plugin which seemed to be required. I was able to get a 2 site deployment working correctly w images and snapins, pxe boot from the right location FOG Storage Node, etc.
The worst think I have to do was delete a Host and register with the FOG server to get it to work post imaging, aside from Reseting Host Encryption.
SO. I followed the same process (or so I think) in a production instance and have a different result. Since the lab is typically maintained to test deployments, the one thing I lacked when testing was machines that were in FOG and working pre-migration to test post migration. However, resetting a host’s encryption seemed to get all but one host to connect to FOG. Testing here was justs seeing that the fog.,log file showed proper authentication. Most testing was full deployment - image and then snapins - and these tests worked.
In the production instance, PXE booting and imagining work as expected at the 2 locations that have been migrated, but in this case, no PC is able to connect to the new FOG server - existing or reimaged.
Note: we have 8 locations, but I only migrated the FOG server and one location so far. Each location has only one Storage node - the FOG server DefaultMember at the corp HQ and a storage node at the remote site. Each storage node is in its own storage group and its own location.
We build site / domain independent images. To do this we have a DNS Alias “FOGSERVER” that points to the main FOG server A record. We use this alias in a \windows\setup\scripts\setupcomplete.cmd when running the smartinstaller.exe v0.12. at first boot.
In cases of existing hosts, I’ve have reset encryption over and again.
Oddly, in both fresh deployments of images and existing machines, all upgrade to the new FOG Service v0.13 yet the UI shows none of the host “touch” time changes.
Also in both cases (reimages and existing hosts), fog.logs show the standard No Token file / Authentication Error / Invalid security token error over and over. In 10 mins or more, the FOG client is upgraded to v0.13, but resetting the Host’s Encryption doesn’t result in the client ever getting connected. The Host shows in the UI that the OS is Windows, but all tests show “No Data” (implying the Host isn’t actually connected - which the fog.log files all show they aren’t). I’m not certain that the FOG client updates happen only after resetting encryption, but perhaps that is so.
There is a pattern in the clients - they do get upgraded, they all try to authenticate over and again, until you Reset Host Encryption. Then, after 5 attempts to connect, the FOGService on every test PC seem to stop but the service is still “running”. Restarting the FOG Service produces the same result after 5 more authentication attempts.
Here’s a typical end of fog.log:
9/16/2024 1:59:11 PM Client-Info Version: 0.13.0
9/16/2024 1:59:11 PM Client-Info OS: Windows
9/16/2024 1:59:11 PM Middleware::Authentication Waiting for authentication timeout to pass
9/16/2024 1:59:28 PM Controller Stop
9/16/2024 1:59:28 PM Service Stop requested
9/16/2024 1:59:28 PM Bus Emmiting message on channel: Status
9/16/2024 1:59:28 PM Middleware::Authentication ERROR: Could not authenticate
9/16/2024 1:59:28 PM Middleware::Authentication ERROR: Thread was being aborted.We also see some message about not starting Looper.
--------------------------------Authentication--------------------------------
9/16/2024 3:35:01 PM Client-Info Version: 0.13.0
9/16/2024 3:35:01 PM Client-Info OS: Windows
9/16/2024 3:35:01 PM Middleware::Authentication Waiting for authentication timeout to pass
9/16/2024 3:37:01 PM Middleware::Communication Download: http://roa1fogl02.rq.priv/fog/management/other/ssl/srvpublic.crt
9/16/2024 3:37:01 PM Middleware::Authentication Cert OK
9/16/2024 3:37:01 PM Middleware::Authentication No token found at C:\Program Files (x86)\FOG\token.dat, this is expected if the client has not authenticated before
9/16/2024 3:37:01 PM Middleware::Authentication ERROR: Could not get security token
9/16/2024 3:37:01 PM Middleware::Authentication ERROR: Could not find file ‘C:\Program Files (x86)\FOG\token.dat’.
9/16/2024 3:37:01 PM Middleware::Communication POST URL: http://roa1fogl02.rq.priv/fog/management/index.php?sub=requestClientInfo&authorize&newService
9/16/2024 3:37:01 PM Middleware::Response Invalid security token9/16/2024 3:37:01 PM Client-Info ERROR: Failed to authenticate, will not run Module Looper.
At this point. all clients stop logging and seem to hang - the timestamp of the fog.log file and the last entry in the log match.
In an attempt to guess what’s going on with server-to-client encryption, since the token errors appear just after the client checks the certificate, we thought maybe the FOG Client connection utilized the ssl cert in some way. We noticed that the cert only had the full server FQDN and its IP in the cert (Subject and AltSubject) (not the FOGSERVER alias we were specifying in the smartinstaller command line install run during the first reboot after imaging, so we altered the image to use the IP address and the latest version at deployment. We see no difference.
I also tried to manually install the FOG Client using both the FQDN and IP address in smartinstaller run manually - and again - still the same result.
I’m thinking is that the issue is on the server side and beyond our understanding.
Let me know if any additional info is needed and thanks for any help you can provide.
Jim Graczyk