What Prevents a FOG Client from Connecting to A FOG Server?
-
Please delete the version of this post I started under General
Hi and thanks for any help anyone can provide.
I’m in the middle of a FOG server 1.5.10 migration to get off of CEntOS 7 and onto AlmaLinux 9.4 running FOG 1.5.10.
I tested this process in our lab and have overcome several issues. I followed this:
https://wiki.fogproject.org/wiki/index.php?title=Migrate_FOGThe difference is that we have several locations that also have to be migrated. I uninstalled and reinstalled the location Plugin which seemed to be required. I was able to get a 2 site deployment working correctly w images and snapins, pxe boot from the right location FOG Storage Node, etc.
The worst think I have to do was delete a Host and register with the FOG server to get it to work post imaging, aside from Reseting Host Encryption.
SO. I followed the same process (or so I think) in a production instance and have a different result. Since the lab is typically maintained to test deployments, the one thing I lacked when testing was machines that were in FOG and working pre-migration to test post migration. However, resetting a host’s encryption seemed to get all but one host to connect to FOG. Testing here was justs seeing that the fog.,log file showed proper authentication. Most testing was full deployment - image and then snapins - and these tests worked.
In the production instance, PXE booting and imagining work as expected at the 2 locations that have been migrated, but in this case, no PC is able to connect to the new FOG server - existing or reimaged.
Note: we have 8 locations, but I only migrated the FOG server and one location so far. Each location has only one Storage node - the FOG server DefaultMember at the corp HQ and a storage node at the remote site. Each storage node is in its own storage group and its own location.
We build site / domain independent images. To do this we have a DNS Alias “FOGSERVER” that points to the main FOG server A record. We use this alias in a \windows\setup\scripts\setupcomplete.cmd when running the smartinstaller.exe v0.12. at first boot.
In cases of existing hosts, I’ve have reset encryption over and again.
Oddly, in both fresh deployments of images and existing machines, all upgrade to the new FOG Service v0.13 yet the UI shows none of the host “touch” time changes.
Also in both cases (reimages and existing hosts), fog.logs show the standard No Token file / Authentication Error / Invalid security token error over and over. In 10 mins or more, the FOG client is upgraded to v0.13, but resetting the Host’s Encryption doesn’t result in the client ever getting connected. The Host shows in the UI that the OS is Windows, but all tests show “No Data” (implying the Host isn’t actually connected - which the fog.log files all show they aren’t). I’m not certain that the FOG client updates happen only after resetting encryption, but perhaps that is so.
There is a pattern in the clients - they do get upgraded, they all try to authenticate over and again, until you Reset Host Encryption. Then, after 5 attempts to connect, the FOGService on every test PC seem to stop but the service is still “running”. Restarting the FOG Service produces the same result after 5 more authentication attempts.
Here’s a typical end of fog.log:
9/16/2024 1:59:11 PM Client-Info Version: 0.13.0
9/16/2024 1:59:11 PM Client-Info OS: Windows
9/16/2024 1:59:11 PM Middleware::Authentication Waiting for authentication timeout to pass
9/16/2024 1:59:28 PM Controller Stop
9/16/2024 1:59:28 PM Service Stop requested
9/16/2024 1:59:28 PM Bus Emmiting message on channel: Status
9/16/2024 1:59:28 PM Middleware::Authentication ERROR: Could not authenticate
9/16/2024 1:59:28 PM Middleware::Authentication ERROR: Thread was being aborted.We also see some message about not starting Looper.
--------------------------------Authentication--------------------------------
9/16/2024 3:35:01 PM Client-Info Version: 0.13.0
9/16/2024 3:35:01 PM Client-Info OS: Windows
9/16/2024 3:35:01 PM Middleware::Authentication Waiting for authentication timeout to pass
9/16/2024 3:37:01 PM Middleware::Communication Download: http://roa1fogl02.rq.priv/fog/management/other/ssl/srvpublic.crt
9/16/2024 3:37:01 PM Middleware::Authentication Cert OK
9/16/2024 3:37:01 PM Middleware::Authentication No token found at C:\Program Files (x86)\FOG\token.dat, this is expected if the client has not authenticated before
9/16/2024 3:37:01 PM Middleware::Authentication ERROR: Could not get security token
9/16/2024 3:37:01 PM Middleware::Authentication ERROR: Could not find file ‘C:\Program Files (x86)\FOG\token.dat’.
9/16/2024 3:37:01 PM Middleware::Communication POST URL: http://roa1fogl02.rq.priv/fog/management/index.php?sub=requestClientInfo&authorize&newService
9/16/2024 3:37:01 PM Middleware::Response Invalid security token9/16/2024 3:37:01 PM Client-Info ERROR: Failed to authenticate, will not run Module Looper.
At this point. all clients stop logging and seem to hang - the timestamp of the fog.log file and the last entry in the log match.
In an attempt to guess what’s going on with server-to-client encryption, since the token errors appear just after the client checks the certificate, we thought maybe the FOG Client connection utilized the ssl cert in some way. We noticed that the cert only had the full server FQDN and its IP in the cert (Subject and AltSubject) (not the FOGSERVER alias we were specifying in the smartinstaller command line install run during the first reboot after imaging, so we altered the image to use the IP address and the latest version at deployment. We see no difference.
I also tried to manually install the FOG Client using both the FQDN and IP address in smartinstaller run manually - and again - still the same result.
I’m thinking is that the issue is on the server side and beyond our understanding.
Let me know if any additional info is needed and thanks for any help you can provide.
Jim Graczyk
-
@Jim-Graczyk Firstly, are you running 1.5.10 on the new server or 1.5.10.1615 the latest stable release version? Granted it would be best to have updated the old server first and then migrate so you’re migrating between the same versions.
If you maintained the same server name (or at least used a dns alias to point the old server name to the new server name) and you migrated the /opt/fog/snapins/ssl directory and all other fog stuff (like the database and the /opt/fog/.fogsettings file) before running the installer for the first time (you may be able to throw them in there afterwords in theory)
then you would have the same CA cert and private cert and subject name for the public cert. So if all that PKI stuff remains the same then all your hosts would already trust the fog server ca (it gets added to trusted root certs when installing the fog client) and the public cert should be updated on the client or may even remain the same. Sounds like you’re using a dns alias so as long as that is updated in the migrated locations it can be made to work.If you ran the install of the new fog, and then migrated stuff, then you might have some conflicting settings and you’ll have generated a new Fog CA with a new private key for the web server certificate that the fog client uses to ensure you’re communicating with the right fog server.
Granted, making a new CA and private key when migrating servers is a good idea and with a new CA it’s easiest to re-install the client to fix the issue so it gets the new CA and cert.
I believe you could do something like this in an admin powershell session to force the fog service to use a new CA.
#stop the service stop-service fogservice; # delete the certs from the fog service program files path remove-item "C:\Program Files (x86)\fog\ca.cert.der","C:\Program Files (x86)\fog\fog.ca.cer","C:\Program Files (x86)\fog\tmp\public.cer"; #remove the old Fog CA cert from the trusted root store Get-ChildItem Cert:\LocalMachine\Root\ | Where-Object Subject -match 'CN=FOG Server CA' | Remove-item -force -ea 0; #Download the new ca cert, replace "fog-server" with your fog server's name iwr "https://fog-server/fog/management/other/ca.cert.der" -OutFile "C:\Program Files (x86)\fog\ca.cert.der" #trust the new CA import-certificate -FilePath "C:\Program Files (x86)\fog\ca.cert.der" -CertStoreLocation Cert:\LocalMachine\Root\ #reset the host encryption in the gui (or if you use my FogApi powershell module you can use the Reset-HostEncryption command) #after resetting the host encryption start the service back up Start-Service FogService;
I just ran all that on a working client and it connected. I even stopped after removing the ca cert from the store and files to confirm that broke the client. Then imported it again and all was well. It’s possible that resetting the host encryption isn’t needed, but I imagine it’s needed as you have a new private key, etc.
Granted, reinstalling the service should do all of the above, so it may be a null point.
You could also restore the old CA cert and private key from where you’re migrating to and you may be able to make it work without touching the clients besides restarting the service and resetting host encryption.
-
@Jim-Graczyk
Re-reading through your post I realized you mentioned a fresh install of the client isn’t working.
There may be a different issue. When I migrated from centos 7 recently I ended up rerunning the fog server installer with--recreate-ca --recreate-keys
and reinstalled the fogservice on all my clients to have them use updated certs and an updated fog server name. I didn’t have the time to troubleshoot it and I also don’t have 8 locations so this wasn’t that big of a thing to deploy via other software deployment tools we have.In theory you should still be able to migrate the private and public keys (note that the private keys are hidden with a prepended
.
in the filename when you migrate them from the old fog server. -
I’m running 1.5.10 on all servers (at least that’s what appears on the fog server and the fog storage nodes in the WebUI). Also, the old server was running 1.5.10 before it was shut down and when I dumped the database for migration to the new server.
I DID run into 1.5.10.xxx when downloading FOG, somehow. I used the “master” branch in git for my installs. I follow what I believe are FOG wiki pages that contain specific instructions.
If there is an issue related to upgrading from 1.5.10 to and current version of 1.5.10.xxxx, please let me know.
As I had once again f’d up and put my post under the General topic, it went unnoticed. So in the last few days, I’ve opted move the new FOG server to the Old FOG server name and IP address, and copied the /opt/fog/snapins/ssl folders and content from the old server. This is all on the servers as they were described in the initial post - not a fresh install.
I also completed the steps in the Change IP Address wiki, which specifically rebuilds the certs.
I’m now testing if I can get new and existing hosts connected.
For the record:
I was also chasing what I thought was a more fully logging version of the 0.13 client - some file named debugger.exe - that appears to be some sort of commandline realtime interface to the client that can be used to debug problems. As I have not seen any documentation regarding the innards of the FOGService and its structure, I punted on that tool.Thanks for the help. I’ll post my results.
Jim
-
@Jim-Graczyk said in What Prevents a FOG Client from Connecting to A FOG Server?:
I DID run into 1.5.10.xxx when downloading FOG, somehow. I used the “master” branch in git for my installs. I follow what I believe are FOG wiki pages that contain specific instructions.
Recently, as in back in like June of this year, we started doing more frequent releases and created a ‘stable’ branch. There were some security issues reported in 1.5.10 but the changes made didn’t fully justify a 1.5.11 release in our versioning schema. So we created a flow for more frequent releases, it’s described here https://github.com/FOGProject/fogproject/tree/stable?tab=readme-ov-file#versioning-and-branches
We’ve built out a CI/CD automated release workflow to do monthly releases so that new minor features, bug fixes, and security fixes are released to users more frequently.
The ‘stable’ branch is now the default branch which should be installed from. The wiki is being slowly migrated over to a new docs site and the latest install doc is found here: https://docs.fogproject.org/en/latest/install-fog-serverAlso yes, changing the ip address typically does involve a recommendation for updating the certs because the ip address is including in the public cert’s san. If you can re-use the same IP and have used that alias in the public cert, then you should be able to migrate without needing to re-install clients.
-
Perhaps I misunderstand how all this is supposed to work, but there seems to be a fundamental issue - In the instructions for migrating seem to indicate that one could create a new FOG Server and migrate that new server into operating - side by side with the old server. Any path to doing this includes new certs on a new server that will have a new IP address and a new server name. One can migrate the database to retain Host, Snapins, Image, etc, but nothing is said about migrating existing FOG clients (hosts) to recognize the new FOG server as each’s FOG server. Various articles address this issue superficially by telling us to Reset Encryption for the Host using the FOG WebUI. I don’t think the Reset Encryption button is applicable when the FOG Server has all new Certs, CA, name & IP address.
In one of the posts for this topic, @JJ-Fullmer describes Powershell commands that would make a FOG clients be able to accept a new FOG server, but one is faced with how one would run those PS commands if each host is no longer able to connect to the new server to run snapins.
Part of this is just defining expectations and tell users what migration process is supported.
As of now, I’ve had a week of downtime on all that FOG does because I attempted an approach that at first, appear to provide a migration path that would have the least downtime - install a new FOG server and move to it, so the old FOG server can remain operational until we know the new one works. In this approach, DHCP would be the only thing that would need to be changes to switch to the new server, first for testing and later, permanently.
It seems this approach is a dead end - as there is no way to get the FOG Clients to connect to the new server automatically. Changing the certs to match the old server exactly can only be done using the same name and IP address. So perhaps this is they only approach that can work with existing Hosts in the system.
So, I’ve shut down the old FOG Server after pulling all existing Certs from it. I got the new server renamed to the old server’s name and IP’d to match the old server (following the Change IP process - which according to the wiki I found, includes reissuing certs - which make clients unable to connect until you mover the old certs back manually - which I did. It seems now that existing Hosts are able to connect to the server. And it may be that all but the migrated FOG Storage Nodes are connected and visible in FOG WebUI.
I feel that this approach is a workaround. In time, the certs from the original CEntOS7 install will one day expire. A migration process that includes locations, existing clients and new certificates should be defined.
Given that many Linuxes come and go, well documented upgrade paths for key software systems like FOG have become more important lately.
I’m still testing because there are still locations to migrate from CEntOS7 to AlmaLinux. I will post when I’m back up and verified.
-
@Jim-Graczyk I agree with you and I actually ran into similar issues in my migration and I’m still working out what is the best approach for migrating without losing client connections.
I took notes on my internal migration that I hope to utilize in creating a new doc for migrating a server without losing the client connection, but I also hope to make the PKI/Certs of fog more flexible in general, it’s just a matter of finding the time to work on it.
If you want to help in creating such a doc in the new docs site (see also https://github.com/FOGProject/fog-docs/tree/master) that would be very welcome.I’d love to build a script for migrating, or include it as an option in the installer, but scaling that to support all supported linux distros and standard supported configurations would take some serious work that I don’t realistically have time to do.
By the by, if the Powershell commands worked, that is something that can be scaled in most windows environments and could be deployed as a group policy script or you can use Powershell methods such as Invoke-Command to run it remotely on all machines.
-
I believe I have a viable migration process to get FOG services moved from set of servers to another set. This process takes into account Locations, Storage Groups, existing Hosts, etc. The system I’m migrating has 7 remote sites so that’s 8 locations and 8 servers - one FOG server and 7 FOG Storage Nodes, each at the end of a Static VPN WAN. The initial FOG server is an alias ‘fogserver’ in DNS and DHCP is served out by MS Domain Controllers at each site. Each Storage node serves out all that is needed for each remote location (PXE, initrd, snapins, and images). Each storage node is the master of its own storage group. We do replication manually, so there may be some snapin and image replication issues in this process that I can’t account for.
The process requires being able to harvest the digital identity of the initial FOG server for use on the replacement FOG server, so the replacement server needs the same name and IP address. This is necessary to allow existing hosts to be easily ‘acquired’ by the new system since the Hosts. I pursued a side-by-side migration that worked fine PXE booting->host registration->imaging, but failed for existing Hosts and, for unknown reasons, even new Hosts failed all software distribution steps. The idea of placing FOG required scripts in group policies wasn’t appealing for my purposes since we need the FOGService to work without the machine joining a domain.
I’m currently testing and will have a process “doc” soon. I’m interested in posting this process for FOG Project’s use if the powers that be find it useful.
I’ll post it here when it’s completed.
It seems to me that the need to change Linux distros is an age-old requirement as various Linux OSs have been dropped or usurped. Migration of FOG from CEntOS to any other Debian Linux should be well documented.
More soon.
Jim Graczyk