Web interface slowdown and FOG Client authentication issues
Recently migrated to a CentOS 7 server from an older Ubuntu server with an older version of FOG (1.4.4 revision 6077). It had inexplicable slowdowns in the webinterface so I thought it better to start from scratch. Exported the old hosts and images as CSV files and imported them on the new server.
The new server ran seemingly without a hitch for a month or so. We only use imaging and hostnamechanger and domain join functionality, which worked fine, despite some errors always being present in c:\fog.log, probably related to authentication between client and server.
Recently tried using snapins for the first time and found out I had to copy /opt/fog/snapins/ssl from the old to the new server as part of the migration process.
I’m not sure I did that step right. Because right now I get errors in fog.log and a massive slowdown (sometimes error 503). There are brief moments where the slowdown is gone. http://fogserver/fog takes ages to load but the Apache default page at http://fogserver loads instantly.
I think these two issues arrived at the same time and may be related somehow.
I have also tried reinstalling the client on one machine and the “reset encryption data” button on the web interface. It didn’t seem to help.
Here’s the fog.log from the client on that machine. It hangs for a very long time on “Middleware::Authentication Waiting for authentication timeout to pass”, every single time.
Thanks for the help.
@j_d As well I may ask you to validate the certificate is matching the CA:
openssl verify -CAfile /var/www/html/fog/management/other/ca.cert.pem /var/www/html/fog/management/other/ssl/srvpublic.crt
Should tell you
From the client log you posted it clearly states to be a certificate/CA issue:
Middleware::Authentication ERROR: Certificate is not from FOG CA
@j_d Here is some more information on how to check the certificate hashes/fingerprints. On your FOG server run:
openssl x509 -noout -fingerprint -sha1 -inform pem -in /opt/fog/snapins/ssl/CA/.fogCA.pem openssl x509 -noout -fingerprint -sha1 -inform pem -in /var/www/html/fog/management/other/ca.cert.pem
Those two must definitely match. Note it down (first and last three blocks should be enough) and go to one of your clients that shows the issue. Here hit Win key + R and
mmc. From the menu click File -> Add/Remove Snap-in -> Certificates -> Computer accont -> Local computer. Now when you have the certificate view navigate to Certificates (Local Computer) -> Trusted Root Certification Authorities -> Certificates and double click the one called and “FOG Server CA” -> Details tab -> field Thumbprint - see if that matches the output of the above openssl commands.
@j_d Sorry for my late reply. Sort of lost track of this over a busy week. We’ll probably best try to track this down by comparing the CA certificate hashes to see if those match. Do you know how to get certificate hashes both on Windows and Linux?
I have now also copied over these files from the old server to the new
/var/www/html/fog/management/other/ssl/srvpublic.crt /var/www/html/fog/management/other/ca.cert.pem /var/www/html/fog/management/other/ca.cert.der
I have rerun the installer using
I have rebooted the server.
… And I don’t think anything changed. FOGService on existing hosts still doesn’t authenticate with the FOGserver.
Here’s another fog.log, fresh from one of the hosts.
I have re-deployed an image that was captured months back using the old server, and that host will then authenticate just fine. Same with an image that was made with the new server.
The problem is with hosts that have been running since before the server migration.
I’d rather not re-image all those machines and I’m also worried some of the changes I’ve made so far are going to mess things up further down the line.
This section in the wiki doesn’t mention it.
I think it does: “IMPORTANT: Then re-run the installer.” mentioned just below the cp/scp commands. Sure it might be a but hidden in the gist of information.
On a related note, Would it be a good idea to have a link to an article on migration on the wiki’s main page?
There are a lot of things that we should add or update or rearrange in the wiki! But just not enough time to do so. I’d be awesome if people knowing FOG a bit would join in to help on things like documentation - request wiki write access here.
I have previously copied over the files from /opt/fog/snapins/ssl/, but I didn’t know there were other files in /var/www/html/fog/management/other/ that needed copying. This section in the wiki doesn’t mention it.
On a related note, Would it be a good idea to have a link to an article on migration on the wiki’s main page? I didn’t even know migrating required any special steps until I ran into these problems.
I will try copying those /var/www/html/fog/management/other/ files on Thursday. Will that require running installfog.sh again?
Thanks for the help so far!
Is resetting encryption data supposed to help at this point?
Hmmm, my fault. At first I thought it would but looking at it again I see why it doesn’t. I only saw the part of the story where you had a huge load but I missed that you had installed a fresh FOG server.
So why does it matter? On installing a FOG server a unique CA (certificate authority) certificate/key pair and a webserver certificate/key pair is being generated. A fresh fog-client install grabs the CA certificate from the FOG server and stores this within the Windows cert store. Now you have the CA certificate from your old server stored on all the clients which are all now trying to connect to the new server which has a different CA certificate.
So you wanna grab the following files from your old server and put in the same place on your new server (don’t just overwrite but backup or move the new ones just in case):
/opt/fog/snapins/ssl/CA/.fogCA.key /opt/fog/snapins/ssl/CA/.fogCA.pem /opt/fog/snapins/ssl/.srvprivate.key /var/www/html/fog/management/other/ssl/srvpublic.crt /var/www/html/fog/management/other/ca.cert.pem /var/www/html/fog/management/other/ca.cert.der
Hint: The later two are copies of the first file - CA cert - available to the clients in two different formats, you definitely need those in place if you add new fog-clients later on.
Is resetting encryption data supposed to help at this point? It doesn’t seem to fix those clients that are still not talking to the server (they still have the same issue in fog.log as the one in the original post). I can’t easily check how many clients are and aren’t working right.
I’ve also tried uninstalling and reinstalling FOGservice on a client and that didn’t help.
Deploying a completely new image is the only thing I can do but again I don’t know how to check which ones are still broken.
@Sebastian-Roth said in Web interface slowdown and FOG Client authentication issues:
[…]Take a snapshot of your Hyper-V VM, pull the latest changes from github, checkout
dev-branchand re-run the installer[…]
That has actually made things a lot snappier. I can now set client polling time back to 300 seconds.
So worst case, I re-image all the ones with authentication errors still.
As I said, doing a “Reset Encryption Data” for all your hosts and they should’ve all synced back in. But now that you went the other way round I don’t think it’s still worth it.
However, I noticed during the imaging, the web interface was once again very slow and giving me 503 errors. Completely fine after it finished.
How many clients did you image? Multicast or unicast? This might be interesting to investigate some more but it’s probably not easy to figure out. That said, I have worked on improving “FOG speed” since 1.5.5 was released. So it would be interesting to see if those changes would help in your situation. Take a snapshot of your Hyper-V VM, pull the latest changes from github, checkout
dev-branchand re-run the installer:
cd /path/to/fogproject/ git pull git checkout dev-branch cd bin/ ./installfog.sh -y
This will install PHP version 7.2 as well (CentOS still had PHP 5.6 in FOG version 1.5.5) which speeds up the whole web UI too. So code changes and PHP 7.2 should make a major difference for you.
Just remembered I forgot to check in yesterday. It was a busy day.
Turns out I didn’t rerun the FOG install script after copying over the SSL files. I ran it and it seems to have fixed it on some (probably most?) hosts but not all the ones I checked. It must’ve fixed it on most because the web interface is good again (not always, read below).
Also deployed an image that was made when the previous FOGserver was still in use, and the FOGservice on that machine works perfectly again. So worst case, I re-image all the ones with authentication errors still. However, I noticed during the imaging, the web interface was once again very slow and giving me 503 errors. Completely fine after it finished.
So it looks like the original problem might be solved and it was my own dumb mistake. I’ll have to check next week to be sure though.
FOG version in the new centOS 7 server?
How many clients are in the host table of fog database?
How many clients are power on simultaneously?
Is a virtual server?
500-600 clients at least 80-90% of those powered on
Didn’t the FOG installer install PHP? If so, that’s the version.
Yes, it’s running on a Hyper-V instance
Speaking of the amount of clients. A while back, before the issue that caused me to post this thread, I had to double the client check-in time to 600 seconds instead of 300 and completely disabled pinging hosts, to help speed up the web interface. Is there anything I can do so I don’t have to lower those features? It’s slightly inconvenient.
Hi @j_d ,
I have FOG 1.5.5 under RHEL 7, now runs well and the performance is normal, but I had troubles with it in the past.
- FOG version in the new centOS 7 server?
- How many clients are in the host table of fog database?
- How many clients are power on simultaneously?
- PHP version?
- Is a virtual server?
It’ll be hard to do this over TeamViewer since I only do this part time. I won’t be back to work until Friday, and that’s during Western European business hours.
Unfortunately this PKI stuff in the wiki goes way over my head.
I just re-read these steps for the 5th time and I think I may have made a mistake and it may be related to the “IMPORTANT” in big red letters. I will report back on Friday.
@j_d I guess there are a couple of things involved. Better we do a teamviewer session to get things fixed. Trying to reach you on chat.
Anything in the apache and PHP-FPM logs (see my signature)?
Well there is one thing you can try first: “Reset encryption data” for ALL of your clients. As you probably have not copied the certificate stuff over from the other server all clients are going mad trying to talk to the server. This is causing a huge load.
It hangs for a very long time on “Middleware::Authentication Waiting for authentication timeout to pass”, every single time.
That’s ok, it’s meant to.