Upgraded an Existing Server to 1.4.4 and Now Interface is Very Slow and Chromium Images are not working

rstockham23

I was recently having some Database/Schema issues with my 1.3 Fog Server and decided to just go ahead and upgrade to 1.4.4. After updating, FOG was operational again, however the Web Interface and any kind of boot loading interactions are extremely slow compared to what they were before. I tried restarting the server, etc and still no difference. Nothing changed otherwise and I’ve been running a FOG server for many years on this VM with no performance issues. Ubuntu 16.04 and updates are turned off.

Also another big problem is that we predominantly use FOG for Chromium images. We had just been using our Chromium image a few days before on some Dell laptops and everything was working great. Now since the upgrade, the image will still load onto the laptop through FOG and it appears everything was successful, but the laptops will never successfully boot into the Chromium image. Exact same hardware, exact same image, UEFI settings all the same as before, but they will not boot after imaging with version 1.4.4.

Anyone else seen any kind of similar issues after an update like this. I’m also attaching a couple of screenshots. There are some error messages showing when we image now that didn’t used to be there. Not sure if it could be useful with these issues or not. I’ve tried some Windows images and they work fine, although much slower to load than before, but no go with Chromium.
![1_1516989697452_20180126_123735.jpg](Uploading 100%) ![0_1516989697451_20180126_123713.jpg](Uploading 100%)

Wayne Workman

@rstockham23 Your photo didn’t upload, try again. Also, I recommend running these commands and see if things improve: https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_MySQL#Database_Maintenance_Commands

rstockham23

@Wayne-Workman Thank you Wayne. I’ll try to upload the pictures again later. Your recommendation did help with the speed. It’s running at the speed I would expect it now. However, I still can’t use my Chromium images after the upgrade. They properly copy the files fine, but it’s something with the partitions…maybe UUID…I’m not sure, but they will not boot into Chromium. Just get the black screen. I can go to the UEFI settings and find all of the boot files, etc. I’ve tried them all, but nothing works. Everything worked fine until I upgraded. I went ahead and upgraded to the 1.5 newest Release Candidate and still having the same Chromium issue. I don’t even know what to try now, but it’s a pretty important thing to our school district.

Wayne Workman

@rstockham23 said in Upgraded an Existing Server to 1.4.4 and Now Interface is Very Slow and Chromium Images are not working:

Everything worked fine until I upgraded.

There have been oh-so-many changes between 1.3 and 1.4.4. One way to go about figuring out what changed in FOG is to test the image with each version between 1.3 and 1.4.4 to determine exactly where it broke. Hopefully we don’t have to do that, I’m going to ask either @Sebastian-Roth or @george1421 to take a look at this problem and see what they think.

rstockham23

@wayne-workman I had mentioned last week that after I tried the commands in the link you sent me for the sql database, the speed increased. This week it’s back to being painfully slow. Any ideas as to why it would have seemingly helped it, but then quickly went back to slow?

Wayne Workman

@rstockham23 said in Upgraded an Existing Server to 1.4.4 and Now Interface is Very Slow and Chromium Images are not working:

Any ideas as to why it would have seemingly helped it, but then quickly went back to slow?

It’s hard to guess without digging in. It could be that the server was just busy doing imaging or a bunch of people were just sitting on the FOG Dashboard. Sitting on the dashboard page is CPU Heavy on the FOG Server. On really large installations, I recommend just disabling all the graph stuff. It could be that the database is dirty again. You can try to run those DB maintenance commands again and see if it immediately improves or not. If it does, we can spend some time (in another thread because it’s off topic) to dig in and see what’s causing the database problems.

george1421

@rstockham23 Just for clarity can you explain who these actors are?

10.23.8.50
10.23.8.45

It would appear that .45 is the fog server and .50 is a storage node? If that is the case can you confirm that the images have been replicated to the storage node?

rstockham23

@george1421 You are correct… .45 is the main fog server and .50 is the storage node. I looked in the /images/2018CloudReady (new image just now created) folder on the storage node and it appears to have the files in it like it should, so I believe that it did replicate it properly.

rstockham23

@wayne-workman Well…I tried running the database cleanup stuff again and this time there seemed to be no help. Maybe just coincidence last time. Even tried restarting the server after the database cleanup. Crazy slow!!! We’re talking minutes between clicks to go to different screens on the interface and the actual fog client on the imaged computers themselves is slow for everything as well. Wasn’t like this until after the upgrade. The Linux system itself when working with it seems slower than normal as well, so I’m assuming there is some new fog process consuming system resources. I’m not well versed in a Task Manager equivalent in Linux, but maybe could see something there. Also, to address the other things you mentioned, there were no images running and nobody on the user interface except for me at the time of testing the speed.

george1421

@rstockham23 How many client computers are hitting this FOG server? What is your check in interval?

Wayne Workman

@rstockham23 In addition to answering George’s question, please run the top command and give us a screenshot of what it displays.

rstockham23

@wayne-workman Running the top command now. That’s a handy command! mysql is definitely what’s consuming the cpu. It’s staying steady at 60-80% CPU. That’s with nothing happening right now. No images running, nobody accessing the online interface, etc.

rstockham23

@george1421 When you say “how many clients hitting the fog server”, do you mean for actual imaging or just for any kind of fog service? Any of the times I have referred to slow performance, etc. there haven’t been any images running and only one person accessing the interface. As far as how many active Windows clients there are out in our network that have the Fog service installed…probably in the 400-500 range. Only services I use though are the Domain join and name changing which really only happen after imaging.

I’m not sure on the Check-In interval. Is that set on the client or in the main FOG settings? The interface is so slow right now, I’m not sure I could even get to find that setting.

george1421

@rstockham23 OK let me rephrase but I think I have my answer already.

How may computers do you have the fog client installed on? The reason why I ask is that the fog clients check-in to the fog server based on the check in interval. So if you have 500 clients checking in every 5 minutes, then you could have 100 check-ins per minute (in an ideal world). All of these check-ins take CPU time from the fog server.

from the linux console key in top then hit the p and it should sort the processes by cpu usage. Take a screen shot of that screen and post it here. It may be easier to do this if you connect to the fog server using putty on a windows computer then you can use the screen shot tools in windows.

rstockham23

@george1421 I’ll get the screenshot soon, but I did run the top command sorted by cpu usage a bit ago and it’s definitely mysqld that’s consuming it. Running 60-80% constantly with basically nothing going on with FOG.

I should throw in that there was never a performance problem before the upgrade to the newer version with the same amount of Clients, etc.

george1421

@rstockham23 Lets try this, go to: FOG Configuration->FOG Settings->FOG Client->FOG_CLIENT_CHECKIN_TIME note the value and then set the value to 900. Wait what ever your checkin interval was and see if response is better.

rstockham23

@george1421 It was set to 30. I changed it to 900.

george1421

@rstockham23 OK so every 30 seconds all 500 systems “ping” the fog server looking for any new instructions. So wait 5 minutes and see what your load is like.

Now understand we’ve set the check in interval to 15 minutes. That means if you schedule a snapin deployment to these computers, It will take up to 15 minutes for the target computer to get the job.

rstockham23

@george1421 @Wayne-Workman Setting the Client Check In to 900 has definitely helped the performance. Interfaces are acting more normal now and cpu usage has dropped significantly. Strange that it didn’t have that problem before, but does now, but glad to see it working better.

Still no luck with the Chromium image though. I thought since performance was fixed, I’d go ahead and upload a new image again and then try to download it to another laptop. Again, upload shows that it’s successful, but when I go to deploy it on another computer, I’m getting this error: https://photos.app.goo.gl/I75ORDRoDZqf24OU2

Tom Elliott

@rstockham23 To me it sounds like the image doesn’t exist on the nodes at /images/2018CloudReady.

Can you verify that the storage nodes you require to have this image actually have it?

Upgraded an Existing Server to 1.4.4 and Now Interface is Very Slow and Chromium Images are not working

116

12.2k

17.4k

155.5k