Upgraded an Existing Server to 1.4.4 and Now Interface is Very Slow and Chromium Images are not working
-
@rstockham23 Just for clarity can you explain who these actors are?
10.23.8.50
10.23.8.45It would appear that .45 is the fog server and .50 is a storage node? If that is the case can you confirm that the images have been replicated to the storage node?
-
@george1421 You are correct… .45 is the main fog server and .50 is the storage node. I looked in the /images/2018CloudReady (new image just now created) folder on the storage node and it appears to have the files in it like it should, so I believe that it did replicate it properly.
-
@wayne-workman Well…I tried running the database cleanup stuff again and this time there seemed to be no help. Maybe just coincidence last time. Even tried restarting the server after the database cleanup. Crazy slow!!! We’re talking minutes between clicks to go to different screens on the interface and the actual fog client on the imaged computers themselves is slow for everything as well. Wasn’t like this until after the upgrade. The Linux system itself when working with it seems slower than normal as well, so I’m assuming there is some new fog process consuming system resources. I’m not well versed in a Task Manager equivalent in Linux, but maybe could see something there. Also, to address the other things you mentioned, there were no images running and nobody on the user interface except for me at the time of testing the speed.
-
@rstockham23 How many client computers are hitting this FOG server? What is your check in interval?
-
@rstockham23 In addition to answering George’s question, please run the
top
command and give us a screenshot of what it displays. -
@wayne-workman Running the top command now. That’s a handy command! mysql is definitely what’s consuming the cpu. It’s staying steady at 60-80% CPU. That’s with nothing happening right now. No images running, nobody accessing the online interface, etc.
-
@george1421 When you say “how many clients hitting the fog server”, do you mean for actual imaging or just for any kind of fog service? Any of the times I have referred to slow performance, etc. there haven’t been any images running and only one person accessing the interface. As far as how many active Windows clients there are out in our network that have the Fog service installed…probably in the 400-500 range. Only services I use though are the Domain join and name changing which really only happen after imaging.
I’m not sure on the Check-In interval. Is that set on the client or in the main FOG settings? The interface is so slow right now, I’m not sure I could even get to find that setting.
-
@rstockham23 OK let me rephrase but I think I have my answer already.
How may computers do you have the fog client installed on? The reason why I ask is that the fog clients check-in to the fog server based on the check in interval. So if you have 500 clients checking in every 5 minutes, then you could have 100 check-ins per minute (in an ideal world). All of these check-ins take CPU time from the fog server.
from the linux console key in
top
then hit thep
and it should sort the processes by cpu usage. Take a screen shot of that screen and post it here. It may be easier to do this if you connect to the fog server using putty on a windows computer then you can use the screen shot tools in windows. -
@george1421 I’ll get the screenshot soon, but I did run the top command sorted by cpu usage a bit ago and it’s definitely mysqld that’s consuming it. Running 60-80% constantly with basically nothing going on with FOG.
I should throw in that there was never a performance problem before the upgrade to the newer version with the same amount of Clients, etc.
-
@rstockham23 Lets try this, go to: FOG Configuration->FOG Settings->FOG Client->FOG_CLIENT_CHECKIN_TIME note the value and then set the value to 900. Wait what ever your checkin interval was and see if response is better.
-
@george1421 It was set to 30. I changed it to 900.
-
@rstockham23 OK so every 30 seconds all 500 systems “ping” the fog server looking for any new instructions. So wait 5 minutes and see what your load is like.
Now understand we’ve set the check in interval to 15 minutes. That means if you schedule a snapin deployment to these computers, It will take up to 15 minutes for the target computer to get the job.
-
@george1421 @Wayne-Workman Setting the Client Check In to 900 has definitely helped the performance. Interfaces are acting more normal now and cpu usage has dropped significantly. Strange that it didn’t have that problem before, but does now, but glad to see it working better.
Still no luck with the Chromium image though. I thought since performance was fixed, I’d go ahead and upload a new image again and then try to download it to another laptop. Again, upload shows that it’s successful, but when I go to deploy it on another computer, I’m getting this error: https://photos.app.goo.gl/I75ORDRoDZqf24OU2
-
@rstockham23 To me it sounds like the image doesn’t exist on the nodes at /images/2018CloudReady.
Can you verify that the storage nodes you require to have this image actually have it?
-
@tom-elliott Yes, I looked on the Node in the /images/2018CloudReady folder and it was filled with files that seemed similar to other images on the node. I just blew it away and trying it all from scratch again, now that our performance issues are better and will report back.
-
@rstockham23 I really dislike chasing two issues in the same thread, but back on the performance side.
Can you describe your fog server a bit?
- What OS is it running
- Is it virtual or physical
- Number of vCPUs/Cores
- What is your disk subsystem (raid, single hdd disk, ssd?)
-
@george1421 I do apologize for chasing multiple issues. I guess my original issue was just one issue that things broke after upgrading to a newer version, but it was definitely multiple things.
Fog Server is a VM running Ubuntu 16.04 I have 2 CPU’s dedicated to it and it’s a RAID configuration on the master VM Server which is a Dell Poweredge 2950 if I’m not mistaken.
I also have 8GB of Ram dedicated to it.
-
@tom-elliott Tried another Chromium image from scratch and still not working. However I also tried a new test Windows 10 image just to see what it would do and it worked fine. So the general process of imaging from creating to deployment is working, but not with any of the Chromium images.
-
@rstockham23 Good, I think you guys already found and partly fixed the performance issue. To keep things a bit sorted I’d ask you to open a new thread for the UUID issue as well. We’ll move the related posts to that new thread then. I’ll likely have a bit of time to look into this problem on the weekend.
-