UNSOLVED Upgraded an Existing Server to 1.4.4 and Now Interface is Very Slow and Chromium Images are not working

  • I was recently having some Database/Schema issues with my 1.3 Fog Server and decided to just go ahead and upgrade to 1.4.4. After updating, FOG was operational again, however the Web Interface and any kind of boot loading interactions are extremely slow compared to what they were before. I tried restarting the server, etc and still no difference. Nothing changed otherwise and I’ve been running a FOG server for many years on this VM with no performance issues. Ubuntu 16.04 and updates are turned off.

    Also another big problem is that we predominantly use FOG for Chromium images. We had just been using our Chromium image a few days before on some Dell laptops and everything was working great. Now since the upgrade, the image will still load onto the laptop through FOG and it appears everything was successful, but the laptops will never successfully boot into the Chromium image. Exact same hardware, exact same image, UEFI settings all the same as before, but they will not boot after imaging with version 1.4.4.

    Anyone else seen any kind of similar issues after an update like this. I’m also attaching a couple of screenshots. There are some error messages showing when we image now that didn’t used to be there. Not sure if it could be useful with these issues or not. I’ve tried some Windows images and they work fine, although much slower to load than before, but no go with Chromium.
    ![1_1516989697452_20180126_123735.jpg](Uploading 100%) ![0_1516989697451_20180126_123713.jpg](Uploading 100%)

  • Senior Developer

    @rstockham23 Good, I think you guys already found and partly fixed the performance issue. To keep things a bit sorted I’d ask you to open a new thread for the UUID issue as well. We’ll move the related posts to that new thread then. I’ll likely have a bit of time to look into this problem on the weekend.

  • @tom-elliott Tried another Chromium image from scratch and still not working. However I also tried a new test Windows 10 image just to see what it would do and it worked fine. So the general process of imaging from creating to deployment is working, but not with any of the Chromium images.

  • @george1421 I do apologize for chasing multiple issues. I guess my original issue was just one issue that things broke after upgrading to a newer version, but it was definitely multiple things.

    Fog Server is a VM running Ubuntu 16.04 I have 2 CPU’s dedicated to it and it’s a RAID configuration on the master VM Server which is a Dell Poweredge 2950 if I’m not mistaken.

    I also have 8GB of Ram dedicated to it.

  • Moderator

    @rstockham23 I really dislike chasing two issues in the same thread, but back on the performance side.

    Can you describe your fog server a bit?

    1. What OS is it running
    2. Is it virtual or physical
    3. Number of vCPUs/Cores
    4. What is your disk subsystem (raid, single hdd disk, ssd?)

  • @tom-elliott Yes, I looked on the Node in the /images/2018CloudReady folder and it was filled with files that seemed similar to other images on the node. I just blew it away and trying it all from scratch again, now that our performance issues are better and will report back.

  • @rstockham23 To me it sounds like the image doesn’t exist on the nodes at /images/2018CloudReady.

    Can you verify that the storage nodes you require to have this image actually have it?

  • @george1421 @Wayne-Workman Setting the Client Check In to 900 has definitely helped the performance. Interfaces are acting more normal now and cpu usage has dropped significantly. Strange that it didn’t have that problem before, but does now, but glad to see it working better.

    Still no luck with the Chromium image though. I thought since performance was fixed, I’d go ahead and upload a new image again and then try to download it to another laptop. Again, upload shows that it’s successful, but when I go to deploy it on another computer, I’m getting this error: https://photos.app.goo.gl/I75ORDRoDZqf24OU2

  • Moderator

    @rstockham23 OK so every 30 seconds all 500 systems “ping” the fog server looking for any new instructions. So wait 5 minutes and see what your load is like.

    Now understand we’ve set the check in interval to 15 minutes. That means if you schedule a snapin deployment to these computers, It will take up to 15 minutes for the target computer to get the job.

  • @george1421 It was set to 30. I changed it to 900.

  • Moderator

    @rstockham23 Lets try this, go to: FOG Configuration->FOG Settings->FOG Client->FOG_CLIENT_CHECKIN_TIME note the value and then set the value to 900. Wait what ever your checkin interval was and see if response is better.

  • @george1421 I’ll get the screenshot soon, but I did run the top command sorted by cpu usage a bit ago and it’s definitely mysqld that’s consuming it. Running 60-80% constantly with basically nothing going on with FOG.

    I should throw in that there was never a performance problem before the upgrade to the newer version with the same amount of Clients, etc.

  • Moderator

    @rstockham23 OK let me rephrase but I think I have my answer already.

    How may computers do you have the fog client installed on? The reason why I ask is that the fog clients check-in to the fog server based on the check in interval. So if you have 500 clients checking in every 5 minutes, then you could have 100 check-ins per minute (in an ideal world). All of these check-ins take CPU time from the fog server.

    from the linux console key in top then hit the p and it should sort the processes by cpu usage. Take a screen shot of that screen and post it here. It may be easier to do this if you connect to the fog server using putty on a windows computer then you can use the screen shot tools in windows.

  • @george1421 When you say “how many clients hitting the fog server”, do you mean for actual imaging or just for any kind of fog service? Any of the times I have referred to slow performance, etc. there haven’t been any images running and only one person accessing the interface. As far as how many active Windows clients there are out in our network that have the Fog service installed…probably in the 400-500 range. Only services I use though are the Domain join and name changing which really only happen after imaging.

    I’m not sure on the Check-In interval. Is that set on the client or in the main FOG settings? The interface is so slow right now, I’m not sure I could even get to find that setting.

  • @wayne-workman Running the top command now. That’s a handy command! mysql is definitely what’s consuming the cpu. It’s staying steady at 60-80% CPU. That’s with nothing happening right now. No images running, nobody accessing the online interface, etc.

  • @rstockham23 In addition to answering George’s question, please run the top command and give us a screenshot of what it displays.

  • Moderator

    @rstockham23 How many client computers are hitting this FOG server? What is your check in interval?

  • @wayne-workman Well…I tried running the database cleanup stuff again and this time there seemed to be no help. Maybe just coincidence last time. Even tried restarting the server after the database cleanup. Crazy slow!!! We’re talking minutes between clicks to go to different screens on the interface and the actual fog client on the imaged computers themselves is slow for everything as well. Wasn’t like this until after the upgrade. The Linux system itself when working with it seems slower than normal as well, so I’m assuming there is some new fog process consuming system resources. I’m not well versed in a Task Manager equivalent in Linux, but maybe could see something there. Also, to address the other things you mentioned, there were no images running and nobody on the user interface except for me at the time of testing the speed.

  • @george1421 You are correct… .45 is the main fog server and .50 is the storage node. I looked in the /images/2018CloudReady (new image just now created) folder on the storage node and it appears to have the files in it like it should, so I believe that it did replicate it properly.