FOG Web GUI speed and default storage activity

JGallo

Client check interval was set to 60 but I doubled that just in case. The FOG server is on a Hyper-V with 2 vCPU and 4GB-16GB of allocated memory. top shows the following:

MYSQL - 5 -13% CPU usage 6 - 8% mem usage
FOGMulticast - 3-5% CPU usage 1% mem usage
Apache - 1 - 3% CPU usage 1% mem sage

So nothing crazy out of the ordinary I think according to top. Should I raise the vCPU on the hyper-v? To 4? Or change some other setting in FOG?

george1421

@jgallo Can you post the header section of top here?

JGallo

@george1421

Last login: Thu Oct  5 13:44:58 2017 from 10.215.57.57
administrator@VUSD-FOG:~$ top
top - 15:03:19 up  5:18,  2 users,  load average: 0.21, 0.19, 0.11
Tasks: 174 total,   1 running, 165 sleeping,   0 stopped,   8 zombie
%Cpu(s):  5.5 us,  3.2 sy,  0.0 ni, 90.3 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem :  4035292 total,  3074368 free,   514096 used,   446828 buff/cache
KiB Swap:  4190204 total,  4190204 free,        0 used.  3223812 avail Mem

george1421

@jgallo I’m not seeing any value in adding vCPUs to this VM. And this VM is about 10% utilized.

It looks like the system was rebooted in the last 5 minutes?? That will skew the load values, but so far the system looks normal. It would be interesting to see what the stats are running during the day time under normal load.

JGallo

@george1421

I will check tomorrow and place the results here.

Sebastian Roth

@george1421 You probably meant system is up for 5 hours 18 minutes…

I have about 18 storage nodes with 6 storage groups on network of roughly 2500 PC’s

I am wondering if our web UI code is just not ready yet to handle such a huge environment. Probably it’s best if one of us remotes in at some point to try and figure out where exactly the limits are.

JGallo

@sebastian-roth

Here is the top header from this am:

top - 07:47:12 up 16:00,  2 users,  load average: 0.01, 0.05, 0.07
Tasks: 166 total,   1 running, 165 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.5 us,  3.9 sy,  0.0 ni, 90.1 id,  0.0 wa,  0.0 hi,  0.5 si,  0.0 st
KiB Mem :  4035292 total,  2814336 free,   500232 used,   720724 buff/cache
KiB Swap:  4190204 total,  4190204 free,        0 used.  3219104 avail Mem

The imaging aspect of it is fine. The web UI becomes sluggish when I begin to create storage nodes and storage groups. I have updated all the nodes to working branch 57 to be consistent across the board.

george1421

@jgallo What host OS is your fog server running?

JGallo

@george1421

I have Ubuntu Server 16.04.03 LTS

Here is a screenshot of my dashboard so you can better understand what’s going on. When you click on a storage node, the information gets refreshed and it works as with the Storage Groups. They go blank and when you refresh they come back.

Sebastian Roth

@Tom-Elliott Would you have an idea what’s going on here by any chance?

Wayne Workman

I think it’s timing out. Tom does some advanced stuff in the javascript with time & timeouts and such in order to prevent those things from blocking other things in the event of an actual problem (like a dead node).

JGallo

Could it be something on my end with the network where packets are being lost and FOG server isn’t receiving responses to those check ins? Or if it’s a timeout script could we increase it manualy? What things should I be looking for if it’s network related?

Wayne Workman

@jgallo All those things, possibly. Let me explain how this works a bit.

The web server issues a web call to all enabled nodes when you call the homepage (the main dashbard). So, you click the home icon in FOG or login, the home page calls this on each enabled node:
http://x.x.x.x/fog/service/getversion.php
Where x.x.x.x is the IP/name of each node. So say you had three nodes: 10.0.0.5, 10.0.0.6, 10.0.0.7
The homepage would call each one like:
http://10.0.0.5/fog/service/getversion.php
http://10.0.0.6/fog/service/getversion.php
http://10.0.0.7/fog/service/getversion.php

If they don’t respond in time because they are too busy or the web server is just to busy to hear the responses, then it causes the error “A valid database connection could not be made”

Further, sitting on the homepage increases load somewhat on the nodes and web server and database - because the homepage graphs continually poll all of them for transmission stats, space stats, continually polls the DB for job status, etc.

At one of my last jobs, the “a valid database blah blah” message was common when the FOG system was under load. If imaging is still working, then you don’t have any problems. If it bothers you a lot, try to dial back the max clients value in your node’s storage management area to reduce the load. Or decrease your client checkin time. Or build faster servers to accommodate your load.

JGallo

@wayne-workman

Cool. Makes perfect sense. I must say I don’t think it’s the storage node servers as they mostly consists of Dell R410’s and R210’s with at least 8GB to 16GB of memory. I will try to decrease the check in time. Is the value in seconds or minutes for the check in time in FOG?

Wayne Workman

@jgallo said in FOG Web GUI speed and default storage activity:

Is the value in seconds or minutes for the check in time in FOG?

Seconds.

JGallo

@wayne-workman

Ok so I have tried to decrease the client checkin time and it seems like it helps but eventually ends up causing that message. I even went to the extent of not sitting on the homepage just to make sure and after an hour I go back and check and still have those connections messages.

I will try one more thing and if this doesn’t work i will just deal with it but could I raise the number of vCPU on Hyper-V from 2 to 4? Will this help improve performance? That’s basically the only thing left to do to improve performance on the fog server. I did make the change from dynamic memory to static. thanks.

Jim Graczyk

@Sebastian-Roth
@george1421

Guys,

Is there any way to alter the choice of pages that the Web UI goes to upon sign in?

I’m seeing very long signin times but the web UI is mostly very fast in my installations. I’ve got 9 remotes sites, each with 1 storage node, some of the sights are connected in over 4G LTE.

Signing into FOG takes variously long times, depending on how the 9 connections are doing.

I could do without the dashboard page and the load it generates. I’d be happy if the default page were anything but the Dashboard.

Jim

george1421

@jim-graczyk There is not currently an option for that, but (in my limited experience it might be trivial to include that feature). The developers might include a new setting in FOG Settings->Login Settings to allow the FOG admin to change their login landing page. If you look in the url the only difference between the dashboard and any other page is just the node reference.

But then might raise the question, if not the dashboard, then what should be the default landing page? And might that landing page be different on a per user bases? (just thinking of reasons why we might not want this feature). Either way, I would surely make a feature request out of this idea. I fell it does have a moderate amount of merit.

Wayne Workman

@jgallo said in FOG Web GUI speed and default storage activity:

could I raise the number of vCPU on Hyper-V from 2 to 4? Will this help improve performance?

That would only help if your host system isn’t overburdened. If you have too many VMs on it already with too many cores assigned, and not enough cores available, it’ll just make things worse. But if you have plenty of resources, then it would help.

Also, set your client checkin time to something like 300 seconds (five minutes) and see if that makes a difference. Keep in mind the change here isn’t immediate - the clients have to checkin once more to actually get the new setting.

JGallo

@wayne-workman

I don’t have too many VM’s. I do have 4 but each of those has either 1 or 2 vCPU’s allocated to them. I have now set 4 vCPU’s for the fog server VM and I still have that issue. I also set the client checkin time to 300 with same issue.

Here is what I noticed. I see that the FOG server disk I/O is about 15% give or take on a constant basis. I also noticed that all the disk activity is from apache2 with user www-data and mysql is using up to 5GB of memory at times just on idle. Could this be some programming bug or my database needs to be cleared?

FOG Web GUI speed and default storage activity

123

12.6k

17.5k

156.3k