FOG Web GUI speed and default storage activity

JGallo

Hello,

I’m noticing that after some time the fog server is running that it begins to have issues with the nodes. On the dashboard, it shows a message that “a valid connection cannot be established” to all nodes. So when you click on the storage node, it will refresh and it works but then other nodes have the same issue. I came across a post to increase the memory limit for the php file in fog settings. that seem to help but not sure if it will fix it. I have also noticed that the FOG server itself is having a lot of activity at least by the graph and it’s currently not needing to replicate.

My question is there areas in the FOG settings to improve web UI speeds? I have read @george1421 post about php-fpm making web UI faster but I’m not sure if this is ready or experimental. Is there anything I can do to improve the speeds currently on the working branch? I have about 18 storage nodes with 6 storage groups on network of roughly 2500 PC’s. Currently the fog database only has about 60 computers as I have had to start from scratch. I have disabled the host lookup option and increased the imagerepsleeptime so it’s not so taxing on the web ui as also for replication.

Thank you.

JGallo

@wayne-workman

Sorry about the late reply here. We had a rogue router on a campus which we were hunting for and eventually found. The error message about the database connection after host registration and prior to actually imaging disappeared. Going to restart again and continue to monitor issue.

george1421

For ubuntu, I’m still working on the php-fpm config. For centos it works. BUT if you only have 60 hosts hitting your fog server and you have a performance issue, then something else is going on.

What is your client check in interval (in the fog settings page)? The default is every 5 minutes. With only 60 hosts that shouldn’t hit too bad, with 200 hosts I might think differently.

What is the stats on your FOG server? How much memory, vCPU, process time from top?

JGallo

Client check interval was set to 60 but I doubled that just in case. The FOG server is on a Hyper-V with 2 vCPU and 4GB-16GB of allocated memory. top shows the following:

MYSQL - 5 -13% CPU usage 6 - 8% mem usage
FOGMulticast - 3-5% CPU usage 1% mem usage
Apache - 1 - 3% CPU usage 1% mem sage

So nothing crazy out of the ordinary I think according to top. Should I raise the vCPU on the hyper-v? To 4? Or change some other setting in FOG?

george1421

@jgallo Can you post the header section of top here?

JGallo

@george1421

Last login: Thu Oct  5 13:44:58 2017 from 10.215.57.57
administrator@VUSD-FOG:~$ top
top - 15:03:19 up  5:18,  2 users,  load average: 0.21, 0.19, 0.11
Tasks: 174 total,   1 running, 165 sleeping,   0 stopped,   8 zombie
%Cpu(s):  5.5 us,  3.2 sy,  0.0 ni, 90.3 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem :  4035292 total,  3074368 free,   514096 used,   446828 buff/cache
KiB Swap:  4190204 total,  4190204 free,        0 used.  3223812 avail Mem

george1421

@jgallo I’m not seeing any value in adding vCPUs to this VM. And this VM is about 10% utilized.

It looks like the system was rebooted in the last 5 minutes?? That will skew the load values, but so far the system looks normal. It would be interesting to see what the stats are running during the day time under normal load.

JGallo

@george1421

I will check tomorrow and place the results here.

Sebastian Roth

@george1421 You probably meant system is up for 5 hours 18 minutes…

I have about 18 storage nodes with 6 storage groups on network of roughly 2500 PC’s

I am wondering if our web UI code is just not ready yet to handle such a huge environment. Probably it’s best if one of us remotes in at some point to try and figure out where exactly the limits are.

JGallo

@sebastian-roth

Here is the top header from this am:

top - 07:47:12 up 16:00,  2 users,  load average: 0.01, 0.05, 0.07
Tasks: 166 total,   1 running, 165 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.5 us,  3.9 sy,  0.0 ni, 90.1 id,  0.0 wa,  0.0 hi,  0.5 si,  0.0 st
KiB Mem :  4035292 total,  2814336 free,   500232 used,   720724 buff/cache
KiB Swap:  4190204 total,  4190204 free,        0 used.  3219104 avail Mem

The imaging aspect of it is fine. The web UI becomes sluggish when I begin to create storage nodes and storage groups. I have updated all the nodes to working branch 57 to be consistent across the board.

george1421

@jgallo What host OS is your fog server running?

JGallo

@george1421

I have Ubuntu Server 16.04.03 LTS

Here is a screenshot of my dashboard so you can better understand what’s going on. When you click on a storage node, the information gets refreshed and it works as with the Storage Groups. They go blank and when you refresh they come back.

Sebastian Roth

@Tom-Elliott Would you have an idea what’s going on here by any chance?

Wayne Workman

I think it’s timing out. Tom does some advanced stuff in the javascript with time & timeouts and such in order to prevent those things from blocking other things in the event of an actual problem (like a dead node).

JGallo

Could it be something on my end with the network where packets are being lost and FOG server isn’t receiving responses to those check ins? Or if it’s a timeout script could we increase it manualy? What things should I be looking for if it’s network related?

Wayne Workman

@jgallo All those things, possibly. Let me explain how this works a bit.

The web server issues a web call to all enabled nodes when you call the homepage (the main dashbard). So, you click the home icon in FOG or login, the home page calls this on each enabled node:
http://x.x.x.x/fog/service/getversion.php
Where x.x.x.x is the IP/name of each node. So say you had three nodes: 10.0.0.5, 10.0.0.6, 10.0.0.7
The homepage would call each one like:
http://10.0.0.5/fog/service/getversion.php
http://10.0.0.6/fog/service/getversion.php
http://10.0.0.7/fog/service/getversion.php

If they don’t respond in time because they are too busy or the web server is just to busy to hear the responses, then it causes the error “A valid database connection could not be made”

Further, sitting on the homepage increases load somewhat on the nodes and web server and database - because the homepage graphs continually poll all of them for transmission stats, space stats, continually polls the DB for job status, etc.

At one of my last jobs, the “a valid database blah blah” message was common when the FOG system was under load. If imaging is still working, then you don’t have any problems. If it bothers you a lot, try to dial back the max clients value in your node’s storage management area to reduce the load. Or decrease your client checkin time. Or build faster servers to accommodate your load.

JGallo

@wayne-workman

Cool. Makes perfect sense. I must say I don’t think it’s the storage node servers as they mostly consists of Dell R410’s and R210’s with at least 8GB to 16GB of memory. I will try to decrease the check in time. Is the value in seconds or minutes for the check in time in FOG?

Wayne Workman

@jgallo said in FOG Web GUI speed and default storage activity:

Is the value in seconds or minutes for the check in time in FOG?

Seconds.

JGallo

@wayne-workman

Ok so I have tried to decrease the client checkin time and it seems like it helps but eventually ends up causing that message. I even went to the extent of not sitting on the homepage just to make sure and after an hour I go back and check and still have those connections messages.

I will try one more thing and if this doesn’t work i will just deal with it but could I raise the number of vCPU on Hyper-V from 2 to 4? Will this help improve performance? That’s basically the only thing left to do to improve performance on the fog server. I did make the change from dynamic memory to static. thanks.

Jim Graczyk

@Sebastian-Roth
@george1421

Guys,

Is there any way to alter the choice of pages that the Web UI goes to upon sign in?

I’m seeing very long signin times but the web UI is mostly very fast in my installations. I’ve got 9 remotes sites, each with 1 storage node, some of the sights are connected in over 4G LTE.

Signing into FOG takes variously long times, depending on how the 9 connections are doing.

I could do without the dashboard page and the load it generates. I’d be happy if the default page were anything but the Dashboard.

Jim

george1421

@jim-graczyk There is not currently an option for that, but (in my limited experience it might be trivial to include that feature). The developers might include a new setting in FOG Settings->Login Settings to allow the FOG admin to change their login landing page. If you look in the url the only difference between the dashboard and any other page is just the node reference.

But then might raise the question, if not the dashboard, then what should be the default landing page? And might that landing page be different on a per user bases? (just thinking of reasons why we might not want this feature). Either way, I would surely make a feature request out of this idea. I fell it does have a moderate amount of merit.

FOG Web GUI speed and default storage activity

176

12.1k

17.3k

155.4k