FOG Web GUI speed and default storage activity

JGallo

I know that I went from 1.4.4 to 1.5.0 RC1 if I recalled. I know when I upgraded I made a huge leap. New interface and all. During that time there were replication issues and eventually updated to 1.5.0 RC7 which still had replication issues. I then upgraded to 1.5.0 RC9 which replication had major issue that was resolved in a working branch. So I have been on working branch 57 until about an hour ago I went to 64 with all nodes and fog server. I have been observing this issue since I have been on 1.5.0 from the new interface. I never had this issue in 1.4.4

Wayne Workman

@jgallo said in FOG Web GUI speed and default storage activity:

The only pattern I have observed is that upon rebooting the fog sever, the valid connection messages do not appear for about an hour or so.

I’m thinking about this - I’d like you to try to restart only Apache and see if it has the same effect or not. On CentOS/Fedora/RHEL it’s systemctl restart httpd and on Ubuntu 16/Debian8/Debian9 it’s systemctl restart apache2 and on Ubuntu 14-,debian7- it’s service apache2 restart

JGallo

@wayne-workman

Tried that and still get the database connection message.

Wayne Workman

@jgallo During when the problem is happening, what does this command return? free -h;uptime

JGallo

@wayne-workman
I have rebooted due to upgrade to working branch. FYI.

             total        used        free      shared  buff/cache   available
Mem:            15G        510M        181M         47M         14G         14G
Swap:          4.0G        268K        4.0G
 15:59:54 up  1:56,  2 users,  load average: 0.24, 0.19, 0.21```

Wayne Workman

@jgallo You have 14GB cached in RAM, which I’ll say is substantial. You can clear that with the below command:
sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
Does the issue resolve after this? Also, check free -h afterwards.

JGallo

@wayne-workman

went down to 139MB. LOL dam that was a lot of cache. Wish I had it in my pocket.

JGallo

@wayne-workman

This was just a minute of running free -h

     total        used        free      shared  buff/cache   available
Mem:            15G        427M         14G         47M        289M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        434M         14G         47M        289M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        484M         14G         47M        292M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        485M         14G         47M        292M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        486M         14G         47M        301M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        486M         14G         47M        301M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        487M         14G         47M        301M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        487M         14G         47M        301M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        488M         14G         47M        301M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        488M         14G         47M        302M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        490M         14G         47M        302M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        489M         14G         47M        302M         14G
Swap:          4.0G          0B        4.0G
administrator@VUSD-FOG:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        491M         14G         47M        302M         14G```

Wayne Workman

@jgallo Most importantly, are the nodes reporting properly?

JGallo

@wayne-workman

I rebooted server just in case. Went home for the day. I will keep an eye on the nodes and see how they are in the morning. Also see how fast the cache grows. It has gone up to 305MB right now but the web UI is substantially snapier.

Wayne Workman

@jgallo If this solves the problem, we can setup a cron job that does this every hour.

JGallo

@wayne-workman

Came in and checked FOG Web UI and messages still persists. I checked server and cache is at 571MB. I will say this, it definitely fixed the web UI speeds by clearing the cache.

Wayne Workman

@jgallo Try clearing the cache the same way on one of the nodes, see if it fixes that node.

JGallo

@wayne-workman

I have done that to all the nodes I have the graph enabled to. Rebooted server and observing if it helps. How fast should the buff/cache be growing? I’m noticing on a few nodes that it spikes really fast and some other nodes it’s just normalized around 1GB even though they have 16GB. For example one of my nodes has 30GB of memory available and I cleared the cache. Within 1 minute it grew over 5GB and running another minute it has grown to 10GB.

Wayne Workman

@jgallo Having cache in the RAM isn’t a bad thing - Linux does a really amazing job at managing memory, it’s just something about fog that causes the cache to not be effective (what’s cached is not what’s actually used often). cache size grows as the Linux kernel does stuff. The less stuff it does, the slower the cache grows. Eventually though - and by design - the RAM gets filled with cache in order to make future requests faster. Again, in the case of fog, what’s cached is not what is getting used over and over.

JGallo

@wayne-workman

So far it has not prompted me with database connections. earlier I have cleared the cache on all nodes and rebooted. Seems like it’s working because so far it has not prompted anything with database connections errors and I have been sitting on the dashboard since.

Wayne Workman

@jgallo So, rebooting after you clear the cache nullifies the effects that clearing the cache does, because rebooting does the same thing. I’m trying to isolate if clearing the cache fixes the problems for you.

JGallo

@wayne-workman

ahh ok. I figured rebooting would reboot all services. Maybe the storage nodes needed a reboot this whole time LOL. There were a few that have been on non-stop for about 90 days. If the messages return, I will do a clear cache and not reboot.

JGallo

@wayne-workman

So after some time again message came back even after clearing cache and not rebooting. We had to image a computer and I noticed that when the computer tried to check in after the iPXE files were loading prior to imaging that the error for the computer checking in stated the same message as the storage nodes on the dashboard valid database connection could be made. I will image another computer and screenshot that error.

JGallo

@wayne-workman

Sorry about the late reply here. We had a rogue router on a campus which we were hunting for and eventually found. The error message about the database connection after host registration and prior to actually imaging disappeared. Going to restart again and continue to monitor issue.

FOG Web GUI speed and default storage activity

84

12.7k

17.6k

156.6k