FOG Web GUI speed and default storage activity
-
I know that I went from 1.4.4 to 1.5.0 RC1 if I recalled. I know when I upgraded I made a huge leap. New interface and all. During that time there were replication issues and eventually updated to 1.5.0 RC7 which still had replication issues. I then upgraded to 1.5.0 RC9 which replication had major issue that was resolved in a working branch. So I have been on working branch 57 until about an hour ago I went to 64 with all nodes and fog server. I have been observing this issue since I have been on 1.5.0 from the new interface. I never had this issue in 1.4.4
-
@jgallo said in FOG Web GUI speed and default storage activity:
The only pattern I have observed is that upon rebooting the fog sever, the valid connection messages do not appear for about an hour or so.
I’m thinking about this - I’d like you to try to restart only Apache and see if it has the same effect or not. On CentOS/Fedora/RHEL it’s
systemctl restart httpd
and on Ubuntu 16/Debian8/Debian9 it’ssystemctl restart apache2
and on Ubuntu 14-,debian7- it’sservice apache2 restart
-
Tried that and still get the database connection message.
-
@jgallo During when the problem is happening, what does this command return?
free -h;uptime
-
@wayne-workman
I have rebooted due to upgrade to working branch. FYI.total used free shared buff/cache available Mem: 15G 510M 181M 47M 14G 14G Swap: 4.0G 268K 4.0G 15:59:54 up 1:56, 2 users, load average: 0.24, 0.19, 0.21```
-
@jgallo You have 14GB cached in RAM, which I’ll say is substantial. You can clear that with the below command:
sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
Does the issue resolve after this? Also, checkfree -h
afterwards. -
went down to 139MB. LOL dam that was a lot of cache. Wish I had it in my pocket.
-
This was just a minute of running free -h
total used free shared buff/cache available Mem: 15G 427M 14G 47M 289M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 434M 14G 47M 289M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 484M 14G 47M 292M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 485M 14G 47M 292M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 486M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 486M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 487M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 487M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 488M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 488M 14G 47M 302M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 490M 14G 47M 302M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 489M 14G 47M 302M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 491M 14G 47M 302M 14G```
-
@jgallo Most importantly, are the nodes reporting properly?
-
I rebooted server just in case. Went home for the day. I will keep an eye on the nodes and see how they are in the morning. Also see how fast the cache grows. It has gone up to 305MB right now but the web UI is substantially snapier.
-
@jgallo If this solves the problem, we can setup a cron job that does this every hour.
-
Came in and checked FOG Web UI and messages still persists. I checked server and cache is at 571MB. I will say this, it definitely fixed the web UI speeds by clearing the cache.
-
@jgallo Try clearing the cache the same way on one of the nodes, see if it fixes that node.
-
I have done that to all the nodes I have the graph enabled to. Rebooted server and observing if it helps. How fast should the buff/cache be growing? I’m noticing on a few nodes that it spikes really fast and some other nodes it’s just normalized around 1GB even though they have 16GB. For example one of my nodes has 30GB of memory available and I cleared the cache. Within 1 minute it grew over 5GB and running another minute it has grown to 10GB.
-
@jgallo Having cache in the RAM isn’t a bad thing - Linux does a really amazing job at managing memory, it’s just something about fog that causes the cache to not be effective (what’s cached is not what’s actually used often). cache size grows as the Linux kernel does stuff. The less stuff it does, the slower the cache grows. Eventually though - and by design - the RAM gets filled with cache in order to make future requests faster. Again, in the case of fog, what’s cached is not what is getting used over and over.
-
So far it has not prompted me with database connections. earlier I have cleared the cache on all nodes and rebooted. Seems like it’s working because so far it has not prompted anything with database connections errors and I have been sitting on the dashboard since.
-
@jgallo So, rebooting after you clear the cache nullifies the effects that clearing the cache does, because rebooting does the same thing. I’m trying to isolate if clearing the cache fixes the problems for you.
-
ahh ok. I figured rebooting would reboot all services. Maybe the storage nodes needed a reboot this whole time LOL. There were a few that have been on non-stop for about 90 days. If the messages return, I will do a clear cache and not reboot.
-
So after some time again message came back even after clearing cache and not rebooting. We had to image a computer and I noticed that when the computer tried to check in after the iPXE files were loading prior to imaging that the error for the computer checking in stated the same message as the storage nodes on the dashboard valid database connection could be made. I will image another computer and screenshot that error.
-
Sorry about the late reply here. We had a rogue router on a campus which we were hunting for and eventually found. The error message about the database connection after host registration and prior to actually imaging disappeared. Going to restart again and continue to monitor issue.