FOG Web GUI speed and default storage activity
-
@wayne-workman
I have rebooted due to upgrade to working branch. FYI.total used free shared buff/cache available Mem: 15G 510M 181M 47M 14G 14G Swap: 4.0G 268K 4.0G 15:59:54 up 1:56, 2 users, load average: 0.24, 0.19, 0.21```
-
@jgallo You have 14GB cached in RAM, which I’ll say is substantial. You can clear that with the below command:
sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
Does the issue resolve after this? Also, checkfree -h
afterwards. -
went down to 139MB. LOL dam that was a lot of cache. Wish I had it in my pocket.
-
This was just a minute of running free -h
total used free shared buff/cache available Mem: 15G 427M 14G 47M 289M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 434M 14G 47M 289M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 484M 14G 47M 292M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 485M 14G 47M 292M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 486M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 486M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 487M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 487M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 488M 14G 47M 301M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 488M 14G 47M 302M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 490M 14G 47M 302M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 489M 14G 47M 302M 14G Swap: 4.0G 0B 4.0G administrator@VUSD-FOG:~$ free -h total used free shared buff/cache available Mem: 15G 491M 14G 47M 302M 14G```
-
@jgallo Most importantly, are the nodes reporting properly?
-
I rebooted server just in case. Went home for the day. I will keep an eye on the nodes and see how they are in the morning. Also see how fast the cache grows. It has gone up to 305MB right now but the web UI is substantially snapier.
-
@jgallo If this solves the problem, we can setup a cron job that does this every hour.
-
Came in and checked FOG Web UI and messages still persists. I checked server and cache is at 571MB. I will say this, it definitely fixed the web UI speeds by clearing the cache.
-
@jgallo Try clearing the cache the same way on one of the nodes, see if it fixes that node.
-
I have done that to all the nodes I have the graph enabled to. Rebooted server and observing if it helps. How fast should the buff/cache be growing? I’m noticing on a few nodes that it spikes really fast and some other nodes it’s just normalized around 1GB even though they have 16GB. For example one of my nodes has 30GB of memory available and I cleared the cache. Within 1 minute it grew over 5GB and running another minute it has grown to 10GB.
-
@jgallo Having cache in the RAM isn’t a bad thing - Linux does a really amazing job at managing memory, it’s just something about fog that causes the cache to not be effective (what’s cached is not what’s actually used often). cache size grows as the Linux kernel does stuff. The less stuff it does, the slower the cache grows. Eventually though - and by design - the RAM gets filled with cache in order to make future requests faster. Again, in the case of fog, what’s cached is not what is getting used over and over.
-
So far it has not prompted me with database connections. earlier I have cleared the cache on all nodes and rebooted. Seems like it’s working because so far it has not prompted anything with database connections errors and I have been sitting on the dashboard since.
-
@jgallo So, rebooting after you clear the cache nullifies the effects that clearing the cache does, because rebooting does the same thing. I’m trying to isolate if clearing the cache fixes the problems for you.
-
ahh ok. I figured rebooting would reboot all services. Maybe the storage nodes needed a reboot this whole time LOL. There were a few that have been on non-stop for about 90 days. If the messages return, I will do a clear cache and not reboot.
-
So after some time again message came back even after clearing cache and not rebooting. We had to image a computer and I noticed that when the computer tried to check in after the iPXE files were loading prior to imaging that the error for the computer checking in stated the same message as the storage nodes on the dashboard valid database connection could be made. I will image another computer and screenshot that error.
-
Sorry about the late reply here. We had a rogue router on a campus which we were hunting for and eventually found. The error message about the database connection after host registration and prior to actually imaging disappeared. Going to restart again and continue to monitor issue.
-
@jgallo That makes a lot of sense now actually - because nothing about this problem was making sense before. I guess there were IP conflicts on your network causing all the issues - or possibly a routing loop.
Glad it seems to be fixed - let us know if it’s not.
-
Yeh these high schoolers get curious and it wouldn’t be the first time we have dealt with rogue routers.
-
So after several different attempts to see this issue be resolved, the issue continues to persist. The patterns I have observed is that it now takes about an hour for the issue to begin. I have gone into the
/etc/mysql/mysql.conf.d/mysqld.cnf
file and edited the bind address to each node because on a seperate post it was recommended to use the IP of the node instead of the loopback. Don’t think this worked because I restarted server and node after these changes and problems persist. I also went into the/var/www/fog/lib/fog/fogurlrequests.class.php
and edited theaconntimeout = 2000
to 10000 as well as thecounttimeout = 15;
to 30. Would it be possible @Tom-Elliott or someone to remote in to check this out remotely? I know all the passwords for fogstorage accounts are accurate because imaging works fine. Registering computers at sites works fine as well as image replication to storage groups work fine. I could start from scratch and add one node at a time but I could not be able to place a test fog server in place due to this being a production FOG system in place. I have back ups of the images so starting from scratch is not a problem. I have a strange feeling this could be related to Ubuntu and updates with PHP7. Thank you -
Another observation is that upon the
sudo service mysql restart
on the fog server, the error goes away for DefaultMember in the dashboard. I tried the same thing on other storage nodes but the message is still in the dashboard. Maybe this could be something with remote connections with the mysql database.