FOG Server CPU usage 100%
-
If I try to send a group task to this group with 2 computers, the browser goes slowly and the mysql query:
MariaDB [fog]> show full processlist; +----------+------+-----------+------+---------+------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----------+------+-----------+------+---------+------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | 11592370 | root | localhost | fog | Sleep | 1 | | NULL | 0.000 | | 11602122 | root | localhost | fog | Sleep | 2803 | | NULL | 0.000 | | 11611211 | root | localhost | fog | Query | 0 | NULL | show full processlist | 0.000 | | 11611851 | root | localhost | fog | Sleep | 992 | | NULL | 0.000 | | 11615468 | root | localhost | fog | Sleep | 408 | | NULL | 0.000 | | 11615586 | root | localhost | fog | Sleep | 387 | | NULL | 0.000 | | 11617654 | root | localhost | fog | Sleep | 49 | | NULL | 0.000 | | 11617704 | root | localhost | fog | Sleep | 0 | | NULL | 0.000 | | 11617918 | root | localhost | fog | Sleep | 2 | | NULL | 0.000 | | 11617931 | root | localhost | fog | Query | 0 | Copying to tmp table | SELECT `hostID` FROM `hosts` LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID` LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage` LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID` LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID` LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID` LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID` LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID` LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID` WHERE `hostMAC`.`hmPrimary` = '1' ORDER BY LOWER(`hosts`.`hostName`) ASC | 0.000 | +----------+------+-----------+------+---------+------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ 10 rows in set (0.00 sec)
But something goes bad because the query is not immediate. The query takes 15 seconds to appear.
-
@fernando-gietz OK I tell you why I asked. I’m looking at this as a VM issue not fog (yet).
6 vCPU is a HUGE number in regards to VMs. I checked the processor from your lscpu post and that processor has 6 cores (12 hyperthreads). The issue with going too large with vCPUs is that it actually slows down your server because your hypervisor has to find a time slot where it has 6 processors available to run. You will see this in your wait times in your vm stats be higher than your other VM. Since you probably have a dual CPU (if you have a quad processor VM host wonderful) VM host server, and if you have a busy VM host server with a lot of virtual machines your FOG VM will suffer because the time where it has 6 vCPUs available will be quite small. So what should you do? While this will sound the opposite, drop your vCPU count to 4 (I would say 2 vCPUs, but start with 4) and see if you have better performance. Watch your top command. Right now your load average is pretty good, with 4 CPUs keep the numbers below 2,00 for the load average.
Aside from your potential vCPU issue since you are using the database when you see the slowness, we should probably run a test and see how fast your disk subsystem is on your virtual server. I was working on a post about making FOG go faster here: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast There are some statistics in this post https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/6 that you can compare your system against. Just use the dd command to create a 1GB test file and check your disk speeds.
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
While you can’t really do anything about the results, you will know how your VM Host server compares to my benchmark testing.Finally I have been working on another thread about moving the php processing from apache to php-fpm to get a better web gui performance increase because we off-load php processing from apache to a dedicated php engine. That post is here: https://forums.fogproject.org/topic/10717/can-php-fpm-make-fog-web-gui-fast When you get the rest of your issues fixed it would be interesting to see (because you have a large campus) if this solution helps with web gui performance.
And finally, you may have reached a limit where you might consider spinning up a new virtual machine dedicated to mysql (2 vCPU) that will be optimized for only mysql activities.
-
@george1421 said in FOG Server CPU usage 100%:
Aside from your potential vCPU issue since you are using the database when you see the slowness, we should probably run a test and see how fast your disk subsystem is on your virtual server. I was working on a post about making FOG go faster here: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast There are some statistics in this post https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/6 that you can compare your system against. Just use the dd command to create a 1GB test file and check your disk speeds.
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
While you can’t really do anything about the results, you will know how your VM Host server compares to my benchmark testing.I have done the tests (I can only test the NFS)
[root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,13593 s, 342 MB/s [root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,01033 s, 357 MB/s [root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,01894 s, 356 MB/s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,61914 s, 410 MB/s real 0m2.626s user 0m0.021s sys 0m0.741s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,63524 s, 407 MB/s real 0m2.642s user 0m0.020s sys 0m0.676s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,61491 s, 411 MB/s real 0m2.617s user 0m0.019s sys 0m0.685s
-
@fernando-gietz In this case I was only interested in the local file creation speed not NFS since that brings in more things (network, nfs subsystem, etc.) that is not part of the issue here.
Based on your numbers, the disk subsystem shouldn’t be your issue here. Your disks should be more than fast enough for them to keep up with mysql. I think I would start with dropping the number of vCPUs on this VM and see how well your performance is then.
-
@george1421 said in FOG Server CPU usage 100%:
@fernando-gietz In this case I was only interested in the local file creation speed not NFS since that brings in more things (network, nfs subsystem, etc.) that is not part of the issue here.
Based on your numbers, the disk subsystem shouldn’t be your issue here. Your disks should be more than fast enough for them to keep up with mysql. I think I would start with dropping the number of vCPUs on this VM and see how well your performance is then.
I tried to see the membership in our test environment
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 37 Model name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz Stepping: 1 CPU MHz: 2665.526 BogoMIPS: 5333.52 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-3
And the behaviour is the same: circa 2 minutes to see the two computers of the group.
-
@fernando-gietz And what is top saying? Header as well as what is the load like on mysql? I want to ensure we have not made things worse.
The default check in time for the fog clients connected to this fog server is 5 minutes. So I might understand if you have 400 clients that you may be over loading mysql or apache.
My initial reaction is to have you try the php-fpm route (because I had a good improvement in performance on my system). But I want to make sure that there isn’t something else going on first.
How long as this virtual machine been in operation (thinking if its been used for a long time we might want to look into some mysql maintenance).
-
I have captured the browser behaviour with firefox developer tools. Maybe the developer team can see anything.
You can download the JSON file from:
-
I have made a new capture of the performance of the browser with the firefox developer tools and I can see the next:
The most of the time the browser calls to setTimeout function
-
Probably it’s best to wait for @Tom-Elliott to look into this. He knows the web UI better than anyone else!
-
Hello!!
Any news about it? -
Mind running the clear memory cache on the server? It would seem, to me, that the setTimeout is being blocked somewhere, as it shouldn’t be having any issues run as it only calls it when the timeout is met.
-
@tom-elliott said in FOG Server CPU usage 100%:
Mind running the clear memory cache on the server?
I try to clear the memory cache on the server with the next command:
sync; echo 3 > /proc/sys/vm/drop_caches
But the problem persists!!
Any idea more?
-
One more thing.
My hosts table has 7000 entries, I think that the problem is the code Maybe is not optimized.
I drop from the table 6000 entries and now the response time is more normal … 10 -15 seconds.@Tom-Elliott how does FOG the membership?
-
@fernando-gietz I was thinking about this - last time someone had this issue, it was the mysql max connections being hit.
Maybe one of the FOG updates or an OS update (especially likely with Ubuntu systems) wiped out our max connection setting.
Anyways, heres what might fix it:
https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_MySQL#Increase_maximum_simultaneous_MySQL_connectionsMake sure to do the query in mysql AND to change the conf setting too so the change persists reboots!
-
@fernando-gietz said in FOG Server CPU usage 100%:
My hosts table has 7000 entries
Admittedly, this is a whole whole lot of hosts reporting to one server.
At a past job, we had in one setup maybe 2k hosts going to one fog server, that FOG Server has 8 cores, 8GB of RAM. It’s a pretty beefy server but it did a ton of work, and managed a ton of Technicians and a ton of images, and a ton of imaging and snapins and storage nodes. That box was Really busy, and that’s with 2K hosts… You say you have 7,000 hosts… I’d say you need a 16 core server, and configure MySQL to use more than one thread, use solid state disks to host the database, and assign 16GB of RAM - and never image from that server, make storage nodes handle the imaging loads - and keep those loads off of your main server. If you wanted to go even further, you could host the FOG Database on it’s own (very fast) server, and the rest of the main fog server on another box.
-
After UEFI issue I have time to this XD
@wayne-workman said in FOG Server CPU usage 100%:
@fernando-gietz I was thinking about this - last time someone had this issue, it was the mysql max connections being hit.
Maybe one of the FOG updates or an OS update (especially likely with Ubuntu systems) wiped out our max connection setting.
Anyways, heres what might fix it:
https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_MySQL#Increase_maximum_simultaneous_MySQL_connectionsMake sure to do the query in mysql AND to change the conf setting too so the change persists reboots!
I have increased the connections and the trouble follows
I have setup the mysql to log the slow queries and not appears nothing.The problem is the apache and his activity, it reaches to 100% of CPU. If you see the capture of the browser activity, the problem begins when the browser tries to “Dibujar” (In english, Draw). I think that the browser must to draw all elements, visible and hidden ones.
If you see the membership page, you can see the list of the members, but the browser doesn’t show the members that are not in the group, to see them you need check the checkbox; then, appears all of them. This list is not create on-the-fly, the browser “drew” it before. In my case, the browser need to draw 7000 elements -
@fernando-gietz The problem, as I’m understanding it, however, is on the Server, not the client machine. The “Drawing” should only be happening on the Browser, so the setTimeout is doing something a little funny, though I’m not 100% sure what.
-
Mind checking if the Location Plugin has a bunch of erroneous entries in it? I’ve seen/heard of people having a real hard time with CPU because the mysql database is just crammed full of data. The last time I saw it, there were a bunch (10’s of thousands) of entries in one of the tables. @Wayne-Workman May remember better than me. He sort of replicated this problem by accidentally uploading a binary data file (like teamviewer installer) into the CSV importer. This spammed the Database with TONS of data, and each request to the database would require a rescan on the HDD causing the CPU load being extremely high.
-
Possibly it’s just that apache hits a wall with too many client requests and too little worker threads?
-
@sebastian-roth It’s possible, but less likely. While a ton of machines hitting the system at the same time can cause issues, I think the communication is so minuscule that I’m going to chalk this up to constant reloading of the Database, which is likely in a HUGE state.
@Fernando-Gietz You could try testing this theory by getting a backup of the current database, then clearing out the larger table sets, taskLog, historyLog, snapinJobs, imagingLog, etc…
Things that are “logging” or have no real value in being extremely large.