FOG Server CPU usage 100%
-
lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 6 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 37 Model name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz Stepping: 1 CPU MHz: 2665.658 BogoMIPS: 5333.52 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-5
-
@fernando-gietz The fog server is a virtual machine right?? How many cores does your VM Host server have? (Your SOC team may have to answer that).
-
@george1421 And a follow up to that question, how many other VMs are running on this VM Host server?
-
@fernando-gietz start up a group task, then issue this command to MySQL:
SHOW FULL PROCESSLIST
Give us a copy/paste of a large chunk of the output - this will let us know what the problem is. -
@george1421 said in FOG Server CPU usage 100%:
@george1421 And a follow up to that question, how many other VMs are running on this VM Host server?
I don’t know the number, but a lot of. The FOG server is in the university VM infrastructure. We have all servers of the university are there
We have 6 core dedicated and 12 GB RAM dedicated.
@wayne-workman said in FOG Server CPU usage 100%:
@fernando-gietz start up a group task, then issue this command to MySQL:
SHOW FULL PROCESSLIST
Give us a copy/paste of a large chunk of the output - this will let us know what the problem is.show full processlist; +----------+------+-----------+------+---------+------+------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----------+------+-----------+------+---------+------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | 11592370 | root | localhost | fog | Sleep | 2 | | NULL | 0.000 | | 11602122 | root | localhost | fog | Sleep | 1802 | | NULL | 0.000 | | 11602143 | root | localhost | fog | Sleep | 1791 | | NULL | 0.000 | | 11608096 | root | localhost | fog | Sleep | 585 | | NULL | 0.000 | | 11611211 | root | localhost | fog | Query | 0 | NULL | show full processlist | 0.000 | | 11611721 | root | localhost | fog | Sleep | 8 | | NULL | 0.000 | | 11611731 | root | localhost | fog | Sleep | 7 | | NULL | 0.000 | | 11611744 | root | localhost | fog | Sleep | 3 | | NULL | 0.000 | | 11611766 | root | localhost | fog | Sleep | 1 | | NULL | 0.000 | | 11611772 | root | localhost | fog | Query | 0 | statistics | SELECT `msID` FROM `moduleStatusByHost` WHERE `moduleStatusByHost`.`msHostID`='608' AND `moduleStatusByHost`.`msModuleID` IN ('5745','5746','5747','5748','6945','47','48','59','55','3567','3568','49','1851','1852','1853','1854','1855','1856','66','4130','4131','4132','4133','4134','4135','1660','4136','4137','4138','4139','4140','4141','4142','4143','4144','4145','4146','4147','4148','4149','4150','3633','4151','4152','4153','4154','4155','4156','4157','4158','3569','3570','3571','3572','3573','3574','3575','3576','3577','3578','3579','3580','3581','3582','3583','3584','3585','3586','3587','3588','3589','3590','3591','3592','3593','3594','3634','3595','3596','3597','3598','1985','1986','1987','1988','1989','1990','1991','1992','1993','1994','1995','1996','1997','1998','1999','2000','2001','2002','2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018','2019','2020','2021','2022','2023','2024','2025','2026','2027','2028','2029','2030','2031','2032','2033','50','51','6946','6947','6948','6949','6950','6951','6952','6953','6954','6955','6956','6957','6958','6959','6960','6961','6962','6963','6964','6965','6966','6967','6968','6969','6970','6971','6972','6973','6974','6975','6976','6977','6978','6979','6980','6981','6982','6983','6984','6985','6986','6987','6988','6989','6990','6991','6992','6993','6994','6995','6996','6997','6998','52','1','3599','3600','3601','1144','3602','3603','3604','3605','3606','3607','3608','3609','3610','3611','3612','3613','3614','3615','4364','4911','4912','4913','4451','4452','4453','6505','6506','6507','6508','6509','6510','6511','6512','6513','412','4585','4586','4587','421','427','429','401','419','417','402','498','420','6514','6515','6516','4454','4455','4456','4457','4458','4459','4460','4461','4462','4463','4464','4465','4588','4589','4590','4591','4914','5239','5240','5241','5242','110','6784','6785','6786','6787','6788','6789','6790','6791','6792','6793','6794','4365','4366','4367','4368','4369','4370','4371','4372','422','403','418','416','404','405','406','407','408','409','410','414','415','413','4915','4416','4417','4418','4419','4420','4421','4422','4423','4424','4425','4426','4427','4428','4336','4337','4352','4353','4354','4355','4356','4357','4358','4359','4360','4361','4362','4363','5178','131','132','133','283','201','134','160','161','135','159','158','157','284','285','162','423','163','164','165','166','167','168','169','170','171','172','173','174','175','176','177','178','179','180','181','182','183','184','185','186','286','287','288','289','290','291','292','293','294','4250','296','297','298','299','300','444','443','440','438','441','434','433','437','442','439','436','435','431','455','202','203','204','205','206','207','208','209','210','497','7073','301','187','4373','4374','4375','4376','4377','4378','4379','4380','4338','4339','4340','4341','4342','4343','411','130','4441','4442','4443','4444','4445','4446','4447','4448','4449','4450','7013','53','54','56','57','58','4159','4160','4161','4162','4163','4164','1661','60','61','4916','4917','4918','4919','4920','4921','4922','4923','4924','4925','4926','4927','4928','4929','4930','4931','4932','4933','4934','4935','4936','4937','4938','4939','4940','4941','4942','4943','4944','4945','4946','4947','4948','4949','4950','4951','4952','4953','4954','4381','4382','4383','4384','4385','627','5067','5068','5069','5070','5071','5072','5073','5074','5075','5076','5077','5078','5079','5080','5081','5082','5083','5084','5085','5086','5087','5088','5089','5090','5091','5092','5093','5094','5095','5096','5097','5098','5099','5100','5101','7054','7055','7056','4728','4729','4730','4731','4732','4733','4734','473','469','474','468','470','471','472','467','465','495','494','493','4386','4387','4388','4389','1515','1516','1517','1518','1519','617','111','112','113','114','115','116','117','118','119','120','5179','5180','5181','5182','5183','5184','5185','5186','6858','6859','6860','6861','6862','6863','6864','6865','6866','6867','6868','6871','6872','6873','6874','6875','6876','6877','6878','6879','6880','6881','6882','6883','6884','6885','6886','6869','6887','6766','6767','6768','6769','6770','6771','6677','6678','6679','6680','6681','6682','5187','5188','5189','5190','5191','5192','5193','5194','5195','5196','5197','5198','74','73','72','69','71','70','263','260','211','212','262','261','198','259','197','200','258','4785','4786','213','214','215','216','217','218','219','220','221','222','223','224','225','226','227','228','229','230','231','232','500','1145','1857','1858','1881','1859','1882','2955','2956','2957','2958','2959','2960','2961','2962','2963','2964','2965','2966','2967','2968','2969','2970','2971','2972','2973','2974','2975','2976','2977','2978','2979','2980','2981','2982','2983','2984','2985','2986','2987','2988','2989','2990','2991','2034','2035','2036','2037','2167','2038','62','1883','1860','1907','2168','4306','4307','4308','4309','4310', [************************************************************************************]'3224','3225','3226','3227','3228','3387','3388','3389','3390','3107','3108','3109','3165','3166','3167','3168','3169','3170','3171','3172','3173','3174','3175','3176','3177','3178','3179','3180','3181','3182','3183','3184','3185','3186','3187','3188','3290') ORDER BY `moduleStatusByHost`.`msID` ASC | 0.000 | | 11611774 | root | localhost | fog | Query | 0 | init | SELECT `hostMAC`.* FROM `hostMAC` WHERE `hmMAC`='ec:b1:d7:48:49:8b' | 0.000 | +----------+------+-----------+------+---------+------+------------+---------------------------------------------------- 11 rows in set (0.00 sec)```
-
I shorten the mysql response is very large
I only try to show he membership of the group, a group with 2 computers.
Maybe the problem is that the browser try to paint the list of all computers that are not in the group, then he needs to list 7000 computers … a very large list. -
If I try to send a group task to this group with 2 computers, the browser goes slowly and the mysql query:
MariaDB [fog]> show full processlist; +----------+------+-----------+------+---------+------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +----------+------+-----------+------+---------+------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | 11592370 | root | localhost | fog | Sleep | 1 | | NULL | 0.000 | | 11602122 | root | localhost | fog | Sleep | 2803 | | NULL | 0.000 | | 11611211 | root | localhost | fog | Query | 0 | NULL | show full processlist | 0.000 | | 11611851 | root | localhost | fog | Sleep | 992 | | NULL | 0.000 | | 11615468 | root | localhost | fog | Sleep | 408 | | NULL | 0.000 | | 11615586 | root | localhost | fog | Sleep | 387 | | NULL | 0.000 | | 11617654 | root | localhost | fog | Sleep | 49 | | NULL | 0.000 | | 11617704 | root | localhost | fog | Sleep | 0 | | NULL | 0.000 | | 11617918 | root | localhost | fog | Sleep | 2 | | NULL | 0.000 | | 11617931 | root | localhost | fog | Query | 0 | Copying to tmp table | SELECT `hostID` FROM `hosts` LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID` LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage` LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID` LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID` LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID` LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID` LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID` LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID` WHERE `hostMAC`.`hmPrimary` = '1' ORDER BY LOWER(`hosts`.`hostName`) ASC | 0.000 | +----------+------+-----------+------+---------+------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ 10 rows in set (0.00 sec)
But something goes bad because the query is not immediate. The query takes 15 seconds to appear.
-
@fernando-gietz OK I tell you why I asked. I’m looking at this as a VM issue not fog (yet).
6 vCPU is a HUGE number in regards to VMs. I checked the processor from your lscpu post and that processor has 6 cores (12 hyperthreads). The issue with going too large with vCPUs is that it actually slows down your server because your hypervisor has to find a time slot where it has 6 processors available to run. You will see this in your wait times in your vm stats be higher than your other VM. Since you probably have a dual CPU (if you have a quad processor VM host wonderful) VM host server, and if you have a busy VM host server with a lot of virtual machines your FOG VM will suffer because the time where it has 6 vCPUs available will be quite small. So what should you do? While this will sound the opposite, drop your vCPU count to 4 (I would say 2 vCPUs, but start with 4) and see if you have better performance. Watch your top command. Right now your load average is pretty good, with 4 CPUs keep the numbers below 2,00 for the load average.
Aside from your potential vCPU issue since you are using the database when you see the slowness, we should probably run a test and see how fast your disk subsystem is on your virtual server. I was working on a post about making FOG go faster here: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast There are some statistics in this post https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/6 that you can compare your system against. Just use the dd command to create a 1GB test file and check your disk speeds.
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
While you can’t really do anything about the results, you will know how your VM Host server compares to my benchmark testing.Finally I have been working on another thread about moving the php processing from apache to php-fpm to get a better web gui performance increase because we off-load php processing from apache to a dedicated php engine. That post is here: https://forums.fogproject.org/topic/10717/can-php-fpm-make-fog-web-gui-fast When you get the rest of your issues fixed it would be interesting to see (because you have a large campus) if this solution helps with web gui performance.
And finally, you may have reached a limit where you might consider spinning up a new virtual machine dedicated to mysql (2 vCPU) that will be optimized for only mysql activities.
-
@george1421 said in FOG Server CPU usage 100%:
Aside from your potential vCPU issue since you are using the database when you see the slowness, we should probably run a test and see how fast your disk subsystem is on your virtual server. I was working on a post about making FOG go faster here: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast There are some statistics in this post https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/6 that you can compare your system against. Just use the dd command to create a 1GB test file and check your disk speeds.
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
While you can’t really do anything about the results, you will know how your VM Host server compares to my benchmark testing.I have done the tests (I can only test the NFS)
[root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,13593 s, 342 MB/s [root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,01033 s, 357 MB/s [root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,01894 s, 356 MB/s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,61914 s, 410 MB/s real 0m2.626s user 0m0.021s sys 0m0.741s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,63524 s, 407 MB/s real 0m2.642s user 0m0.020s sys 0m0.676s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,61491 s, 411 MB/s real 0m2.617s user 0m0.019s sys 0m0.685s
-
@fernando-gietz In this case I was only interested in the local file creation speed not NFS since that brings in more things (network, nfs subsystem, etc.) that is not part of the issue here.
Based on your numbers, the disk subsystem shouldn’t be your issue here. Your disks should be more than fast enough for them to keep up with mysql. I think I would start with dropping the number of vCPUs on this VM and see how well your performance is then.
-
@george1421 said in FOG Server CPU usage 100%:
@fernando-gietz In this case I was only interested in the local file creation speed not NFS since that brings in more things (network, nfs subsystem, etc.) that is not part of the issue here.
Based on your numbers, the disk subsystem shouldn’t be your issue here. Your disks should be more than fast enough for them to keep up with mysql. I think I would start with dropping the number of vCPUs on this VM and see how well your performance is then.
I tried to see the membership in our test environment
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 37 Model name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz Stepping: 1 CPU MHz: 2665.526 BogoMIPS: 5333.52 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-3
And the behaviour is the same: circa 2 minutes to see the two computers of the group.
-
@fernando-gietz And what is top saying? Header as well as what is the load like on mysql? I want to ensure we have not made things worse.
The default check in time for the fog clients connected to this fog server is 5 minutes. So I might understand if you have 400 clients that you may be over loading mysql or apache.
My initial reaction is to have you try the php-fpm route (because I had a good improvement in performance on my system). But I want to make sure that there isn’t something else going on first.
How long as this virtual machine been in operation (thinking if its been used for a long time we might want to look into some mysql maintenance).
-
I have captured the browser behaviour with firefox developer tools. Maybe the developer team can see anything.
You can download the JSON file from:
-
I have made a new capture of the performance of the browser with the firefox developer tools and I can see the next:
The most of the time the browser calls to setTimeout function
-
Probably it’s best to wait for @Tom-Elliott to look into this. He knows the web UI better than anyone else!
-
Hello!!
Any news about it? -
Mind running the clear memory cache on the server? It would seem, to me, that the setTimeout is being blocked somewhere, as it shouldn’t be having any issues run as it only calls it when the timeout is met.
-
@tom-elliott said in FOG Server CPU usage 100%:
Mind running the clear memory cache on the server?
I try to clear the memory cache on the server with the next command:
sync; echo 3 > /proc/sys/vm/drop_caches
But the problem persists!!
Any idea more?
-
One more thing.
My hosts table has 7000 entries, I think that the problem is the code Maybe is not optimized.
I drop from the table 6000 entries and now the response time is more normal … 10 -15 seconds.@Tom-Elliott how does FOG the membership?
-
@fernando-gietz I was thinking about this - last time someone had this issue, it was the mysql max connections being hit.
Maybe one of the FOG updates or an OS update (especially likely with Ubuntu systems) wiped out our max connection setting.
Anyways, heres what might fix it:
https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_MySQL#Increase_maximum_simultaneous_MySQL_connectionsMake sure to do the query in mysql AND to change the conf setting too so the change persists reboots!