FOG Server CPU usage 100%
-
I have problems with the perfomance of the server (I don’t know if is better open a new thread about this).
When I try to do a group task, the mysql proccess goes to 100% of the CPU and the browser takes a lot of time to refresh the screen. To go from the advanced tasks view to the launch view takes 1 minute or more and If I try to see the membership of one group the same problem.
-
@fernando-gietz What are the specs of the FOG Server? How big is the group? How much free RAM on the machine when this happens?
It’s possible the MySQL config needs tweaking as scaling increases, the defaults assume a pretty lightweight scenario afaik.
-
Server version RHEL 7.3
Virtual Machine with 6 cores and 12 GB RAM
The group is very small 3 computerNow I am seeing that the MySQL process is “normal” but the httpd is at 100%
The browser takes 2 minutes and 34 seconds to show the membership of the group. Maybe the code see or calculate over all hosts in the DB, I have 7007 computers.
-
One thing more:
Php.ini memory limit = 3000M
-
What version of fog?
-
@fernando-gietz You have told us about your physical host, but on the virtual machine itself how many vCPUs are dedicated to it as well as how much ram?
How many clients are hitting this FOG server too? What is their update frequency?
-
@george1421 said in FOG Server CPU usage 100%:
@fernando-gietz You have told us about your physical host, but on the virtual machine itself how many vCPUs are dedicated to it as well as how much ram?
How many clients are hitting this FOG server too? What is their update frequency?
I don’t know how many vcore are dedicated, I will ask about them to SOC team.
This environment is pre-production, I think that, more or less, 400 computers are asking to the server. The clients have the default settings (I don’t know how can you setup the frequency)@tom-elliott said in FOG Server CPU usage 100%:
What version of fog?
FOG version: 1.5.0 - RC9 (SVN Version: 6079 Running version: 24)
-
@fernando-gietz How about listing the output of lscpu (from the linux console of the fog server) also grab the memory information from the top section of
top
-
lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 6 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 37 Model name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz Stepping: 1 CPU MHz: 2665.658 BogoMIPS: 5333.52 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-5
-
@fernando-gietz The fog server is a virtual machine right?? How many cores does your VM Host server have? (Your SOC team may have to answer that).
-
@george1421 And a follow up to that question, how many other VMs are running on this VM Host server?
-
@fernando-gietz start up a group task, then issue this command to MySQL:
SHOW FULL PROCESSLIST
Give us a copy/paste of a large chunk of the output - this will let us know what the problem is. -
@george1421 said in FOG Server CPU usage 100%:
@george1421 And a follow up to that question, how many other VMs are running on this VM Host server?
I don’t know the number, but a lot of. The FOG server is in the university VM infrastructure. We have all servers of the university are there
We have 6 core dedicated and 12 GB RAM dedicated.
@wayne-workman said in FOG Server CPU usage 100%:
@fernando-gietz start up a group task, then issue this command to MySQL:
SHOW FULL PROCESSLIST
Give us a copy/paste of a large chunk of the output - this will let us know what the problem is.show full processlist| Id | User | Host | db | Command | Time | State | Info | Progress | +----------+------+-----------+------+---------+------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+ | 11592370 | root | localhost | fog | Sleep | 2 | | NULL | 0.000 | | 11602122 | root | localhost | fog | Sleep | 1802 | | NULL | 0.000 | | 11602143 | root | localhost | fog | Sleep | 1791 | | NULL | 0.000 | | 11608096 | root | localhost | fog | Sleep | 585 | | NULL | 0.000 | | 11611211 | root | localhost | fog | Query | 0 | NULL | show full processlist | 0.000 | | 11611721 | root | localhost | fog | Sleep | 8 | | NULL | 0.000 | | 11611731 | root | localhost | fog | Sleep | 7 | | NULL | 0.000 | | 11611744 | root | localhost | fog | Sleep | 3 | | NULL | 0.000 | | 11611766 | root | localhost | fog | Sleep | 1 | | NULL | 0.000 | | 11611772 | root | localhost | fog | Query | 0 | statistics | SELECT `msID` FROM `moduleStatusByHost` WHERE `moduleStatusByHost`.`msHostID`='608' AND `moduleStatusByHost`.`msModuleID` IN ('5745','5746','5747','5748','6945','47','48','59','55','3567','3568','49','1851','1852','1853','1854','1855','1856','66','4130','4131','4132','4133','4134','4135','1660','4136','4137','4138','4139','4140','4141','4142','4143','4144','4145','4146','4147','4148','4149','4150','3633','4151','4152','4153','4154','4155','4156','4157','4158','3569','3570','3571','3572','3573','3574','3575','3576','3577','3578','3579','3580','3581','3582','3583','3584','3585','3586','3587','3588','3589','3590','3591','3592','3593','3594','3634','3595','3596','3597','3598','1985','1986','1987','1988','1989','1990','1991','1992','1993','1994','1995','1996','1997','1998','1999','2000','2001','2002','2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018','2019','2020','2021','2022','2023','2024','2025','2026','2027','2028','2029','2030','2031','2032','2033','50','51','6946','6947','6948','6949','6950','6951','6952','6953','6954','6955','6956','6957','6958','6959','6960','6961','6962','6963','6964','6965','6966','6967','6968','6969','6970','6971','6972','6973','6974','6975','6976','6977','6978','6979','6980','6981','6982','6983','6984','6985','6986','6987','6988','6989','6990','6991','6992','6993','6994','6995','6996','6997','6998','52','1','3599','3600','3601','1144','3602','3603','3604','3605','3606','3607','3608','3609','3610','3611','3612','3613','3614','3615','4364','4911','4912','4913','4451','4452','4453','6505','6506','6507','6508','6509','6510','6511','6512','6513','412','4585','4586','4587','421','427','429','401','419','417','402','498','420','6514','6515','6516','4454','4455','4456','4457','4458','4459','4460','4461','4462','4463','4464','4465','4588','4589','4590','4591','4914','5239','5240','5241','5242','110','6784','6785','6786','6787','6788','6789','6790','6791','6792','6793','6794','4365','4366','4367','4368','4369','4370','4371','4372','422','403','418','416','404','405','406','407','408','409','410','414','415','413','4915','4416','4417','4418','4419','4420','4421','4422','4423','4424','4425','4426','4427','4428','4336','4337','4352','4353','4354','4355','4356','4357','4358','4359','4360','4361','4362','4363','5178','131','132','133','283','201','134','160','161','135','159','158','157','284','285','162','423','163','164','165','166','167','168','169','170','171','172','173','174','175','176','177','178','179','180','181','182','183','184','185','186','286','287','288','289','290','291','292','293','294','4250','296','297','298','299','300','444','443','440','438','441','434','433','437','442','439','436','435','431','455','202','203','204','205','206','207','208','209','210','497','7073','301','187','4373','4374','4375','4376','4377','4378','4379','4380','4338','4339','4340','4341','4342','4343','411','130','4441','4442','4443','4444','4445','4446','4447','4448','4449','4450','7013','53','54','56','57','58','4159','4160','4161','4162','4163','4164','1661','60','61','4916','4917','4918','4919','4920','4921','4922','4923','4924','4925','4926','4927','4928','4929','4930','4931','4932','4933','4934','4935','4936','4937','4938','4939','4940','4941','4942','4943','4944','4945','4946','4947','4948','4949','4950','4951','4952','4953','4954','4381','4382','4383','4384','4385','627','5067','5068','5069','5070','5071','5072','5073','5074','5075','5076','5077','5078','5079','5080','5081','5082','5083','5084','5085','5086','5087','5088','5089','5090','5091','5092','5093','5094','5095','5096','5097','5098','5099','5100','5101','7054','7055','7056','4728','4729','4730','4731','4732','4733','4734','473','469','474','468','470','471','472','467','465','495','494','493','4386','4387','4388','4389','1515','1516','1517','1518','1519','617','111','112','113','114','115','116','117','118','119','120','5179','5180','5181','5182','5183','5184','5185','5186','6858','6859','6860','6861','6862','6863','6864','6865','6866','6867','6868','6871','6872','6873','6874','6875','6876','6877','6878','6879','6880','6881','6882','6883','6884','6885','6886','6869','6887','6766','6767','6768','6769','6770','6771','6677','6678','6679','6680','6681','6682','5187','5188','5189','5190','5191','5192','5193','5194','5195','5196','5197','5198','74','73','72','69','71','70','263','260','211','212','262','261','198','259','197','200','258','4785','4786','213','214','215','216','217','218','219','220','221','222','223','224','225','226','227','228','229','230','231','232','500','1145','1857','1858','1881','1859','1882','2955','2956','2957','2958','2959','2960','2961','2962','2963','2964','2965','2966','2967','2968','2969','2970','2971','2972','2973','2974','2975','2976','2977','2978','2979','2980','2981','2982','2983','2984','2985','2986','2987','2988','2989','2990','2991','2034','2035','2036','2037','2167','2038','62','1883','1860','1907','2168','4306','4307','4308','4309','4310', [************************************************************************************]'3224','3225','3226','3227','3228','3387','3388','3389','3390','3107','3108','3109','3165','3166','3167','3168','3169','3170','3171','3172','3173','3174','3175','3176','3177','3178','3179','3180','3181','3182','3183','3184','3185','3186','3187','3188','3290') ORDER BY `moduleStatusByHost`.`msID` ASC | 0.000 | | 11611774 | root | localhost | fog | Query | 0 | init | SELECT `hostMAC`.* FROM `hostMAC` WHERE `hmMAC`='ec:b1:d7:48:49:8b' | 0.000 | +----------+------+-----------+------+---------+------+------------+---------------------------------------------------- 11 rows in set (0.00 sec)```
-
I shorten the mysql response is very large
I only try to show he membership of the group, a group with 2 computers.
Maybe the problem is that the browser try to paint the list of all computers that are not in the group, then he needs to list 7000 computers … a very large list. -
If I try to send a group task to this group with 2 computers, the browser goes slowly and the mysql query:
MariaDB [fog]> show full processlist| Id | User | Host | db | Command | Time | State | Info | Progress || 11592370 | root | localhost | fog | Sleep | 1 | | NULL | 0.000 | | 11602122 | root | localhost | fog | Sleep | 2803 | | NULL | 0.000 | | 11611211 | root | localhost | fog | Query | 0 | NULL | show full processlist | 0.000 | | 11611851 | root | localhost | fog | Sleep | 992 | | NULL | 0.000 | | 11615468 | root | localhost | fog | Sleep | 408 | | NULL | 0.000 | | 11615586 | root | localhost | fog | Sleep | 387 | | NULL | 0.000 | | 11617654 | root | localhost | fog | Sleep | 49 | | NULL | 0.000 | | 11617704 | root | localhost | fog | Sleep | 0 | | NULL | 0.000 | | 11617918 | root | localhost | fog | Sleep | 2 | | NULL | 0.000 | | 11617931 | root | localhost | fog | Query | 0 | Copying to tmp table | SELECT `hostID` FROM `hosts` LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID` LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage` LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID` LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID` LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID` LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID` LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID` LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID` WHERE `hostMAC`.`hmPrimary` = '1' ORDER BY LOWER(`hosts`.`hostName`) ASC | 0.000 |rows in set (0.00 sec)
But something goes bad because the query is not immediate. The query takes 15 seconds to appear.
-
@fernando-gietz OK I tell you why I asked. I’m looking at this as a VM issue not fog (yet).
6 vCPU is a HUGE number in regards to VMs. I checked the processor from your lscpu post and that processor has 6 cores (12 hyperthreads). The issue with going too large with vCPUs is that it actually slows down your server because your hypervisor has to find a time slot where it has 6 processors available to run. You will see this in your wait times in your vm stats be higher than your other VM. Since you probably have a dual CPU (if you have a quad processor VM host wonderful) VM host server, and if you have a busy VM host server with a lot of virtual machines your FOG VM will suffer because the time where it has 6 vCPUs available will be quite small. So what should you do? While this will sound the opposite, drop your vCPU count to 4 (I would say 2 vCPUs, but start with 4) and see if you have better performance. Watch your top command. Right now your load average is pretty good, with 4 CPUs keep the numbers below 2,00 for the load average.
Aside from your potential vCPU issue since you are using the database when you see the slowness, we should probably run a test and see how fast your disk subsystem is on your virtual server. I was working on a post about making FOG go faster here: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast There are some statistics in this post https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/6 that you can compare your system against. Just use the dd command to create a 1GB test file and check your disk speeds.
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
While you can’t really do anything about the results, you will know how your VM Host server compares to my benchmark testing.Finally I have been working on another thread about moving the php processing from apache to php-fpm to get a better web gui performance increase because we off-load php processing from apache to a dedicated php engine. That post is here: https://forums.fogproject.org/topic/10717/can-php-fpm-make-fog-web-gui-fast When you get the rest of your issues fixed it would be interesting to see (because you have a large campus) if this solution helps with web gui performance.
And finally, you may have reached a limit where you might consider spinning up a new virtual machine dedicated to mysql (2 vCPU) that will be optimized for only mysql activities.
-
@george1421 said in FOG Server CPU usage 100%:
Aside from your potential vCPU issue since you are using the database when you see the slowness, we should probably run a test and see how fast your disk subsystem is on your virtual server. I was working on a post about making FOG go faster here: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast There are some statistics in this post https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/6 that you can compare your system against. Just use the dd command to create a 1GB test file and check your disk speeds.
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
While you can’t really do anything about the results, you will know how your VM Host server compares to my benchmark testing.I have done the tests (I can only test the NFS)
[root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,13593 s, 342 MB/s [root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,01033 s, 357 MB/s [root@fog7 images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct 1+0 registros leídos 1+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 3,01894 s, 356 MB/s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,61914 s, 410 MB/s real 0m2.626s user 0m0.021s sys 0m0.741s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,63524 s, 407 MB/s real 0m2.642s user 0m0.020s sys 0m0.676s [root@fog7 images]# echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/images/test1.img of=/dev/null bs=8k 3 131072+0 registros leídos 131072+0 registros escritos 1073741824 bytes (1,1 GB) copiados, 2,61491 s, 411 MB/s real 0m2.617s user 0m0.019s sys 0m0.685s
-
@fernando-gietz In this case I was only interested in the local file creation speed not NFS since that brings in more things (network, nfs subsystem, etc.) that is not part of the issue here.
Based on your numbers, the disk subsystem shouldn’t be your issue here. Your disks should be more than fast enough for them to keep up with mysql. I think I would start with dropping the number of vCPUs on this VM and see how well your performance is then.
-
@george1421 said in FOG Server CPU usage 100%:
@fernando-gietz In this case I was only interested in the local file creation speed not NFS since that brings in more things (network, nfs subsystem, etc.) that is not part of the issue here.
Based on your numbers, the disk subsystem shouldn’t be your issue here. Your disks should be more than fast enough for them to keep up with mysql. I think I would start with dropping the number of vCPUs on this VM and see how well your performance is then.
I tried to see the membership in our test environment
# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 37 Model name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz Stepping: 1 CPU MHz: 2665.526 BogoMIPS: 5333.52 Hypervisor vendor: VMware Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-3
And the behaviour is the same: circa 2 minutes to see the two computers of the group.
-
@fernando-gietz And what is top saying? Header as well as what is the load like on mysql? I want to ensure we have not made things worse.
The default check in time for the fog clients connected to this fog server is 5 minutes. So I might understand if you have 400 clients that you may be over loading mysql or apache.
My initial reaction is to have you try the php-fpm route (because I had a good improvement in performance on my system). But I want to make sure that there isn’t something else going on first.
How long as this virtual machine been in operation (thinking if its been used for a long time we might want to look into some mysql maintenance).