FOG Server High CPU
-
@Fernando-Gietz this is what I hope to move to in the future but this is not in use currently. The tablename and install method are not used for the core elements and those where just a placeholder until I can mimic the proper table layout for the item.
It is not in use for core elements.
-
Hello,
Some news about this problem. We made some changes in our server and his configuration and, actually, the server is not so drowned as before. The conclusion is: the default configuration of apache, php-fpm and mysql is not optimal for large scenarios. If you have a great number of client, you need to tune the server.
I will tell our previous situation and the actual situation to share our experience.
Initial Scenario:
- FOG version 1.5.2
- Virtual server with 8 vCores and 16 GB RAM
- OS: RHEL 7
- Active clients: 7000
- One fog server and only the default node.
In July we migrated from our old FOG version (0.32) under RHEL 5 to the new one (1.5.2) under RHEL 7. Without any additional configuration.
In August we observed that the server consumed a lot of CPU and RAM and we began to have performance troubles (and the course had not started). Panic Mode ON!!
The first thing that you think is … more resources are neccesary (more wood is the war). ERROR. The System Operation Center (SOC) guys say NO. We can not give you more resources.
First thing: Update
we updated the server OS and some packages. For example: php and mariaDB. We had php 5.6 version and we updated to php 7, the performance of php increased a lot of.We updated the FOG version from 1.5.2 to 1.5.4
Second thing: Optimize the virtual machine resources
Our virtual server is hosted in a VMWARE server with two socket and each one with 6 cores (is an old server). Problem: our virtual server was 8 vcores, 6 vCores in one socket and the other 2 in the other one. The server had time access problems.
We removed two vCores from the server, in this way all vCores were in the same socket and the time acccess was more quicky. PROBLEM: less resource, more server load. In September the clients began to wake up and the php and mysql queries increased, then more resources were neccesary. To minimize it we increased the checkout time of the client to 900 seconds, with this we decreased the php and mysql queries, but the comsumption was still high (mysqld proccess 300%). The problem was the access time to the cores of the server, we had 6 vCores in a socket with 6 cores and also with more virtual server in the same socket. The more time the vCores were waiting to access to the sockets cores. The vCores were always at 100% of CPU usage.To solve this we enabled the NUMA in the server:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirtWith this we distributed the vCores between the two socket: vCPUs 0,1,2 + 8GB RAM in NUMA 0 y 3,4,5, + 8GB in NUMA 1. This is configurated in the VMWARE server. In addition, we install the numad package in our virtual server, this daemon distributed the proccess betwween the two NUMAs. The access to the RAM and CPU was faster.
For example:
# ./numa-maps-summary.pl < /proc/1787/numa_maps N0 : 100 ( 0.00 GB) N1 : 648226 ( 2.47 GB) active : 435228 ( 1.66 GB) anon : 645582 ( 2.46 GB) dirty : 647118 ( 2.47 GB) kernelpagesize_kB: 1012 ( 0.00 GB) mapmax : 332 ( 0.00 GB) mapped : 1248 ( 0.00 GB)
we can see with this python script that the mysql is using the resources of the NUMA1 Node.
Now we have, again, 8 vCores distribuited between the two NUMA nodes.
Now, the vCores are at 80%-90%Third thing: tunning php, php.fpm and mysql
We don’t have a lot idea about php, php-fpm and mysql, then we had to read a lot of articles in the web about them.
Tunning MySQL: to do it we have used the mysqltunner script, http://mysqltuner.com . This script gives you an idea about the performance of the database and how tunne it to increase the performance.
SET GLOBAL query_cache_size = 4000000; (4MB) SET GLOBAL tmp_table_size = 20000000; (20MB) SET GLOBAL query_cache_limit = 2000000; (2MB) SET GLOBAL max_heap_table_size = 20000000; (20MB) SET GLOBAL thread_cache_size = 4; SET GLOBAL table_open_cache = 450;
In the MariaDB web page recommends edcrease the swappiness value (https://mariadb.com/kb/en/library/configuring-swappiness/)
#sysctl -w vm.swappiness=10
Tunning php-fpm and php: There is some articles about it in this forum.
PHP-FPM:pm = ondemand ; The number of child processes to be created when pm is set to 'static' and the ; maximum number of child processes when pm is set to 'dynamic' or 'ondemand'. ; This value sets the limit on the number of simultaneous requests that will be ; served. Equivalent to the ApacheMaxClients directive with mpm_prefork. ; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP ; CGI. ; Note: Used when pm is set to 'static', 'dynamic' or 'ondemand' ; Note: This value is mandatory. pm.max_children = 50 ; The number of child processes created on startup. ; Note: Used only when pm is set to 'dynamic' ; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2 pm.start_servers = 5 ; The desired minimum number of idle server processes. ; Note: Used only when pm is set to 'dynamic' ; Note: Mandatory when pm is set to 'dynamic' pm.min_spare_servers = 5 ; The desired maximum number of idle server processes. ; Note: Used only when pm is set to 'dynamic' ; Note: Mandatory when pm is set to 'dynamic' pm.max_spare_servers = 50 ; The number of seconds after which an idle process will be killed. ; Note: Used only when pm is set to 'ondemand' ; Default Value: 10s pm.process_idle_timeout = 10s;
Normaly the people have pm=dynamic but we use pm=ondemand because we saw that the performance is better.
Is possible that these parameters will be changed, but now the server runs well, but is October and the download activity has decreased a lot of.
To see the activity of php you can enable the apache server status in the php.ini and there is a tool “goaccess” too to see the php calls and the number in the terminal:
#yum install goaccess
#tail -f /var/log/httpd/access_log | goaccess - -
More info about this
Now we have configurated the client cheout time = 275 seconds. In September we had to increase it to 900, but after the changes we have setup it as I tell, 275 seconds.
This is a capture of goaccess tool:
The capture is after 6 minutes of activity, we can see that:
Total request: 8436 -> 23,5 req/sec
1378 visitors requested the /fog/service/getversion.php?NewServise&json page. This says that 1378 client are connected simultaneous.
1 visitor requested /fog/service/progress.php 114 times. This client is doing a download task.Is necessary take in account that when a cleint is doing a dowload, capture or multicast task, this client asks or reports to the server his advance (yes is very pretty and cool see the progress bar) but this info has a price. The client reports his advance more or lees every three or four seconds, when you have one tasks is “pecata minuta”, but when you have 100 or more client doing download or cpature tasks is a problem, because the php server can not process all request simultaneous and is not only the php server, is the mysql server too.
For example in this capture of htop command (like atop or top):
We can see that the vCores are busy, but not at 100%, the load is high, mysqld is using 221% of CPU. In this moment the server is proccessing only the FOG client requests of the computers, there is no any tasks (When the technicians send download, multicast or capture tasks, the server is burning … literally. I saw the load at 60 or more, the server could not attend the all request and refused them ), In this capture shows the activity of the two NUMA nodes clearly.
Node 0: 1, 2 ,3 and 4
Node 1: 5 ,6, 7 and 8Where is working the mysqld proccess?
# ./numa-maps-summary.pl < /proc/1787/numa_maps N0 : 636053 ( 2.43 GB) N1 : 14336 ( 0.05 GB) active : 352378 ( 1.34 GB) anon : 649153 ( 2.48 GB) dirty : 649153 ( 2.48 GB) kernelpagesize_kB: 1016 ( 0.00 GB) mapmax : 480 ( 0.00 GB) mapped : 1276 ( 0.00 GB)
In the Node 0.
I downloaded a little script, i forgot from where, that shows the usage of RAM of each proccess:
# ./ps_mem.py Private + Shared = RAM used Program 4.0 KiB + 12.5 KiB = 16.5 KiB agetty 4.0 KiB + 15.0 KiB = 19.0 KiB mysqld_safe 4.0 KiB + 47.5 KiB = 51.5 KiB rpc.statd 4.0 KiB + 49.5 KiB = 53.5 KiB rpc.idmapd 4.0 KiB + 57.0 KiB = 61.0 KiB lvmetad 36.0 KiB + 31.0 KiB = 67.0 KiB atd 4.0 KiB + 73.5 KiB = 77.5 KiB VGAuthService 88.0 KiB + 32.0 KiB = 120.0 KiB rhsmcertd 92.0 KiB + 41.5 KiB = 133.5 KiB systemd-udevd 112.0 KiB + 22.5 KiB = 134.5 KiB sleep 88.0 KiB + 55.0 KiB = 143.0 KiB vsftpd 88.0 KiB + 65.0 KiB = 153.0 KiB gssproxy 148.0 KiB + 23.0 KiB = 171.0 KiB udp-sender 156.0 KiB + 30.0 KiB = 186.0 KiB crond 164.0 KiB + 30.0 KiB = 194.0 KiB in.tftpd 180.0 KiB + 20.0 KiB = 200.0 KiB numad 192.0 KiB + 16.0 KiB = 208.0 KiB rhnsd 128.0 KiB + 83.5 KiB = 211.5 KiB master 176.0 KiB + 35.5 KiB = 211.5 KiB xinetd 188.0 KiB + 54.5 KiB = 242.5 KiB auditd 232.0 KiB + 29.0 KiB = 261.0 KiB irqbalance 192.0 KiB + 88.5 KiB = 280.5 KiB qmgr 208.0 KiB + 87.0 KiB = 295.0 KiB sh 240.0 KiB + 109.5 KiB = 349.5 KiB rpcbind 588.0 KiB + 36.5 KiB = 624.5 KiB systemd-logind 668.0 KiB + 75.5 KiB = 743.5 KiB dbus-daemon 596.0 KiB + 311.5 KiB = 907.5 KiB polkitd 800.0 KiB + 175.5 KiB = 975.5 KiB vmtoolsd 916.0 KiB + 64.5 KiB = 980.5 KiB FOGpxe.sh 936.0 KiB + 135.0 KiB = 1.0 MiB dnsmasq 1.1 MiB + 386.0 KiB = 1.5 MiB NetworkManager 1.4 MiB + 413.5 KiB = 1.8 MiB pickup 2.0 MiB + 101.0 KiB = 2.1 MiB rpc.mountd 2.2 MiB + 67.5 KiB = 2.3 MiB systemd 2.6 MiB + 308.0 KiB = 2.9 MiB tuned 2.9 MiB + 355.5 KiB = 3.2 MiB mysql 2.7 MiB + 846.0 KiB = 3.5 MiB bash (6) 3.0 MiB + 822.5 KiB = 3.8 MiB FOGSnapinReplic (2) 3.1 MiB + 682.5 KiB = 3.8 MiB FOGImageReplica (2) 4.2 MiB + 80.0 KiB = 4.2 MiB nsrexecd 4.1 MiB + 718.5 KiB = 4.8 MiB FOGSnapinHash (2) 3.6 MiB + 1.3 MiB = 4.9 MiB sudo (3) 5.4 MiB + 1.0 MiB = 6.4 MiB FOGTaskSchedule (2) 7.5 MiB + 118.5 KiB = 7.6 MiB glusterfsd 1.9 MiB + 6.1 MiB = 8.0 MiB sshd (7) 2.5 MiB + 7.1 MiB = 9.5 MiB rsyslogd 10.3 MiB + 713.5 KiB = 11.0 MiB FOGImageSize (2) 7.0 MiB + 9.9 MiB = 16.9 MiB systemd-journald 21.5 MiB + 791.0 KiB = 22.2 MiB FOGPingHosts (2) 25.6 MiB + 1.0 MiB = 26.6 MiB FOGMulticastMan (2) 315.4 MiB + 14.6 MiB = 330.0 MiB php-fpm (51) 2.5 GiB + 289.0 KiB = 2.5 GiB mysqld 5.1 GiB + 14.9 MiB = 5.1 GiB httpd (12) --------------------------------- 8.0 GiB =================================