Hello,
Some news about this problem. We made some changes in our server and his configuration and, actually, the server is not so drowned as before. The conclusion is: the default configuration of apache, php-fpm and mysql is not optimal for large scenarios. If you have a great number of client, you need to tune the server.
I will tell our previous situation and the actual situation to share our experience.
Initial Scenario:
- FOG version 1.5.2
- Virtual server with 8 vCores and 16 GB RAM
- OS: RHEL 7
- Active clients: 7000
- One fog server and only the default node.
In July we migrated from our old FOG version (0.32) under RHEL 5 to the new one (1.5.2) under RHEL 7. Without any additional configuration.
In August we observed that the server consumed a lot of CPU and RAM and we began to have performance troubles (and the course had not started). Panic Mode ON!!
The first thing that you think is … more resources are neccesary (more wood is the war). ERROR. The System Operation Center (SOC) guys say NO. We can not give you more resources.
First thing: Update
we updated the server OS and some packages. For example: php and mariaDB. We had php 5.6 version and we updated to php 7, the performance of php increased a lot of.
We updated the FOG version from 1.5.2 to 1.5.4
Second thing: Optimize the virtual machine resources
Our virtual server is hosted in a VMWARE server with two socket and each one with 6 cores (is an old server). Problem: our virtual server was 8 vcores, 6 vCores in one socket and the other 2 in the other one. The server had time access problems.
We removed two vCores from the server, in this way all vCores were in the same socket and the time acccess was more quicky. PROBLEM: less resource, more server load. In September the clients began to wake up and the php and mysql queries increased, then more resources were neccesary. To minimize it we increased the checkout time of the client to 900 seconds, with this we decreased the php and mysql queries, but the comsumption was still high (mysqld proccess 300%). The problem was the access time to the cores of the server, we had 6 vCores in a socket with 6 cores and also with more virtual server in the same socket. The more time the vCores were waiting to access to the sockets cores. The vCores were always at 100% of CPU usage.
To solve this we enabled the NUMA in the server:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt
With this we distributed the vCores between the two socket: vCPUs 0,1,2 + 8GB RAM in NUMA 0 y 3,4,5, + 8GB in NUMA 1. This is configurated in the VMWARE server. In addition, we install the numad package in our virtual server, this daemon distributed the proccess betwween the two NUMAs. The access to the RAM and CPU was faster.
For example:
# ./numa-maps-summary.pl < /proc/1787/numa_maps
N0 : 100 ( 0.00 GB)
N1 : 648226 ( 2.47 GB)
active : 435228 ( 1.66 GB)
anon : 645582 ( 2.46 GB)
dirty : 647118 ( 2.47 GB)
kernelpagesize_kB: 1012 ( 0.00 GB)
mapmax : 332 ( 0.00 GB)
mapped : 1248 ( 0.00 GB)
we can see with this python script that the mysql is using the resources of the NUMA1 Node.
Now we have, again, 8 vCores distribuited between the two NUMA nodes.
Now, the vCores are at 80%-90%
Third thing: tunning php, php.fpm and mysql
We don’t have a lot idea about php, php-fpm and mysql, then we had to read a lot of articles in the web about them.
Tunning MySQL: to do it we have used the mysqltunner script, http://mysqltuner.com . This script gives you an idea about the performance of the database and how tunne it to increase the performance.
SET GLOBAL query_cache_size = 4000000; (4MB)
SET GLOBAL tmp_table_size = 20000000; (20MB)
SET GLOBAL query_cache_limit = 2000000; (2MB)
SET GLOBAL max_heap_table_size = 20000000; (20MB)
SET GLOBAL thread_cache_size = 4;
SET GLOBAL table_open_cache = 450;
In the MariaDB web page recommends edcrease the swappiness value (https://mariadb.com/kb/en/library/configuring-swappiness/)
#sysctl -w vm.swappiness=10
Tunning php-fpm and php: There is some articles about it in this forum.
PHP-FPM:
pm = ondemand
; The number of child processes to be created when pm is set to 'static' and the
; maximum number of child processes when pm is set to 'dynamic' or 'ondemand'.
; This value sets the limit on the number of simultaneous requests that will be
; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.
; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
; CGI.
; Note: Used when pm is set to 'static', 'dynamic' or 'ondemand'
; Note: This value is mandatory.
pm.max_children = 50
; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
pm.start_servers = 5
; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.min_spare_servers = 5
; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.max_spare_servers = 50
; The number of seconds after which an idle process will be killed.
; Note: Used only when pm is set to 'ondemand'
; Default Value: 10s
pm.process_idle_timeout = 10s;
Normaly the people have pm=dynamic but we use pm=ondemand because we saw that the performance is better.
Is possible that these parameters will be changed, but now the server runs well, but is October and the download activity has decreased a lot of.
To see the activity of php you can enable the apache server status in the php.ini and there is a tool “goaccess” too to see the php calls and the number in the terminal:
#yum install goaccess
#tail -f /var/log/httpd/access_log | goaccess -