FOG Server High CPU


  • Testers

    Hello All,
    I came in this morning to a few alarms on my VM environment about Guest CPU Usage. Tracked it down to my fog server which I then found was chugging at 100% cpu utilization. Here is a screenshot of the “TOP” command run as root:

    0_1535381169369_chooch.PNG

    I am also getting many “Max children reached” messages in the php-fpm error log on the Web UI.
    0_1535381297464_php.PNG

    I am running 1.5.4 on CentOS7.
    Thanks!
    Paul


  • Developer

    More info about this :)

    Now we have configurated the client cheout time = 275 seconds. In September we had to increase it to 900, but after the changes we have setup it as I tell, 275 seconds.

    This is a capture of goaccess tool:

    0_1540379551079_goaccess_RHEL.png

    The capture is after 6 minutes of activity, we can see that:
    Total request: 8436 -> 23,5 req/sec
    1378 visitors requested the /fog/service/getversion.php?NewServise&json page. This says that 1378 client are connected simultaneous.
    1 visitor requested /fog/service/progress.php 114 times. This client is doing a download task.

    Is necessary take in account that when a cleint is doing a dowload, capture or multicast task, this client asks or reports to the server his advance (yes is very pretty and cool see the progress bar) but this info has a price. The client reports his advance more or lees every three or four seconds, when you have one tasks is “pecata minuta”, but when you have 100 or more client doing download or cpature tasks is a problem, because the php server can not process all request simultaneous and is not only the php server, is the mysql server too.

    For example in this capture of htop command (like atop or top):

    0_1540380690148_Captura de pantalla de 2018-10-24 13-08-54.png

    We can see that the vCores are busy, but not at 100%, the load is high, mysqld is using 221% of CPU. In this moment the server is proccessing only the FOG client requests of the computers, there is no any tasks (When the technicians send download, multicast or capture tasks, the server is burning … literally. I saw the load at 60 or more, the server could not attend the all request and refused them ), In this capture shows the activity of the two NUMA nodes clearly.

    Node 0: 1, 2 ,3 and 4
    Node 1: 5 ,6, 7 and 8

    Where is working the mysqld proccess?

    # ./numa-maps-summary.pl < /proc/1787/numa_maps
    N0        :       636053 (  2.43 GB)
    N1        :        14336 (  0.05 GB)
    active    :       352378 (  1.34 GB)
    anon      :       649153 (  2.48 GB)
    dirty     :       649153 (  2.48 GB)
    kernelpagesize_kB:         1016 (  0.00 GB)
    mapmax    :          480 (  0.00 GB)
    mapped    :         1276 (  0.00 GB)
    

    In the Node 0.

    I downloaded a little script, i forgot from where, that shows the usage of RAM of each proccess:

    # ./ps_mem.py 
     Private  +   Shared  =  RAM used	Program
    
      4.0 KiB +  12.5 KiB =  16.5 KiB	agetty
      4.0 KiB +  15.0 KiB =  19.0 KiB	mysqld_safe
      4.0 KiB +  47.5 KiB =  51.5 KiB	rpc.statd
      4.0 KiB +  49.5 KiB =  53.5 KiB	rpc.idmapd
      4.0 KiB +  57.0 KiB =  61.0 KiB	lvmetad
     36.0 KiB +  31.0 KiB =  67.0 KiB	atd
      4.0 KiB +  73.5 KiB =  77.5 KiB	VGAuthService
     88.0 KiB +  32.0 KiB = 120.0 KiB	rhsmcertd
     92.0 KiB +  41.5 KiB = 133.5 KiB	systemd-udevd
    112.0 KiB +  22.5 KiB = 134.5 KiB	sleep
     88.0 KiB +  55.0 KiB = 143.0 KiB	vsftpd
     88.0 KiB +  65.0 KiB = 153.0 KiB	gssproxy
    148.0 KiB +  23.0 KiB = 171.0 KiB	udp-sender
    156.0 KiB +  30.0 KiB = 186.0 KiB	crond
    164.0 KiB +  30.0 KiB = 194.0 KiB	in.tftpd
    180.0 KiB +  20.0 KiB = 200.0 KiB	numad
    192.0 KiB +  16.0 KiB = 208.0 KiB	rhnsd
    128.0 KiB +  83.5 KiB = 211.5 KiB	master
    176.0 KiB +  35.5 KiB = 211.5 KiB	xinetd
    188.0 KiB +  54.5 KiB = 242.5 KiB	auditd
    232.0 KiB +  29.0 KiB = 261.0 KiB	irqbalance
    192.0 KiB +  88.5 KiB = 280.5 KiB	qmgr
    208.0 KiB +  87.0 KiB = 295.0 KiB	sh
    240.0 KiB + 109.5 KiB = 349.5 KiB	rpcbind
    588.0 KiB +  36.5 KiB = 624.5 KiB	systemd-logind
    668.0 KiB +  75.5 KiB = 743.5 KiB	dbus-daemon
    596.0 KiB + 311.5 KiB = 907.5 KiB	polkitd
    800.0 KiB + 175.5 KiB = 975.5 KiB	vmtoolsd
    916.0 KiB +  64.5 KiB = 980.5 KiB	FOGpxe.sh
    936.0 KiB + 135.0 KiB =   1.0 MiB	dnsmasq
      1.1 MiB + 386.0 KiB =   1.5 MiB	NetworkManager
      1.4 MiB + 413.5 KiB =   1.8 MiB	pickup
      2.0 MiB + 101.0 KiB =   2.1 MiB	rpc.mountd
      2.2 MiB +  67.5 KiB =   2.3 MiB	systemd
      2.6 MiB + 308.0 KiB =   2.9 MiB	tuned
      2.9 MiB + 355.5 KiB =   3.2 MiB	mysql
      2.7 MiB + 846.0 KiB =   3.5 MiB	bash (6)
      3.0 MiB + 822.5 KiB =   3.8 MiB	FOGSnapinReplic (2)
      3.1 MiB + 682.5 KiB =   3.8 MiB	FOGImageReplica (2)
      4.2 MiB +  80.0 KiB =   4.2 MiB	nsrexecd
      4.1 MiB + 718.5 KiB =   4.8 MiB	FOGSnapinHash (2)
      3.6 MiB +   1.3 MiB =   4.9 MiB	sudo (3)
      5.4 MiB +   1.0 MiB =   6.4 MiB	FOGTaskSchedule (2)
      7.5 MiB + 118.5 KiB =   7.6 MiB	glusterfsd
      1.9 MiB +   6.1 MiB =   8.0 MiB	sshd (7)
      2.5 MiB +   7.1 MiB =   9.5 MiB	rsyslogd
     10.3 MiB + 713.5 KiB =  11.0 MiB	FOGImageSize (2)
      7.0 MiB +   9.9 MiB =  16.9 MiB	systemd-journald
     21.5 MiB + 791.0 KiB =  22.2 MiB	FOGPingHosts (2)
     25.6 MiB +   1.0 MiB =  26.6 MiB	FOGMulticastMan (2)
    315.4 MiB +  14.6 MiB = 330.0 MiB	php-fpm (51)
      2.5 GiB + 289.0 KiB =   2.5 GiB	mysqld
      5.1 GiB +  14.9 MiB =   5.1 GiB	httpd (12)
    ---------------------------------
                              8.0 GiB
    =================================
    

  • Developer

    Hello,

    Some news about this problem. We made some changes in our server and his configuration and, actually, the server is not so drowned as before. The conclusion is: the default configuration of apache, php-fpm and mysql is not optimal for large scenarios. If you have a great number of client, you need to tune the server.

    I will tell our previous situation and the actual situation to share our experience.

    Initial Scenario:

    • FOG version 1.5.2
    • Virtual server with 8 vCores and 16 GB RAM
    • OS: RHEL 7
    • Active clients: 7000
    • One fog server and only the default node.

    In July we migrated from our old FOG version (0.32) under RHEL 5 to the new one (1.5.2) under RHEL 7. Without any additional configuration.

    In August we observed that the server consumed a lot of CPU and RAM and we began to have performance troubles (and the course had not started). Panic Mode ON!!

    The first thing that you think is … more resources are neccesary (more wood is the war). ERROR. The System Operation Center (SOC) guys say NO. We can not give you more resources.

    First thing: Update
    we updated the server OS and some packages. For example: php and mariaDB. We had php 5.6 version and we updated to php 7, the performance of php increased a lot of.

    We updated the FOG version from 1.5.2 to 1.5.4

    Second thing: Optimize the virtual machine resources

    Our virtual server is hosted in a VMWARE server with two socket and each one with 6 cores (is an old server). Problem: our virtual server was 8 vcores, 6 vCores in one socket and the other 2 in the other one. The server had time access problems.
    We removed two vCores from the server, in this way all vCores were in the same socket and the time acccess was more quicky. PROBLEM: less resource, more server load. In September the clients began to wake up and the php and mysql queries increased, then more resources were neccesary. To minimize it we increased the checkout time of the client to 900 seconds, with this we decreased the php and mysql queries, but the comsumption was still high (mysqld proccess 300%). The problem was the access time to the cores of the server, we had 6 vCores in a socket with 6 cores and also with more virtual server in the same socket. The more time the vCores were waiting to access to the sockets cores. The vCores were always at 100% of CPU usage.

    To solve this we enabled the NUMA in the server:
    https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt

    With this we distributed the vCores between the two socket: vCPUs 0,1,2 + 8GB RAM in NUMA 0 y 3,4,5, + 8GB in NUMA 1. This is configurated in the VMWARE server. In addition, we install the numad package in our virtual server, this daemon distributed the proccess betwween the two NUMAs. The access to the RAM and CPU was faster.

    For example:

    # ./numa-maps-summary.pl < /proc/1787/numa_maps
    N0        :          100 (  0.00 GB)
    N1        :       648226 (  2.47 GB)
    active    :       435228 (  1.66 GB)
    anon      :       645582 (  2.46 GB)
    dirty     :       647118 (  2.47 GB)
    kernelpagesize_kB:         1012 (  0.00 GB)
    mapmax    :          332 (  0.00 GB)
    mapped    :         1248 (  0.00 GB)
    

    we can see with this python script that the mysql is using the resources of the NUMA1 Node.
    Now we have, again, 8 vCores distribuited between the two NUMA nodes.
    Now, the vCores are at 80%-90%

    Third thing: tunning php, php.fpm and mysql

    We don’t have a lot idea about php, php-fpm and mysql, then we had to read a lot of articles in the web about them.

    Tunning MySQL: to do it we have used the mysqltunner script, http://mysqltuner.com . This script gives you an idea about the performance of the database and how tunne it to increase the performance.

    SET GLOBAL query_cache_size = 4000000; (4MB)
    SET GLOBAL tmp_table_size = 20000000; (20MB)
    SET GLOBAL query_cache_limit = 2000000; (2MB)
    SET GLOBAL max_heap_table_size = 20000000; (20MB)
    SET GLOBAL thread_cache_size = 4;
    SET GLOBAL table_open_cache = 450; 
    

    In the MariaDB web page recommends edcrease the swappiness value (https://mariadb.com/kb/en/library/configuring-swappiness/)

    #sysctl -w vm.swappiness=10
    

    Tunning php-fpm and php: There is some articles about it in this forum.
    PHP-FPM:

    pm = ondemand
    
    ; The number of child processes to be created when pm is set to 'static' and the
    ; maximum number of child processes when pm is set to 'dynamic' or 'ondemand'.
    ; This value sets the limit on the number of simultaneous requests that will be
    ; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.
    ; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
    ; CGI.
    ; Note: Used when pm is set to 'static', 'dynamic' or 'ondemand'
    ; Note: This value is mandatory.
    pm.max_children = 50
    
    ; The number of child processes created on startup.
    ; Note: Used only when pm is set to 'dynamic'
    ; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
    pm.start_servers = 5
    
    ; The desired minimum number of idle server processes.
    ; Note: Used only when pm is set to 'dynamic'
    ; Note: Mandatory when pm is set to 'dynamic'
    pm.min_spare_servers = 5
    
    ; The desired maximum number of idle server processes.
    ; Note: Used only when pm is set to 'dynamic'
    ; Note: Mandatory when pm is set to 'dynamic'
    pm.max_spare_servers = 50
    
    ; The number of seconds after which an idle process will be killed.
    ; Note: Used only when pm is set to 'ondemand'
    ; Default Value: 10s
    pm.process_idle_timeout = 10s;
    

    Normaly the people have pm=dynamic but we use pm=ondemand because we saw that the performance is better.

    Is possible that these parameters will be changed, but now the server runs well, but is October and the download activity has decreased a lot of.

    To see the activity of php you can enable the apache server status in the php.ini and there is a tool “goaccess” too to see the php calls and the number in the terminal:

    #yum install goaccess
    #tail -f /var/log/httpd/access_log | goaccess -


  • Senior Developer

    @Fernando-Gietz this is what I hope to move to in the future but this is not in use currently. The tablename and install method are not used for the core elements and those where just a placeholder until I can mimic the proper table layout for the item.

    It is not in use for core elements.


  • Developer

    I am searching from where is called this query and I have seen that it seems an error or a bug.

     <?php
     class UserTrackingManager extends FOGManagerController
     {
         public $tablename = 'dirCleaner';
         public function install()
         {
             $this->uninstall();
             $sql = Schema::createTable(
                 $this->tablename,
                 true,
                 array(
                     'dcID',
                     'dcPath'
                 ),
                 array(
                     'INTEGER',
                     'LONGTEXT'
                 ),
                 array(
                     false,
                     false
                 ),
                 array(
                     false,
                     false
                 ),
                 array(
                     'dcID',
                     'dcPath'
                 ),
                 'MyISAM',
                 'utf8',
                 'dcID',
                 'dcID'
             );
             return self::$DB->query($sql);
         }
     }
    

    ¿¿tablename= dirCleaner??

    Of course I don’t find who does this query XD


  • Developer

    @Tom-Elliott Then, in this case, is the FOG client who do this query? The query shows a lot of info, is all necessary? Maybe is possible to put a flag on the service which do the query and make it configurable for webUI?

    Do a “SELECT * FROM” query is possibe use a sledgehammer to crack nuts.


  • Senior Developer

    @Fernando-Gietz every host requests all the information from linking tables. That’s the only reason I ask about clearing. Of course this can be changed but would require a lot of work to the code base, as the intent was to have all information as readily available as possible.


  • Developer

    @Tom-Elliott I prefer not clean or delete userTracking table. Once my boss asked about it to investigate the bad use of the computers :0 The slow query log shows userTracking queries, but are select queries and not insert queries, who or which process asks about it?

    # Time: 181003 16:37:01
    # User@Host: root[root] @ localhost []
    # Thread_id: 12427592  Schema: fog  QC_hit: No
    # Query_time: 83.091067  Lock_time: 0.000094  Rows_sent: 155086  Rows_examined: 536104
    SET timestamp=1538577421;
    SELECT * FROM `userTracking`  LEFT OUTER JOIN `hosts` ON `hosts`.`hostID`=`userTracking`.`utHostID`  LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage`  LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID`  LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID`  LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID`  LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID`    WHERE `hostMAC`.`hmPrimary` = '1'  ORDER BY `userTracking`.`utID` ASC;
    # Time: 181003 16:46:07
    # User@Host: root[root] @ localhost []
    # Thread_id: 12448399  Schema: fog  QC_hit: No
    # Query_time: 48.331137  Lock_time: 0.000105  Rows_sent: 155135  Rows_examined: 536251
    SET timestamp=1538577967;
    SELECT * FROM `userTracking`  LEFT OUTER JOIN `hosts` ON `hosts`.`hostID`=`userTracking`.`utHostID`  LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage`  LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID`  LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID`  LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID`  LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID`    WHERE `hostMAC`.`hmPrimary` = '1'  ORDER BY `userTracking`.`utID` ASC;
    

  • Senior Developer

    @Fernando-Gietz mind clearing out some of the userTracking and imagingLog tables? I’ve seen issues you’re describing simply because these logs are always added to while rarely cleared. This essentially makes each request per host have to pull in a ton of data.


  • Developer

    @george1421 Hi George,

    sorry for my late answer, I was out of the forums these days.

    Scenario:
    FOG Version: 1.5.4
    Server OS: RHEL 7
    7000 hosts in the server (we only use one server to manege them). No nodes.
    Client checkout time: 275 seconds. We began with 900 seconds and after the update to PHP 7.0, we have been decreasing the chechout time to 275. With this configuration the mysql process uses 140-160% of the CPU.

    We are testing the server configuration (cores and RAM) to optimize them. We see that 6 vCores (before we had 8 vCores) are few because the server load is great, we will increase them to 8 vCores. To do this we need shutdown the server and we will take in advantage this to increase the hardware profile from 10 to 13 (I think that is the last one in the Vcenter). This, maybe, can increase a few the performance of the server.

    After this we will try to optimize the mysql config.

     >>  MySQLTuner 1.7.10 - Major Hayden <major@mhtx.net>
     >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
     >>  Run with '--help' for additional options and output filtering
    
    [--] Skipped version check for MySQLTuner script
    [OK] Currently running supported MySQL version 5.5.60-MariaDB
    [OK] Operating on 64-bit architecture
     
    -------- Log file Recommendations ------------------------------------------------------------------
    [--] Log file: /var/log/mariadb/mariadb.log(218K)
    [OK] Log file /var/log/mariadb/mariadb.log exists
    [OK] Log file /var/log/mariadb/mariadb.log is readable.
    [OK] Log file /var/log/mariadb/mariadb.log is not empty
    [OK] Log file /var/log/mariadb/mariadb.log is smaller than 32 Mb
    [!!] /var/log/mariadb/mariadb.log contains 429 warning(s).
    [!!] /var/log/mariadb/mariadb.log contains 477 error(s).
    [--] 89 start(s) detected in /var/log/mariadb/mariadb.log
    [--] 1) 180925 16:48:01 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 2) 180920 18:36:46 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 3) 180920 18:32:57 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 4) 180920 18:32:29 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 5) 180919 18:14:02 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 6) 180919 18:03:47 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 7) 180919 14:11:52 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 8) 180918 18:57:39 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 9) 180918 18:54:57 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 10) 180918  9:18:59 [Note] /usr/libexec/mysqld: ready for connections.
    [--] 65 shutdown(s) detected in /var/log/mariadb/mariadb.log
    [--] 1) 180925 16:45:39 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 2) 180920 18:35:51 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 3) 180920 18:32:53 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 4) 180920 18:32:25 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 5) 180919 18:12:16 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 6) 180919 18:03:44 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 7) 180918 18:57:17 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 8) 180918 18:54:55 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 9) 180917 19:03:31 [Note] /usr/libexec/mysqld: Shutdown complete
    [--] 10) 180907 18:15:30 [Note] /usr/libexec/mysqld: Shutdown complete
     
    -------- Storage Engine Statistics -----------------------------------------------------------------
    [--] Status: +ARCHIVE +Aria +BLACKHOLE +CSV +FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA 
    [--] Data in MyISAM tables: 56.9M (Tables: 67)
    [--] Data in InnoDB tables: 26.8M (Tables: 7)
    [OK] Total fragmented tables: 0
     
     
    -------- CVE Security Recommendations --------------------------------------------------------------
    [OK] NO SECURITY CVE FOUND FOR YOUR VERSION
     
    -------- Performance Metrics -----------------------------------------------------------------------
    [--] Up for: 7d 23h 44m 8s (373M q [541.025 qps], 12M conn, TX: 626G, RX: 83G)
    [--] Reads / Writes: 99% / 1%
    [--] Binary logging is disabled
    [--] Physical Memory     : 15.5G
    [--] Max MySQL memory    : 972.2M
    [--] Other process memory: 8.2G
    [--] Total buffers: 416.0M global + 2.8M per thread (200 max threads)
    [--] P_S Max memory usage: 0B
    [--] Galera GCache Max memory usage: 0B
    [OK] Maximum reached memory usage: 644.1M (4.05% of installed RAM)
    [OK] Maximum possible memory usage: 972.2M (6.12% of installed RAM)
    [OK] Overall possible memory usage with other process is compatible with memory available
    [OK] Slow queries: 0% (398/373M)
    [OK] Highest usage of available connections: 41% (82/200)
    [OK] Aborted connections: 0.00%  (1/12420260)
    [!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
    [!!] Query cache may be disabled by default due to mutex contention.
    [!!] Query cache efficiency: 0.0% (0 cached / 334M selects)
    [OK] Query cache prunes per day: 0
    [OK] Sorts requiring temporary tables: 0% (1K temp sorts / 97M sorts)
    [OK] No joins without indexes
    [!!] Temporary tables created on disk: 67% (7M on disk / 10M total)
    [!!] Thread cache is disabled
    [!!] Table cache hit rate: 1% (400 open / 20K opened)
    [OK] Open file limit used: 44% (455/1K)
    [OK] Table locks acquired immediately: 99% (614M immediate / 617M locks)
     
    -------- Performance schema ------------------------------------------------------------------------
    [--] Performance schema is disabled.
    [--] Memory used by P_S: 0B
    [--] Sys schema isn't installed.
     
    -------- ThreadPool Metrics ------------------------------------------------------------------------
    [--] ThreadPool stat is enabled.
    [--] Thread Pool Size: 6 thread(s).
    [--] Using default value is good enough for your version (5.5.60-MariaDB)
     
    -------- MyISAM Metrics ----------------------------------------------------------------------------
    [!!] Key buffer used: 24.6% (33M used / 134M cache)
    [OK] Key buffer size / total MyISAM indexes: 128.0M/26.9M
    [OK] Read Key buffer hit rate: 100.0% (5B cached / 45K reads)
    [!!] Write Key buffer hit rate: 4.5% (13M cached / 615K writes)
     
    -------- InnoDB Metrics ----------------------------------------------------------------------------
    [--] InnoDB is enabled.
    [--] InnoDB Thread Concurrency: 0
    [!!] InnoDB File per table is not activated
    [OK] InnoDB buffer pool / data size: 128.0M/26.8M
    [!!] Ratio InnoDB log file size / InnoDB Buffer pool size (7.8125 %): 5.0M * 2/128.0M should be equal 25%
    [OK] InnoDB buffer pool instances: 1
    [--] InnoDB Buffer Pool Chunk Size not used or defined in your version
    [OK] InnoDB Read buffer efficiency: 100.00% (1698149490 hits/ 1698149871 total)
    [OK] InnoDB Write log efficiency: 99.83% (41391992 hits/ 41460907 total)
    [OK] InnoDB log waits: 0.00% (0 waits / 68915 writes)
     
    -------- AriaDB Metrics ----------------------------------------------------------------------------
    [--] AriaDB is enabled.
    [OK] Aria pagecache size / total Aria indexes: 128.0M/1B
    [!!] Aria pagecache hit rate: 84.7% (167M cached / 25M reads)
     
    -------- TokuDB Metrics ----------------------------------------------------------------------------
    [--] TokuDB is disabled.
     
    -------- XtraDB Metrics ----------------------------------------------------------------------------
    [--] XtraDB is disabled.
     
    -------- RocksDB Metrics ---------------------------------------------------------------------------
    [--] RocksDB is disabled.
     
    -------- Spider Metrics ----------------------------------------------------------------------------
    [--] Spider is disabled.
     
    -------- Connect Metrics ---------------------------------------------------------------------------
    [--] Connect is disabled.
     
    -------- Galera Metrics ----------------------------------------------------------------------------
    [--] Galera is disabled.
     
    -------- Replication Metrics -----------------------------------------------------------------------
    [--] Galera Synchronous replication: NO
    [--] No replication slave(s) for this server.
    [--] Binlog format: STATEMENT
    [--] XA support enabled: ON
    [--] Semi synchronous replication Master: Not Activated
    [--] Semi synchronous replication Slave: Not Activated
    [--] This is a standalone server
     
    -------- Recommendations ---------------------------------------------------------------------------
    General recommendations:
        Control warning line(s) into /var/log/mariadb/mariadb.log file
        Control error line(s) into /var/log/mariadb/mariadb.log file
        Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
        When making adjustments, make tmp_table_size/max_heap_table_size equal
        Reduce your SELECT DISTINCT queries which have no LIMIT clause
        Set thread_cache_size to 4 as a starting value
        Increase table_open_cache gradually to avoid file descriptor limits
        Read this before increasing table_open_cache over 64: http://bit.ly/1mi7c4C
        This is MyISAM only table_cache scalability problem, InnoDB not affected.
        See more details here: https://bugs.mysql.com/bug.php?id=49177
        This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
        Beware that open_files_limit (1024) variable 
        should be greater than table_open_cache (400)
        Consider installing Sys schema from https://github.com/mysql/mysql-sys
        Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: http://bit.ly/2wgkDvS
    Variables to adjust:
        query_cache_size (=0)
        query_cache_type (=0)
        query_cache_limit (> 1M, or use smaller result sets)
        tmp_table_size (> 16M)
        max_heap_table_size (> 16M)
        thread_cache_size (start at 4)
        table_open_cache (> 400)
        innodb_file_per_table=ON
        innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
    
    

  • Moderator

    @Fernando-Gietz How is your fog server running now that you did these updates. I still have it in the back of my head that we need to look into mysql performance number.

    You have a moderately large FOG environment there. I’m wondering if you can collect some stats for me.

    wget http://mysqltuner.com/mysqltuner.pl
    chmod +x mysqltuner.pl 
    ./mysqltuner.pl
    

    For the login credentials enter root and no password.

    I’m interested in seeing what the recommendations are. You will get better results the longer mysql runs. If mysql (your fog server) hasn’t been rebooted in the last week then you should have very solid recommendations in the results.


  • Developer

    Hi,

    My FOG 1.5.2 server had php 5.6 version installed and after upgrade the php version to 7.0 and mariadb to 5.5.60, the performance is better, super better XD.

    The consume of mysql had decrease a lot of and the RAM memory too.

    I use redHat7, more info about this in:
    Is FOG installer ready to install php70 under RedHat 7
    Installation - Centos 7 & PHP 7?
    PHP 7.0.0 finally Released


  • Developer

    Hi,

    some info about this, actually my “poor” server is drowned by the high use of mysql. We have noticed the start of the new academic year and the computer are waking up :(

    Now the mysql consumes the 300% of the CPU, we have 8 vcore in the server.

    # mysqladmin status;
    Uptime: 229755  Threads: 62  Questions: 181260860  Slow queries: 9385  Opens: 155223  Flush tables: 2  Open tables: 400  Queries per second avg: 788.931
    

    Is possible have the mysql server out of the fog server, use an external instance?


  • Moderator

    @fry_p I found the post I was thinking about, but its not specifically for slow CPU.

    https://forums.fogproject.org/topic/12336/allowed-memory-size-of-536870912-bytes-exhausted/7

    I don’t know if truncating the history will help anything in your case.


  • Testers

    @george1421 If you can locate the thread that Tom gave these instructions and/or a wiki article on how to do it, I am more than willing to try. I appreciate your time on this.


  • Moderator

    @george1421 said in FOG Server High CPU:

    php_admin_value[memory_limit] = 256M
    pm.max_requests = 2000
    pm.max_children = 35
    pm.min_spare_servers = 5
    pm.start_servers = 5

    I have no basis of fact on this, but I wonder if this is a mysql issue and not specifically php-fpm.

    If we look at the settings above and Fernando is similar. When php-fpm starts, it will launch 5 php-fpm child servers right away. If the 5 php-fpm processes are heavily used it will spin up a new child instance up to the max_children setting. After the processing is done, if the child process is idle for 10 seconds, that child process will be killed off. It will keep doing the process culling until the min_spare_servers value is reached. Since @fry_p system is currently sitting at 35 php-fpm processes that is telling me that php-fpm can’t respond fast enough with the children it has. In addition mysql being the highest use process that might be the bottle neck (again there is no basis of fact here).

    I seemed to recall a recent thread where the FOG activity log was huge and that was causing an issue. Tom gave the OP of that thread the instructions to purge the activity log, where the OP stated his system was running much better. I wonder if that is the case here. AND/OR we need to look at optimizing the FOG database for the larger installs. The default settings might not be good enough for 1000s of workstations.


  • Developer

    @george1421 Without tasks, now 80% ~ 95% of CPU

    mysql      mysql       --     -     27    S       1   127%    mysqld```

  • Testers

    @george1421 I did indeed reboot and didn’t see any difference. On a gut feeling, I doubled the checkin time from 180s to 360s and that has marginally helped the CPU usage. I am just puzzled why this wasn’t an issue at the end of the last school year? Our number of clients has not changed and I have been on 1.5.4 for a while now. Here is the graph of my current CPU Usage on that VM. You can see where it was at, when I made the setting change in www.conf, changing the checkin time, etc:
    0_1535460890658_Capturecpu.PNG

    I did notice I still had many php-fpm processes, but overall utilization seems to be down:

    0_1535461118715_Capturetop.PNG

    EDIT: I did forget to mention I have this running on 2vcpus and 8GB of Memory. Do I not have it specced high enough?


  • Moderator

    @Fernando-Gietz With 7000 clients, are you also seeing high mysql cpu usage?


  • Developer

    Hi,

    I have the same problem but I change some things in the www.conf file and it seems that the performance is better.

    pm = ondemand
    pm.max_children = 50
    pm.start_servers = 16
    pm.min_spare_servers = 8
    pm.max_spare_servers = 16
    pm.process_idle_timeout = 10s;
    

    Actually I have 7000 clients in this server.


 

514
Online

5.4k
Users

12.6k
Topics

118.7k
Posts