• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    FOG Server High CPU

    Scheduled Pinned Locked Moved Unsolved
    FOG Problems
    4
    23
    4.7k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tom ElliottT
      Tom Elliott @Fernando Gietz
      last edited by

      @Fernando-Gietz this is what I hope to move to in the future but this is not in use currently. The tablename and install method are not used for the core elements and those where just a placeholder until I can mimic the proper table layout for the item.

      It is not in use for core elements.

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

      1 Reply Last reply Reply Quote 0
      • F
        Fernando Gietz Developer
        last edited by Fernando Gietz

        Hello,

        Some news about this problem. We made some changes in our server and his configuration and, actually, the server is not so drowned as before. The conclusion is: the default configuration of apache, php-fpm and mysql is not optimal for large scenarios. If you have a great number of client, you need to tune the server.

        I will tell our previous situation and the actual situation to share our experience.

        Initial Scenario:

        • FOG version 1.5.2
        • Virtual server with 8 vCores and 16 GB RAM
        • OS: RHEL 7
        • Active clients: 7000
        • One fog server and only the default node.

        In July we migrated from our old FOG version (0.32) under RHEL 5 to the new one (1.5.2) under RHEL 7. Without any additional configuration.

        In August we observed that the server consumed a lot of CPU and RAM and we began to have performance troubles (and the course had not started). Panic Mode ON!!

        The first thing that you think is … more resources are neccesary (more wood is the war). ERROR. The System Operation Center (SOC) guys say NO. We can not give you more resources.

        First thing: Update
        we updated the server OS and some packages. For example: php and mariaDB. We had php 5.6 version and we updated to php 7, the performance of php increased a lot of.

        We updated the FOG version from 1.5.2 to 1.5.4

        Second thing: Optimize the virtual machine resources

        Our virtual server is hosted in a VMWARE server with two socket and each one with 6 cores (is an old server). Problem: our virtual server was 8 vcores, 6 vCores in one socket and the other 2 in the other one. The server had time access problems.
        We removed two vCores from the server, in this way all vCores were in the same socket and the time acccess was more quicky. PROBLEM: less resource, more server load. In September the clients began to wake up and the php and mysql queries increased, then more resources were neccesary. To minimize it we increased the checkout time of the client to 900 seconds, with this we decreased the php and mysql queries, but the comsumption was still high (mysqld proccess 300%). The problem was the access time to the cores of the server, we had 6 vCores in a socket with 6 cores and also with more virtual server in the same socket. The more time the vCores were waiting to access to the sockets cores. The vCores were always at 100% of CPU usage.

        To solve this we enabled the NUMA in the server:
        https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-numa_and_libvirt

        With this we distributed the vCores between the two socket: vCPUs 0,1,2 + 8GB RAM in NUMA 0 y 3,4,5, + 8GB in NUMA 1. This is configurated in the VMWARE server. In addition, we install the numad package in our virtual server, this daemon distributed the proccess betwween the two NUMAs. The access to the RAM and CPU was faster.

        For example:

        # ./numa-maps-summary.pl < /proc/1787/numa_maps
        N0        :          100 (  0.00 GB)
        N1        :       648226 (  2.47 GB)
        active    :       435228 (  1.66 GB)
        anon      :       645582 (  2.46 GB)
        dirty     :       647118 (  2.47 GB)
        kernelpagesize_kB:         1012 (  0.00 GB)
        mapmax    :          332 (  0.00 GB)
        mapped    :         1248 (  0.00 GB)
        

        we can see with this python script that the mysql is using the resources of the NUMA1 Node.
        Now we have, again, 8 vCores distribuited between the two NUMA nodes.
        Now, the vCores are at 80%-90%

        Third thing: tunning php, php.fpm and mysql

        We don’t have a lot idea about php, php-fpm and mysql, then we had to read a lot of articles in the web about them.

        Tunning MySQL: to do it we have used the mysqltunner script, http://mysqltuner.com . This script gives you an idea about the performance of the database and how tunne it to increase the performance.

        SET GLOBAL query_cache_size = 4000000; (4MB)
        SET GLOBAL tmp_table_size = 20000000; (20MB)
        SET GLOBAL query_cache_limit = 2000000; (2MB)
        SET GLOBAL max_heap_table_size = 20000000; (20MB)
        SET GLOBAL thread_cache_size = 4;
        SET GLOBAL table_open_cache = 450; 
        

        In the MariaDB web page recommends edcrease the swappiness value (https://mariadb.com/kb/en/library/configuring-swappiness/)

        #sysctl -w vm.swappiness=10
        

        Tunning php-fpm and php: There is some articles about it in this forum.
        PHP-FPM:

        pm = ondemand
        
        ; The number of child processes to be created when pm is set to 'static' and the
        ; maximum number of child processes when pm is set to 'dynamic' or 'ondemand'.
        ; This value sets the limit on the number of simultaneous requests that will be
        ; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.
        ; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
        ; CGI.
        ; Note: Used when pm is set to 'static', 'dynamic' or 'ondemand'
        ; Note: This value is mandatory.
        pm.max_children = 50
        
        ; The number of child processes created on startup.
        ; Note: Used only when pm is set to 'dynamic'
        ; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
        pm.start_servers = 5
        
        ; The desired minimum number of idle server processes.
        ; Note: Used only when pm is set to 'dynamic'
        ; Note: Mandatory when pm is set to 'dynamic'
        pm.min_spare_servers = 5
        
        ; The desired maximum number of idle server processes.
        ; Note: Used only when pm is set to 'dynamic'
        ; Note: Mandatory when pm is set to 'dynamic'
        pm.max_spare_servers = 50
        
        ; The number of seconds after which an idle process will be killed.
        ; Note: Used only when pm is set to 'ondemand'
        ; Default Value: 10s
        pm.process_idle_timeout = 10s;
        

        Normaly the people have pm=dynamic but we use pm=ondemand because we saw that the performance is better.

        Is possible that these parameters will be changed, but now the server runs well, but is October and the download activity has decreased a lot of.

        To see the activity of php you can enable the apache server status in the php.ini and there is a tool “goaccess” too to see the php calls and the number in the terminal:

        #yum install goaccess
        #tail -f /var/log/httpd/access_log | goaccess -

        1 Reply Last reply Reply Quote 3
        • F
          Fernando Gietz Developer
          last edited by Fernando Gietz

          More info about this 🙂

          Now we have configurated the client cheout time = 275 seconds. In September we had to increase it to 900, but after the changes we have setup it as I tell, 275 seconds.

          This is a capture of goaccess tool:

          0_1540379551079_goaccess_RHEL.png

          The capture is after 6 minutes of activity, we can see that:
          Total request: 8436 -> 23,5 req/sec
          1378 visitors requested the /fog/service/getversion.php?NewServise&json page. This says that 1378 client are connected simultaneous.
          1 visitor requested /fog/service/progress.php 114 times. This client is doing a download task.

          Is necessary take in account that when a cleint is doing a dowload, capture or multicast task, this client asks or reports to the server his advance (yes is very pretty and cool see the progress bar) but this info has a price. The client reports his advance more or lees every three or four seconds, when you have one tasks is “pecata minuta”, but when you have 100 or more client doing download or cpature tasks is a problem, because the php server can not process all request simultaneous and is not only the php server, is the mysql server too.

          For example in this capture of htop command (like atop or top):

          0_1540380690148_Captura de pantalla de 2018-10-24 13-08-54.png

          We can see that the vCores are busy, but not at 100%, the load is high, mysqld is using 221% of CPU. In this moment the server is proccessing only the FOG client requests of the computers, there is no any tasks (When the technicians send download, multicast or capture tasks, the server is burning … literally. I saw the load at 60 or more, the server could not attend the all request and refused them ), In this capture shows the activity of the two NUMA nodes clearly.

          Node 0: 1, 2 ,3 and 4
          Node 1: 5 ,6, 7 and 8

          Where is working the mysqld proccess?

          # ./numa-maps-summary.pl < /proc/1787/numa_maps
          N0        :       636053 (  2.43 GB)
          N1        :        14336 (  0.05 GB)
          active    :       352378 (  1.34 GB)
          anon      :       649153 (  2.48 GB)
          dirty     :       649153 (  2.48 GB)
          kernelpagesize_kB:         1016 (  0.00 GB)
          mapmax    :          480 (  0.00 GB)
          mapped    :         1276 (  0.00 GB)
          

          In the Node 0.

          I downloaded a little script, i forgot from where, that shows the usage of RAM of each proccess:

          # ./ps_mem.py 
           Private  +   Shared  =  RAM used	Program
          
            4.0 KiB +  12.5 KiB =  16.5 KiB	agetty
            4.0 KiB +  15.0 KiB =  19.0 KiB	mysqld_safe
            4.0 KiB +  47.5 KiB =  51.5 KiB	rpc.statd
            4.0 KiB +  49.5 KiB =  53.5 KiB	rpc.idmapd
            4.0 KiB +  57.0 KiB =  61.0 KiB	lvmetad
           36.0 KiB +  31.0 KiB =  67.0 KiB	atd
            4.0 KiB +  73.5 KiB =  77.5 KiB	VGAuthService
           88.0 KiB +  32.0 KiB = 120.0 KiB	rhsmcertd
           92.0 KiB +  41.5 KiB = 133.5 KiB	systemd-udevd
          112.0 KiB +  22.5 KiB = 134.5 KiB	sleep
           88.0 KiB +  55.0 KiB = 143.0 KiB	vsftpd
           88.0 KiB +  65.0 KiB = 153.0 KiB	gssproxy
          148.0 KiB +  23.0 KiB = 171.0 KiB	udp-sender
          156.0 KiB +  30.0 KiB = 186.0 KiB	crond
          164.0 KiB +  30.0 KiB = 194.0 KiB	in.tftpd
          180.0 KiB +  20.0 KiB = 200.0 KiB	numad
          192.0 KiB +  16.0 KiB = 208.0 KiB	rhnsd
          128.0 KiB +  83.5 KiB = 211.5 KiB	master
          176.0 KiB +  35.5 KiB = 211.5 KiB	xinetd
          188.0 KiB +  54.5 KiB = 242.5 KiB	auditd
          232.0 KiB +  29.0 KiB = 261.0 KiB	irqbalance
          192.0 KiB +  88.5 KiB = 280.5 KiB	qmgr
          208.0 KiB +  87.0 KiB = 295.0 KiB	sh
          240.0 KiB + 109.5 KiB = 349.5 KiB	rpcbind
          588.0 KiB +  36.5 KiB = 624.5 KiB	systemd-logind
          668.0 KiB +  75.5 KiB = 743.5 KiB	dbus-daemon
          596.0 KiB + 311.5 KiB = 907.5 KiB	polkitd
          800.0 KiB + 175.5 KiB = 975.5 KiB	vmtoolsd
          916.0 KiB +  64.5 KiB = 980.5 KiB	FOGpxe.sh
          936.0 KiB + 135.0 KiB =   1.0 MiB	dnsmasq
            1.1 MiB + 386.0 KiB =   1.5 MiB	NetworkManager
            1.4 MiB + 413.5 KiB =   1.8 MiB	pickup
            2.0 MiB + 101.0 KiB =   2.1 MiB	rpc.mountd
            2.2 MiB +  67.5 KiB =   2.3 MiB	systemd
            2.6 MiB + 308.0 KiB =   2.9 MiB	tuned
            2.9 MiB + 355.5 KiB =   3.2 MiB	mysql
            2.7 MiB + 846.0 KiB =   3.5 MiB	bash (6)
            3.0 MiB + 822.5 KiB =   3.8 MiB	FOGSnapinReplic (2)
            3.1 MiB + 682.5 KiB =   3.8 MiB	FOGImageReplica (2)
            4.2 MiB +  80.0 KiB =   4.2 MiB	nsrexecd
            4.1 MiB + 718.5 KiB =   4.8 MiB	FOGSnapinHash (2)
            3.6 MiB +   1.3 MiB =   4.9 MiB	sudo (3)
            5.4 MiB +   1.0 MiB =   6.4 MiB	FOGTaskSchedule (2)
            7.5 MiB + 118.5 KiB =   7.6 MiB	glusterfsd
            1.9 MiB +   6.1 MiB =   8.0 MiB	sshd (7)
            2.5 MiB +   7.1 MiB =   9.5 MiB	rsyslogd
           10.3 MiB + 713.5 KiB =  11.0 MiB	FOGImageSize (2)
            7.0 MiB +   9.9 MiB =  16.9 MiB	systemd-journald
           21.5 MiB + 791.0 KiB =  22.2 MiB	FOGPingHosts (2)
           25.6 MiB +   1.0 MiB =  26.6 MiB	FOGMulticastMan (2)
          315.4 MiB +  14.6 MiB = 330.0 MiB	php-fpm (51)
            2.5 GiB + 289.0 KiB =   2.5 GiB	mysqld
            5.1 GiB +  14.9 MiB =   5.1 GiB	httpd (12)
          ---------------------------------
                                    8.0 GiB
          =================================
          
          1 Reply Last reply Reply Quote 0
          • 1
          • 2
          • 2 / 2
          • First post
            Last post

          209

          Online

          12.0k

          Users

          17.3k

          Topics

          155.2k

          Posts
          Copyright © 2012-2024 FOG Project