FOG Server High CPU

fry_p

Hello All,
I came in this morning to a few alarms on my VM environment about Guest CPU Usage. Tracked it down to my fog server which I then found was chugging at 100% cpu utilization. Here is a screenshot of the “TOP” command run as root:

I am also getting many “Max children reached” messages in the php-fpm error log on the Web UI.

I am running 1.5.4 on CentOS7.
Thanks!
Paul

george1421

@fry_p Well the first thing I noticed is that you have way too many php-fpm processes running. Typically we see 5-7 process running at one time.

Below is my standard response when it comes to FOG between version 1.5.2 and 1.5.4. Ensure your settings are consistent with these:

Lets assume is the issue we’ve found after FOG 1.5.4 has been released.

Change to the /etc directory from the fog server linux command prompt.
Search for www.conf file. It can be in a number of locations depending on what version of php is installed. Use this command.
find /etc -name www.conf (hopefully you will only find one)
Edit that file file and ensure these settings are accurate. Don’t just add them since all should be there except php_admin_value[memory_limit] = 256M you will need to add that entry.

php_admin_value[memory_limit] = 256M
pm.max_requests = 2000
pm.max_children = 35
pm.min_spare_servers = 5
pm.start_servers = 5

Save and exit your text editor.
Reboot the fog server.
See if that fixes what is wrong. You really should only see this strangeness under heavy load, but I guess it might show up sooner under certain conditions.

Also we found there is something strange going on in the linux kernels after 4.15.2, I’m going to recommend that you downgrade your FOG/FOS kernel to 4.15.2. The issue with later kernels is that its taking 3-5 minutes to create the disk structure under certain circumstances, where with 4.15.2 and older its only seconds to create the structure.

Now the kernel will not impact your issue, but processing is incomplete might be related to the missing php-fpm configuration setting.

///

For completeness, how many computers that have the fog client installed, connect to this fog server. (round numbers are good).

fry_p

Hi George,
Most of these settings were set properly. Below are the ones I set:

pm.max_children = 35
php_admin_value[memory_limit] = 256M

A reboot after that still yields 100% usage and way too many php-fpm processes.
I have about 1300 devices using the FOG client. I realize this is a lot, so I did set the checkin time to 180s. Do you think increasing that more would help?

Thanks!

george1421

@fry_p Did you restart the fog server after making those adjustments?

The thing I noticed is that mysql usage is higher than php-fpm. What I’m thinking is that mysql is slow for some reason, causing php-fpm to detect a slow responding child, so it spins up more children until 35 are running.

What was your client check in time? With 1300 clients, I might think about changing the check in time to 300 (5 min) or 600 (10 min). At 300 seconds (if you average the check ins out) that 4 devices registering every second. We all know that check ins are random based on the population. So you may have 15-20 check in at once, in one second and nothing for 3 seconds after.

Fernando Gietz

Hi,

I have the same problem but I change some things in the www.conf file and it seems that the performance is better.

pm = ondemand
pm.max_children = 50
pm.start_servers = 16
pm.min_spare_servers = 8
pm.max_spare_servers = 16
pm.process_idle_timeout = 10s;

Actually I have 7000 clients in this server.

george1421

@Fernando-Gietz With 7000 clients, are you also seeing high mysql cpu usage?

fry_p

@george1421 I did indeed reboot and didn’t see any difference. On a gut feeling, I doubled the checkin time from 180s to 360s and that has marginally helped the CPU usage. I am just puzzled why this wasn’t an issue at the end of the last school year? Our number of clients has not changed and I have been on 1.5.4 for a while now. Here is the graph of my current CPU Usage on that VM. You can see where it was at, when I made the setting change in www.conf, changing the checkin time, etc:

I did notice I still had many php-fpm processes, but overall utilization seems to be down:

EDIT: I did forget to mention I have this running on 2vcpus and 8GB of Memory. Do I not have it specced high enough?

Fernando Gietz

@george1421 Without tasks, now 80% ~ 95% of CPU

mysql      mysql       --     -     27    S       1   127%    mysqld```

george1421

@george1421 said in FOG Server High CPU:

php_admin_value[memory_limit] = 256M
pm.max_requests = 2000
pm.max_children = 35
pm.min_spare_servers = 5
pm.start_servers = 5

I have no basis of fact on this, but I wonder if this is a mysql issue and not specifically php-fpm.

If we look at the settings above and Fernando is similar. When php-fpm starts, it will launch 5 php-fpm child servers right away. If the 5 php-fpm processes are heavily used it will spin up a new child instance up to the max_children setting. After the processing is done, if the child process is idle for 10 seconds, that child process will be killed off. It will keep doing the process culling until the min_spare_servers value is reached. Since @fry_p system is currently sitting at 35 php-fpm processes that is telling me that php-fpm can’t respond fast enough with the children it has. In addition mysql being the highest use process that might be the bottle neck (again there is no basis of fact here).

I seemed to recall a recent thread where the FOG activity log was huge and that was causing an issue. Tom gave the OP of that thread the instructions to purge the activity log, where the OP stated his system was running much better. I wonder if that is the case here. AND/OR we need to look at optimizing the FOG database for the larger installs. The default settings might not be good enough for 1000s of workstations.

fry_p

@george1421 If you can locate the thread that Tom gave these instructions and/or a wiki article on how to do it, I am more than willing to try. I appreciate your time on this.

george1421

@fry_p I found the post I was thinking about, but its not specifically for slow CPU.

https://forums.fogproject.org/topic/12336/allowed-memory-size-of-536870912-bytes-exhausted/7

I don’t know if truncating the history will help anything in your case.

Fernando Gietz

Hi,

some info about this, actually my “poor” server is drowned by the high use of mysql. We have noticed the start of the new academic year and the computer are waking up

Now the mysql consumes the 300% of the CPU, we have 8 vcore in the server.

# mysqladmin status;
Uptime: 229755  Threads: 62  Questions: 181260860  Slow queries: 9385  Opens: 155223  Flush tables: 2  Open tables: 400  Queries per second avg: 788.931

Is possible have the mysql server out of the fog server, use an external instance?

Fernando Gietz

Hi,

My FOG 1.5.2 server had php 5.6 version installed and after upgrade the php version to 7.0 and mariadb to 5.5.60, the performance is better, super better XD.

The consume of mysql had decrease a lot of and the RAM memory too.

I use redHat7, more info about this in:
Is FOG installer ready to install php70 under RedHat 7
Installation - Centos 7 & PHP 7?
PHP 7.0.0 finally Released

george1421

@Fernando-Gietz How is your fog server running now that you did these updates. I still have it in the back of my head that we need to look into mysql performance number.

You have a moderately large FOG environment there. I’m wondering if you can collect some stats for me.

wget http://mysqltuner.com/mysqltuner.pl
chmod +x mysqltuner.pl 
./mysqltuner.pl

For the login credentials enter root and no password.

I’m interested in seeing what the recommendations are. You will get better results the longer mysql runs. If mysql (your fog server) hasn’t been rebooted in the last week then you should have very solid recommendations in the results.

Fernando Gietz

@george1421 Hi George,

sorry for my late answer, I was out of the forums these days.

Scenario:
FOG Version: 1.5.4
Server OS: RHEL 7
7000 hosts in the server (we only use one server to manege them). No nodes.
Client checkout time: 275 seconds. We began with 900 seconds and after the update to PHP 7.0, we have been decreasing the chechout time to 275. With this configuration the mysql process uses 140-160% of the CPU.

We are testing the server configuration (cores and RAM) to optimize them. We see that 6 vCores (before we had 8 vCores) are few because the server load is great, we will increase them to 8 vCores. To do this we need shutdown the server and we will take in advantage this to increase the hardware profile from 10 to 13 (I think that is the last one in the Vcenter). This, maybe, can increase a few the performance of the server.

After this we will try to optimize the mysql config.

 >>  MySQLTuner 1.7.10 - Major Hayden <major@mhtx.net>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering

[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.60-MariaDB
[OK] Operating on 64-bit architecture
 
-------- Log file Recommendations ------------------------------------------------------------------
[--] Log file: /var/log/mariadb/mariadb.log(218K)
[OK] Log file /var/log/mariadb/mariadb.log exists
[OK] Log file /var/log/mariadb/mariadb.log is readable.
[OK] Log file /var/log/mariadb/mariadb.log is not empty
[OK] Log file /var/log/mariadb/mariadb.log is smaller than 32 Mb
[!!] /var/log/mariadb/mariadb.log contains 429 warning(s).
[!!] /var/log/mariadb/mariadb.log contains 477 error(s).
[--] 89 start(s) detected in /var/log/mariadb/mariadb.log
[--] 1) 180925 16:48:01 [Note] /usr/libexec/mysqld: ready for connections.
[--] 2) 180920 18:36:46 [Note] /usr/libexec/mysqld: ready for connections.
[--] 3) 180920 18:32:57 [Note] /usr/libexec/mysqld: ready for connections.
[--] 4) 180920 18:32:29 [Note] /usr/libexec/mysqld: ready for connections.
[--] 5) 180919 18:14:02 [Note] /usr/libexec/mysqld: ready for connections.
[--] 6) 180919 18:03:47 [Note] /usr/libexec/mysqld: ready for connections.
[--] 7) 180919 14:11:52 [Note] /usr/libexec/mysqld: ready for connections.
[--] 8) 180918 18:57:39 [Note] /usr/libexec/mysqld: ready for connections.
[--] 9) 180918 18:54:57 [Note] /usr/libexec/mysqld: ready for connections.
[--] 10) 180918  9:18:59 [Note] /usr/libexec/mysqld: ready for connections.
[--] 65 shutdown(s) detected in /var/log/mariadb/mariadb.log
[--] 1) 180925 16:45:39 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 2) 180920 18:35:51 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 3) 180920 18:32:53 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 4) 180920 18:32:25 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 5) 180919 18:12:16 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 6) 180919 18:03:44 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 7) 180918 18:57:17 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 8) 180918 18:54:55 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 9) 180917 19:03:31 [Note] /usr/libexec/mysqld: Shutdown complete
[--] 10) 180907 18:15:30 [Note] /usr/libexec/mysqld: Shutdown complete
 
-------- Storage Engine Statistics -----------------------------------------------------------------
[--] Status: +ARCHIVE +Aria +BLACKHOLE +CSV +FEDERATED +InnoDB +MEMORY +MRG_MYISAM +MyISAM +PERFORMANCE_SCHEMA 
[--] Data in MyISAM tables: 56.9M (Tables: 67)
[--] Data in InnoDB tables: 26.8M (Tables: 7)
[OK] Total fragmented tables: 0
 
 
-------- CVE Security Recommendations --------------------------------------------------------------
[OK] NO SECURITY CVE FOUND FOR YOUR VERSION
 
-------- Performance Metrics -----------------------------------------------------------------------
[--] Up for: 7d 23h 44m 8s (373M q [541.025 qps], 12M conn, TX: 626G, RX: 83G)
[--] Reads / Writes: 99% / 1%
[--] Binary logging is disabled
[--] Physical Memory     : 15.5G
[--] Max MySQL memory    : 972.2M
[--] Other process memory: 8.2G
[--] Total buffers: 416.0M global + 2.8M per thread (200 max threads)
[--] P_S Max memory usage: 0B
[--] Galera GCache Max memory usage: 0B
[OK] Maximum reached memory usage: 644.1M (4.05% of installed RAM)
[OK] Maximum possible memory usage: 972.2M (6.12% of installed RAM)
[OK] Overall possible memory usage with other process is compatible with memory available
[OK] Slow queries: 0% (398/373M)
[OK] Highest usage of available connections: 41% (82/200)
[OK] Aborted connections: 0.00%  (1/12420260)
[!!] name resolution is active : a reverse name resolution is made for each new connection and can reduce performance
[!!] Query cache may be disabled by default due to mutex contention.
[!!] Query cache efficiency: 0.0% (0 cached / 334M selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (1K temp sorts / 97M sorts)
[OK] No joins without indexes
[!!] Temporary tables created on disk: 67% (7M on disk / 10M total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 1% (400 open / 20K opened)
[OK] Open file limit used: 44% (455/1K)
[OK] Table locks acquired immediately: 99% (614M immediate / 617M locks)
 
-------- Performance schema ------------------------------------------------------------------------
[--] Performance schema is disabled.
[--] Memory used by P_S: 0B
[--] Sys schema isn't installed.
 
-------- ThreadPool Metrics ------------------------------------------------------------------------
[--] ThreadPool stat is enabled.
[--] Thread Pool Size: 6 thread(s).
[--] Using default value is good enough for your version (5.5.60-MariaDB)
 
-------- MyISAM Metrics ----------------------------------------------------------------------------
[!!] Key buffer used: 24.6% (33M used / 134M cache)
[OK] Key buffer size / total MyISAM indexes: 128.0M/26.9M
[OK] Read Key buffer hit rate: 100.0% (5B cached / 45K reads)
[!!] Write Key buffer hit rate: 4.5% (13M cached / 615K writes)
 
-------- InnoDB Metrics ----------------------------------------------------------------------------
[--] InnoDB is enabled.
[--] InnoDB Thread Concurrency: 0
[!!] InnoDB File per table is not activated
[OK] InnoDB buffer pool / data size: 128.0M/26.8M
[!!] Ratio InnoDB log file size / InnoDB Buffer pool size (7.8125 %): 5.0M * 2/128.0M should be equal 25%
[OK] InnoDB buffer pool instances: 1
[--] InnoDB Buffer Pool Chunk Size not used or defined in your version
[OK] InnoDB Read buffer efficiency: 100.00% (1698149490 hits/ 1698149871 total)
[OK] InnoDB Write log efficiency: 99.83% (41391992 hits/ 41460907 total)
[OK] InnoDB log waits: 0.00% (0 waits / 68915 writes)
 
-------- AriaDB Metrics ----------------------------------------------------------------------------
[--] AriaDB is enabled.
[OK] Aria pagecache size / total Aria indexes: 128.0M/1B
[!!] Aria pagecache hit rate: 84.7% (167M cached / 25M reads)
 
-------- TokuDB Metrics ----------------------------------------------------------------------------
[--] TokuDB is disabled.
 
-------- XtraDB Metrics ----------------------------------------------------------------------------
[--] XtraDB is disabled.
 
-------- RocksDB Metrics ---------------------------------------------------------------------------
[--] RocksDB is disabled.
 
-------- Spider Metrics ----------------------------------------------------------------------------
[--] Spider is disabled.
 
-------- Connect Metrics ---------------------------------------------------------------------------
[--] Connect is disabled.
 
-------- Galera Metrics ----------------------------------------------------------------------------
[--] Galera is disabled.
 
-------- Replication Metrics -----------------------------------------------------------------------
[--] Galera Synchronous replication: NO
[--] No replication slave(s) for this server.
[--] Binlog format: STATEMENT
[--] XA support enabled: ON
[--] Semi synchronous replication Master: Not Activated
[--] Semi synchronous replication Slave: Not Activated
[--] This is a standalone server
 
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
    Control warning line(s) into /var/log/mariadb/mariadb.log file
    Control error line(s) into /var/log/mariadb/mariadb.log file
    Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
    When making adjustments, make tmp_table_size/max_heap_table_size equal
    Reduce your SELECT DISTINCT queries which have no LIMIT clause
    Set thread_cache_size to 4 as a starting value
    Increase table_open_cache gradually to avoid file descriptor limits
    Read this before increasing table_open_cache over 64: http://bit.ly/1mi7c4C
    This is MyISAM only table_cache scalability problem, InnoDB not affected.
    See more details here: https://bugs.mysql.com/bug.php?id=49177
    This bug already fixed in MySQL 5.7.9 and newer MySQL versions.
    Beware that open_files_limit (1024) variable 
    should be greater than table_open_cache (400)
    Consider installing Sys schema from https://github.com/mysql/mysql-sys
    Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: http://bit.ly/2wgkDvS
Variables to adjust:
    query_cache_size (=0)
    query_cache_type (=0)
    query_cache_limit (> 1M, or use smaller result sets)
    tmp_table_size (> 16M)
    max_heap_table_size (> 16M)
    thread_cache_size (start at 4)
    table_open_cache (> 400)
    innodb_file_per_table=ON
    innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.

Tom Elliott

@Fernando-Gietz mind clearing out some of the userTracking and imagingLog tables? I’ve seen issues you’re describing simply because these logs are always added to while rarely cleared. This essentially makes each request per host have to pull in a ton of data.

Fernando Gietz

@Tom-Elliott I prefer not clean or delete userTracking table. Once my boss asked about it to investigate the bad use of the computers :0 The slow query log shows userTracking queries, but are select queries and not insert queries, who or which process asks about it?

# Time: 181003 16:37:01
# User@Host: root[root] @ localhost []
# Thread_id: 12427592  Schema: fog  QC_hit: No
# Query_time: 83.091067  Lock_time: 0.000094  Rows_sent: 155086  Rows_examined: 536104
SET timestamp=1538577421;
SELECT * FROM `userTracking`  LEFT OUTER JOIN `hosts` ON `hosts`.`hostID`=`userTracking`.`utHostID`  LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage`  LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID`  LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID`  LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID`  LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID`    WHERE `hostMAC`.`hmPrimary` = '1'  ORDER BY `userTracking`.`utID` ASC;
# Time: 181003 16:46:07
# User@Host: root[root] @ localhost []
# Thread_id: 12448399  Schema: fog  QC_hit: No
# Query_time: 48.331137  Lock_time: 0.000105  Rows_sent: 155135  Rows_examined: 536251
SET timestamp=1538577967;
SELECT * FROM `userTracking`  LEFT OUTER JOIN `hosts` ON `hosts`.`hostID`=`userTracking`.`utHostID`  LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage`  LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID`  LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID`  LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID`  LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID`    WHERE `hostMAC`.`hmPrimary` = '1'  ORDER BY `userTracking`.`utID` ASC;

Tom Elliott

@Fernando-Gietz every host requests all the information from linking tables. That’s the only reason I ask about clearing. Of course this can be changed but would require a lot of work to the code base, as the intent was to have all information as readily available as possible.

Fernando Gietz

@Tom-Elliott Then, in this case, is the FOG client who do this query? The query shows a lot of info, is all necessary? Maybe is possible to put a flag on the service which do the query and make it configurable for webUI?

Do a “SELECT * FROM” query is possibe use a sledgehammer to crack nuts.

Fernando Gietz

I am searching from where is called this query and I have seen that it seems an error or a bug.

 <?php
 class UserTrackingManager extends FOGManagerController
 {
     public $tablename = 'dirCleaner';
     public function install()
     {
         $this->uninstall();
         $sql = Schema::createTable(
             $this->tablename,
             true,
             array(
                 'dcID',
                 'dcPath'
             ),
             array(
                 'INTEGER',
                 'LONGTEXT'
             ),
             array(
                 false,
                 false
             ),
             array(
                 false,
                 false
             ),
             array(
                 'dcID',
                 'dcPath'
             ),
             'MyISAM',
             'utf8',
             'dcID',
             'dcID'
         );
         return self::$DB->query($sql);
     }
 }

¿¿tablename= dirCleaner??

Of course I don’t find who does this query XD

FOG Server High CPU

51

12.7k

17.6k

156.8k