How increase the FOG server performance?
-
Hi FOGers!
I need help to customize the setting of my FOG server to increase the performance.
Environment:
7000 host in the IT rooms
300 IT rooms
9TB of images (increasing)
60 technicians
1 FOG server and 1 storage nodeActually we use an old FOG version (0.30) and works fine … very fine. But we need to migrate the FOG version to the last version.
To do this step I installed two FOG servers with the 1.5 RC x version (dev and preproduction environments) but I have performance problems.- The web UI goes fine until you send a multicast tasks or you want to see the membership of one group [more info here]
- I don’t know if is normal but the mysqld process uses 1,3G of RAM
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2073 mysql 20 0 3770600 1,372g 3920 S 0,3 11,8 3448:19 mysqld
I use mytop tool to see the mysql performance
MySQL on localhost (5.5.56-MariaDB) up 48+03:43:38 [17:36:15] Queries: 397.4M qps: 100 Slow: 953.0 Se/In/Up/De(%): 87/01/01/00 qps now: 84 Slow qps: 0.0 Threads: 8 ( 1/ 0) 86/00/00/00 Key Efficiency: 100.0% Bps in/out: 31.1k/109.1k Now in/out: 16.5k/144.6k
84 queries per second, are not a lot of?
3) FOGImageReplicator and FOGSnapinReplicator. If I have only one node, these two daemons, are neccessaries?
4) Can I enable the php-fdm to increase the performance [https://forums.fogproject.org/topic/10717/can-php-fpm-make-fog-web-gui-fast]? -
@fernando-gietz Lets get a bit more details here.
- How many vCPUs does your FOG server have?
- Do you use the fog client? If so what is your check in interval?
- How many network adapters do you have in this fog server?
- Is this fog server virtual or physical?
- What kind of disk subsystem do you have? (raid, single disk, ssd,??)
-
How many vCPUs does your FOG server have?
6 vCPU and 12 GB RAM
Do you use the fog client? If so what is your check in interval?
Yes, but is not installed in all of them. Actually the client is installed in 600 computers. CLIENT CHECKIN TIME = 90
How many network adapters do you have in this fog server?
Two adapters. One for clients and one for the storage.
Is this fog server virtual or physical?
Is virtual
What kind of disk subsystem do you have? (raid, single disk, ssd,??)
I dont know But is not bad, we use the Production environment of the university. I can do download tasks at 13 GB/min, then I suppose that the disks are not the problemOS: RHEL 7 64 bits
-
@fernando-gietz It would be interesting to see what
top
had to say. With 6 vCPUs, it would be interesting to know how many cores your server has. If it has way more than 6, then 6 vCPUs is OK. Otherwise adding more vCPUs than necessary will slow down your VM.My initial reaction is to take your client check in time to 15 minutes, in stead of 90 seconds. At 90 seconds you have 600 hosts hitting your FOG server (at an average lineralized rate) of 6 hosts per second. We all know host check in at random. So you might have 15 check in, in 1 second and 2 check in, the next second. So drop your check in period to 10-15 minutes.
Second I would surely enable php-fpm and memcache to see how well it improves your performance. I have only done this on a small scale and that really helped me with web server responsiveness.
Hopefully your vm host server uses more than one network interface to the building switches. For a university I might expect that they use 10 - 40GbE networking. Also look at what interface your VM is using to interface with your vm host server. If your hypervisor is ESX (vSphere) then ensure you are using the VMX3 network interface. That should give you 10G to your vSwitch.
Lastly, you may be at a scale (number of users) where you might consider removing the sql server from FOG and running an independent SQL server specifically configured to run MS SQL.
I think I might do the first 2 in the list and check on the 3rd one. Leave extracting mssql server out of the fog server until last.
-
top command:
top - 18:41:55 up 48 days, 4:49, 2 users, load average: 0,19, 0,23, 0,29 Tasks: 282 total, 1 running, 278 sleeping, 0 stopped, 3 zombie %Cpu(s): 8,2 us, 2,2 sy, 0,0 ni, 89,6 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem : 12138956 total, 177100 free, 2809672 used, 9152184 buff/cache KiB Swap: 1023996 total, 199544 free, 824452 used. 8521144 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26061 apache 20 0 543340 45800 6768 S 11,3 0,4 6:29.34 httpd 13607 apache 20 0 700016 47256 8016 S 9,0 0,4 14:19.99 httpd 16160 apache 20 0 678892 27200 9160 S 7,3 0,2 1:32.28 httpd 2073 mysql 20 0 3770600 1,372g 3920 S 6,0 11,8 3452:06 mysqld
atop command:
PRC | sys 0.13s | user 0.20s | #proc 285 | #trun 3 | #tslpi 328 | #tslpu 0 | #zombie 3 | #exit 7 | CPU | sys 3% | user 4% | irq 0% | idle 593% | wait 0% | guest 0% | curf 2.67GHz | curscal ?% | cpu | sys 1% | user 0% | irq 0% | idle 99% | cpu003 w 0% | guest 0% | curf 2.67GHz | curscal ?% | cpu | sys 1% | user 2% | irq 0% | idle 98% | cpu005 w 0% | guest 0% | curf 2.67GHz | curscal ?% | cpu | sys 1% | user 1% | irq 0% | idle 99% | cpu004 w 0% | guest 0% | curf 2.67GHz | curscal ?% | cpu | sys 1% | user 0% | irq 0% | idle 99% | cpu000 w 0% | guest 0% | curf 2.67GHz | curscal ?% | cpu | sys 0% | user 1% | irq 0% | idle 99% | cpu001 w 0% | guest 0% | curf 2.67GHz | curscal ?% | cpu | sys 0% | user 0% | irq 0% | idle 100% | cpu002 w 0% | guest 0% | curf 2.67GHz | curscal ?% | CPL | avg1 0.08 | avg5 0.19 | avg15 0.27 | | csw 5925 | intr 5744 | | numcpu 6 | MEM | tot 11.6G | free 147.2M | cache 8.5G | buff 0.1M | slab 221.8M | shmem 428.8M | vmbal 0.0M | hptot 0.0M | SWP | tot 1.0G | free 194.9M | | | | | vmcom 2.9G | vmlim 6.8G | LVM | Datos-root | busy 1% | read 5 | write 4 | KiB/w 8 | MBr/s 0.19 | MBw/s 0.01 | avio 4.56 ms | LVM | Datos-tmp | busy 0% | read 0 | write 1 | KiB/w 4 | MBr/s 0.00 | MBw/s 0.00 | avio 1.00 ms | DSK | sda | busy 1% | read 5 | write 5 | KiB/w 7 | MBr/s 0.19 | MBw/s 0.01 | avio 4.20 ms | NET | transport | tcpi 10 | tcpo 12 | udpi 1924 | udpo 1920 | tcpao 2 | tcppo 2 | tcprs 3 | NET | network | ipi 2102 | ipo 2088 | ipfrw 0 | deliv 2102 | | icmpi 0 | icmpo 0 | NET | ens192 ---- | pcki 2108 | pcko 2088 | si 220 Kbps | so 1754 Kbps | erri 0 | erro 0 | drpo 0 | NET | ens224 ---- | pcki 1 | pcko 1 | si 0 Kbps | so 0 Kbps | erri 0 | erro 0 | drpo 0 |
The checking time, what checks? The computer state? 15 minutes is a lot of for us. Take note that if you send a multicast tasks, the computers will shutdown in very differents moments and some ones will be out of the tasks (if you have a multicast timeout of 5 minutes)
-
@fernando-gietz I think maybe we are not talking about the same check in time.
Also your CPU usage doesn’t look bad (according to top).
-
@george1421 We are talking about the same check time This check time, what means?
I am worry about the mysql performance and the huge use of RAM, 1,3GB.
2073 mysql 20 0 3770600 1,372g 3920 S 6,0 11,8 3452:06 mysqld
And when I want to see the membership of one group, the apache use the 100% vCPU and I spend two minutes to see the list of them.
The swap use, is normal? circa 100%
-
@fernando-gietz said in How increase the FOG server performance?:
We are talking about the same check time This check time, what means?
What this means, it tells the client “Check back with the server every XX seconds to see if there is something for you to do”. So the clients will query the FOG server every XX seconds to see if there are snapins to deploy or system rename events, or what ever you can schedule with the FOG Server. This I feel the FOG server and MySQL are busy servicing these client check ins to do much of anything else. As I suggested change the check in time to 900 (15 min) and see if this resolves your problem, or makes it easier on the FOG server. If not, you can change it back.
Normally with that much ram, swap is never used. 800MB does seem like a lot. 1.3GB of ram for mysql process does seem to be a lot too. Again drop your check in time and wait 30 minutes to see if the resources free up on your fog server.
-
I have restarted the mysql server and the usage has downed
8895 mysql 20 0 1300380 93492 9236 S 7,0 0,8 0:05.37 mysqld
I have config the check_time to 900 seconds
-
The activity of mysql server is huge. I have restarted the server and in seven minutes:
MySQL on localhost (5.5.56-MariaDB) up 0+00:07:00 [16:13:04] Queries: 38.1k qps: 93 Slow: 0.0 Se/In/Up/De(%): 94/00/00/00 qps now: 102 Slow qps: 0.0 Threads: 5 ( 1/ 0) 85/01/00/00 Key Efficiency: 100.0% Bps in/out: 13.5k/43.9k Now in/out: 41.3k/190.2k Id User Host/IP DB Time Cmd Query or State -- ---- ------- -- ---- --- ---------- 664 root localhost test 0 Query show full processlist 782 root localhost fog 4 Sleep 768 root localhost fog 10 Sleep 746 root localhost fog 19 Sleep 10 root localhost fog 414 Sleep
38k queries??
-
I have config the mysql to log the queries and seems that some queries are fool.
180228 16:38:32 364 Connect root@localhost as anonymous on fog 364 Query USE `fog` 364 Query SET SESSION sql_mode='' 365 Connect root@localhost as anonymous on fog 365 Query USE `fog` 364 Quit 365 Query SET SESSION sql_mode='' 366 Connect root@localhost as anonymous on fog 366 Query USE `fog` 365 Quit 366 Query SET SESSION sql_mode='' 366 Query SELECT `vValue` FROM `fog`.`schemaVersion` 366 Query SELECT `pName` FROM `plugins` WHERE `plugins`.`pInstalled`='1' AND `plugins`.`pState`='1' ORDER BY LOWER(`plugins`.`pName`) ASC 366 Query SELECT `settingValue` FROM `globalSettings` WHERE `globalSettings`.`settingKey` IN ('FOG_DEFAULT_LOCALE','FOG_HOST_LOOKUP','FOG_MEMORY_LIMIT','FOG_REAUTH_ON_DELETE','FOG_REAUTH_ON_EXPORT','FOG_TZ_INFO','FOG_VIEW_DEFAULT_SCREEN') ORDER BY LOWER(`globalSettings`.`settingKey`) ASC 366 Query SELECT COUNT(`hosts`.`hostID`) AS `total` FROM `hosts` WHERE `hostPending` = '1' LIMIT 1 366 Query SELECT COUNT(`COLUMN_NAME`)AS`total`FROM`information_schema`.`COLUMNS`WHERE`TABLE_SCHEMA`='fog'AND`TABLE_NAME`='hostMAC'AND`COLUMN_NAME`='hmMAC' 366 Query SELECT COUNT(`hostMAC`.`hmID`) AS `total` FROM `hostMAC` WHERE `hmPending` = '1' LIMIT 1 366 Query SELECT `settingValue` FROM `globalSettings` WHERE `globalSettings`.`settingKey` IN ('FOG_URL_AVAILABLE_TIMEOUT','FOG_URL_BASE_CONNECT_TIMEOUT','FOG_URL_BASE_TIMEOUT') ORDER BY LOWER(`globalSettings`.`settingKey`) ASC 366 Query SELECT `globalSettings`.* FROM `globalSettings` WHERE `settingKey`='FOG_QUICKREG_PENDING_MAC_FILTER' 366 Query SELECT COUNT(`hostMAC`.`hmID`) AS `total` FROM `hostMAC` WHERE `hmMAC` IN ('40:b0:34:39:57:ac') AND `hmPending` IN ('0','') LIMIT 1 366 Query SELECT `hmMAC` FROM `hostMAC` WHERE `hostMAC`.`hmMAC` IN ('40:b0:34:39:57:ac') AND `hostMAC`.`hmPending` IN ('0','') ORDER BY `hostMAC`.`hmID` ASC 366 Query SELECT `hmMAC` FROM `hostMAC` WHERE `hostMAC`.`hmMAC` IN ('40:b0:34:39:57:ac') AND `hostMAC`.`hmIgnoreImaging`='1' ORDER BY `hostMAC`.`hmID` ASC 366 Query SELECT `hostMAC`.* FROM `hostMAC` WHERE `hmMAC`='40:b0:34:39:57:ac' 366 Query SELECT `hmHostID` FROM `hostMAC` WHERE `hostMAC`.`hmPending` IN ('0','') AND `hostMAC`.`hmMAC` IN ('40:b0:34:39:57:ac') ORDER BY `hostMAC`.`hmID` ASC 366 Query SELECT `hosts`.*,`hostMAC`.*,`images`.*,`os`.*,`imagePartitionTypes`.*,`imageTypes`.*,`hostScreenSettings`.*,`hostAutoLogOut`.*,`inventory`.* FROM `hosts` LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID` LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage` LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID` LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID` LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID` LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID` LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID` LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID` WHERE `hostID`='7502' AND `hostMAC`.`hmPrimary` = '1' 366 Query SELECT COUNT(`hookEvents`.`heName`) AS `total` FROM `hookEvents` WHERE `hookEvents`.`heName`='QUEUED_STATES' AND `hookEvents`.`heName` <> '0' 366 Query SELECT COUNT(`hookEvents`.`heName`) AS `total` FROM `hookEvents` WHERE `hookEvents`.`heName`='PROGRESS_STATE' AND `hookEvents`.`heName` <> '0' 366 Query SELECT `taskID` FROM `tasks` LEFT OUTER JOIN `images` ON `images`.`imageID`=`tasks`.`taskImageID` LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID` LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID` LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID` LEFT OUTER JOIN `hosts` ON `hosts`.`hostID`=`tasks`.`taskHostID` LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID` LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID` LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID` LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID` LEFT OUTER JOIN `taskTypes` ON `taskTypes`.`ttID`=`tasks`.`taskTypeID` LEFT OUTER JOIN `taskStates` ON `taskStates`.`tsID`=`tasks`.`taskStateID` LEFT OUTER JOIN `nfsGroupMembers` ON `nfsGroupMembers`.`ngmID`=`tasks`.`taskNFSMemberID` LEFT OUTER JOIN `nfsGroups` ON `nfsGroups`.`ngID`=`nfsGroupMembers`.`ngmGroupID` WHERE `tasks`.`taskHostID`='7502' AND `tasks`.`taskStateID` IN ('0','1','2','3') AND `hostMAC`.`hmPrimary` = '1' ORDER BY LOWER(`tasks`.`taskName`) ASC 366 Query SELECT `hostMAC`.* FROM `hostMAC` WHERE `hmMAC`='40:b0:34:39:57:ac' 366 Quit
In one second queries