How increase the FOG server performance?


  • Developer

    Hi FOGers!

    I need help to customize the setting of my FOG server to increase the performance.

    Environment:

    7000 host in the IT rooms
    300 IT rooms
    9TB of images (increasing)
    60 technicians
    1 FOG server and 1 storage node

    Actually we use an old FOG version (0.30) and works fine … very fine. But we need to migrate the FOG version to the last version.
    To do this step I installed two FOG servers with the 1.5 RC x version (dev and preproduction environments) but I have performance problems.

    1. The web UI goes fine until you send a multicast tasks or you want to see the membership of one group [more info here]
    2. I don’t know if is normal but the mysqld process uses 1,3G of RAM
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    2073 mysql     20   0 3770600 1,372g   3920 S   0,3 11,8   3448:19 mysqld
    
    

    I use mytop tool to see the mysql performance

    MySQL on localhost (5.5.56-MariaDB)     up 48+03:43:38 [17:36:15]
     Queries: 397.4M  qps:  100 Slow:   953.0         Se/In/Up/De(%):    87/01/01/00 
                 qps now:   84 Slow qps: 0.0  Threads:    8 (   1/   0) 86/00/00/00 
     Key Efficiency: 100.0%  Bps in/out: 31.1k/109.1k   Now in/out: 16.5k/144.6k
    

    84 queries per second, are not a lot of?

    1. FOGImageReplicator and FOGSnapinReplicator. If I have only one node, these two daemons, are neccessaries?
    2. Can I enable the php-fdm to increase the performance [https://forums.fogproject.org/topic/10717/can-php-fpm-make-fog-web-gui-fast]?

  • Developer

    I have config the mysql to log the queries and seems that some queries are fool.

    180228 16:38:32	  364 Connect	root@localhost as anonymous on fog
    		  364 Query	USE `fog`
    		  364 Query	SET SESSION sql_mode=''
    		  365 Connect	root@localhost as anonymous on fog
    		  365 Query	USE `fog`
    		  364 Quit	
    		  365 Query	SET SESSION sql_mode=''
    		  366 Connect	root@localhost as anonymous on fog
    		  366 Query	USE `fog`
    		  365 Quit	
    		  366 Query	SET SESSION sql_mode=''
    		  366 Query	SELECT `vValue` FROM `fog`.`schemaVersion`
    		  366 Query	SELECT `pName` FROM `plugins`   WHERE `plugins`.`pInstalled`='1' AND `plugins`.`pState`='1'   ORDER BY LOWER(`plugins`.`pName`) ASC
    		  366 Query	SELECT `settingValue` FROM `globalSettings`   WHERE `globalSettings`.`settingKey` IN ('FOG_DEFAULT_LOCALE','FOG_HOST_LOOKUP','FOG_MEMORY_LIMIT','FOG_REAUTH_ON_DELETE','FOG_REAUTH_ON_EXPORT','FOG_TZ_INFO','FOG_VIEW_DEFAULT_SCREEN')   ORDER BY LOWER(`globalSettings`.`settingKey`) ASC
    		  366 Query	SELECT COUNT(`hosts`.`hostID`) AS `total` FROM `hosts` WHERE `hostPending` = '1' LIMIT 1
    		  366 Query	SELECT COUNT(`COLUMN_NAME`)AS`total`FROM`information_schema`.`COLUMNS`WHERE`TABLE_SCHEMA`='fog'AND`TABLE_NAME`='hostMAC'AND`COLUMN_NAME`='hmMAC'
    		  366 Query	SELECT COUNT(`hostMAC`.`hmID`) AS `total` FROM `hostMAC` WHERE `hmPending` = '1' LIMIT 1
    		  366 Query	SELECT `settingValue` FROM `globalSettings`   WHERE `globalSettings`.`settingKey` IN ('FOG_URL_AVAILABLE_TIMEOUT','FOG_URL_BASE_CONNECT_TIMEOUT','FOG_URL_BASE_TIMEOUT')   ORDER BY LOWER(`globalSettings`.`settingKey`) ASC
    		  366 Query	SELECT `globalSettings`.* FROM `globalSettings`  WHERE `settingKey`='FOG_QUICKREG_PENDING_MAC_FILTER'
    		  366 Query	SELECT COUNT(`hostMAC`.`hmID`) AS `total` FROM `hostMAC` WHERE `hmMAC` IN ('40:b0:34:39:57:ac') AND `hmPending` IN ('0','') LIMIT 1
    		  366 Query	SELECT `hmMAC` FROM `hostMAC`   WHERE `hostMAC`.`hmMAC` IN ('40:b0:34:39:57:ac') AND `hostMAC`.`hmPending` IN ('0','')   ORDER BY `hostMAC`.`hmID` ASC
    		  366 Query	SELECT `hmMAC` FROM `hostMAC`   WHERE `hostMAC`.`hmMAC` IN ('40:b0:34:39:57:ac') AND `hostMAC`.`hmIgnoreImaging`='1'   ORDER BY `hostMAC`.`hmID` ASC
    		  366 Query	SELECT `hostMAC`.* FROM `hostMAC`  WHERE `hmMAC`='40:b0:34:39:57:ac'
    		  366 Query	SELECT `hmHostID` FROM `hostMAC`   WHERE `hostMAC`.`hmPending` IN ('0','') AND `hostMAC`.`hmMAC` IN ('40:b0:34:39:57:ac')   ORDER BY `hostMAC`.`hmID` ASC
    		  366 Query	SELECT `hosts`.*,`hostMAC`.*,`images`.*,`os`.*,`imagePartitionTypes`.*,`imageTypes`.*,`hostScreenSettings`.*,`hostAutoLogOut`.*,`inventory`.* FROM `hosts`  LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `images` ON `images`.`imageID`=`hosts`.`hostImage`  LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID`  LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID`  LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID`  LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID`  WHERE `hostID`='7502'  AND `hostMAC`.`hmPrimary` = '1'
    		  366 Query	SELECT COUNT(`hookEvents`.`heName`) AS `total` FROM `hookEvents` WHERE `hookEvents`.`heName`='QUEUED_STATES' AND `hookEvents`.`heName` <> '0'
    		  366 Query	SELECT COUNT(`hookEvents`.`heName`) AS `total` FROM `hookEvents` WHERE `hookEvents`.`heName`='PROGRESS_STATE' AND `hookEvents`.`heName` <> '0'
    		  366 Query	SELECT `taskID` FROM `tasks`  LEFT OUTER JOIN `images` ON `images`.`imageID`=`tasks`.`taskImageID`  LEFT OUTER JOIN `os` ON `os`.`osID`=`images`.`imageOSID`  LEFT OUTER JOIN `imagePartitionTypes` ON `imagePartitionTypes`.`imagePartitionTypeID`=`images`.`imagePartitionTypeID`  LEFT OUTER JOIN `imageTypes` ON `imageTypes`.`imageTypeID`=`images`.`imageTypeID`  LEFT OUTER JOIN `hosts` ON `hosts`.`hostID`=`tasks`.`taskHostID`  LEFT OUTER JOIN `hostMAC` ON `hostMAC`.`hmHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `hostScreenSettings` ON `hostScreenSettings`.`hssHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `hostAutoLogOut` ON `hostAutoLogOut`.`haloHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `inventory` ON `inventory`.`iHostID`=`hosts`.`hostID`  LEFT OUTER JOIN `taskTypes` ON `taskTypes`.`ttID`=`tasks`.`taskTypeID`  LEFT OUTER JOIN `taskStates` ON `taskStates`.`tsID`=`tasks`.`taskStateID`  LEFT OUTER JOIN `nfsGroupMembers` ON `nfsGroupMembers`.`ngmID`=`tasks`.`taskNFSMemberID`  LEFT OUTER JOIN `nfsGroups` ON `nfsGroups`.`ngID`=`nfsGroupMembers`.`ngmGroupID`   WHERE `tasks`.`taskHostID`='7502' AND `tasks`.`taskStateID` IN ('0','1','2','3') AND `hostMAC`.`hmPrimary` = '1'  ORDER BY LOWER(`tasks`.`taskName`) ASC
    		  366 Query	SELECT `hostMAC`.* FROM `hostMAC`  WHERE `hmMAC`='40:b0:34:39:57:ac'
    		  366 Quit	
    

    In one second queries


  • Developer

    The activity of mysql server is huge. I have restarted the server and in seven minutes:

    MySQL on localhost (5.5.56-MariaDB)     up 0+00:07:00 [16:13:04]
     Queries: 38.1k  qps:   93 Slow:     0.0         Se/In/Up/De(%):    94/00/00/00 
                 qps now:  102 Slow qps: 0.0  Threads:    5 (   1/   0) 85/01/00/00 
     Key Efficiency: 100.0%  Bps in/out: 13.5k/43.9k   Now in/out: 41.3k/190.2k
    
          Id      User         Host/IP         DB      Time    Cmd Query or State                                                       
           --      ----         -------         --      ----    --- ----------                                                           
          664      root       localhost       test         0  Query show full processlist                                                
          782      root       localhost        fog         4  Sleep                                                                      
          768      root       localhost        fog        10  Sleep                                                                      
          746      root       localhost        fog        19  Sleep                                                                      
           10      root       localhost        fog       414  Sleep
    

    38k queries??


  • Developer

    I have restarted the mysql server and the usage has downed

    8895 mysql     20   0 1300380  93492   9236 S   7,0  0,8   0:05.37 mysqld
    

    I have config the check_time to 900 seconds


  • Moderator

    @fernando-gietz said in How increase the FOG server performance?:

    We are talking about the same check time :) This check time, what means?

    What this means, it tells the client “Check back with the server every XX seconds to see if there is something for you to do”. So the clients will query the FOG server every XX seconds to see if there are snapins to deploy or system rename events, or what ever you can schedule with the FOG Server. This I feel the FOG server and MySQL are busy servicing these client check ins to do much of anything else. As I suggested change the check in time to 900 (15 min) and see if this resolves your problem, or makes it easier on the FOG server. If not, you can change it back.

    Normally with that much ram, swap is never used. 800MB does seem like a lot. 1.3GB of ram for mysql process does seem to be a lot too. Again drop your check in time and wait 30 minutes to see if the resources free up on your fog server.


  • Developer

    @george1421 We are talking about the same check time :) This check time, what means?

    I am worry about the mysql performance and the huge use of RAM, 1,3GB.

     2073 mysql     20   0 3770600 1,372g   3920 S   6,0 11,8   3452:06 mysqld
    

    And when I want to see the membership of one group, the apache use the 100% vCPU and I spend two minutes to see the list of them.

    The swap use, is normal? circa 100%


  • Moderator

    @fernando-gietz I think maybe we are not talking about the same check in time.
    0_1519755680130_client_checkin.png

    Also your CPU usage doesn’t look bad (according to top).


  • Developer

    top command:

    top - 18:41:55 up 48 days,  4:49,  2 users,  load average: 0,19, 0,23, 0,29
    Tasks: 282 total,   1 running, 278 sleeping,   0 stopped,   3 zombie
    %Cpu(s):  8,2 us,  2,2 sy,  0,0 ni, 89,6 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
    KiB Mem : 12138956 total,   177100 free,  2809672 used,  9152184 buff/cache
    KiB Swap:  1023996 total,   199544 free,   824452 used.  8521144 avail Mem 
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                          
    26061 apache    20   0  543340  45800   6768 S  11,3  0,4   6:29.34 httpd                                                            
    13607 apache    20   0  700016  47256   8016 S   9,0  0,4  14:19.99 httpd                                                            
    16160 apache    20   0  678892  27200   9160 S   7,3  0,2   1:32.28 httpd                                                            
     2073 mysql     20   0 3770600 1,372g   3920 S   6,0 11,8   3452:06 mysqld
    

    atop command:

    PRC | sys    0.13s  | user   0.20s  | #proc    285  | #trun	 3  | #tslpi   328  | #tslpu     0  | #zombie    3  | #exit      7  |
    CPU | sys       3%  | user      4%  | irq	0%  | idle    593%  | wait	0%  | guest     0%  | curf 2.67GHz  | curscal   ?%  |
    cpu | sys	1%  | user      0%  | irq	0%  | idle     99%  | cpu003 w  0%  | guest     0%  | curf 2.67GHz  | curscal   ?%  |
    cpu | sys	1%  | user      2%  | irq	0%  | idle     98%  | cpu005 w  0%  | guest     0%  | curf 2.67GHz  | curscal   ?%  |
    cpu | sys	1%  | user	1%  | irq	0%  | idle     99%  | cpu004 w  0%  | guest     0%  | curf 2.67GHz  | curscal   ?%  |
    cpu | sys	1%  | user	0%  | irq	0%  | idle     99%  | cpu000 w  0%  | guest     0%  | curf 2.67GHz  | curscal   ?%  |
    cpu | sys	0%  | user	1%  | irq	0%  | idle     99%  | cpu001 w  0%  | guest     0%  | curf 2.67GHz  | curscal   ?%  |
    cpu | sys	0%  | user	0%  | irq	0%  | idle    100%  | cpu002 w  0%  | guest     0%  | curf 2.67GHz  | curscal   ?%  |
    CPL | avg1    0.08  | avg5    0.19  | avg15   0.27  |               | csw     5925  | intr    5744  |               | numcpu     6  |
    MEM | tot    11.6G  | free  147.2M  | cache   8.5G  | buff    0.1M  | slab  221.8M  | shmem 428.8M  | vmbal   0.0M  | hptot   0.0M  |
    SWP | tot     1.0G  | free  194.9M  |               |               |               |               | vmcom   2.9G  | vmlim   6.8G  |
    LVM |   Datos-root  | busy	1%  | read	 5  | write	 4  | KiB/w	 8  | MBr/s   0.19  | MBw/s   0.01  | avio 4.56 ms  |
    LVM |    Datos-tmp  | busy	0%  | read	 0  | write	 1  | KiB/w	 4  | MBr/s   0.00  | MBw/s   0.00  | avio 1.00 ms  |
    DSK |          sda  | busy      1%  | read       5  | write      5  | KiB/w      7  | MBr/s   0.19  | MBw/s   0.01  | avio 4.20 ms  |
    NET | transport     | tcpi	10  | tcpo	12  | udpi    1924  | udpo    1920  | tcpao      2  | tcppo      2  | tcprs      3  |
    NET | network       | ipi     2102  | ipo     2088  | ipfrw      0  | deliv   2102  |               | icmpi      0  | icmpo      0  |
    NET | ens192  ----  | pcki    2108  | pcko    2088  | si  220 Kbps  | so 1754 Kbps  | erri       0  | erro       0  | drpo       0  |
    NET | ens224  ----  | pcki       1  | pcko       1  | si    0 Kbps  | so    0 Kbps  | erri	 0  | erro	 0  | drpo	 0  |
    

    The checking time, what checks? The computer state? 15 minutes is a lot of for us. Take note that if you send a multicast tasks, the computers will shutdown in very differents moments and some ones will be out of the tasks (if you have a multicast timeout of 5 minutes)


  • Moderator

    @fernando-gietz It would be interesting to see what top had to say. With 6 vCPUs, it would be interesting to know how many cores your server has. If it has way more than 6, then 6 vCPUs is OK. Otherwise adding more vCPUs than necessary will slow down your VM.

    My initial reaction is to take your client check in time to 15 minutes, in stead of 90 seconds. At 90 seconds you have 600 hosts hitting your FOG server (at an average lineralized rate) of 6 hosts per second. We all know host check in at random. So you might have 15 check in, in 1 second and 2 check in, the next second. So drop your check in period to 10-15 minutes.

    Second I would surely enable php-fpm and memcache to see how well it improves your performance. I have only done this on a small scale and that really helped me with web server responsiveness.

    Hopefully your vm host server uses more than one network interface to the building switches. For a university I might expect that they use 10 - 40GbE networking. Also look at what interface your VM is using to interface with your vm host server. If your hypervisor is ESX (vSphere) then ensure you are using the VMX3 network interface. That should give you 10G to your vSwitch.

    Lastly, you may be at a scale (number of users) where you might consider removing the sql server from FOG and running an independent SQL server specifically configured to run MS SQL.

    I think I might do the first 2 in the list and check on the 3rd one. Leave extracting mssql server out of the fog server until last.


  • Developer

    @george1421

    How many vCPUs does your FOG server have?
    6 vCPU and 12 GB RAM
    Do you use the fog client? If so what is your check in interval?
    Yes, but is not installed in all of them. Actually the client is installed in 600 computers. CLIENT CHECKIN TIME = 90
    How many network adapters do you have in this fog server?
    Two adapters. One for clients and one for the storage.
    Is this fog server virtual or physical?
    Is virtual
    What kind of disk subsystem do you have? (raid, single disk, ssd,??)
    I dont know :) But is not bad, we use the Production environment of the university. I can do download tasks at 13 GB/min, then I suppose that the disks are not the problem

    OS: RHEL 7 64 bits


  • Moderator

    @fernando-gietz Lets get a bit more details here.

    1. How many vCPUs does your FOG server have?
    2. Do you use the fog client? If so what is your check in interval?
    3. How many network adapters do you have in this fog server?
    4. Is this fog server virtual or physical?
    5. What kind of disk subsystem do you have? (raid, single disk, ssd,??)

 

357
Online

41.8k
Users

12.3k
Topics

116.0k
Posts