SVN 4972 to SVN 5046 server load


  • Testers

    I updated to the latest svn last night and my server load went from and average of around 4 to 27 and the web pages are slow to load again this is with 0 active tasks. All my apache log shows is.

    [Fri Oct 23 06:37:08.122924 2015] [mpm_prefork:notice] [pid 1615] AH00163: Apache/2.4.16 (Ubuntu) OpenSSL/1.0.1f configured -- resuming normal operations
    [Fri Oct 23 06:37:08.122977 2015] [core:notice] [pid 1615] AH00094: Command line: '/usr/sbin/apache2'```


  • @Tom-Elliott said:

    You are running legacy client, so you would have to update all of the clients to receive these new configs?

    set the interval for checkin
    reinstall the legacy client on a computer.
    copy C:\Program Files (x86)\FOG\etc\config.ini
    push that file out as an update…

    There is an option here:
    Service Configuration -> Client Updater

    This area will allow you to push out the new config.


  • Senior Developer

    @Joseph-Hales 600 is 10 minutes, so 60000 is 1000 minutes?

    Where did you change this? You are running legacy client, so you would have to update all of the clients to receive these new configs?



  • @Joseph-Hales How is it possible for the load averages to go up when you increase the checkin time span ???


  • Testers

    Current checkin time was set to 600 I now moved it to 60000 and restarted the server this is the result. Also what is the increment for the checkin value is it seconds or minutes. Can we update the tooltips to indicate the values for the config fields in fog?

    Username	jhales
    Web Server	10.200.10.150
    TFTP Server	10.200.10.150
    Load Average	34.96, 26.32, 11.34
    System Uptime	5 min, 0 users```
    
    No change or even worse preformance?


  • @Joseph-Hales said:

    Possibly but we are about to update the server hardware and add a storage node but up till now most of the performance issues appear to be database related not client checkin but I will test it next week.

    It’s not hard to do, and is quite simple to do, but it’s not straight forward.

    Install each additional server as a full installation. Notate your ftp user and pass in /opt/fog/.fogsettings for each server.

    Point all servers to the main DB via their /opt/fog/.fogsettings file. the mysqlhost, mysqluser, and mysqlpass should be what the main needs.

    Enable remote access on the main server’s mysql.

    Copy /opt/fog/snapins/ssl from the main to all the other servers, delete what the other servers have and put the new files in place with the right permissions for fog:apache

    then,

    You now should have the convenience to access the fog web interface from any of the servers, and just set up each server as a storage node. Make the main server the master node.

    There are a great number of ways to organize storage nodes/storage groups but since you have one massive server now that does everything, it’d operate the same if you made that one server a master node, then make all the storage nodes in the same group as the master node.

    With this setup, you could then point each site’s DHCP Scopes to give out the local fog server for options 066 and 067 also ;-)


  • Testers

    Possibly but we are about to update the server hardware and add a storage node but up till now most of the performance issues appear to be database related not client checkin but I will test it next week.



  • @Joseph-Hales said:

    we image 20 to 50 pc’s a day on average.

    Well that’s a lot…

    You would benefit a lot from having multiple “Full installations” of FOG, configuring them all to point to the main server for MySQL (for one master DB), and then just setup each one as a storage node. Then, you could slowly migrate all the clients to point to their local server. Long as you have a copy of your SSL key on all the servers (from the main server), you should be good to go.

    Dispersing this massive load is ultimately going to be the best way to solve the performance issues.


  • Testers

    we image 20 to 50 pc’s a day on average.



  • @Joseph-Hales Not sure…

    but is it worth 10,000 hosts hitting one server every few minutes just so the occasional imaged computer gets snapins fast? To me it’s not worth it.


  • Testers

    SVN 5058 fixes the main issue or at least makes it manageable.

    Username	jhales
    Web Server	10.200.10.150
    TFTP Server	10.200.10.150
    Load Average	6.79, 13.52, 13.55
    System Uptime	1:28, 1 user
    

  • Testers

    Do they still checkin on boot? The reason I ask is that I would like a newly imaged machine to pull snapins as soon as possible.



  • How about making the checkin time even greater?

    I mean, there is no reason for 10,000 hosts in the middle of the school year to be checking for tasks every 2 minutes… that’s kind of absurd.

    Why not set it to 2 hours?


  • Developer

    @Joseph-Hales said:

    I am only running the legacy client on my hosts.

    Thanks for clarifying!! So this idea was really a dead end. That’s good news as I think we can rule out the possibility of it being an issue introduced by the client. Must be the PHP code then I reckon.


  • Testers

    FYI I can load the pages but they are extremely slow and sometimes fail to load at all.


  • Testers

    After updating SVN to 5054 and restarting the server completly it still seems high to me.

    System Overview
    Username	jhales
    Web Server	10.200.10.150
    TFTP Server	10.200.10.150
    Load Average	29.00, 31.61, 20.87
    System Uptime	13 min, 0 users```

  • Testers

    I am only running the legacy client on my hosts.


  • Developer

    @Joseph-Hales Thank you so much for the lsof output text file. I feel like this is kind of proofing that it’s the clients causing this massive storm of apache childs and probably mysql load as well. This is why Tom and I are a bit helpless debugging this issue as we don’t have such a big test environment. I just grepped through the output and found roughly 120 established connections to distinct hosts/clients. And another 50 or so in FIN_WAIT state.

    I think Tom is pretty convinced that it is something he’s done in the code. Maybe it is but I still wonder why clients keep open connections and even don’t close them properly. Maybe this was introduced with the new client? Sorry if I am heading the wrong way here. It’s just me guessing…


  • Senior Developer

    @Joseph-Hales I have not added back any invalid entry checks.


  • Testers

    I have one of the largest deployments running SVN but like I said this was fine under a previous SVN one of the earlier changes that was made was to remove some sql validation checks that were used previously to clean incomplete records from the db it seems innodb didn’t like the function calls I wonder if these have been readded since?


 

526
Online

41.8k
Users

12.3k
Topics

115.9k
Posts