FOG Unresponsive under "heavy" load


  • Moderator

    @librarymark What I want you to test is outlined in this post: https://forums.fogproject.org/topic/11713/503-service-unavailable-error/40

    I want you to update this section

        <Proxy "fcgi://127.0.0.1:9000">
            ProxySet timeout=500
        </Proxy>
    

    Set the timeout in seconds to be just a bit longer than your push time.



  • @george1421
    I just upped the memory in/etc/php/7.1/fpm/pool.d/www.conf:

    php_admin_value[memory_limit] = 256M 
    

  • Moderator

    @librarymark OK for the gateway timeout lets work with that. I think if you look in the apache error file. You will see a php timeout waiting for php-fpm to respond. What we need to do is tell apache to wait a bit before timing out.

    About how long does it take to push out your image to 20 computers?


  • Moderator

    @librarymark said in FOG Unresponsive under "heavy" load:

    Edit: Well, I take that back (a little bit). After a 20-pc multicast session, none of the PC’s were able to ‘update the database’. I had to cancel the session, reboot the fog server, and reboot the PC’s . At least the image was successfully blasted out otherwise I would be having a bad day right about now.

    It would be interesting to know the memory usage when this broke.

    Also just for clarity what updates did you do to the www.conf file, up the memory to 256MB?


  • Moderator

    @librarymark Do you get php memory exhaustion in the logs?

    Would also be interested in seeing your free -m and top (shift+m) stats when this happens.



  • @librarymark
    And after I reboot the server and the multicast actually runs, the PC’s are stuck at this:
    0_1528808253831_295033fa-25d0-497e-8c95-59aef6f22f3a-image.png

    and FOG’s webpage says this:
    0_1528808342767_b826c179-b290-4e0e-842f-42e53a8d96b9-image.png



  • @librarymark
    And while trying to multicast 8 pcs, now I get this again: alt text



  • I am running Ubuntu 16.04 server in vmware. I was never able to make 1.5.4 multicast until I made the changes outlined here to the www.conf file. I was suffering the same things that fry_p had probloms with. Downloading boot.php would just be “…” for days. Now it works (like it used to).

    Thank you, george1421!

    Edit: Well, I take that back (a little bit). After a 20-pc multicast session, none of the PC’s were able to ‘update the database’. I had to cancel the session, reboot the fog server, and reboot the PC’s . At least the image was successfully blasted out otherwise I would be having a bad day right about now.


  • Moderator

    I’m wondering if the Ondemand FPM handler is the better choice for FOG in general and such cases specifically.

    In my experience Ondemand is only marginally slower than Dynamic or Static, but uses far less RAM on average. It’s also far easier to setup correctly since you don’t require minimum children or anything like that.

    The problem with the current set up is that FPM processes that have claimed a lot of RAM will only respawn after they’ve met the request limit which could take ages in certain scenarios.

    In Ondemand you can specify the idle timeout so that if a process is doing nothing it will be killed off and the memory freed to the system.

    I will also recommend the Event MPM for Apache alongside this. There is little point to remain with Prefork when we are using FPM anyway.


  • Moderator

    @george1421 I made the changes but it got me thinking. I’ve been meaning to rebuild our FOG server on a more proper OS (Centos 7), so on Friday night, I did just that. I’ll let you know if I have any issues with mass unicasting now, but the variables have changed. I feel a lot better about the stability with the new install for now.


  • Moderator

    @george1421 We will be doing an exorbitant of lab re-imaging this summer, so testing will not be an issue. I will do this and certainly report back. Probably in the next few days.


  • Moderator

    @fry_p well I have no proof of this but my intuition is telling me that php-fpm is probably running out of memory when you are unicasting to that many systems. So as an experiment I want you to do this .

    1. We need to locate a file called www.conf in the etc directory. It should be in a directory that has php-fpm in the path. Use this command.
      find /etc -name www.conf
    2. Edit that file down towards the bottom. You should see a section that has a few entries that start out with php_admin_value. I want you to add a new line with this:
      php_admin_value[memory_limit] = 256M The exact placement of the line doesn’t really matter but keep it in the admin value section.
    3. Save and exit your text editor.
    4. Restart pgp-fpm and apache (make sure you don’t have imaging running when you do this)
    sudo systemctl restart php-fpm
    sudo systemctl restart apache2
    

    Now when you have time or your next be image push see if you run into issue again.


  • Moderator

    @george1421 20 unicast deployments. I also seemed to have triggered it when truncating the imaging log in mysql.


  • Moderator

    @fry_p that tells me that php-fpm is doing its job and serving the php pages. We just found an issue with debian where it wasn’t.

    So it looks like you are running into issue only during multicasting? Or was that 20 unicast images?


  • Moderator

    Hi @george1421 ,
    At present, things seem quiet, but here are screens of system monitor on ubuntu:

    Top CPU usage:
    0_1528478572725_fogscreen1.PNG

    Top Memory Usage:
    0_1528478592606_fogscreen2.PNG

    So yeah, when not in crisis mode, things seem normal to me.


  • Moderator

    @fry_p Ok I have a feeling I know what it is, but lets collect some information.

    When you look at top and sort by processor what has the top cpu spots consistently?

    What about top memory?


Log in to reply
 

321
Online

6.1k
Users

13.4k
Topics

126.4k
Posts