• Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login
  • Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login

FOG Unresponsive under "heavy" load

Scheduled Pinned Locked Moved Unsolved
FOG Problems
6
36
6.7k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F
    fry_p Moderator
    last edited by Jun 8, 2018, 4:55 PM

    Ubuntu 14.04
    FOG 1.5.3-1.5.4

    I have run into a strange issue ever since updating to 1.5.3 and they continue onto 1.5.4. It all started when I went to image a lab of 27 PC’s the other day. We pxe booted and quick deployed each client (the way we do it all of the time). Everything seemed to be going fine, 20 hosts started imaging and others sat in the queue (our max clients on the storage node is 20). I walked away and came back and the lab was in various states of errors. I saw some had quit imaging partway through and others never started. I didn’t catch the actual error as we went into panic mode, but it said something about being unable to open the web server at our fog’s IP address. I went to a non-imaged PC and tried to get to the web interface. It timed out. I was thinking about the apache2 service crashing, so I checked its status. It appeared to be running, but I restarted it. No change. I rebooted the FOG server and shotgun upgraded to 1.5.4. I also noticed on the Vsphere console that memory usage spiked to the top (8GB). The combination of this brought it back up. This was 2 days ago.

    Today, I tried to wipe the imaging log as it had built up about 3 years of entries. I also tried to update the Kernel to Tom Elliott 4.17.0. The UI became unresponsive and if refreshed, timed out. Same thing happened afterward. I was unable to Pxe boot to FOG. I restarted the Apache2 service and rebooted which made it responsive once more.

    Prior to the 1.5.3 update, everything seemed stable. I am now worried about the reliability of my FOG setup. Please let me know any logs I can provide to help troubleshoot.

    Thanks!

    Like open source community computing? Why not do it for a good cause?
    Use your computer/server for humanitarian projects when it is idle!
    https://join.worldcommunitygrid.org?recruiterId=1026912

    G 1 Reply Last reply Jun 8, 2018, 5:18 PM Reply Quote 0
    • G
      george1421 Moderator @fry_p
      last edited by Jun 8, 2018, 5:18 PM

      @fry_p Ok I have a feeling I know what it is, but lets collect some information.

      When you look at top and sort by processor what has the top cpu spots consistently?

      What about top memory?

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

      1 Reply Last reply Reply Quote 0
      • F
        fry_p Moderator
        last edited by Jun 8, 2018, 5:23 PM

        Hi @george1421 ,
        At present, things seem quiet, but here are screens of system monitor on ubuntu:

        Top CPU usage:
        0_1528478572725_fogscreen1.PNG

        Top Memory Usage:
        0_1528478592606_fogscreen2.PNG

        So yeah, when not in crisis mode, things seem normal to me.

        Like open source community computing? Why not do it for a good cause?
        Use your computer/server for humanitarian projects when it is idle!
        https://join.worldcommunitygrid.org?recruiterId=1026912

        G 1 Reply Last reply Jun 8, 2018, 5:46 PM Reply Quote 0
        • G
          george1421 Moderator @fry_p
          last edited by Jun 8, 2018, 5:46 PM

          @fry_p that tells me that php-fpm is doing its job and serving the php pages. We just found an issue with debian where it wasn’t.

          So it looks like you are running into issue only during multicasting? Or was that 20 unicast images?

          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

          F 1 Reply Last reply Jun 8, 2018, 5:47 PM Reply Quote 0
          • F
            fry_p Moderator @george1421
            last edited by Jun 8, 2018, 5:47 PM

            @george1421 20 unicast deployments. I also seemed to have triggered it when truncating the imaging log in mysql.

            Like open source community computing? Why not do it for a good cause?
            Use your computer/server for humanitarian projects when it is idle!
            https://join.worldcommunitygrid.org?recruiterId=1026912

            G 1 Reply Last reply Jun 8, 2018, 5:54 PM Reply Quote 0
            • G
              george1421 Moderator @fry_p
              last edited by george1421 Jun 8, 2018, 11:55 AM Jun 8, 2018, 5:54 PM

              @fry_p well I have no proof of this but my intuition is telling me that php-fpm is probably running out of memory when you are unicasting to that many systems. So as an experiment I want you to do this .

              1. We need to locate a file called www.conf in the etc directory. It should be in a directory that has php-fpm in the path. Use this command.
                find /etc -name www.conf
              2. Edit that file down towards the bottom. You should see a section that has a few entries that start out with php_admin_value. I want you to add a new line with this:
                php_admin_value[memory_limit] = 256M The exact placement of the line doesn’t really matter but keep it in the admin value section.
              3. Save and exit your text editor.
              4. Restart pgp-fpm and apache (make sure you don’t have imaging running when you do this)
              sudo systemctl restart php-fpm
              sudo systemctl restart apache2
              

              Now when you have time or your next be image push see if you run into issue again.

              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

              F L 3 Replies Last reply Jun 8, 2018, 6:07 PM Reply Quote 2
              • F
                fry_p Moderator @george1421
                last edited by Jun 8, 2018, 6:07 PM

                @george1421 We will be doing an exorbitant of lab re-imaging this summer, so testing will not be an issue. I will do this and certainly report back. Probably in the next few days.

                Like open source community computing? Why not do it for a good cause?
                Use your computer/server for humanitarian projects when it is idle!
                https://join.worldcommunitygrid.org?recruiterId=1026912

                1 Reply Last reply Reply Quote 1
                • F
                  fry_p Moderator @george1421
                  last edited by fry_p Jun 11, 2018, 6:56 AM Jun 11, 2018, 12:56 PM

                  @george1421 I made the changes but it got me thinking. I’ve been meaning to rebuild our FOG server on a more proper OS (Centos 7), so on Friday night, I did just that. I’ll let you know if I have any issues with mass unicasting now, but the variables have changed. I feel a lot better about the stability with the new install for now.

                  Like open source community computing? Why not do it for a good cause?
                  Use your computer/server for humanitarian projects when it is idle!
                  https://join.worldcommunitygrid.org?recruiterId=1026912

                  1 Reply Last reply Reply Quote 0
                  • Q
                    Quazz Moderator
                    last edited by Quazz Jun 11, 2018, 8:02 AM Jun 11, 2018, 1:59 PM

                    I’m wondering if the Ondemand FPM handler is the better choice for FOG in general and such cases specifically.

                    In my experience Ondemand is only marginally slower than Dynamic or Static, but uses far less RAM on average. It’s also far easier to setup correctly since you don’t require minimum children or anything like that.

                    The problem with the current set up is that FPM processes that have claimed a lot of RAM will only respawn after they’ve met the request limit which could take ages in certain scenarios.

                    In Ondemand you can specify the idle timeout so that if a process is doing nothing it will be killed off and the memory freed to the system.

                    I will also recommend the Event MPM for Apache alongside this. There is little point to remain with Prefork when we are using FPM anyway.

                    1 Reply Last reply Reply Quote 0
                    • L
                      LibraryMark @george1421
                      last edited by LibraryMark Jun 12, 2018, 6:26 AM Jun 12, 2018, 12:12 PM

                      I am running Ubuntu 16.04 server in vmware. I was never able to make 1.5.4 multicast until I made the changes outlined here to the www.conf file. I was suffering the same things that fry_p had probloms with. Downloading boot.php would just be “…” for days. Now it works (like it used to).

                      Thank you, george1421!

                      Edit: Well, I take that back (a little bit). After a 20-pc multicast session, none of the PC’s were able to ‘update the database’. I had to cancel the session, reboot the fog server, and reboot the PC’s . At least the image was successfully blasted out otherwise I would be having a bad day right about now.

                      L G 2 Replies Last reply Jun 12, 2018, 12:47 PM Reply Quote 0
                      • L
                        LibraryMark @LibraryMark
                        last edited by Jun 12, 2018, 12:47 PM

                        @librarymark
                        And while trying to multicast 8 pcs, now I get this again: alt text

                        L Q 2 Replies Last reply Jun 12, 2018, 12:59 PM Reply Quote 0
                        • L
                          LibraryMark @LibraryMark
                          last edited by Jun 12, 2018, 12:59 PM

                          @librarymark
                          And after I reboot the server and the multicast actually runs, the PC’s are stuck at this:
                          0_1528808253831_295033fa-25d0-497e-8c95-59aef6f22f3a-image.png

                          and FOG’s webpage says this:
                          0_1528808342767_b826c179-b290-4e0e-842f-42e53a8d96b9-image.png

                          G 2 Replies Last reply Jun 12, 2018, 1:05 PM Reply Quote 0
                          • Q
                            Quazz Moderator @LibraryMark
                            last edited by Jun 12, 2018, 12:59 PM

                            @librarymark Do you get php memory exhaustion in the logs?

                            Would also be interested in seeing your free -m and top (shift+m) stats when this happens.

                            1 Reply Last reply Reply Quote 0
                            • G
                              george1421 Moderator @LibraryMark
                              last edited by Jun 12, 2018, 1:03 PM

                              @librarymark said in FOG Unresponsive under "heavy" load:

                              Edit: Well, I take that back (a little bit). After a 20-pc multicast session, none of the PC’s were able to ‘update the database’. I had to cancel the session, reboot the fog server, and reboot the PC’s . At least the image was successfully blasted out otherwise I would be having a bad day right about now.

                              It would be interesting to know the memory usage when this broke.

                              Also just for clarity what updates did you do to the www.conf file, up the memory to 256MB?

                              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                              L 1 Reply Last reply Jun 12, 2018, 1:08 PM Reply Quote 0
                              • G
                                george1421 Moderator @LibraryMark
                                last edited by Jun 12, 2018, 1:05 PM

                                @librarymark OK for the gateway timeout lets work with that. I think if you look in the apache error file. You will see a php timeout waiting for php-fpm to respond. What we need to do is tell apache to wait a bit before timing out.

                                About how long does it take to push out your image to 20 computers?

                                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                1 Reply Last reply Reply Quote 0
                                • L
                                  LibraryMark @george1421
                                  last edited by LibraryMark Jun 12, 2018, 7:08 AM Jun 12, 2018, 1:08 PM

                                  @george1421
                                  I just upped the memory in/etc/php/7.1/fpm/pool.d/www.conf:

                                  php_admin_value[memory_limit] = 256M 
                                  
                                  1 Reply Last reply Reply Quote 1
                                  • G
                                    george1421 Moderator @LibraryMark
                                    last edited by george1421 Jun 12, 2018, 12:42 PM Jun 12, 2018, 1:08 PM

                                    @librarymark What I want you to test is outlined in this post: https://forums.fogproject.org/topic/11713/503-service-unavailable-error/40

                                    I want you to update this section

                                        <Proxy "fcgi://127.0.0.1:9000">
                                            ProxySet timeout=500
                                        </Proxy>
                                    

                                    Set the timeout in seconds to be just a bit longer than your push time.

                                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                    L 1 Reply Last reply Jun 12, 2018, 1:12 PM Reply Quote 0
                                    • L
                                      LibraryMark @george1421
                                      last edited by LibraryMark Jun 12, 2018, 7:53 AM Jun 12, 2018, 1:12 PM

                                      @george1421

                                      Where do I find the “push time”?

                                      I edited the file is /etc/apache2/sites-enabled/001-fog.conf, and it now looks like this:

                                      <VirtualHost *:80>
                                        <Proxy "fcgi://127.0.0.1:9000">
                                              ProxySet timeout=300
                                         </Proxy>
                                      
                                          <FilesMatch "\.php$">
                                              SetHandler "proxy:fcgi://127.0.0.1:9000/"
                                          </FilesMatch>
                                          KeepAlive Off
                                          ServerName 10.5.0.61
                                          DocumentRoot /var/www/html/
                                          <Directory /var/www/html/fog/>
                                              DirectoryIndex index.php index.html index.htm
                                          </Directory>
                                          RewriteEngine On
                                          RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
                                          RewriteRule .* - [F]
                                          RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
                                          RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-d
                                          RewriteRule ^/fog/(.*)$ /fog/api/index.php [QSA,L]
                                      </VirtualHost>
                                      
                                      

                                      Is that correct? In any case I will not be able to test now because we just opened (public library). It might be a few days.

                                      G 1 Reply Last reply Jun 12, 2018, 1:14 PM Reply Quote 0
                                      • G
                                        george1421 Moderator @LibraryMark
                                        last edited by george1421 Jun 12, 2018, 7:15 AM Jun 12, 2018, 1:14 PM

                                        @librarymark Right that looks good. Make sure you set the timeout to the right number of seconds. Right now as configured apache will wait 5 minutes for php-fpm to respond before giving up. If your image push time is more than 5 minutes you need to adjust this number.

                                        [edit] Sorry, I was not clear “push time” is the time it takes to send the image to all 20 computers when using a multicast image.

                                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                        L 1 Reply Last reply Jun 12, 2018, 1:29 PM Reply Quote 0
                                        • L
                                          LibraryMark @george1421
                                          last edited by Jun 12, 2018, 1:29 PM

                                          @george1421
                                          My multicast sessions usually take about 5-7 minutes to complete. Is that what you mean?

                                          G 1 Reply Last reply Jun 12, 2018, 1:50 PM Reply Quote 0
                                          • 1
                                          • 2
                                          • 1 / 2
                                          1 / 2
                                          • First post
                                            4/36
                                            Last post

                                          206

                                          Online

                                          12.0k

                                          Users

                                          17.3k

                                          Topics

                                          155.2k

                                          Posts
                                          Copyright © 2012-2024 FOG Project