• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Multicast randomly hangs around 70-90% on last partition

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    4
    20
    1.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • george1421G
      george1421 Moderator
      last edited by

      Lets assume is the issue we’ve found after FOG 1.5.4 has been released. Similar posts have addressed other multicasting issues with FOG 1.5.4. What the developers have seen is that under certain conditions php-fpm runs out of usable memory during a multicast. Probably the most useful value to you is bumping the memory from the default of 32MB to 256MB.

      1. Change to the /etc directory from the fog server linux command prompt.
      2. Search for www.conf file. It can be in a number of locations depending on what version of php is installed. Use this command.
        find /etc -name www.conf (hopefully you will only find one)
      3. Edit that file file and ensure these settings are accurate. Don’t just add them since all should be there except php_admin_value[memory_limit] = 256M you will need to add that entry.
      php_admin_value[memory_limit] = 256M
      pm.max_requests = 2000
      pm.max_children = 35
      pm.min_spare_servers = 5
      pm.start_servers = 5
      
      1. Save and exit your text editor.
      2. Reboot the fog server.
      3. See if that fixes what is wrong. You really should only see this strangeness under heavy load, but I guess it might show up sooner under certain conditions.

      Also we found there is something strange going on in the linux kernels after 4.15.2, I’m going to recommend that you downgrade your FOG/FOS kernel to 4.15.2. The issue with later kernels is that its taking 3-5 minutes to create the disk structure under certain circumstances, where with 4.15.2 and older its only seconds to create the structure.

      Now the kernel will not impact your issue, but processing is incomplete might be related to the missing php-fpm configuration setting.

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

      B 2 Replies Last reply Reply Quote 1
      • B
        benc @george1421
        last edited by

        @george1421 Something I forgot to mention is that I’m running FOG 1.4.4 at all of my locations except one. I upgraded one of them to 1.5.4 to see what would happen. Same results. I am going to try editing www.conf as you suggested. Will report back shortly.

        1 Reply Last reply Reply Quote 0
        • B
          benc @george1421
          last edited by

          @george1421 I edited www.conf and set the values you specified. There was only one copy, located in /etc/php/7.1/fpm/pool.d/. The server at this location is running Ubuntu Server 18.04.1 and FOG 1.4.4. Tried a multicast with 5 machines and it failed the first time at 86%, then I tried it a second time and it failed at 93%. My third attempt was with 2 clients, and it failed at 90%. I attached a pic of a client after each attempt just after it hung. After it hangs, elapsed time on the clients is still counting up, and GB/min is slowly decreasing since there is no activity. I looked at each client to confirm they all showed the same thing. They all stop at the same block and the Partclone screens are identical. I checked the hard drive LEDs on each client to make sure none of them are hung or showing signs of drive failure. All clients have a 120GB SSD. During the first 3 or 4 minutes, the speed is around 11-12 GB/min, but I can usually tell when it’s about to fail because the speed will start dropping to around 8-9 GB/min. HD LEDs on each client flash as expected, and none of them are staying lit constantly. I have also tried using different images in case it is something related to the image. Also, when I use unicast to deploy, it works fine on all machines and runs near 12 GB/min the whole way through.

          0_1532626882830_IMG_20180726_120232035.jpg
          0_1532626901819_IMG_20180726_122536500_HDR.jpg

          george1421G 1 Reply Last reply Reply Quote 0
          • george1421G
            george1421 Moderator @benc
            last edited by

            @benc The fix I posted will only address a specific issue with FOG 1.5.4, It will not help fog 1.4.4 since it doesn’t use php-fpm.

            So what does the multicast log say in /opt/fog/logs? Does it give you a clue to why its failing?

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

            B 1 Reply Last reply Reply Quote 0
            • B
              benc @george1421
              last edited by

              @george1421 I haven’t had a chance to dig much deeper into this issue since my last post but I did look through the logs of the last server I was using. I see a lot of timeouts, but I’m not familiar enough with the logs to be able to identify the issue. I can post a log here if that would help. Which log(s) would be relevant?

              1 Reply Last reply Reply Quote 0
              • F
                Fernando Gietz Developer
                last edited by

                Hi,

                We are having performance problems with the multicast task too with FOG 1.5.4 version. In some sites the performance is good but in another ones the performance is vey very bad.

                We have two fog servers: the old fog server with 0.30 version and the new one with 1.5.4. We can deploy without problems with the old one (the both servers are in the same vlan and deploy to the same vlans) but with the new one no.

                We are testing the net to know which is the problem but without success, but we have noticed that the --mcast-data-address always is the same value and in the old fog server always is different. Can be this parameter the problem?

                1 Reply Last reply Reply Quote 0
                • B
                  benc
                  last edited by

                  0_1532980312008_multicast.log.udpcast.1.log

                  I pulled one of the multicast logs from the last server I used. Had to add .log to the end of the file to upload it here. Hope this helps.

                  1 Reply Last reply Reply Quote 0
                  • S
                    Sebastian Roth Moderator
                    last edited by Sebastian Roth

                    @benc Thanks for the multicast log. Really strange behavior I find. Why would the last partition play differently on multicast that do all the other partitions do (just thinking out loud)?
                    From the log it seems kind of random. Sometimes it’s just one client not answering and next it’s all of them at the same time:

                    Timeout notAnswered=[2] notReady=[2] nrAns=5 nrRead=5 nrPart=6 avg=106
                    Timeout notAnswered=[0,1,2,3,4,5] notReady=[0,1,2,3,4,5] nrAns=0 nrRead=0 nrPart=6 avg=105
                    

                    What kind of filesystems do you have on those four partitions? Is it all FAT32 or NTFS?

                    Can you post the contents of /images/Val-Public/d1.partitions (as well … fixed_size_partitions and …minimum.partitions if this is a resizable image type) - just trying to get a bigger picture here.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    B 1 Reply Last reply Reply Quote 0
                    • B
                      benc @Sebastian Roth
                      last edited by benc

                      @sebastian-roth My guess is that the problem, whatever it is, only shows up on a partition that is over a certain size. Or perhaps it has to do with the time elapsed. I wouldn’t think it has anything to do with the type of partition or the data it contains. This image is a pretty straightforward UEFI Windows 10 install. The first 3 partitions are whatever Windows puts there during install. The last partition is NTFS. I am thinking about finding another smaller image to test with and see if maybe the smaller image multicasts successfully.

                      0_1533058838262_d1.fixed_size_partitions.log
                      0_1533058851863_d1.minimum.partitions.log
                      0_1533058865275_d1.partitions.log

                      1 Reply Last reply Reply Quote 0
                      • S
                        Sebastian Roth Moderator
                        last edited by Sebastian Roth

                        @benc said in Multicast randomly hangs around 70-90% on last partition:

                        I am thinking about finding another smaller image to test with and see if maybe the smaller image multicasts successfully.

                        Definitely give that a try. See if you can pin point what exactly is causing this. So far I have no clue I am afraid.
                        The partition files you posted seem perfectly fine from my point of view.

                        Would you be able to put in a different hard drive in two or three of these PCs just for testing multicast on those and see if it makes any difference?

                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                        B 1 Reply Last reply Reply Quote 0
                        • B
                          benc @Sebastian Roth
                          last edited by

                          @sebastian-roth I will try putting different hard drives in the clients, and if that shows the same results I’ll probably just reinstall Win10 on one of the machines, capture that, and use that as my smaller test image.

                          1 Reply Last reply Reply Quote 1
                          • B
                            benc
                            last edited by benc

                            I am working in a different location today, and both of the multicasts that I tried have worked all the way through. I copied the same image I have been using all along from the last location’s server to this location’s server, deployed it to 1 PC, changed a few settings, captured, and used multicast to deploy it to 5 PCs and then 3 PCs. I have attached one of the successful multicast logs from the server at this location.

                            0_1533155793310_multicast.log.udpcast.12.log

                            1 Reply Last reply Reply Quote 0
                            • B
                              benc
                              last edited by

                              The last 3 of my FOG servers I’ve been working with have successfully completed all multicasts. It looks like I’ve got the issue with about half of my servers. Haven’t yet found the issue or the difference between the working servers and the non-working servers. I may just try to reinstall Ubuntu Server 16 and start there. One thing I did try on the last server with the issue was to reinstall FOG. It failed on every package that had curl in it. Don’t know anything about curl but maybe that’s a clue.

                              1 Reply Last reply Reply Quote 0
                              • B
                                benc
                                last edited by benc

                                I’ve tried putting new drives in 2 PCs and trying a multicast again. Same results. I tried the same thing at another location and actually got up to 98% before it got stuck. I tried a couple more times and it hung randomly around 90%.

                                I’m starting to think that I made a mistake by trying to keep our FOG servers up to date. I’m relatively new to the Linux world and I just assumed that running apt-get update / apt-get upgrade / apt-get dist-upgrade / do-release-upgrade every now and then was probably a good idea to keep security tight. I have not had time to rebuild any of my FOG servers yet to see if that fixes my issues. When I do rebuild, I’ll most likely just throw a new drive in and start over. For long-term stability and reliability, what distro/version should I go with? Most of my experience in Linux has been with Ubuntu so I’d like to stay with that, but I’m open to suggestions.

                                1 Reply Last reply Reply Quote 0
                                • S
                                  Sebastian Roth Moderator
                                  last edited by Sebastian Roth

                                  @benc Running system upgrades as you do is not a bad thing. It’s wise to keep your system up to date! Usually in the Linux world such an upgrade would break things badly (seldomly!) or not at all. Sure, there are situations where an upgrade might introduce such subtle issues but that’s not what I see very often. So keep this good habit of keeping your systems updated!

                                  From what I see we are fairly sure this is not a general issue with the clients and not a general problem of FOG as you see it happening at some locations but working fine at others. I wouldn’t say it’s impossible but I highly doubt this problem arises from upgrading your server OS packages. To me this sounds like some kind of network traffic shaping / limiter kicking in at some amount of traffic having passed through in one session.

                                  Do you have different switches (configurations) at those locations?

                                  PS: Debian and CentOS are pretty solid systems. Debian is closer to what you are used from using Ubuntu. CentOS is more enterprise like, being based on RHEL.

                                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                  B 1 Reply Last reply Reply Quote 0
                                  • B
                                    benc @Sebastian Roth
                                    last edited by

                                    @sebastian-roth The switches at each location are identical, and the configuration is fundamentally the same except that some locations have two switches stacked together to provide enough ports. One VLAN, same addressing scheme, same types of devices connected. Right now I’m really combing through the details of the configs, comparing the working locations to the ones that don’t. There could also be something with the fact that some locations have two switches and others have just one. That shouldn’t matter, but who knows. I’ll check back in with my findings.

                                    1 Reply Last reply Reply Quote 1
                                    • F
                                      Fernando Gietz Developer
                                      last edited by

                                      I think that is interesant see this post:

                                      Multicast data address not change from one task to another one

                                      1 Reply Last reply Reply Quote 0
                                      • 1 / 1
                                      • First post
                                        Last post

                                      156

                                      Online

                                      12.0k

                                      Users

                                      17.3k

                                      Topics

                                      155.2k

                                      Posts
                                      Copyright © 2012-2024 FOG Project