• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Multicast randomly hangs around 70-90% on last partition

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    4
    20
    1.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Sebastian Roth Moderator
      last edited by Sebastian Roth

      @benc Thanks for the multicast log. Really strange behavior I find. Why would the last partition play differently on multicast that do all the other partitions do (just thinking out loud)?
      From the log it seems kind of random. Sometimes it’s just one client not answering and next it’s all of them at the same time:

      Timeout notAnswered=[2] notReady=[2] nrAns=5 nrRead=5 nrPart=6 avg=106
      Timeout notAnswered=[0,1,2,3,4,5] notReady=[0,1,2,3,4,5] nrAns=0 nrRead=0 nrPart=6 avg=105
      

      What kind of filesystems do you have on those four partitions? Is it all FAT32 or NTFS?

      Can you post the contents of /images/Val-Public/d1.partitions (as well … fixed_size_partitions and …minimum.partitions if this is a resizable image type) - just trying to get a bigger picture here.

      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

      B 1 Reply Last reply Reply Quote 0
      • B
        benc @Sebastian Roth
        last edited by benc

        @sebastian-roth My guess is that the problem, whatever it is, only shows up on a partition that is over a certain size. Or perhaps it has to do with the time elapsed. I wouldn’t think it has anything to do with the type of partition or the data it contains. This image is a pretty straightforward UEFI Windows 10 install. The first 3 partitions are whatever Windows puts there during install. The last partition is NTFS. I am thinking about finding another smaller image to test with and see if maybe the smaller image multicasts successfully.

        0_1533058838262_d1.fixed_size_partitions.log
        0_1533058851863_d1.minimum.partitions.log
        0_1533058865275_d1.partitions.log

        1 Reply Last reply Reply Quote 0
        • S
          Sebastian Roth Moderator
          last edited by Sebastian Roth

          @benc said in Multicast randomly hangs around 70-90% on last partition:

          I am thinking about finding another smaller image to test with and see if maybe the smaller image multicasts successfully.

          Definitely give that a try. See if you can pin point what exactly is causing this. So far I have no clue I am afraid.
          The partition files you posted seem perfectly fine from my point of view.

          Would you be able to put in a different hard drive in two or three of these PCs just for testing multicast on those and see if it makes any difference?

          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

          B 1 Reply Last reply Reply Quote 0
          • B
            benc @Sebastian Roth
            last edited by

            @sebastian-roth I will try putting different hard drives in the clients, and if that shows the same results I’ll probably just reinstall Win10 on one of the machines, capture that, and use that as my smaller test image.

            1 Reply Last reply Reply Quote 1
            • B
              benc
              last edited by benc

              I am working in a different location today, and both of the multicasts that I tried have worked all the way through. I copied the same image I have been using all along from the last location’s server to this location’s server, deployed it to 1 PC, changed a few settings, captured, and used multicast to deploy it to 5 PCs and then 3 PCs. I have attached one of the successful multicast logs from the server at this location.

              0_1533155793310_multicast.log.udpcast.12.log

              1 Reply Last reply Reply Quote 0
              • B
                benc
                last edited by

                The last 3 of my FOG servers I’ve been working with have successfully completed all multicasts. It looks like I’ve got the issue with about half of my servers. Haven’t yet found the issue or the difference between the working servers and the non-working servers. I may just try to reinstall Ubuntu Server 16 and start there. One thing I did try on the last server with the issue was to reinstall FOG. It failed on every package that had curl in it. Don’t know anything about curl but maybe that’s a clue.

                1 Reply Last reply Reply Quote 0
                • B
                  benc
                  last edited by benc

                  I’ve tried putting new drives in 2 PCs and trying a multicast again. Same results. I tried the same thing at another location and actually got up to 98% before it got stuck. I tried a couple more times and it hung randomly around 90%.

                  I’m starting to think that I made a mistake by trying to keep our FOG servers up to date. I’m relatively new to the Linux world and I just assumed that running apt-get update / apt-get upgrade / apt-get dist-upgrade / do-release-upgrade every now and then was probably a good idea to keep security tight. I have not had time to rebuild any of my FOG servers yet to see if that fixes my issues. When I do rebuild, I’ll most likely just throw a new drive in and start over. For long-term stability and reliability, what distro/version should I go with? Most of my experience in Linux has been with Ubuntu so I’d like to stay with that, but I’m open to suggestions.

                  1 Reply Last reply Reply Quote 0
                  • S
                    Sebastian Roth Moderator
                    last edited by Sebastian Roth

                    @benc Running system upgrades as you do is not a bad thing. It’s wise to keep your system up to date! Usually in the Linux world such an upgrade would break things badly (seldomly!) or not at all. Sure, there are situations where an upgrade might introduce such subtle issues but that’s not what I see very often. So keep this good habit of keeping your systems updated!

                    From what I see we are fairly sure this is not a general issue with the clients and not a general problem of FOG as you see it happening at some locations but working fine at others. I wouldn’t say it’s impossible but I highly doubt this problem arises from upgrading your server OS packages. To me this sounds like some kind of network traffic shaping / limiter kicking in at some amount of traffic having passed through in one session.

                    Do you have different switches (configurations) at those locations?

                    PS: Debian and CentOS are pretty solid systems. Debian is closer to what you are used from using Ubuntu. CentOS is more enterprise like, being based on RHEL.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    B 1 Reply Last reply Reply Quote 0
                    • B
                      benc @Sebastian Roth
                      last edited by

                      @sebastian-roth The switches at each location are identical, and the configuration is fundamentally the same except that some locations have two switches stacked together to provide enough ports. One VLAN, same addressing scheme, same types of devices connected. Right now I’m really combing through the details of the configs, comparing the working locations to the ones that don’t. There could also be something with the fact that some locations have two switches and others have just one. That shouldn’t matter, but who knows. I’ll check back in with my findings.

                      1 Reply Last reply Reply Quote 1
                      • F
                        Fernando Gietz Developer
                        last edited by

                        I think that is interesant see this post:

                        Multicast data address not change from one task to another one

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post

                        157

                        Online

                        12.0k

                        Users

                        17.3k

                        Topics

                        155.2k

                        Posts
                        Copyright © 2012-2024 FOG Project