• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Fog Multicast Sessions: What happens when a host in session is powered off and what happens when it is powered back on?

    Scheduled Pinned Locked Moved
    General Problems
    5
    22
    3.0k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Z
      zacadams
      last edited by

      I am testing the multicast capabilities of Fog and the handling of a host that is disconnected during a multicast session. In our organization users will sometimes walk into a room and power off a machine that is currently being imaged (no matter what we tell them or what dialogue is present on the machine during an imaging session) in a Ghost Multicast session. When this happens we are able to see in the Ghost console that the specific host has disconnected and can assign a unicast or another multicast session to it.

      I recently tested this event in Fog and has some questions on how it was handled. During the power off of a host in a multicast session the rest of the hosts still connected to the session experienced a severe drop in speed. I understand the drop in speed is how the system was designed but the disconnected host stayed in the task list and never notified the fog server it had disconnected and none of the other hosts returned to their previous imaging speed.

      Was this a bug, feature, or am I missing a setting here?

      The other question I am having is when the host is powered back on and the multicast session is still in the task list, the host will boot into Partclone and perpetually waits for the multicast session to start although the multicast session already in progress.

      Should the host eventually time out and leave the session and report back to the WebUI that the session has failed?

      1 Reply Last reply Reply Quote 0
      • george1421G
        george1421 Moderator
        last edited by

        The first thing you need to remember is FOG uses other opensource products to make the FOG system.

        There is not super tight integration between FOG and the udp-sender service that fog uses. So the web gui is only aware of limited things.

        During the power off of a host in a multicast session the rest of the hosts still connected to the session experienced a severe drop in speed. I understand the drop in speed is how the system was designed but the disconnected host stayed in the task list and never notified the fog server it had disconnected and none of the other hosts returned to their previous imaging speed.

        This is understandable to a point. The upd-sender service will know when a target system disappears. While the image IS sent out as a multicast, the client does respond via unicast with a byte or checksum count (not sure which). That “checksum” tells the upd-sender service if the client needs that data block over again or not. When that host disappears everyone should slow down so that the rest don’t get too far ahead in case the lost client was only momentarily interrupted. What you are seeing is expected. What I might also suspect is that the upd-sender service “should” give up on a lost target system and return the group to normal speed, but that is not what you are seeing. It appears the timeout never happens so every one stays at a retarded transfer rate.

        The other question I am having is when the host is powered back on and the multicast session is still in the task list, the host will boot into Partclone and perpetually waits for the multicast session to start although the multicast session already in progress.

        This is true. The developers would have to look into upd-send to see if they can get a status message of who is still online or not. A lot is dependent on how upd-send can communicate with the outside world.

        Should the host eventually time out and leave the session and report back to the WebUI that the session has failed?

        That is an excellent question I would hope that if the target system hadn’t joined the multicast stream in 10 minutes, it should request its task canceled and then reboot.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        1 Reply Last reply Reply Quote 0
        • S
          Sebastian Roth Moderator
          last edited by

          @zacadams Though I clearly understand your question I am not sure if I can give you a satisfying answer. For multicast we rely on udpcast which is not under active development I have to say. So this is not something within the FOG code that we can simply change. Not saying that we wouldn’t be able to patch udpcast but I just mean it’s not part of the active FOG code base.

          That said you can have a look at the official udpcase manual. One option sounds promising:

          --retries-until-drop retries
              How many time to send a REQACK until dropping a receiver. Lower
              retrycounts make udp-sender faster to react to crashed receivers,
              but they also increase the probability of false alerts (dropping
              receivers that are not actually crashed, but merely slow to respond
              for whatever reason)
          

          So far I have not found out what the default value for this is set for. You might want to add the option to /var/www/html/fog/lib/service/multicasttask.class.php (line 420ff) to see if that helps in your case. Don’t forget to restart the FOGMulticastManager service after modifying the code.

          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

          george1421G Z 3 Replies Last reply Reply Quote 0
          • george1421G
            george1421 Moderator @Sebastian Roth
            last edited by george1421

            @sebastian-roth Just for reference this is what FOG has running when a multicast is in session.

            /usr/local/sbin/udp-sender --interface ens224 --min-receivers 1 --max-wait 600 --portbase 59290 
            --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Win10E1709F/d1p1.img
            
            /usr/local/sbin/udp-sender --interface ens224 --min-receivers 1 --max-wait 10 --portbase 59290 
            --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Win10E1709F/d1p2.img;
            

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

            1 Reply Last reply Reply Quote 0
            • Z
              zacadams @Sebastian Roth
              last edited by

              @sebastian-roth @george1421 Thank you for the input I’m gonna look into it tomorrow and then follow up here

              1 Reply Last reply Reply Quote 0
              • george1421G
                george1421 Moderator @Sebastian Roth
                last edited by

                @sebastian-roth said in Fog Multicast Sessions: What happens when a host in session is powered off and what happens when it is powered back on?:

                –retries-until-drop retries

                Looking at the udp-sender code I see the default value set to 200.

                udp-sender.c

                net_config.retriesUntilDrop = 200;
                

                This one is also interesting from a MTU standpoint.

                -b blocksize
                    Choses the packet size. Default (and also maximum) is 1456.
                

                There was a post the other day where the fog server was on the other side of a provisioned fiber line that I’m suspecting had a smaller MTU than standard. The OP of that thread said that the iPXE kernel would load on the local LAN but would not on the other side of the fiber link until he set this setting in the in.tftp options (-B 1024),

                --blocksize max-block-size, -B max-block-size
                    Specifies the maximum permitted block size. The permitted range for this parameter is from 512 to 65464. Some embedded clients request large block sizes and yet do not handle fragmented packets correctly; for these clients, it is recommended to set this value to the smallest MTU on your network minus 32 bytes (20 bytes for IP, 8 for UDP, and 4 for TFTP; less if you use IP options on your network.) For example, on a standard Ethernet (MTU 1500) a value of 1468 is reasonable. 
                

                I’m not saying this is any way connected here, it is just something that we need to be aware of from a support side. On the tftp issue, we spent quite a bit of time looking at pcaps and everything looked perfect but it wasn’t transferring the file to the remote client.

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                1 Reply Last reply Reply Quote 0
                • S
                  Sebastian Roth Moderator
                  last edited by Sebastian Roth

                  @george1421 Great, thanks for looking at the code! Seeing the figure now, 200, I notice that it’s hard to estimate how long it takes for a client to drop as the timeout for each single package seems to be some kind of calculated waitAverage time in the code (senddata.c).
                  @zacadams Nevertheless you can just start playing with the option to see if you can pull enhance the slowdown situation when a client is being shut down in the middle of multicasting. For testing this I’d suggest the following strategy: Do three tests each:

                  • do not turn off any of the clients to see if it still is reliable
                  • pull the network cable of one of the clients while multicast is on for 5 seconds - see if that is enough for it to be dropped already
                  • turn off one of the clients while multicasting and see how long it takes for it to drop

                  I’d suggest first try a value of 100, if that is working great (client not dropping even for the 5 second disconnect test) test 50, then 25, then 12. The closer you get to 1 the more “responsive” your dropout will be noticed and speed coming backup again hopefully. If you see the client being dropped on the 5-second-disconnect-test I’d go the other way, so if it drops at 50, do your next test with 75…

                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                  Tom ElliottT 1 Reply Last reply Reply Quote 2
                  • Tom ElliottT
                    Tom Elliott @Sebastian Roth
                    last edited by

                    @sebastian-roth I’m wondering if it’d be worth our time to build our own method of multi casting? It’d be quite an undertaking and all but if udpsender is no longer under active development, it might make it worth our time and we’d have full controls - in theory.

                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      Sebastian Roth Moderator
                      last edited by

                      @Tom-Elliott Well that would definitely be an interesting project and I’ve thought about this as well when looking into this. But it’s a huge undertaking I am afraid. I’d schedule this on the wish list for FOG 2.0 at best.

                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                      1 Reply Last reply Reply Quote 0
                      • Z
                        zacadams
                        last edited by

                        @Sebastian-Roth where will I need to apply these options? I went to 420 on multicasttask.class.php but it was the end of a block?

                        1 Reply Last reply Reply Quote 0
                        • S
                          Sebastian Roth Moderator
                          last edited by

                          @zacadams Well, which version of FOG are you running?!

                          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                          Z 1 Reply Last reply Reply Quote 0
                          • Z
                            zacadams @Sebastian Roth
                            last edited by

                            @sebastian-roth sorry for the delayed response. We are currently running FOG 1.5.0

                            1 Reply Last reply Reply Quote 0
                            • S
                              Sebastian Roth Moderator
                              last edited by

                              @zacadams Sorry, I meant below 421… see here. To add the --retries-until-drop option insert a field into the $buildcmd array. Let us know if you need assistance with coding.

                              Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                              Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                              Z 2 Replies Last reply Reply Quote 0
                              • S
                                Szeraax @Tom Elliott
                                last edited by

                                @tom-elliott

                                have full controls

                                This would be cool. I don’t use multicast currently cause I’ve used it on a prod network midday before… But, being able to do stuff like “rolling multicast” where the sender will start over for clients that joined later and get them all to completion ASAP (maybe that’s a thing that udpsender already does and I’ve missed it). etc.

                                Tom ElliottT 1 Reply Last reply Reply Quote 0
                                • Tom ElliottT
                                  Tom Elliott @Szeraax
                                  last edited by

                                  @szeraax by full controls I simple meant joining /leaving sessions/clients while still being able to image. Rolling multicast doesn’t make sense to me as it would cause the whole session to wait until the other “behind” are catching up. This would seem to be more appropriate, then, as a unicast. At least in my eyes. Of course as @Sebastian-Roth stated, this would be a massive undertaking just getting what I hope.

                                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                  S 1 Reply Last reply Reply Quote 0
                                  • S
                                    Szeraax @Tom Elliott
                                    last edited by

                                    @tom-elliott I mean rolling as in:

                                    If the multicast is 50% done and a new client connects, it will begin writing at the same block as the others. Once all others complete, the storage node sees that there are clients who aren’t done. Asks them what block ranges they still need, and then gives out those blocks. Thus, I could start a multicast and join any number of clients while it is going. Assuming that the last one joins before the first one finishes, the total data sent would be less than 2x the image. Additionally, the first client was not slowed down in any way by the new clients joining.

                                    a la, a feature that I think Ghost had once upon a time.

                                    1 Reply Last reply Reply Quote 0
                                    • Z
                                      zacadams @Sebastian Roth
                                      last edited by

                                      @sebastian-roth I apologize again they have had me bouncing around all last week so I haven’t had a change to test the
                                      –retries-until-drop option yet. I’ll report back this morning about what I find.

                                      1 Reply Last reply Reply Quote 1
                                      • Z
                                        zacadams @Sebastian Roth
                                        last edited by

                                        @sebastian-roth that did not appear to do anything.

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          Sebastian Roth Moderator
                                          last edited by

                                          @zacadams What exactly did you try? Which values? Did the udp-sender calls on the FOG server look different according to the changes you made in the code? Check with ps ax | grep udp on the console while a multicast task is scheduled.

                                          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                          Z 2 Replies Last reply Reply Quote 0
                                          • Z
                                            zacadams @Sebastian Roth
                                            last edited by

                                            @sebastian-roth to line 421 I have changed to:

                                            $buildcmd = array(
                                                        UDPSENDERPATH,
                                                        (
                                                            $this->getBitrate() ?
                                                            sprintf(' --max-bitrate %s', $this->getBitrate()) :
                                                            null
                                                        ),
                                                        (
                                                            $this->getInterface() ?
                                                            sprintf(' --interface %s', $this->getInterface()) :
                                                            null
                                                        ),
                                                        sprintf(
                                                            ' --min-receivers %d',
                                                            (
                                                                $this->getClientCount() ?
                                                                $this->getClientCount():
                                                                self::getClass('HostManager')->count()
                                                            )
                                                        ),
                                                        sprintf(' --max-wait %s', '%d'),
                                                        (
                                                            $address ?
                                                            sprintf(' --mcast-data-address %s', $address) :
                                                            null
                                                        ),
                                                        (
                                                            $multicastrdv ?
                                                            sprintf(' --mcast-rdv-address %s', $multicastrdv) :
                                                            null
                                                        ),
                                                        sprintf(' --portbase %s', $this->getPortBase()),
                                                        sprintf(' %s', $duplex),
                                                        ' --ttl 32',
                                                        ' --nokbd',
                                                        ' --nopointopoint',
                                                        ' --retries-until-drop retries 100',
                                                    );
                                            

                                            This has been done to the main fog server and the fog storage node that will be performing the multicast session. The udp-sendercalls did look different in the logs as well.

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 1 / 2
                                            • First post
                                              Last post

                                            162

                                            Online

                                            12.0k

                                            Users

                                            17.3k

                                            Topics

                                            155.2k

                                            Posts
                                            Copyright © 2012-2024 FOG Project