Fog Multicast Sessions: What happens when a host in session is powered off and what happens when it is powered back on?

george1421

@sebastian-roth Just for reference this is what FOG has running when a multicast is in session.

/usr/local/sbin/udp-sender --interface ens224 --min-receivers 1 --max-wait 600 --portbase 59290 
--full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Win10E1709F/d1p1.img

/usr/local/sbin/udp-sender --interface ens224 --min-receivers 1 --max-wait 10 --portbase 59290 
--full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Win10E1709F/d1p2.img;

zacadams

@sebastian-roth @george1421 Thank you for the input I’m gonna look into it tomorrow and then follow up here

george1421

@sebastian-roth said in Fog Multicast Sessions: What happens when a host in session is powered off and what happens when it is powered back on?:

–retries-until-drop retries

Looking at the udp-sender code I see the default value set to 200.

udp-sender.c

net_config.retriesUntilDrop = 200;

This one is also interesting from a MTU standpoint.

-b blocksize
    Choses the packet size. Default (and also maximum) is 1456.

There was a post the other day where the fog server was on the other side of a provisioned fiber line that I’m suspecting had a smaller MTU than standard. The OP of that thread said that the iPXE kernel would load on the local LAN but would not on the other side of the fiber link until he set this setting in the in.tftp options (-B 1024),

--blocksize max-block-size, -B max-block-size
    Specifies the maximum permitted block size. The permitted range for this parameter is from 512 to 65464. Some embedded clients request large block sizes and yet do not handle fragmented packets correctly; for these clients, it is recommended to set this value to the smallest MTU on your network minus 32 bytes (20 bytes for IP, 8 for UDP, and 4 for TFTP; less if you use IP options on your network.) For example, on a standard Ethernet (MTU 1500) a value of 1468 is reasonable.

I’m not saying this is any way connected here, it is just something that we need to be aware of from a support side. On the tftp issue, we spent quite a bit of time looking at pcaps and everything looked perfect but it wasn’t transferring the file to the remote client.

Sebastian Roth

@george1421 Great, thanks for looking at the code! Seeing the figure now, 200, I notice that it’s hard to estimate how long it takes for a client to drop as the timeout for each single package seems to be some kind of calculated waitAverage time in the code (senddata.c).
@zacadams Nevertheless you can just start playing with the option to see if you can pull enhance the slowdown situation when a client is being shut down in the middle of multicasting. For testing this I’d suggest the following strategy: Do three tests each:

do not turn off any of the clients to see if it still is reliable
pull the network cable of one of the clients while multicast is on for 5 seconds - see if that is enough for it to be dropped already
turn off one of the clients while multicasting and see how long it takes for it to drop

I’d suggest first try a value of 100, if that is working great (client not dropping even for the 5 second disconnect test) test 50, then 25, then 12. The closer you get to 1 the more “responsive” your dropout will be noticed and speed coming backup again hopefully. If you see the client being dropped on the 5-second-disconnect-test I’d go the other way, so if it drops at 50, do your next test with 75…

Tom Elliott

@sebastian-roth I’m wondering if it’d be worth our time to build our own method of multi casting? It’d be quite an undertaking and all but if udpsender is no longer under active development, it might make it worth our time and we’d have full controls - in theory.

Sebastian Roth

@Tom-Elliott Well that would definitely be an interesting project and I’ve thought about this as well when looking into this. But it’s a huge undertaking I am afraid. I’d schedule this on the wish list for FOG 2.0 at best.

zacadams

@Sebastian-Roth where will I need to apply these options? I went to 420 on multicasttask.class.php but it was the end of a block?

Sebastian Roth

@zacadams Well, which version of FOG are you running?!

zacadams

@sebastian-roth sorry for the delayed response. We are currently running FOG 1.5.0

Sebastian Roth

@zacadams Sorry, I meant below 421… see here. To add the --retries-until-drop option insert a field into the $buildcmd array. Let us know if you need assistance with coding.

Szeraax

@tom-elliott

have full controls

This would be cool. I don’t use multicast currently cause I’ve used it on a prod network midday before… But, being able to do stuff like “rolling multicast” where the sender will start over for clients that joined later and get them all to completion ASAP (maybe that’s a thing that udpsender already does and I’ve missed it). etc.

Tom Elliott

@szeraax by full controls I simple meant joining /leaving sessions/clients while still being able to image. Rolling multicast doesn’t make sense to me as it would cause the whole session to wait until the other “behind” are catching up. This would seem to be more appropriate, then, as a unicast. At least in my eyes. Of course as @Sebastian-Roth stated, this would be a massive undertaking just getting what I hope.

Szeraax

@tom-elliott I mean rolling as in:

If the multicast is 50% done and a new client connects, it will begin writing at the same block as the others. Once all others complete, the storage node sees that there are clients who aren’t done. Asks them what block ranges they still need, and then gives out those blocks. Thus, I could start a multicast and join any number of clients while it is going. Assuming that the last one joins before the first one finishes, the total data sent would be less than 2x the image. Additionally, the first client was not slowed down in any way by the new clients joining.

a la, a feature that I think Ghost had once upon a time.

zacadams

@sebastian-roth I apologize again they have had me bouncing around all last week so I haven’t had a change to test the
–retries-until-drop option yet. I’ll report back this morning about what I find.

zacadams

@sebastian-roth that did not appear to do anything.

Sebastian Roth

@zacadams What exactly did you try? Which values? Did the udp-sender calls on the FOG server look different according to the changes you made in the code? Check with ps ax | grep udp on the console while a multicast task is scheduled.

zacadams

@sebastian-roth to line 421 I have changed to:

$buildcmd = array(
            UDPSENDERPATH,
            (
                $this->getBitrate() ?
                sprintf(' --max-bitrate %s', $this->getBitrate()) :
                null
            ),
            (
                $this->getInterface() ?
                sprintf(' --interface %s', $this->getInterface()) :
                null
            ),
            sprintf(
                ' --min-receivers %d',
                (
                    $this->getClientCount() ?
                    $this->getClientCount():
                    self::getClass('HostManager')->count()
                )
            ),
            sprintf(' --max-wait %s', '%d'),
            (
                $address ?
                sprintf(' --mcast-data-address %s', $address) :
                null
            ),
            (
                $multicastrdv ?
                sprintf(' --mcast-rdv-address %s', $multicastrdv) :
                null
            ),
            sprintf(' --portbase %s', $this->getPortBase()),
            sprintf(' %s', $duplex),
            ' --ttl 32',
            ' --nokbd',
            ' --nopointopoint',
            ' --retries-until-drop retries 100',
        );

This has been done to the main fog server and the fog storage node that will be performing the multicast session. The udp-sendercalls did look different in the logs as well.

zacadams

Oh lord I see what I may have done wrong, I need to remove the retries. Let me try this again this morning.

zacadams

@sebastian-roth ok after fixing my error to the multicasttask.class.php I ran the 3 tests you suggested

do not turn off any of the clients to see if it still is reliable
pull the network cable of one of the clients while multicast is on for 5 seconds - see if that is enough for it to be dropped already
turn off one of the clients while multicasting and see how long it takes for it to drop

During the first test, multicast worked normally and the new argument added to multicasttask.class.php was reflecting in the logs. During the second test pulling the network cable did not appear to drop the disconnected host, but upon plugging the network cable back in the session returned to normal speeds. During the last lest of powering a client off during a session, the multicast session came to a near halt and is only transferring at around 100 MBpm. I’ve left the session going all night and the session is still active and moving at the throttled pace.

Fog Multicast Sessions: What happens when a host in session is powered off and what happens when it is powered back on?

141

12.4k

17.5k

156.0k