Fog Multicast Sessions: What happens when a host in session is powered off and what happens when it is powered back on?
@zacadams Though I clearly understand your question I am not sure if I can give you a satisfying answer. For multicast we rely on udpcast which is not under active development I have to say. So this is not something within the FOG code that we can simply change. Not saying that we wouldn’t be able to patch udpcast but I just mean it’s not part of the active FOG code base.
That said you can have a look at the official udpcase manual. One option sounds promising:
--retries-until-drop retries How many time to send a REQACK until dropping a receiver. Lower retrycounts make udp-sender faster to react to crashed receivers, but they also increase the probability of false alerts (dropping receivers that are not actually crashed, but merely slow to respond for whatever reason)
So far I have not found out what the default value for this is set for. You might want to add the option to
/var/www/html/fog/lib/service/multicasttask.class.php(line 420ff) to see if that helps in your case. Don’t forget to restart the
FOGMulticastManagerservice after modifying the code.
The first thing you need to remember is FOG uses other opensource products to make the FOG system.
There is not super tight integration between FOG and the udp-sender service that fog uses. So the web gui is only aware of limited things.
During the power off of a host in a multicast session the rest of the hosts still connected to the session experienced a severe drop in speed. I understand the drop in speed is how the system was designed but the disconnected host stayed in the task list and never notified the fog server it had disconnected and none of the other hosts returned to their previous imaging speed.
This is understandable to a point. The upd-sender service will know when a target system disappears. While the image IS sent out as a multicast, the client does respond via unicast with a byte or checksum count (not sure which). That “checksum” tells the upd-sender service if the client needs that data block over again or not. When that host disappears everyone should slow down so that the rest don’t get too far ahead in case the lost client was only momentarily interrupted. What you are seeing is expected. What I might also suspect is that the upd-sender service “should” give up on a lost target system and return the group to normal speed, but that is not what you are seeing. It appears the timeout never happens so every one stays at a retarded transfer rate.
The other question I am having is when the host is powered back on and the multicast session is still in the task list, the host will boot into Partclone and perpetually waits for the multicast session to start although the multicast session already in progress.
This is true. The developers would have to look into upd-send to see if they can get a status message of who is still online or not. A lot is dependent on how upd-send can communicate with the outside world.
Should the host eventually time out and leave the session and report back to the WebUI that the session has failed?
That is an excellent question I would hope that if the target system hadn’t joined the multicast stream in 10 minutes, it should request its task canceled and then reboot.