Multicast without registration starts OK, but hangs and disconnects clients due to timeout.
-
@Sebastian-Roth I may have an opportunity coming up soon to test this in our production environment, but without multicast working in 1.5.9-RC1.4 I will lose the chance. Barring some kind of fix in the very near future for this issue, can you advice on how to roll back to 1.5.8 so I can at least test functionality of multicast cross-subnet in terms of bandwidth usage, etc when the opportunity presents itself?
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
I can at least test functionality of multicast cross-subnet in terms of bandwidth usage,
This can only happen if your subnet router supports igmp proxying or you have a mrouter in place to manage the multicast traffic. Multicasts will not normally traverse a normal router.
-
@george1421 I’m pretty sure we’re covered on this front, but I will run this by our network guy.
-
@jmvela2x I am sorry but today has been a very busy day und I could not find the time to look at this yet. I will do so first thing in the morning!! I don’t recommend you to roll back to 1.5.8 just now.
-
@jmvela2x Just a quick note on this. Now that I think more about it I am fairly sure I did test multicast once before pushing out the RC1 release. So I wonder if this could be a hickup on your FOG server? Did you try to clear all tasks, restart the FOG server and then schedule a fresh multicast task yet?
Is this multicast scheduled through a group in the FOG Web UI?
-
@Sebastian-Roth I did try several reboots and cleared all the tasks each time to make double sure. The multicast session is being scheduled from the Images tab with a defined client count as we are trying to avoid host registration as the hosts in our environment are not static per se. The amount of work required just makes FOG an nonviable solution for us if have to do host registration for all our clients.
-
@Sebastian-Roth I will hold off on changing anything until I hear from you tomorrow. Thanks for the follow up.
-
@jmvela2x Good news and bad news. I found the issue but I still need a bit more time to debug and fix it. It’s getting late and I need to rush to work now. Please stay tuned and I will update as soon as I can.
-
@jmvela2x I just pushed a fix to
dev-branch
which should fix the issue. But I found that multicast from the PXE menu seems to still have an issue. Though it works it spawns several udpcast sessions. But this issue has been in 1.5.8 already as I see from my testing. I will look into this and fix that soon as well.For now you can pull the latest fix to get multicast as in 1.5.8 back again:
sudo -i cd fogproject git checkout dev-branch git pull cd bin ./installfog.sh
-
@Sebastian-Roth That works for the time being. I can just manually cancel the tasks on completion since this is still in validation on our side. Thanks a ton!
-
@george1421 Finally heard back from IT and we have Catalyst 4510r+e in the router department. From some of the documentation I’ve been looking at it seems they have mroute and IGMP Proxy capability. Might need some guidance in that department if anyone has the knowledge or experience.
-
@jmvela2x I just pushed some more commits to
dev-branch
which should fix the multicast session issue altogether. -
Hi,
i dont know if its the right thread, but i stumble upon somehow similar problem.
My Host are all registered, in a Group and Multicast for the group starts just fine. But after a while up to 50% of the machines suddenly restart and go back to the “waiting” screen where the image location is shown. The Rest Multicasts up to 95-97 Percent and then does not finish or finish VERY slow.
Any Ideas?
dev-branch from Mo, 11.05.2020 - 12:00
Fog Installation/Update and Server restart happend.No problems on Unicast.
-
@spychodelics From what you describe your issue is not related to this topic. May I ask you to open a new one so we can focus on each of them and not mix up things. It helps a lot to not have separate issues discussed in one topic!
Post your FOG version (right lower corner of the web UI after login) - probably 1.5.9-RC1.8 but I wanna be sure. As well tell us more about your setup. Do you have FOG server and clients all in the same subnet, all connected on the exact same switch or is it distributed across network equipment and possibly even subnets? Are clients all the same hardware? Please give us more details like make and model when opening the new topic.
-
@Sebastian-Roth I still see the same issue where the multicast task is stuck at “in progress” and continues to run according to the logs. The job finishes successfully, but the task does not auto delete. Updated to 1.5.9-RC1.8 and rebooted server. Checked twice, same issue as before.
-
@jmvela2x Thanks for testing and the update. Interesting that it did “auto delete” / cleanup the task in my tests fine all the time. We are still talking about multicast session created via web UI -> images view and joined through the PXE menu?
Everything else works fine now? I mean do you see only one multicast task being started (we had it start as many as multicast clients would PXE boot)?
-
@Sebastian-Roth Yes, still talking ‘multicast session created via web UI -> images view and joined through the PXE menu.’ I had no other issues from 1.5.8 release that I could tell except for the failure to auto-delete multicast tasks.
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
I had no other issues from 1.5.8 release that I could tell except for the failure to auto-delete multicast tasks.
In my tests it created several multicast tasks (new
udp-sender
processes which you’d see in the log as well) whenever a new host joined via the PXE menu. While the clients still did deploy it’s actually not proper multicast because each host had its own session. I am fairly sure this would be the case because after I found what was causing this I figured that it was a change I did long before 1.5.8 was released.When you did multicast with 1.5.8 did the PXE booted clients all wait on the first blue partclone screen until the amount of clients reached the number defined when creating the multicast session in the web UI??
-
@Sebastian-Roth I seem to recall seeing systems start without waiting in 1.5.8. I guess I didn’t realize that was not by design. Currently I see the hosts waiting and the Multicast log shows a single PID. However, the task does not auto delete and as I just tested, I am able to start another group of hosts to join the same session post completion (after the first multicast deployment completed).
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
However, the task does not auto delete and as I just tested, I am able to start another group of hosts to join the same session post completion (after the first multicast deployment completed).
I will look into this in the next days.