Multicast without registration starts OK, but hangs and disconnects clients due to timeout.
-
@jmvela2x Good news and bad news. I found the issue but I still need a bit more time to debug and fix it. It’s getting late and I need to rush to work now. Please stay tuned and I will update as soon as I can.
-
@jmvela2x I just pushed a fix to
dev-branch
which should fix the issue. But I found that multicast from the PXE menu seems to still have an issue. Though it works it spawns several udpcast sessions. But this issue has been in 1.5.8 already as I see from my testing. I will look into this and fix that soon as well.For now you can pull the latest fix to get multicast as in 1.5.8 back again:
sudo -i cd fogproject git checkout dev-branch git pull cd bin ./installfog.sh
-
@Sebastian-Roth That works for the time being. I can just manually cancel the tasks on completion since this is still in validation on our side. Thanks a ton!
-
@george1421 Finally heard back from IT and we have Catalyst 4510r+e in the router department. From some of the documentation I’ve been looking at it seems they have mroute and IGMP Proxy capability. Might need some guidance in that department if anyone has the knowledge or experience.
-
@jmvela2x I just pushed some more commits to
dev-branch
which should fix the multicast session issue altogether. -
Hi,
i dont know if its the right thread, but i stumble upon somehow similar problem.
My Host are all registered, in a Group and Multicast for the group starts just fine. But after a while up to 50% of the machines suddenly restart and go back to the “waiting” screen where the image location is shown. The Rest Multicasts up to 95-97 Percent and then does not finish or finish VERY slow.
Any Ideas?
dev-branch from Mo, 11.05.2020 - 12:00
Fog Installation/Update and Server restart happend.No problems on Unicast.
-
@spychodelics From what you describe your issue is not related to this topic. May I ask you to open a new one so we can focus on each of them and not mix up things. It helps a lot to not have separate issues discussed in one topic!
Post your FOG version (right lower corner of the web UI after login) - probably 1.5.9-RC1.8 but I wanna be sure. As well tell us more about your setup. Do you have FOG server and clients all in the same subnet, all connected on the exact same switch or is it distributed across network equipment and possibly even subnets? Are clients all the same hardware? Please give us more details like make and model when opening the new topic.
-
@Sebastian-Roth I still see the same issue where the multicast task is stuck at “in progress” and continues to run according to the logs. The job finishes successfully, but the task does not auto delete. Updated to 1.5.9-RC1.8 and rebooted server. Checked twice, same issue as before.
-
@jmvela2x Thanks for testing and the update. Interesting that it did “auto delete” / cleanup the task in my tests fine all the time. We are still talking about multicast session created via web UI -> images view and joined through the PXE menu?
Everything else works fine now? I mean do you see only one multicast task being started (we had it start as many as multicast clients would PXE boot)?
-
@Sebastian-Roth Yes, still talking ‘multicast session created via web UI -> images view and joined through the PXE menu.’ I had no other issues from 1.5.8 release that I could tell except for the failure to auto-delete multicast tasks.
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
I had no other issues from 1.5.8 release that I could tell except for the failure to auto-delete multicast tasks.
In my tests it created several multicast tasks (new
udp-sender
processes which you’d see in the log as well) whenever a new host joined via the PXE menu. While the clients still did deploy it’s actually not proper multicast because each host had its own session. I am fairly sure this would be the case because after I found what was causing this I figured that it was a change I did long before 1.5.8 was released.When you did multicast with 1.5.8 did the PXE booted clients all wait on the first blue partclone screen until the amount of clients reached the number defined when creating the multicast session in the web UI??
-
@Sebastian-Roth I seem to recall seeing systems start without waiting in 1.5.8. I guess I didn’t realize that was not by design. Currently I see the hosts waiting and the Multicast log shows a single PID. However, the task does not auto delete and as I just tested, I am able to start another group of hosts to join the same session post completion (after the first multicast deployment completed).
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
However, the task does not auto delete and as I just tested, I am able to start another group of hosts to join the same session post completion (after the first multicast deployment completed).
I will look into this in the next days.
-
@Sebastian-Roth Sounds good. Thanks!
-
@jmvela2x Updated to 1.5.9-RC1.11 today and am still seeing the stale task issue. Hosts are waiting for all to join session before multicast begins. Attached log sample: multicast_stale.log
-
@jmvela2x I still have this on my list but have not had the time to dig into it… Will do and let you know here.
-
-
@Sebastian-Roth I will give it a test today and let you know. Thanks!
-
@jmvela2x I’m still seeing the same issue after an update to 1.5.9-RC2.3 this morning. Attached multicast log. multicast05262020.log
-
@jmvela2x Thanks for testing and reporting back. Too bad I still haven’t worked it out. It did work in my test setup but I think I had the code changes for the other issue I found in place too when doing the tests. I shall have merged them all in but I feared those would cause other problems.
Anyhow, merged it all into
dev-branch
(RC2.6 as of now) and it would be great if you’d want to give it another try.