Multicast without registration starts OK, but hangs and disconnects clients due to timeout.
-
I am able to successfully test multicast as detailed her: https://wiki.fogproject.org/wiki/index.php/Multicasting#Troubleshooting. It works with one and two clients. When I submit a multicast task via the images tab in the WebGUI the session starts fine according to the log and I am able to boot two clients to it and begin multicasting. The first partition of my Win10 image deploys without any issues, but after it hits the second partition around 88% - 90% the cited transfer rate in Partclone starts dropping rapidly and no more progress is seen. The FOGMulticastManager cites client disconnections due to timeout. Any ideas?
-
What version of FOG are you using?
Are the fog (target) clients on the same subnet as the fog server?
Do you have igmp snooping enabled on all of your network switches. It just needs to be enabled for the vlan(s) where you are going to multicast.
I can say 95% of the multicasting issues are related to infrastructure and not the fog server. But that 5% we will still need to look into.
-
I am on FOG 1.5.8. The target clients are on the same subnet as the FOG server. I have been working with our network admin about the switch configuration and this was his last comment on my ticket (“ok Auto RP (PIM) has been configured as well as IGMP snooping, and MSDP has been enabled. Go ahead and run a test.”). For the time being the server and the target clients are all on the same subnet/VLAN/switch while I do my testing.
-
@jmvela2x The default timeout when FOG proceeds form one partition to the next in multicast mode is 10 seconds (code ref). We have seen one topic in the forums where this caused a problem some weeks ago. You might want to play with that to see if it can fix your issue.
Cancel all multicast tasks. Then edit
/var/www/html/fog/lib/service/multicasttask.class.php
, go to line 662 and change the code from this:( $i == 0 ? $maxwait * 60 : 10 ) ),
to
( $i == 0 ? $maxwait * 60 : 30 ) ),
Only changing the value of “10” to “30”. Save the file and restart your whole FOG server. Create a new multicast task and see if it’s any better.
-
@Sebastian-Roth said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
/var/www/html/fog/lib/service/multicasttask.class.php
Wouldn’t the multicast manager service need to be restarted to pick this change up? (Asking because I’m unsure)
-
@george1421 Yes it needs a restart! I did suggest to restart the whole FOG server just to make sure no old udpcast sessions are still running… Good you asked because my hint on restarting might have been not as obvious before.
-
@Sebastian-Roth said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
restart the whole FOG server
Ah yes I see it now, to fast reading all them words and stuff. Sorry…
-
@Sebastian-Roth I changed the code you referenced, removed all the old multicast tasks, rebooted the server, and am running a new 2 client multicast session now. I hope it works, but the issue is not hanging between partitions as much as it hangs most of the way through the second partition after it picks up the task and starts the image deployment. Will advise.
-
@jmvela2x We’ll see if it does make a change or not. If not then I may ask you to use two other machines for your next test just to make sure this is not a hardware issue.
-
@jmvela2x I still see the same issue. It makes it to the second partition without issue and starts to deploy. It gets to somewhere in the high 80% range and then the ‘Rate:’ starts dropping and no more progress is observed. The output from <sudo service FogMulticastManager status> shows two connections, start transfer, two disconnections, two connections, start transfer, then two disconnections and a “transfer complete” message. After that it just dies in the midst of that for some reason I cannot fathom.
-
@Sebastian-Roth I’ll start on getting another couple machines setup to test on. For what it’s worth these two machines do unicast just fine. Thanks for the hands on help and proving me right when I told management the devs are really responsive in the forums.
-
@jmvela2x Same issue as before @Sebastian-Roth. The network admin should be on site later today to do some more looking into the infrastructure side of things. In the meantime is there anything else I can try from my end?
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
ok Auto RP (PIM) has been configured
So these are these cisco catalyst switches?
-
@george1421 The switch is a Nexus 9372TX, but my network guy is onsite now and he told me the switch (the Nexus) in the test lab is not our POR in the main rack space where this solution will ultimately be deployed so he’s going to swap it out (with a Catalyst we use in our prod lab) and we will retest and update this thread with our results.
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
Same issue as before… In the meantime is there anything else I can try from my end?
Ok, looks like it’s not the hosts/clients. You might want to try manual multicast deployment testing along these lines: https://wiki.fogproject.org/wiki/index.php/Multicasting#Something_else_to_try
I have to say that have not done this myself in a long time and I am not exactly sure if the commands are still used exactly like this in FOG. Give it a try with the second partition (that seems to be more problematic than the first one) and let us know if you need assistance with the commands.
-
@Sebastian-Roth I will add this to the list of testing after the switch is swapped out for the prod Catalyst if I am still having issues.
-
@jmvela2x Multicast appears to be working after swapping out the switch. I was successfully able to image two clients concurrently sans registration (this is important to the solution as host registration would cause us a lot of extra work) in our environment.
I did notice the task is still showing in active tasks and has not cleared out. Is that an issue you are aware of?
-
@jmvela2x said:
I did notice the task is still showing in active tasks and has not cleared out. Is that an issue you are aware of?
This is a known issue in FOG 1.5.8 and should be fixed in the latest development version. You are more than welcome to update to 1.5.9-RC1 or even the current developer version as we are reliant on people actually testing the release candidate version to find bugs we can fix before the next release.
Best way is to use git:
sudo -i git clone https://github.com/FOGProject/fogproject/ cd fogproject git checkout dev-branch cd bin ./installfog.sh
-
@Sebastian-Roth I’ll give it a shot tomorrow and let you know. What the timeline one a new version making it from RC to an actual official release typically?
-
@jmvela2x said:
What the timeline one a new version making it from RC to an actual official release typically?
I have to say that we don’t have a strict release cycle established yet. The amount of people testing and reporting bugs is unpredictable and so it’s hard to predict how many bug reports we get and when. It also depends on how serious bugs are (hard to fix or easy going) as we are a small team of developers and how responsive bug reporters/testers are. I would hope we get the next release out by end of May.