Multicast without registration starts OK, but hangs and disconnects clients due to timeout.
-
@Sebastian-Roth said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
restart the whole FOG server
Ah yes I see it now, to fast reading all them words and stuff. Sorry…
-
@Sebastian-Roth I changed the code you referenced, removed all the old multicast tasks, rebooted the server, and am running a new 2 client multicast session now. I hope it works, but the issue is not hanging between partitions as much as it hangs most of the way through the second partition after it picks up the task and starts the image deployment. Will advise.
-
@jmvela2x We’ll see if it does make a change or not. If not then I may ask you to use two other machines for your next test just to make sure this is not a hardware issue.
-
@jmvela2x I still see the same issue. It makes it to the second partition without issue and starts to deploy. It gets to somewhere in the high 80% range and then the ‘Rate:’ starts dropping and no more progress is observed. The output from <sudo service FogMulticastManager status> shows two connections, start transfer, two disconnections, two connections, start transfer, then two disconnections and a “transfer complete” message. After that it just dies in the midst of that for some reason I cannot fathom.
-
@Sebastian-Roth I’ll start on getting another couple machines setup to test on. For what it’s worth these two machines do unicast just fine. Thanks for the hands on help and proving me right when I told management the devs are really responsive in the forums.
-
@jmvela2x Same issue as before @Sebastian-Roth. The network admin should be on site later today to do some more looking into the infrastructure side of things. In the meantime is there anything else I can try from my end?
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
ok Auto RP (PIM) has been configured
So these are these cisco catalyst switches?
-
@george1421 The switch is a Nexus 9372TX, but my network guy is onsite now and he told me the switch (the Nexus) in the test lab is not our POR in the main rack space where this solution will ultimately be deployed so he’s going to swap it out (with a Catalyst we use in our prod lab) and we will retest and update this thread with our results.
-
@jmvela2x said in Multicast without registration starts OK, but hangs and disconnects clients due to timeout.:
Same issue as before… In the meantime is there anything else I can try from my end?
Ok, looks like it’s not the hosts/clients. You might want to try manual multicast deployment testing along these lines: https://wiki.fogproject.org/wiki/index.php/Multicasting#Something_else_to_try
I have to say that have not done this myself in a long time and I am not exactly sure if the commands are still used exactly like this in FOG. Give it a try with the second partition (that seems to be more problematic than the first one) and let us know if you need assistance with the commands.
-
@Sebastian-Roth I will add this to the list of testing after the switch is swapped out for the prod Catalyst if I am still having issues.
-
@jmvela2x Multicast appears to be working after swapping out the switch. I was successfully able to image two clients concurrently sans registration (this is important to the solution as host registration would cause us a lot of extra work) in our environment.
I did notice the task is still showing in active tasks and has not cleared out. Is that an issue you are aware of?
-
@jmvela2x said:
I did notice the task is still showing in active tasks and has not cleared out. Is that an issue you are aware of?
This is a known issue in FOG 1.5.8 and should be fixed in the latest development version. You are more than welcome to update to 1.5.9-RC1 or even the current developer version as we are reliant on people actually testing the release candidate version to find bugs we can fix before the next release.
Best way is to use git:
sudo -i git clone https://github.com/FOGProject/fogproject/ cd fogproject git checkout dev-branch cd bin ./installfog.sh
-
@Sebastian-Roth I’ll give it a shot tomorrow and let you know. What the timeline one a new version making it from RC to an actual official release typically?
-
@jmvela2x said:
What the timeline one a new version making it from RC to an actual official release typically?
I have to say that we don’t have a strict release cycle established yet. The amount of people testing and reporting bugs is unpredictable and so it’s hard to predict how many bug reports we get and when. It also depends on how serious bugs are (hard to fix or easy going) as we are a small team of developers and how responsive bug reporters/testers are. I would hope we get the next release out by end of May.
-
@jmvela2x I updated to 1.5.9-RC1.4, changed line 662 per your earlier instructions and tried to kick off a multicast task. The log shows the task start, then completed, then killed, then completed again, and then removes itself from active tasks. This seems like step backwards. Am I missing something?
-
@jmvela2x Can you please post the relevant part of the log here?
-
[05-05-20 12:08:27 pm] ================================== === ==== ===== ==== === ========= == === == === === ======== ==== == ==== === === ======== ==== == ========= === ==== ==== == ========= === ======== ==== == === === === ======== ==== == ==== === === ========= == === == === === ========== ===== ==== ================================== ===== Free Opensource Ghost ====== ================================== ============ Credits ============= = https://fogproject.org/Credits = ================================== == Released under GPL Version 3 == ================================== [05-05-20 12:08:27 pm] Interface Ready with IP Address: 10.132.81.150 [05-05-20 12:08:27 pm] Interface Ready with IP Address: 127.0.0.1 [05-05-20 12:08:27 pm] Interface Ready with IP Address: 127.0.1.1 [05-05-20 12:08:27 pm] Interface Ready with IP Address: 192.168.122.1 [05-05-20 12:08:27 pm] Interface Ready with IP Address: f223pxefog.fm.intel.com [05-05-20 12:08:27 pm] * Starting MulticastManager Service [05-05-20 12:08:27 pm] * Checking for new items every 10 seconds [05-05-20 12:08:27 pm] * Starting service loop [05-05-20 12:08:27 pm] * No new tasks found [05-05-20 12:08:38 pm] * No new tasks found [05-05-20 12:08:48 pm] * No new tasks found [05-05-20 12:08:58 pm] * No new tasks found [05-05-20 12:09:08 pm] * No new tasks found [05-05-20 12:09:18 pm] * No new tasks found [05-05-20 12:09:28 pm] * No new tasks found [05-05-20 12:09:38 pm] * No new tasks found [05-05-20 12:09:48 pm] | Task ID: 22 Name: Test is new [05-05-20 12:09:48 pm] | Task ID: 22 Name: Test image file found, file: /images/Ubuntu-16.04-Legacy [05-05-20 12:09:48 pm] | Task ID: 22 Name: Test 2 clients found [05-05-20 12:09:48 pm] | Task ID: 22 Name: Test sending on base port 53110 [05-05-20 12:09:48 pm] | Command: /usr/local/sbin/udp-sender --interface eno1 --min-receivers 2 --max-wait 36000 --mcast-rdv-address 10.132.81.150 --portbase 53110 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Ubuntu-16.04-Legacy/d1p1.img;/usr/local/sbin/udp-sender --interface eno1 --min-receivers 2 --max-wait 30 --mcast-rdv-address 10.132.81.150 --portbase 53110 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Ubuntu-16.04-Legacy/d1p2.img; [05-05-20 12:09:48 pm] | Task ID: 22 Name: Test has started [05-05-20 12:09:58 pm] | Task ID: 22 Name: Test has been completed [05-05-20 12:09:58 pm] | Task ID: 22 Name: Test has been killed [05-05-20 12:09:58 pm] | Task ID: 22 Name: Test is now completed [05-05-20 12:10:08 pm] * No new tasks found [05-05-20 12:10:18 pm] * No new tasks found [05-05-20 12:10:28 pm] * No new tasks found [05-05-20 12:10:38 pm] * No new tasks found [05-05-20 12:10:48 pm] * No new tasks found [05-05-20 12:10:58 pm] * No new tasks found [05-05-20 12:11:08 pm] * No new tasks found [05-05-20 12:11:18 pm] * No new tasks found [05-05-20 12:11:28 pm] | Task ID: 23 Name: Test is new [05-05-20 12:11:28 pm] | Task ID: 23 Name: Test image file found, file: /images/Ubuntu-16.04-Legacy [05-05-20 12:11:28 pm] | Task ID: 23 Name: Test 2 clients found [05-05-20 12:11:28 pm] | Task ID: 23 Name: Test sending on base port 60722 [05-05-20 12:11:28 pm] | Command: /usr/local/sbin/udp-sender --interface eno1 --min-receivers 2 --max-wait 36000 --mcast-rdv-address 10.132.81.150 --portbase 60722 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Ubuntu-16.04-Legacy/d1p1.img;/usr/local/sbin/udp-sender --interface eno1 --min-receivers 2 --max-wait 30 --mcast-rdv-address 10.132.81.150 --portbase 60722 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/Ubuntu-16.04-Legacy/d1p2.img; [05-05-20 12:11:28 pm] | Task ID: 23 Name: Test has started [05-05-20 12:11:38 pm] | Task ID: 23 Name: Test has been completed [05-05-20 12:11:38 pm] | Task ID: 23 Name: Test has been killed [05-05-20 12:11:38 pm] | Task ID: 23 Name: Test is now completed [05-05-20 12:11:48 pm] * No new tasks found [05-05-20 12:11:58 pm] * No new tasks found [05-05-20 12:12:08 pm] * No new tasks found [05-05-20 12:12:18 pm] * No new tasks found [05-05-20 12:12:29 pm] * No new tasks found [05-05-20 12:12:39 pm] * No new tasks found [05-05-20 12:12:49 pm] * No new tasks found [05-05-20 12:12:59 pm] * No new tasks found
-
@jmvela2x I didn’t even have the chance to join the clients to the session.
-
@jmvela2x I see something potentially relevant in the status output of the FOGMulticastManager service.
“PHP Warning: proc_get_status(): supplied resource is not a”(vailable) I presume is the ending, but it’s cut off in the terminal.
-
@Sebastian-Roth I may have an opportunity coming up soon to test this in our production environment, but without multicast working in 1.5.9-RC1.4 I will lose the chance. Barring some kind of fix in the very near future for this issue, can you advice on how to roll back to 1.5.8 so I can at least test functionality of multicast cross-subnet in terms of bandwidth usage, etc when the opportunity presents itself?