Multicast session is not starting
-
Hi,
We are experiencing an issue at a new location where the Multicast session is not starting. See the attached picture. It will go all the way to the partclone screen, and just hang there. We have verified that Unicast image works just fine, using the same image, on the same machines. We can even do multiple systems with that image via Unicast. We’ve been trying with 2 systems, and also tried with 2 different systems.
We setup FOG with default settings, using Ubuntu 20.04 LTS and then we tried again with 18.04 LTS. Changing only IP and passwords really. Any help would be appreciated!
-
@dmcadams Has multicast imaging ever worked on your campus? Multicasting is a different beast than unicast imaging. Multicasting relies a lot on how your network infrastructure is setup.
Are the target computes on the same ip subnet as the fog server?
-
@george1421
At this site, we have not had Multicast working. We do have 6 other sites using the same type of setup and networking gear. Those sites we have Multicast working, and we don’t recall needing to do anything special to get it working, other than tweaking some performance settings like IGMP.
Yes, they are on the same IP subnet as the Fog server.Here is a snip of the Multicast.log that I just attempted:
[09-30-21 1:44:21 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is new [09-30-21 1:44:21 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast image file found, file: /images/TournamentImage [09-30-21 1:44:21 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast 2 clients found [09-30-21 1:44:21 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast sending on base port 59270 [09-30-21 1:44:21 pm] | Command: /usr/local/sbin/udp-sender --interface ens160 --min-receivers 2 --max-wait 600 --mcast-rdv-address 192.168.48.4 --portbase 59270 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/TournamentImage/d1p1.img;/usr/local/sbin/udp-sender --interface ens160 --min-receivers 2 --max-wait 10 --mcast-rdv-address 192.168.48.4 --portbase 59270 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/TournamentImage/d1p2.img;/usr/local/sbin/udp-sender --interface ens160 --min-receivers 2 --max-wait 10 --mcast-rdv-address 192.168.48.4 --portbase 59270 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/TournamentImage/d1p3.img;/usr/local/sbin/udp-sender --interface ens160 --min-receivers 2 --max-wait 10 --mcast-rdv-address 192.168.48.4 --portbase 59270 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/TournamentImage/d1p4.img; [09-30-21 1:44:21 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast has started [09-30-21 1:44:31 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:44:41 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:44:51 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:45:01 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:45:11 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:45:21 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:45:31 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:45:41 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:45:51 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317 [09-30-21 1:46:01 pm] | Task ID: 5 Name: Multi-Cast Task - Multicast is already running with pid: 3317
-
@dmcadams said in Multicast session is not starting:
We do have 6 other sites using the same type of setup and networking gear.
Exact same switches and configuration?
-
@dmcadams Is the fog server and target systems on the same IP subnet?
Yes igmp snooping is needed. But this issue is more fundamental.
-
@sebastian-roth
Yes, same hardware and same configuration. At least as 2 other sites.
A couple of other sites have some differences.Is there a better log that I can look at to see why its just hanging there? I was hoping for one of those debug modes but for Multicast, or that the Multicast log would show something.
-
@george1421
Yes they are all on the 192.168.48.0/24 network. -
@dmcadams First make sure no older udpcast processes are still running on the server. For that cancel all your multicast tasks in the FOG web UI and use the following commands to check and kill old processes:
ps ax | grep udp-sender killall -9 udp-sender
From the udp-sender commands we see that you have FOG setting FOG_MULTICAST_RENDEZVOUS set to 192.168.48.4. You sure this is correct?
For manual testing multicast in your network you might do the following:
- Schedule a normal deploy task (not multicast) for two test hosts and let them do the imaging. This step is just to ensure they have the partition layout set correctly for the next steps. If you have deployed this image to the test machines before then you can skip this first step.
- Nows schedule a debug UNICAST deploy task for two test machines in the FOG web UI.
- Start a udpcast sender on the FOG server console:
/usr/local/sbin/udp-sender --interface ens160 --min-receivers 2 --max-wait 600 --mcast-rdv-address 192.168.48.4 --portbase 59270 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/TournamentImage/d1p1.img
- Boot up the two machines to the point where you get to the terminal console and enter the following command manually:
udp-receiver --nokbd --portbase 59270 --ttl 32 --mcast-rdv-address 192.168.48.4 | gunzip > /dev/sda1
-
@sebastian-roth
Thanks for the great information. We will give that a try. Also, the FOG_MULTICAST_RENDEZVOUS was actually one of our last attempts. I will revert that back to the default of blank first, and remove that portion from the commands that you sent.These client machines also have NVMe drives, so on the receiver side, instead of “/dev/sda1” should we use “/dev/nvme0”?
-
@dmcadams said in Multicast session is not starting:
These client machines also have NVMe drives, so on the receiver side, instead of “/dev/sda1” should we use “/dev/nvme0”?
Yes, for the first partition use
/dev/nvme0n1p1