Wiki errors - "Troubleshooting a multicast"

3mu

I’m trying to get Multicast working and I think there is an error in the Wiki article, but don’t know enough about FOG yet to go asking for edit access myself.

In the “Testing 2 Clients” section,

Success: If the clients start imaging then your network and multicast settings an are correct.

should say

Success: If the clients display the contents of your .fogsettings file  then your network and multicast settings (an) are correct.

as it is sending the same file as in “Testing 1 client.”

The article also says that FOG_UDPCAST_STARTINGPORT default is 63100 however my installation (1.5.8) had a default value of 56944. Is this value critical?

george1421

Multicasting is a beast upon its own. With that said, 50% of the issues are related to network infrastructure and the rest with FOG.

So for your network, do you have igmp snooping turned on for the VLAN(S) where your FOG server is as well as target computers?

Is the FOG server and target computers on the same subnet?

3mu

Another helpful addition - the “Something else to try” section has

gunzip -c "/images/anyimagename/file"

The -S switch is helpful here, as the default extension for image files isn’t the .gz that gunzip expects:

gunzip  -S ".img" -c "/images/anyimagename/file"

3mu

@george1421 - I only have a single unmanaged switch and a basic broadband router, so any multicast traffic will be treated as a broadcast (I’m planning to use FOG to maintain PCs for Scouts, Guides and the other groups that I’m involved with, so this is being done as cheap as possible). All machines are on the same L2 network.

I have successfully completed the “Testing 1 client” and “Testing 2 client” tests, and started a multicast session a few minutes ago with two computers joining, but I don’t think that the image is actually being deployed. All that I get in the log is

[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 image file found, file: /images/T500Linux19.10desktopclean
[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 2 clients found
[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 sending on base port 60328
[04-06-20 11:52:39 pm] | Command: /usr/local/sbin/udp-sender --interface dev --min-receivers 2 --max-wait 600 --portbase 60328 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/T500Linux19.10desktopclean/d1p1.img;
[04-06-20 11:52:39 pm] | Task ID: 6 Name: Multi-Cast Task - Multicast t500 has started

every 10 seconds. I’m too tired to continue now, so will have another crack tomorrow and post in the technical section - this post was just to save some time for anyone else who needs to troubleshoot.

Thanks.

george1421

@3mu said in Wiki errors - "Troubleshooting a multicast":

I’m not saying this is the issue, but I would remove any special characters out of the file name. While the dot is a legal characters for both http and linux, I would still remove that out of the name. The simplest way is to go into the linux console on the fog server and use the mv command something like mv /images/T500Linux19.10desktopclean /images/T500Linux19-10desktopclean then go into the web ui and image definition and update it there.

How many systems are you going to image at one time? A single subnet is good. That eliminates a lot of the issues. Its possible that unmanaged switch will cause us a problem (not being fast enough to send the data out to multiple ports at the same time). We can do some debugging by putting the target computer into debug mode then load udp-receiver and then issue the udp-sender command from the fog server.

I also noticed that you need to FIX your interface settings in the fog configuration->FOG Settings page. Your interface is dev that’s not right. You need to get the network name of the ethernet interface. You can get the interface name with ip addr show.

# This is wrong >> dev <<
/usr/local/sbin/udp-sender --interface >> dev << --min-receivers 2 --max-wait 600 --portbase 60328

Depending on the network adapter it should be something like ens192 or eth0 You will find this under FOG Configuration->FOG Settings->Multicast Settings->UDPCAST INTERFACE

Sebastian Roth

@3mu Thanks, updated the wiki.

3mu

@george1421 said in Wiki errors - "Troubleshooting a multicast":

/images/T500Linux19-10desktopclean

Thanks, @george1421. I’ve changed the image name, but the UDPCAST INTERFACE is already set to ens3 (my FOG server is running under KVM). I’ve tried changing it to a dummy name, saving and then changing it back, but it is still reporting “–interface dev”. That explains why the troubleshooting tests work - they use the correct interface.

I’ve also noticed that the FOG_UDPCAST_STARTINGPORT changes. For my first test this morning it was 56494. I rebooted and now it is 62822.

I’m expecting to image no more than 20 machines at once. It won’t be often so it’s not critical that I get it working, but it could help others if I do.

george1421

@3mu said in Wiki errors - "Troubleshooting a multicast":

still reporting “–interface dev”

When you start a multicast stream, on the linux server console run the ps command to look to see what the running image is saying. ps aux|grep udp-send See if the running command also has the incorrect interface.

Now there is also a multicast service that is running in the background too that may be holding onto the value when it was started. But you said it was set correctly in the fog settings to begin with???

There is also in the storage node configuration for the master node, it also calls out an interface. But, the mutlcasting should be using the one from the fog configuration, I believe the storage node is only used for bandwidth calculations.

3mu

@george1421 - I can’t remember if it picked up the correct interface on install or if I changed it later when preparing to try multicasting, but it has been set to ens3 for at least a few weeks.

ps only gives:

[username]     18913  0.0  0.0  13136  1092 pts/0    S+   01:39   0:00 grep --color=auto udp-send

The multicast task appears as “queued” for each host in the Active Tasks pane, and the multicast log continues as if all is well:

[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 image file found, file: /images/T500Linux19-10desktopclean
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 2 clients found
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 sending on base port 61792
[04-07-20 11:59:25 am] | Command: /usr/local/sbin/udp-sender --interface dev --min-receivers 2 --max-wait 600 --portbase 61792 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/T500Linux19-10desktopclean/d1p1.img;
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 has started

If I start a multicast from the command line with

sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log  --ttl 1 --nopointopoint

I get

root     20233  0.0  0.1  66696  4268 pts/1    S+   01:48   0:00 sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint
root     20234  0.0  0.0   8884   896 pts/1    S+   01:48   0:00 udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint
[username]     20351  0.0  0.0  13136  1000 pts/0    S+   01:49   0:00 grep --color=auto udp-send

Sebastian Roth

@3mu said in Wiki errors - "Troubleshooting a multicast":

ps only gives:

[username]     18913  0.0  0.0  13136  1092 pts/0    S+   01:39   0:00 grep --color=auto udp-send

On first sight I would guess that it tries to start udp-sender but fails due to it not finding the interface dev and will quit.

george1421

@3mu Will you show us a screen shot of the fog configuration->multicast settings page. I’m confused where its getting the dev name from. You don’t have the network interface setup as /dev/ens3 for some reason?

3mu

@george1421 -

There don’t appear to be any spaces before or after “ens3” - I deleted the contents and typed it in again.

george1421

@3mu Well two things.

So far no one else has reported this strange behavior. That doesn’t mean your install isn’t doing this, it just means there is something unique with your setup because if this was a systematic issue (programming) everyone would have the same issue.
I guess we need to reverse engineer where that interface name comes from.

Just to confirm you have rebooted your fog server since setting that field?

george1421

@3mu Looking at the code (I’m not a programmer) but it looks like the multicast task service is what creates the udp-send task and command syntax. I still haven’t found what database record is used to source the interface name.

Sebastian Roth

@3mu @george1421 Those settings were used in the past but nowadays FOG tries to determine the interface no matter what setting you have from the system. I know we should have removed those settings at some point but I guess someone forgot. Code where interface is being determined for multicast: 1, 2, 3

@3mu Please run the following command as root and post output here: /sbin/ip route
The IP address configured for the storage node might play a role here too. Please post the IP set for the storage node as well.

3mu

@george1421 - Current uptime was 1 day 14 hours. I have just now:

Cleared all tasks, rebooted and created a new multicast task - the log file still says “–interface dev”.
Cleared the tasks, changed the multicast interface to “eth0” (which doesn’t exist) and created a new multicast task - same result.
Cleared the tasks, rebooted and created a new multicast task - same result.

The only things running on this server are FOG and dnsmasq. It is a VM under KVM and has 4GB RAM and a single processor. I have done a single in-place upgrade from 1.5.7 since my original install. I did have to recover the MySQL password (I was distracted and lost the new password before I could record it), but I recovered it without issue.

I have also updated the fogservice.class.php and FOGSnapinReplicator.service files as per the Interface not ready, waiting for it to come up post.

Sebastian Roth

@3mu said in Wiki errors - "Troubleshooting a multicast":

I have also updated the fogservice.class.php and FOGSnapinReplicator.service files as per the Interface not ready, waiting for it to come up post.

Good point. I just remembered that too.

Please run the following command as root and post output here: /sbin/ip route
The IP address configured for the storage node might play a role here too. Please post the IP set for the storage node as well.

3mu

@Sebastian-Roth said in Wiki errors - "Troubleshooting a multicast":

/sbin/ip route

default via 10.0.0.138 dev ens3 proto dhcp src 10.0.0.203 metric 100
10.0.0.0/24 dev ens3 proto kernel scope link src 10.0.0.203
10.0.0.138 dev ens3 proto dhcp scope link src 10.0.0.203 metric 100

(run under sudo)

Storage Node config:

Sebastian Roth

@3mu said in Wiki errors - "Troubleshooting a multicast":

default via 10.0.0.138 dev ens3 proto dhcp src 10.0.0.203 metric 100

Here we’ve got it. You seem to have configured your network interface ens3 via DHCP - looks like it from the output. While our code should not fail to find the right interface I still wonder why you would configure a servers IP via DHCP!?

Just configure the same IP but static in Linux network settings, restart your FOG server and it will most probably work!

3mu

@Sebastian-Roth - I look after a network with over 600 servers at work. The ones that cause me problems are the ones with static IP addresses - generally because somebody makes a typo when they are configuring an interface. We also do a lot of migrations, mergers and other changes, so reserved addresses in DHCP make that much easier, as well as enforcing consistency and a basic level of self-documentation. DHCP also makes it easier to deploy in environments where the user does not have control of the network.

I’m sorry for taking up your time, @george1421, and thank you both.

Wiki errors - "Troubleshooting a multicast"

158

12.0k

17.3k

155.2k