Wiki errors - "Troubleshooting a multicast"
-
@3mu Thanks, updated the wiki.
-
@george1421 said in Wiki errors - "Troubleshooting a multicast":
/images/T500Linux19-10desktopclean
Thanks, @george1421. I’ve changed the image name, but the UDPCAST INTERFACE is already set to ens3 (my FOG server is running under KVM). I’ve tried changing it to a dummy name, saving and then changing it back, but it is still reporting “–interface dev”. That explains why the troubleshooting tests work - they use the correct interface.
I’ve also noticed that the FOG_UDPCAST_STARTINGPORT changes. For my first test this morning it was 56494. I rebooted and now it is 62822.
I’m expecting to image no more than 20 machines at once. It won’t be often so it’s not critical that I get it working, but it could help others if I do.
-
@3mu said in Wiki errors - "Troubleshooting a multicast":
still reporting “–interface dev”
When you start a multicast stream, on the linux server console run the ps command to look to see what the running image is saying.
ps aux|grep udp-send
See if the running command also has the incorrect interface.Now there is also a multicast service that is running in the background too that may be holding onto the value when it was started. But you said it was set correctly in the fog settings to begin with???
There is also in the storage node configuration for the master node, it also calls out an interface. But, the mutlcasting should be using the one from the fog configuration, I believe the storage node is only used for bandwidth calculations.
-
@george1421 - I can’t remember if it picked up the correct interface on install or if I changed it later when preparing to try multicasting, but it has been set to ens3 for at least a few weeks.
ps only gives:
[username] 18913 0.0 0.0 13136 1092 pts/0 S+ 01:39 0:00 grep --color=auto udp-send
The multicast task appears as “queued” for each host in the Active Tasks pane, and the multicast log continues as if all is well:
[04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 image file found, file: /images/T500Linux19-10desktopclean [04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 2 clients found [04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 sending on base port 61792 [04-07-20 11:59:25 am] | Command: /usr/local/sbin/udp-sender --interface dev --min-receivers 2 --max-wait 600 --portbase 61792 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/T500Linux19-10desktopclean/d1p1.img; [04-07-20 11:59:25 am] | Task ID: 10 Name: Multi-Cast Task - Multicast t500 has started
If I start a multicast from the command line with
sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint
I get
root 20233 0.0 0.1 66696 4268 pts/1 S+ 01:48 0:00 sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint root 20234 0.0 0.0 8884 896 pts/1 S+ 01:48 0:00 udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint [username] 20351 0.0 0.0 13136 1000 pts/0 S+ 01:49 0:00 grep --color=auto udp-send
-
@3mu said in Wiki errors - "Troubleshooting a multicast":
ps only gives:
[username] 18913 0.0 0.0 13136 1092 pts/0 S+ 01:39 0:00 grep --color=auto udp-send
On first sight I would guess that it tries to start udp-sender but fails due to it not finding the interface
dev
and will quit. -
@3mu Will you show us a screen shot of the fog configuration->multicast settings page. I’m confused where its getting the
dev
name from. You don’t have the network interface setup as/dev/ens3
for some reason? -
There don’t appear to be any spaces before or after “ens3” - I deleted the contents and typed it in again. -
@3mu Well two things.
- So far no one else has reported this strange behavior. That doesn’t mean your install isn’t doing this, it just means there is something unique with your setup because if this was a systematic issue (programming) everyone would have the same issue.
- I guess we need to reverse engineer where that interface name comes from.
Just to confirm you have rebooted your fog server since setting that field?
-
@3mu Looking at the code (I’m not a programmer) but it looks like the multicast task service is what creates the udp-send task and command syntax. I still haven’t found what database record is used to source the interface name.
-
@3mu @george1421 Those settings were used in the past but nowadays FOG tries to determine the interface no matter what setting you have from the system. I know we should have removed those settings at some point but I guess someone forgot. Code where interface is being determined for multicast: 1, 2, 3
@3mu Please run the following command as root and post output here:
/sbin/ip route
The IP address configured for the storage node might play a role here too. Please post the IP set for the storage node as well. -
@george1421 - Current uptime was 1 day 14 hours. I have just now:
- Cleared all tasks, rebooted and created a new multicast task - the log file still says “–interface dev”.
- Cleared the tasks, changed the multicast interface to “eth0” (which doesn’t exist) and created a new multicast task - same result.
- Cleared the tasks, rebooted and created a new multicast task - same result.
The only things running on this server are FOG and dnsmasq. It is a VM under KVM and has 4GB RAM and a single processor. I have done a single in-place upgrade from 1.5.7 since my original install. I did have to recover the MySQL password (I was distracted and lost the new password before I could record it), but I recovered it without issue.
I have also updated the fogservice.class.php and FOGSnapinReplicator.service files as per the Interface not ready, waiting for it to come up post.
-
@3mu said in Wiki errors - "Troubleshooting a multicast":
I have also updated the fogservice.class.php and FOGSnapinReplicator.service files as per the Interface not ready, waiting for it to come up post.
Good point. I just remembered that too.
Please run the following command as root and post output here:
/sbin/ip route
The IP address configured for the storage node might play a role here too. Please post the IP set for the storage node as well. -
@Sebastian-Roth said in Wiki errors - "Troubleshooting a multicast":
/sbin/ip route
default via 10.0.0.138 dev ens3 proto dhcp src 10.0.0.203 metric 100 10.0.0.0/24 dev ens3 proto kernel scope link src 10.0.0.203 10.0.0.138 dev ens3 proto dhcp scope link src 10.0.0.203 metric 100
(run under sudo)
Storage Node config:
-
@3mu said in Wiki errors - "Troubleshooting a multicast":
default via 10.0.0.138 dev ens3 proto dhcp src 10.0.0.203 metric 100
Here we’ve got it. You seem to have configured your network interface ens3 via DHCP - looks like it from the output. While our code should not fail to find the right interface I still wonder why you would configure a servers IP via DHCP!?
Just configure the same IP but static in Linux network settings, restart your FOG server and it will most probably work!
-
@Sebastian-Roth - I look after a network with over 600 servers at work. The ones that cause me problems are the ones with static IP addresses - generally because somebody makes a typo when they are configuring an interface. We also do a lot of migrations, mergers and other changes, so reserved addresses in DHCP make that much easier, as well as enforcing consistency and a basic level of self-documentation. DHCP also makes it easier to deploy in environments where the user does not have control of the network.
I’m sorry for taking up your time, @george1421, and thank you both.
-
@3mu I will be looking into fixing this issue! Though I can’t promise you when the next release will be out. Would you want me to post the fix here so you could manually add it?
-
That would be great, thanks @Sebastian-Roth. I’m happy to test.
-
@3mu Sorry for the delay. I just had a play with this and I think it’s best to go for a simple fix: https://github.com/FOGProject/fogproject/commit/21460c1d8ba7dae1b2988b9287c188595ce01e9d
-
Thanks, @Sebastian-Roth. I’ll test as soon as I finish sorting out a few disasters - hopefully in the next day.
-
@Sebastian-Roth - Sorry for the delay - some other unrelated issues and then a pandemic to deal with! The fix works perfectly.