Multicast interface binding issue
I have a slightly non-standard setup on Ubuntu 12.04 and I’m having multicast issues. Unicasting seems to work fine.
The setup is in out workshop. The server has multiple NICs that are assigned as follows:
- eth0 - Connected to the workshop “dirty” (public Internet) network in 192.168.3.0/24 with DHCP.
- eth1 - The 192.168.4.1/24 - Private network for the machines being imaged.
- eth2 - Not connected
- eth3 - Not connected
I’ve set up ipv4.forward and MASQUERADE in IPTables so the clients can get punch-through onto the Internet to activate software after imaging. The reason for the separate network was to make sure I didn’t contaminate the Workshop network for other users or overwork the Internet router (a prosumer device). DNSMasq is installed with its default config to provide DNS relay to the 192.168.4.0/24 clients.
FOG installs fine and reports udpcast built OK. I can form a group and submit a multicast job against it. When the clients boot, they come up and the partclone screen appears but no data is received and they sit there forever. The Task Management screen indicates the multicast task as started against all the hosts.
I’ve tried installing the official udpcast packages and building the one in the FOG distribution manually. Neither method fixes the problem but there are different symptoms. Running ps on the server either shows no sender task at all or a sender task running but with “eth0” as its interface argument instead of “eth1”. The FOG server is I did set eth1 as the multicast NIC on the FOG configuration web page so I’m wondering if this is a bug.
I’ve tried creating symbolic links in /sbin for the udpcast executables in /usr/local/sbin to make sure it wasn’t a path issue but this didn’t help.
This setup did work in 0.32 on my other machine after it had be reinstalled with the newer build of udpcast.
I would really appreciate any ideas or help.
I tried that and unfortunately, it doesn’t work for me. I checked the permissions for /etc/rc.local and tried using the full path name of service (/usr/sbin/service) to no avail. I would welcome your suggestions of what to try next.
Would it be possible for future versions of the services to syslog something meaningful if they die and try to respawn themselves a couple of times with a delay for each please?
The only guess I have as to why it died like that is it couldn’t find the mysql db. This is similar to the tftpd-hpa issue where it tries starting the service before there’s even been a network interface opened to run on.
The fix, from my testing, has been to remove all of the needed services of fog from initial boot up with:
update-rc.d mysql disable update-rc.d FOGMulticastManager disable update-rc.d FOGImageReplicator disable update-rc.d FOGScheduler disable update-rc.d tftpd-hpa disable
Then edit the /etc/rc.local file to contain:
sleep 30 && service tftpd-hpa start service mysql start service FOGMulticastManager start service FOGImageReplicator start service FOGScheduler start exit 0
That way all the services are still started at boot time, but after enough time has passed to establish network connectivity. The disabling of the services in the first code block just ensures nothing from the rc.local is trying to start something that is already started and may die.
This isn’t an upgrade, it’s a fresh install so I could write a machine build guide.
FOG was applied to a freshly installed Ubuntu 12.04LTS machine so none of the rc scripts have any custom mods.
My guess as to why it died unexpectedly was during an upgrade point. It’s probably more related to the rc.local script you’re using?
It appears not. [I]ps aux | grep -i fog[/I] returned only the grep process itself so the scheduler and replicator aren’t running either (The machine hadn’t got any scheduled jobs and isn’t part of a replication group so I don’t know if that is normal.).
When I ran [I]/etc/init.d/FOGMulticastManager restart[/I] it complained it couldn’t kill the current instance because the process for the PID it listed couldn’t be found. At this point, the multicast imaging started working.
Since the service restart went looking for a PID, my first assumption would be that the service was running but stopped unexpectedly. I guess the next job is to find out why it died…
Is the FOGMulticastManager service actually running?
I’ve changed every occurrence of eth0 to eth1 in /var/www/fog/commons/config.php. I also checked the executable path for udp-sender is correct. There’s no sign or any error but “ps aux” on the server doesn’t reveal an incidence of the sender running after the task starts even though the task list shows the tasks as started. Partclone starts on the clients but just sits there doing nothing,
What is the interface you’re trying to run from.
You’ll likely need to adjust it in /var/www/fog/commons/config.php for the multicast interface.
The only nasties I can find in the logs were:
- Apache log: Favicon.ico missing.
Nothing that looks bad in kern.log or dmesg.
what version of fog 1.+ are you running?