I’m having a lot of trouble getting multicast working in my environment. I work for a University where our networks are all segmented into multiple VLANs, and I don’t have the ability to put FOG on the same VLAN as the clients - we have about 10 computer labs and our FOG server is a VMWare virtual machine in our data center. Each computer lab is on its own VLAN, and the data center is as well.
Our network team has a lot of controls on the network that I cannot see, but they assigned me a multicast range of 239.132.20.0 /24. Since this doesn’t match the default that FOG uses, I upgraded to the Trunk version of FOG to allow me to customize that. The network team setup a test multicast node within that multicast range and ran several small tests from their utilities across all of my VLANs and have had no hiccups at all, leading them to believe that the problem lies on the FOG server somewhere.
Here’s where I’m at on the server:
OS: Ubuntu Server 14.04.3 LTS
FOG Trunk revision: 4623
On the FOG server, I have a small 8GB image for testing. Under FOG Configuration -> FOG Settings -> Multicast Settings, I have configured the following:
FOG_UDPCAST_INTERFACE: eth0
FOG_UDPCAST_STARTINGPORT: 63072
FOG_MULTICAST_MAX_SESSIONS: 64
FOG_UDPCAST_MAXWAIT : 10
FOG_MULTICAST_ADDRESS: 239.132.20.10
FOG_MULTICAST_PORT_OVERRIDE: 0
FOG_MULTICAST_DUPLEX: FULL_DUPLEX
I have registered two test hosts, and am attempting to get one working before trying two. I’ve double-checked under Active Tasks and Active Multicast tasks to ensure nothing is currently running. I’ve rebooted the server and restarted FOGMulticastManager, FOGImageReplicator, and FOGScheduler services (in case they didn’t start properly).
When trying to multicast to a single client, I get inconsistent results. Most of the time, the client gets stuck at “Starting to restore image ( - ) to device (/dev/sda1).” Other times, it’ll get past that point and show a status bar, but the bar will be stuck at 0% with a transfer rate of 0.00bytes/min. The latter of which is incredibly difficult for me to replicate, I’ve found. Most of the time, I’m stuck at the former.
Here’s what the multicast.log looks like:
[01-09-16 6:31:41 pm] Interface Ready with IP Address: 10.133.132.127
[01-09-16 6:31:41 pm] * Starting MulticastManager Service
[01-09-16 6:31:41 pm] * Checking for new items every 10 seconds
[01-09-16 6:31:41 pm] * Starting service loop
[01-09-16 6:31:41 pm] * No tasks found!
[01-09-16 6:31:51 pm] * No tasks found!
[01-09-16 6:32:01 pm] * No tasks found!
[01-09-16 6:32:11 pm] | Sleeping for 10 seconds to ensure tasks are properly submitted
[01-09-16 6:32:21 pm] | 0 tasks to be cleaned
[01-09-16 6:32:21 pm] | 1 task found
[01-09-16 6:32:21 pm] | Task (11) Multi-Cast Task is new!
[01-09-16 6:32:21 pm] | Task (11) Multi-Cast Task has been cleaned.
[01-09-16 6:32:21 pm] | Task (11) /images/Test162016 image file found.
[01-09-16 6:32:21 pm] | Task (11) 1 client(s) found.
Udp-sender 20120424
[01-09-16 6:32:21 pm] | Task (11) Multi-Cast Task sending on base port: 64512
UDP sender for (stdin) at 10.133.132.127 on eth0
Broadcasting control to 224.0.0.1
[01-09-16 6:32:21 pm] | CMD: cat /images/Test162016/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 600 --mcast-data-address 239.132.20.10 --portbase 64512 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/Test162016/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 600 --mcast-data-address 239.132.20.10 --portbase 64512 --full-duplex --ttl 32 --nokbd --nopointopoint;
[01-09-16 6:32:21 pm] | Task (11) Multi-Cast Task has started.
[01-09-16 6:32:31 pm] | 0 tasks to be cleaned
[01-09-16 6:32:31 pm] | 1 task found
[01-09-16 6:32:31 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:32:41 pm] | 0 tasks to be cleaned
[01-09-16 6:32:41 pm] | 1 task found
[01-09-16 6:32:41 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:32:51 pm] | 0 tasks to be cleaned
[01-09-16 6:32:51 pm] | 1 task found
[01-09-16 6:32:51 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:33:01 pm] | 0 tasks to be cleaned
[01-09-16 6:33:01 pm] | 1 task found
[01-09-16 6:33:01 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:33:11 pm] | 0 tasks to be cleaned
[01-09-16 6:33:11 pm] | 1 task found
[01-09-16 6:33:11 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:33:21 pm] | 0 tasks to be cleaned
[01-09-16 6:33:21 pm] | 1 task found
[01-09-16 6:33:21 pm] | Task (11) Multi-Cast Task is already running PID 3793
New connection from 10.20.129.47 (#0) 00000009
Starting transfer: 00000009
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:33:31 pm] | 0 tasks to be cleaned
[01-09-16 6:33:31 pm] | 1 task found
[01-09-16 6:33:31 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:33:41 pm] | 0 tasks to be cleaned
[01-09-16 6:33:41 pm] | 1 task found
[01-09-16 6:33:41 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:33:51 pm] | 0 tasks to be cleaned
[01-09-16 6:33:51 pm] | 1 task found
[01-09-16 6:33:51 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:34:02 pm] | 0 tasks to be cleaned
[01-09-16 6:34:02 pm] | 1 task found
[01-09-16 6:34:02 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:34:12 pm] | 0 tasks to be cleaned
[01-09-16 6:34:12 pm] | 1 task found
[01-09-16 6:34:12 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:34:22 pm] | 0 tasks to be cleaned
[01-09-16 6:34:22 pm] | 1 task found
[01-09-16 6:34:22 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:34:32 pm] | 0 tasks to be cleaned
[01-09-16 6:34:32 pm] | 1 task found
[01-09-16 6:34:32 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:34:42 pm] | 0 tasks to be cleaned
[01-09-16 6:34:42 pm] | 1 task found
[01-09-16 6:34:42 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:34:52 pm] | 0 tasks to be cleaned
[01-09-16 6:34:52 pm] | 1 task found
[01-09-16 6:34:52 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:35:02 pm] | 0 tasks to be cleaned
[01-09-16 6:35:02 pm] | 1 task found
[01-09-16 6:35:02 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:35:12 pm] | 0 tasks to be cleaned
[01-09-16 6:35:12 pm] | 1 task found
[01-09-16 6:35:12 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:35:22 pm] | 0 tasks to be cleaned
[01-09-16 6:35:22 pm] | 1 task found
[01-09-16 6:35:22 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:35:32 pm] | 0 tasks to be cleaned
[01-09-16 6:35:32 pm] | 1 task found
[01-09-16 6:35:32 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:35:42 pm] | 0 tasks to be cleaned
[01-09-16 6:35:42 pm] | 1 task found
[01-09-16 6:35:42 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:35:52 pm] | 0 tasks to be cleaned
[01-09-16 6:35:52 pm] | 1 task found
[01-09-16 6:35:52 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:36:02 pm] | 0 tasks to be cleaned
[01-09-16 6:36:02 pm] | 1 task found
[01-09-16 6:36:02 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:36:12 pm] | 0 tasks to be cleaned
[01-09-16 6:36:12 pm] | 1 task found
[01-09-16 6:36:12 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:36:22 pm] | 0 tasks to be cleaned
[01-09-16 6:36:22 pm] | 1 task found
[01-09-16 6:36:22 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:36:32 pm] | 0 tasks to be cleaned
[01-09-16 6:36:32 pm] | 1 task found
[01-09-16 6:36:32 pm] | Task (11) Multi-Cast Task is already running PID 3793
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000
[01-09-16 6:36:42 pm] | 0 tasks to be cleaned
[01-09-16 6:36:42 pm] | 1 task found
[01-09-16 6:36:42 pm] | Task (11) Multi-Cast Task is already running PID 3793
Dropping client #0 because of timeout
Disconnecting #0 (10.20.129.47)
bytes= re-xmits=0000000 ( 0.0%) slice=0112 - 0
Transfer complete.
cat: write error: Broken pipe
Udp-sender 20120424
UDP sender for (stdin) at 10.133.132.127 on eth0
Broadcasting control to 224.0.0.1
[01-09-16 6:36:52 pm] | 0 tasks to be cleaned
[01-09-16 6:36:52 pm] | 1 task found
[01-09-16 6:36:52 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:37:02 pm] | 0 tasks to be cleaned
[01-09-16 6:37:02 pm] | 1 task found
[01-09-16 6:37:02 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:37:12 pm] | 0 tasks to be cleaned
[01-09-16 6:37:12 pm] | 1 task found
[01-09-16 6:37:12 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:37:22 pm] | 0 tasks to be cleaned
[01-09-16 6:37:22 pm] | 1 task found
[01-09-16 6:37:22 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:37:32 pm] | 0 tasks to be cleaned
[01-09-16 6:37:32 pm] | 1 task found
[01-09-16 6:37:32 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:37:42 pm] | 0 tasks to be cleaned
[01-09-16 6:37:42 pm] | 1 task found
[01-09-16 6:37:42 pm] | Task (11) Multi-Cast Task is already running PID 3793
[01-09-16 6:37:52 pm] | 0 tasks to be cleaned
[01-09-16 6:37:52 pm] | 1 task found
[01-09-16 6:37:52 pm] | Task (11) Multi-Cast Task is already running PID 3793
I’m not sure where to go next in my troubleshooting. Any help?