FOG Multicast Not working
-
I’ve been trying to get Multicast working with FOG but I’m stuck. Unicast works fine, but multicast just stops at preparing. I went through the troubleshooting steps at https://wiki.fogproject.org/wiki/index.php?title=Troubleshooting_a_multicast#Troubleshooting, but when I try testing with 1 client, I don’t get the .fogsettings file listed (or anything at all). I’ve also tried specifying the interface on the server, and the address on the receiver, but still nothing. multicast log file only shows this:
10:04:18.973480 Using mcast address 232.168.0.100 10:04:18.973604 UDP sender for /opt/fog/.fogsettings at 192.168.0.100 on em2 10:04:18.973621 Broadcasting control to 192.168.0.255
Server is on 1.5.0, CentOS7. Right now, there is only a simple switch between the server and the client, and the client does boot fine. Anybody know what I’m doing wrong here?
-
@awellis said in FOG Multicast Not working:
Right now, there is only a simple switch between the server and the client
What kind of switch is this? Just wondering…
-
I’ve tried it with both a Netgear Prosafe JGS516 and an Asante GX5-2400W (factory settings)
-
Getting a bit farther…
I realized the problem with my test previously was the udp-sender command runs on 9000/udp, whereas the initial firewall setup did not allow that port so I changed the port and that worked fine. Going back to the multicast log file, I get this when I try running the multicast restore:
[03-27-18 6:25:11 pm] | Task (7) Multi-Cast Task is new! [03-27-18 6:25:11 pm] | Task (7) /images/3450_lab_img image file found. [03-27-18 6:25:11 pm] | Task (7) Multi-Cast Task 3 clients found. [03-27-18 6:25:11 pm] | Task (7) Multi-Cast Task sending on base port: 59106. [03-27-18 6:25:11 pm] | Command: /usr/local/sbin/udp-sender --interface em2 --min-receivers 3 --max-wait 600 --portbase 59106 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/3450_lab_img/d1p1.img;/usr/local/sbin/udp-sender --interface em2 --min-receivers 3 --max-wait 10 --portbase 59106 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/3450_lab_img/d1p2.img; [03-27-18 6:25:11 pm] | Task (7) Multi-Cast Task has started! [03-27-18 6:25:21 pm] | Task (7) Multi-Cast Task is already running with pid: 12616. [03-27-18 6:25:31 pm] | Task (7) Multi-Cast Task is already running with pid: 12616. [03-27-18 6:25:41 pm] | Task (7) Multi-Cast Task is already running with pid: 12616. [03-27-18 6:25:51 pm] | Task (7) Multi-Cast Task is already running with pid: 12616. [03-27-18 6:26:01 pm] | Task (7) Multi-Cast Task is already running with pid: 12616. [03-27-18 6:26:11 pm] | Task (7) Multi-Cast Task is already running with pid: 12616. [03-27-18 6:26:21 pm] | Task (7) Multi-Cast Task is already running with pid: 12616.
Still going to keep going with the tests on the multicast page but just wanted to update my situation
-
Some More testing.
I ran the 2 host file test and the 2 host partclone test. The file test seemed to work fine, but the partclone manual multicast test did not. Got this on the server:
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000 Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000 Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000 Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000 Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000 Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000 Dropping client #0 because of timeout Disconnecting #0 (192.168.0.187) Dropping client #1 because of timeout Disconnecting #1 (192.168.0.169) bytes= re-xmits=0000000 ( 0.0%) slice=0130 - 0 Transfer complete.
-
@awellis said in FOG Multicast Not working:
whereas the initial firewall setup
Will you explain this statement? What firewall?
-
@george1421 Sure. On the Centos 7 Wiki Installation page (https://wiki.fogproject.org/wiki/index.php?title=CentOS_7#Continue_pre-config) the instructions say “Open UDP port 49152 through 65532” and don’t mention port 9000 at all
-
@awellis can you drop the firewall all together until you can get multicasting working? If you need the firewall per company policy we can reverse engineer what ports are being used.
systemctl stop firewalld
-
@george1421 I don’t think the firewall should be an issue for FOG (just udp-sender/udp-receiver), as I don’t see any traffic hitting any port but 59106 and the ones after that, but I did disable the firewall and am still seeing the same issue.
-
@awellis Are the fog server and target computers on the same subnet.
-
@george1421 Yes they are all on the same subnet
-
@awellis So just for clarity, if we discount all of the debugging you’ve done up to this point. if you shutdown the centos firewall and then setup a mutlicast job with 1 client the client just sits there an never joins the multicast?
I have it on my task list to spin up a new fog server to retest multicasting across subnets again. My OS of choice is centos. So the system I’ll spinup will be centos 7.4. I’ll do everything in a virtual environment to make it easy on myself.
-
It actually does join the session, the session just never starts. I left it running overnight yesterday and all the computers got to the blue partclone screen and just sat there. I don’t recall if I did the same thing with just 1 multicast host but Ill give that a shot tomorrow morning when I get back to the office
-
@awellis said in FOG Multicast Not working:
… Command: /usr/local/sbin/udp-sender --interface em2 --min-receivers 3 --max-wait 600 …
When you see this in the log file, was this scheduled for three clients? As you see the command wants at least/exactly those three clients to connect and would not start otherwise! Beside that, have you checked to see if the process is really running (
ps ax | grep udp
)? Before your next try make really sure there is no old multicast task still scheduled and kill any still running stuff (killall -9 udp-sender
) on your FOG server to make sure nothing is interfering. -
@sebastian-roth said in FOG Multicast Not working:
It was scheduled for 3 clients and all 3 opened up partclone. That said, shouldn’t the timeout kick in after 10 minutes and go off anyway?
I did check for the udp process and it is running after starting it. I’ve also tried running it immediately after a server restart and no udp processes running and no tasks scheduled.
edit: FOG does start up 2 separate udp-senders with the task… is that normal?
edit 2: looks like there’s 1 task for each partition
-
@awellis I have some sad news for you. I setup a clean 1.5.0 fog environment and multicasting works “as described on the tin”.
Is your fog server in a virutal environment?
-
@george1421 No it’s bare metal ATM. I’m going to just do a reinstall and see if that fixes it, since at this point I don’t know what else could be going wrong. Did you do anything special with the install or just basically fog installer on the fresh install?
-
@awellis Nothing special. in this setup I created a dedicated imaging network and then fog server had a second network connection for internet access while installing. I didn’t even configure anything in fog. I copied over an image from my production server and created the image definition, and from right in the image definition I started the multicast. (note I did register a target VM first then rebooted into FOG iPXE and then setup the multicast in the fog ui). It multicast imaged right away.
-
@george1421 Did another full reinstall with a different NIC just in case and still nothing. Any chance the type of image taken matters? Partclone vs partimage? compression?
-
@awellis Just for clarity, you don’t have ANY firewalls or screening routers between the fog server and the target computer?
During my testing of multiple subnet multicasting FOG uses 239. (plus the first 3 octets of the fog server IP address) as the multicast address. Such as the fog server IP is 192.168.1.20 so the multicast address will be 239.192.168.1 and the port will be a random high port.
The fog server will send out messages in as multicast and the target will respond with short unicast messages with checksum data back to the fog server.
For your testing the target system and fog server are on the same subnet all on ethernet switches? You don’t have any fiber or other links where the MTU will be less than 1400?