FOG Multicast issue - stuck on starting to restore image - version 7547
-
@Sebastian-Roth We did run the manual test, when we leave the destination address blank, it doesn’t work and says it’s sending to 2xx.xxx.xxx.xxx something (can’t remember).
-
As you can see here multicast has it’s very own “address space”. It’s designed to use those addresses to work properly I suppose. I guess you can use different addresses if you really know what you are doing. But using a subnet broadcast address does not sound right to me.
-
All the mcast-data-address does, to my knowledge, is tell the client what address to try to grab the UDP packets from. It’s still going to use multicast addressing to grab it. So I think @Sebastian-Roth is correct. The mcast-data-address should NOT be a broadcast/network address, but the direct location to the host trying to send the data to begin with. The mcast-rdv-address will allow you to define the actual multicast address to use for the session, and should remain in the 224 pool (though there could be any number of reasons you need to change it to something else.)
Then again, I’m a total idiot when it comes to fully understand UDP and its actions. What I do know mcast-data-address will try to derive that ip address’s multicast address from where it’s pointing. This is NOT where you want to be trying to define the multicast network it’s going across.
To quote the man page in regards to these two separate arguments:
–mcast-data-address address
Uses the given address for multicasting the data. If not specified, the program will automatically derive a multicast address from its own IP (by keeping the last 27 bits of the IP and then prepending 232).–mcast-rdv-address address
Uses a non-standard multicast address for the control (rendez-vous) connection. This address is used by the sender and receivers to “find” each other. This is not the address that is used to transfer the actual data.
By default “mcast-rdv-address” is the Ethernet broadcast address if “ttl” is 1, and 224.0.0.1 otherwise. This setting should not be used except in very special situations, such as when 224.0.0.1 cannot be used for policy reasons. -
Ok then. All that makes sense, I learn something new every day here.
However - manually doing a udp-cast test without specifying an address failed, while specifying the broadcast address for this fog server’s broadcast domain worked.
Ideas?
-
I need details on the exact steps.
For example, if you setup the client to receive data BEFORE the udp-sender was established, the client side will NEVER connect to anything.
-
I’m sure that @arainero would be happy to re-do the test, and provide exact steps.
-
Sorry for the delay, here are the exact steps:
For testing 1 client:
- Boot one computer into debug
- Enter
udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint
into server and press enter. I now see
[root@fogserv ~]# udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint Udp-sender 20120424 Using full duplex mode
- Go to client computer and type this
udp-receiver
. On the client this immediately shows
Udp-receiver 20120424 UDP receiver for (stdout) at 192.168.1.118 on eth0 received message, cap=00000009 Connected as #0 to 192.168.1.3 Listening to multicast on 232.168.1.3 Press any key to start receiving data!
On the server this immediately shows
New connection from 192.168.1.118 (#0) 00000009 Ready. Press any key to start sending data.
- I press enter on the server. Nothing happens.
- I now press enter on the client. I get
Sending go signal
Nothing happens on the server. I now press enter on the server and still nothing happens.
I CTRL-C out on the client and see
[root@fogserv ~]# re-xmits=0000000 ( 0.0%) slice=0112 - 0
appear on the server. I CTRL-C on server and I am back to the prompt.
If I repeat the steps to step 4 and hit enter on the client first instead of the server I get the same results.
If I do the above steps again but use
udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint --mcast-data-address 192.168.1.255
instead of the wiki command and hit enter on the client first I see the fog settings file displayed on the client. If I repeat that and hit enter on the server first I see the settings displayed. This also works with a 2nd client computer and does not matter if I hit enter on the client or server first.
The
--mcast-data-address 192.168.1.255
seems to be the magic argument that makes it “work” for the test scenario.Was this enough to determine anything or would you want me to do anything else to help narrow it down? If anyone wants to take a look through teamviewer or what-have-you I am more than open.
Thanks again everyone and enjoy the 4th!
-
Should I update to trunk and do the same tests or hold off on that for now?
-
@arainero What about using the options you see in the logs?
cat /path/to/file | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 600 --portbase 65102 --full-duplex --ttl 32 --nokbd --nopointopoint
-
If I use this modifed version of that line from the log it works.
cat /opt/fog/.fogsettings | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 600 --full-duplex --nokbd --nopointopoint --mcast-data-address 192.168.1.255
I had to remove portbase and the ttl flags and add the --mcast-data-address 192.168.1.255 flag. After doing that I was able to receive the fog settings files.
I tried the command with just removing the portbase and ttl flags and not adding --mcast-data-address 192.168.1.255, but that didnt work.
-
@arainero Why did you have to remove
--ttl
and--portbase
?? I still think something is wrong with the network setup. From the udp-cast man page I get that it would default to use 232.168.1.x (where x would be the last byte of the server IP address). Possibly your switch is layer 3 switch and does not allow those “out of range” IPs? -
@Sebastian-Roth I removed those to test to see if different scenarios would work. I tested with --mcast-data-address 232.168.1.255, 232.168.1.1, 232.168.1.0, etc, and those worked too.
I actually added --ttl 32 and that worked. It’s just when I add --portbase 65102 it stops working. I tried a few different port variations with no luck.
The switches in the letwork are layer 2. Multicast used to work fine and there have been no changes to the network. I will double check the switch configurations again though.
-
@arainero are you sure fog firewall is off? Some places configure special ports for use with multicast (like mine).
-
@arainero said in FOG Multicast issue - stuck on starting to restore image - version 7547:
It’s just when I add --portbase 65102 it stops working. I tried a few different port variations with no luck.
Working off of this finding - I may have a fix for you.
Open up this file on your master node:
/var/www/html/fog/lib/service/multicasttask.class.php
Comment out line 91, do that by putting two forward slashes at the beginning of the line, like this:
Then restart the
FOGMulticastManager
like this:
service FOGMulticastManager restart
And then try multicast again. With any luck, it’ll work.
Even if it DOES work - we need to figure out what it is about your network that’s preventing you from defining a port, I hope you understand.
-
@arainero said:
Multicast used to work fine and there have been no changes to the network.
Can’t remember how many times I’ve heard this. But those kind of things usually don’t just stop working from one day to the other. Maybe it wasn’t you who changed anything. I really hope we can find out what’s wrong here and find a solution to it. Keeping my fingers crossed. Are you absolutely sure there are only layer 2 switches between the FOG server and the clients?
-
iptables and selinux are off. Firewalld is not installed either. Is there a firewall somewhere else that I have missed?
-
@Wayne-Workman I tried this with no luck, but I expected it to work as you did. I had some interesting observations.
The new command I found in the multicast log was
cat /images/6_16_16/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint;
As expected there were no ports involved, but the multicast still didn’t work. However, if I took the first part of it and replaced it with the fogsettings file like in the wiki test it works. (This test was the test from the wiki and not a GUI multicast)
cat /opt/fog/.fogsettings | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint
The only difference between the two commands is the file and the fact that I am running the second one manually through the terminal and receiving it manually with udp-receiver.
Is there a different process when doing this test manually compared to when it is done through the GUI?
Here is the multicast log
[07-09-16 1:08:48 pm] | Task (1) Multi-Cast Task has started. [07-09-16 1:08:48 pm] | CMD: cat /images/6_16_16/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint; [07-09-16 1:08:48 pm] | Task (1) Multi-Cast Task sending on base port: 53324 [07-09-16 1:08:48 pm] | Task (1) 1 client(s) found. Broadcasting control to 224.0.0.1 UDP sender for (stdin) at 192.168.1.3 on eth0 [07-09-16 1:08:48 pm] | Task (1) /images/6_16_16 image file found. Udp-sender 20120424 [07-09-16 1:08:48 pm] | Task (1) Multi-Cast Task has been cleaned. [07-09-16 1:08:48 pm] | Task (1) Multi-Cast Task is new! [07-09-16 1:08:48 pm] | 1 task found [07-09-16 1:08:48 pm] | 0 tasks to be cleaned [07-09-16 1:08:38 pm] | Sleeping for 10 seconds to ensure tasks are properly submitted [07-09-16 1:08:28 pm] * No tasks found! [07-09-16 1:08:18 pm] * No tasks found! [07-09-16 1:08:18 pm] * Starting service loop [07-09-16 1:08:18 pm] * Checking for new items every 10 seconds [07-09-16 1:08:18 pm] * Starting MulticastManager Service
-
@Sebastian-Roth I know, I turned into one of those people. But as far as I can tell nothing on the network has changed. It is a very basic setup. Internet comes in, hits the pfsense router which goes to a layer 2 meraki switch (which is trunked to another identical one) and out to the computers.
The only other device on the network is a wireless AP and a few gaming consoles. Other than that every computer has a direct ethernet cord to it. All of these devices were here before when multicast was working.
It truly sounds network related somehow, but I have no idea what could be causing it at this point.
On Monday I plan to hook the router (for the pxe boot routing), 2 pc’s for multicasting, and an admin computer (for the web GUI access) to a dumb switch and see what happens when I cut everything else out. That should rule out any network issues, right?
-
@arainero wait a second there, the multicast is going through the router? That could be the problem, it’s probably blocking the needed port and/or UDP traffic.
-
@Wayne-Workman The multicast doesn’t go through the router, it is just the edge device and I was just giving a brief rundown of the network and how basic it was.
As far as I know the router can’t block interlan traffic. Also, no changes have been made to the router either (I know how much you guys love hearing that).