FOG Multicast issue - stuck on starting to restore image - version 7547
-
Could the time stamps be like that since I had the log sorted from top down?
Trunking might have been the wrong term. They are simply connected to each other through SFP ports with twinax cable. Like plugging a switch into another switch with ethernet.
The udp sender appears to be in working order. This is what I get after starting a multicast session:
[root@fogserv ~]# ps ax | grep udp 28869 ? S 0:00 sh -c cat /images/6_16_16/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint; 28871 ? S 0:00 /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --mcast-data-address 232.168.1.0 --full-duplex --ttl 32 --nokbd --nopointopoint 28881 pts/1 S+ 0:00 grep udp
Doing the debug test seems to have worked.
One thing I noticed that might be odd is that when I went to the task management page and to active multicast tasks it said 4 hosts. There are only 2 members in the group. Is this indicative of anything wrong?
-
@Wayne-Workman The fog server is on the LAN along with the client computers. The router is the edge device and handles the DHCP for the network.
-
@arainero Do I get this right? You did as I suggested: started a multicast using the web gui and started the multicast receive on two clients by hand. That worked!? So why wouldn’t it work when the other two clients run into the multicast session???
Could you please do this same test again but remove the
--mcast-data-address 232.168.1.0
setting. This should not be necessary I reckon. After that give it another try with a plain multicast session. But first make sure that you don’t have any old udp-sender processes running on the FOG server (ps ax | grep udp
andkillall udp-sender
in case there are orphaned processes around) and clear the (multicast) task lists in the web gui! -
@arainero I would think the reasoning for the maxClients to show up as 4 is because you’ve rebooted/reran the same multicast tasking for the host at least 1 per each host and possibly more times on each (maybe only one host) by hand?
-
@Sebastian-Roth Once I removed the --mcast-data-address 232.168.1.0 from the multicast settings page and attempted the manual debug multicast it didn’t work. I made sure no udp process was running in the background either. For some reason a mcast address is needed now for these tests. (This was never necessary when multicasting used to work)
-
@arainero Why were those different IPs (.109 and .110) than last time (.127 and .128)? Did you use different clients for the test or do your clients change IP address on every reboot because they get it from a dynamic pool?
Have you tried this on just the simple setup with a dump switch yet?
Could you please run the same test again but use
--mcast-data-address 232.168.1.4
for example. Just want to see what exactly this is about. Not making sense to me that it does receive the data on 232.168.1.0 but not on 232.168.1.3. -
The IP’s were different because they were different computers. The computers are all identical and are all connected to the same switch. Nothing is unique about them.
We just tried the test on a dumb switch, the multicast failed at the same spot as usual. This is what the multcast log shows.
[07-18-16 11:20:15 am] | CMD: cat /images/6_16_16/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 180 --full-duplex --ttl 32 --nokbd --nopointopoint; [07-18-16 11:20:15 am] | Task (1) Multi-Cast Task sending on base port: 50362 [07-18-16 11:20:15 am] | Task (1) 2 client(s) found. [07-18-16 11:20:15 am] | Task (1) /images/6_16_16 image file found. [07-18-16 11:20:15 am] | Task (1) Multi-Cast Task has been cleaned.
Using
--mcast-data-address 232.168.1.4
worked when testing manually, like the others. -
@arainero Then can you add your 232.168.1.4 address to the FOG Settings?
FOG Configuration Page->FOG Settings->Multicast Settings->FOG_MULTICAST_ADDRESS.
If you have to specify the port, also under the same area I think (FOG_MULTICAST_PORT_OVERRIDE) just update as well.
Then cancel the tasking.
Run these commands in MySQL (just to ensure all is well):
truncate table `fog`.`multicastSessions`; truncate table `fog`.`multicastSessionsAssoc`; DELETE FROM `fog`.`tasks` where `taskTypeID`='8';
Then recreate your task and test again.
-
After doing the steps tom listed, it wouldn’t hurt to give the FOGMulticastManager a restart as well.
-
@Tom-Elliott Tried this with no success unfortunately. I just updated FOG to trunk to see if that would help, but that didn’t yield any positive results either.
Would getting a pcap help at all with this? I wouldn’t know what to look for, but I could supply one I’m sure.
-
Well, I got some good news. I was able to get a successful multicast going tonight. I am not 100% sure of what fixed it. I did several things.
Not in particular order:
- Updated FOG to trunk
- Disabled IGMP snooping from the switches
- Made a change to CentOS as seen here that has to do with routing (honestly don’t think this helped)
- Added 232.168.1.0 for FOG_MULTICAST_ADDRESS (it wouldn’t work without that)
I plan to do more tests to really make sure it is working tomorrow and the following days. I will keep this updated if anything substantial changes.
Thank you to everyone.