Multicasting
-
I’m back on the hunt for a solution to multicasting. The last time I really looked into this was last summer and I gave up, but now our imaging projects are coming up again.
First things first: I’m currently testing on a new install of Ubuntu 12.04 and using the latest fog SVN 3264. The problem I’m seeing is clients get to the please wait screen and then never go past that.
I’m working through this wiki: [url]http://fogproject.org/wiki/index.php?title=Troubleshooting_a_multicast[/url]
Testing 1 client is successful
Testing 2 clients is successful
Under something else to try, that fails and I see this on the server:root@fogtesting:~/svn/trunk/bin# gunzip -c “/images/U160SGEJuly2014” | /usr/local/sbin/udp-sender --min-receivers 2 --portbase 9000 --interface eth0 --half-duplex --ttl 32
Udp-sender 20120424
Using mcast address 234.162.70.30
UDP sender for (stdin) at 10.162.70.30 on eth0
Broadcasting control to 224.0.0.1
New connection from 10.162.3.33 (#0) 00000009
Ready. Press any key to start sending data.
New connection from 10.162.3.38 (#1) 00000009
Ready. Press any key to start sending data.
Starting transfer: 00000009
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=8362
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=8362It looks like the clients are communicating with the server but the data just doesn’t send correctly. I’d like to think multicasting is configured correctly on our network because we have other multicast applications working properly.
Thoughts?
-
Here’s where things get tricky. Your clients, I’m assuming here, are already waiting for an image? Even when you’re trying to troubleshoot? If this is the case two things are wrong in your command. First the gunzip happens at the client level and the second is the portbase is likely incorrect. If you’re manually entering the receiver command on the clients, what exactly are you typing?
-
I typed that command on the server, and then typed this into the clients:
udp-receiver --portbase 9000 --mcast-rdv-address 10.162.70.30 | partimage -f3 -b restore /dev/sda stdinI did find that odd though - I know unzipping happens on the client but it’s a bit confusing as to why the gunzip command goes to the server?
If the portbase is likely incorrect, what do you suggest changing it to?
-
Well let’s backup a little. What version of fog are you running? Are these images in part clone or partimage format?
-
Alrighty. I’m on SVN 3264. I just set up a new server today to test with as I don’t want to mess around with our production server. The images are in partclone format. They were uploaded to our main server running fog 1.2.0, then I just copied the files over to the new one.
-
Well the receiver command is then incorrect as you are trying to use partimage which is not the format the images are in. I forget the command to use but I do know it’s like partclone.restote
-
I found a bug in the init’s and the command you need use to “troubleshoot” would be:
[code]udp-receiver --nokbd --portbase $port --ttl 32 --mcast-rdv-address $storageip | partclone.restore --ignore_crc -O $1 -N -f 1[/code]change the $port to 9000 and the $storageip to 10.162.70.30 and $1 to /dev/sda{partition you’re trying to image}
Also update to the latest and greatest as apparently the part that was missing was the --mcast-rdv-address $storageip in the init’s This is why it wasn’t finding your multilcast. It was looking on 255.255.255.255 (234.0.0.1) which was obviously incorrect for what was needed.
-
That fixed it! Thank you very much. I ended up on SVN 3267.
-
I’ve got something else weird going on now. I’m on SVN 3275 and when I try to multicast to a group, I see these errors in apache:
[Mon Apr 20 10:48:27 2015] [error] [client 10.162.3.26] PHP Warning: in_array() expects parameter 2 to be array, null given in /var/www/fog/lib/pages/TaskManagementPage.class.php on line 344, referer: [url]http://10.162.1.212/fog/management/index.php?node=tasks&sub=listgroups[/url]
[Mon Apr 20 10:48:27 2015] [error] [client 10.162.3.26] PHP Warning: Invalid argument supplied for foreach() in /var/www/fog/lib/fog/Host.class.php on line 978, referer: [url]http://10.162.1.212/fog/management/index.php?node=tasks&sub=listgroups[/url]
This is an issue that seems to be related just to my installation of fog and our database, as it works correctly on a fresh install of the SVN. I’m not sure what I should be asking now: Is there a way to check for invalid data, or what would be a good troubleshooting step? At this point I don’t think multicasting itself is the issue (again, because it works on a fresh install)
-
Maybe try to re-create that group? Remove the computers from it, delete the group, then re-create and re-add computers?
-
Hmm, we’re getting closer. I only get the second error now:
[Mon Apr 20 11:11:44 2015] [error] [client 10.162.3.26] PHP Warning: Invalid argument supplied for foreach() in /var/www/fog/lib/fog/Host.class.php on line 978, referer: [url]http://10.162.1.212/fog/management/index.php?node=group&sub=deploy&id=84&type=8[/url]
Edit: Sorry, still getting both errors. I had my log scrolled down too far to see the first ones.
-
I guess I’m not understanding what the issue is. Is it not working, or you’re just worried about the errors?
-
Ahh sorry, I wasn’t quite clear.
It is not working. When I click multicast for a group, I see the tasks for the individual machines show up under active tasks. But nothing appears under Active Multicast Tasks.
-
Then I’d recommend trying to truncate the multicastSessions and multicastSessionsAssoc tables. Also clean up any left over taskings:
[code]truncate table multicastSessions;
truncate table multicastSessionsAssoc;
delete from tasks where taskTypeID=‘8’;[/code] -
Ahh it was the leftover tasking that was holding me back. It’s working now and I’m currently watching 4 machines being multicasted to. Thanks again everyone.
-
[quote=“Tom Elliott, post: 45825, member: 7271”]Then I’d recommend trying to truncate the multicastSessions and multicastSessionsAssoc tables. Also clean up any left over taskings:
[code]truncate table multicastSessions;
truncate table multicastSessionsAssoc;
delete from tasks where taskTypeID=‘8’;[/code][/quote]I remember another thread about making tasks have a TTL sort of feature… Has that been implemented? If the default TTL is maybe 24 hours, then this guy wouldn’t have had the multicast issues the next day… it’d just resolve itself because the old tasks would get deleted.
-
[QUOTE=Wayne Workman]I remember another thread about making tasks have a TTL sort of feature… Has that been implemented? If the default TTL is maybe 24 hours, then this guy wouldn’t have had the multicast issues the next day… it’d just resolve itself because the old tasks would get deleted.[/QUOTE]
I like this idea. Usually when I’m imaging, I start the task and go PXE boot the machine(s) within a few minutes - maybe an hour or 2 at most. Maybe there could be a custom TTL for users who would want a longer timeframe?
-
This post is deleted! -
[quote=“Ben Warfield, post: 45835, member: 17746”]I like this idea. Usually when I’m imaging, I start the task and go PXE boot the machine(s) within a few minutes - maybe an hour or 2 at most. Maybe there could be a custom TTL for users who would want a longer timeframe?[/quote]
why not let the fog server wake them up with wake on lan?
-
Ha. I’d love to. Long explanation there though, totally unrelated to fog. We have teachers who use netop vision to wake up their computer labs in the morning. They complained about the extra time the pxe boot added to startup. I know, all of 15 seconds at most, but that’s a battle I wasn’t willing to fight. So 90% of our labs are set to boot right to the hard disk when they get the magic packet.