FOG Multicast issue - stuck on starting to restore image - version 7547
-
Hey everyone and @Developers ,
Just me again with another fun multicast problem! Ha ha. It seems that multicast problems always creep up after several months of no issues. I was running trunk version 5624 for the longest time (with a special init from @Sebastian-Roth) and I just upgraded to the latest trunk to see if that would fix the problem, but it didn’t.
As the the title says, when I attempt to do a multicast I get to this screen:
The multicast log shows this when I attempted to multicast to one computer:
[05-09-16 2:55:37 pm] | Sleeping for 10 seconds to ensure tasks are properly submitted [05-09-16 2:55:47 pm] | 0 tasks to be cleaned [05-09-16 2:55:47 pm] | 1 task found [05-09-16 2:55:47 pm] | Task (1) Multi-Cast Task is new! [05-09-16 2:55:47 pm] | Task (1) Multi-Cast Task has been cleaned. Udp-sender 20120424 [05-09-16 2:55:47 pm] | Task (1) /images/5516 image file found. Using mcast address 232.168.1.3 UDP sender for (stdin) at 192.168.1.3 on eth0 Broadcasting control to 224.0.0.1 [05-09-16 2:55:47 pm] | Task (1) 1 client(s) found. [05-09-16 2:55:47 pm] | Task (1) Multi-Cast Task sending on base port: 65102 [05-09-16 2:55:47 pm] | CMD: cat /images/5516/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 600 --portbase 65102 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/5516/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 600 --portbase 65102 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/5516/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 1 --max-wait 600 --portbase 65102 --full-duplex --ttl 32 --nokbd --nopointopoint; [05-09-16 2:55:47 pm] | Task (1) Multi-Cast Task has started. [05-09-16 2:55:57 pm] | 0 tasks to be cleaned [05-09-16 2:55:57 pm] | 1 task found [05-09-16 2:55:57 pm] | Task (1) Multi-Cast Task is already running PID 9405 [05-09-16 2:56:07 pm] | 0 tasks to be cleaned [05-09-16 2:56:07 pm] | 1 task found [05-09-16 2:56:07 pm] | Task (1) Multi-Cast Task is already running PID 9405 [05-09-16 2:56:17 pm] | 0 tasks to be cleaned [05-09-16 2:56:17 pm] | 1 task found [05-09-16 2:56:17 pm] | Task (1) Multi-Cast Task is already running PID 9405 [05-09-16 2:56:27 pm] | 0 tasks to be cleaned [05-09-16 2:56:27 pm] | 1 task found [05-09-16 2:56:27 pm] | Task (1) Multi-Cast Task is already running PID 9405 New connection from 192.168.1.106 (#0) 00000009 Starting transfer: 00000009 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:56:37 pm] | 0 tasks to be cleaned [05-09-16 2:56:37 pm] | 1 task found [05-09-16 2:56:37 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:56:47 pm] | 0 tasks to be cleaned [05-09-16 2:56:47 pm] | 1 task found [05-09-16 2:56:47 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:56:57 pm] | 0 tasks to be cleaned [05-09-16 2:56:57 pm] | 1 task found [05-09-16 2:56:57 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:57:08 pm] | 0 tasks to be cleaned [05-09-16 2:57:08 pm] | 1 task found [05-09-16 2:57:08 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:57:18 pm] | 0 tasks to be cleaned [05-09-16 2:57:18 pm] | 1 task found [05-09-16 2:57:18 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:57:28 pm] | 0 tasks to be cleaned [05-09-16 2:57:28 pm] | 1 task found [05-09-16 2:57:28 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:57:38 pm] | 0 tasks to be cleaned [05-09-16 2:57:38 pm] | 1 task found [05-09-16 2:57:38 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:57:48 pm] | 0 tasks to be cleaned [05-09-16 2:57:48 pm] | 1 task found [05-09-16 2:57:48 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:57:58 pm] | 0 tasks to be cleaned [05-09-16 2:57:58 pm] | 1 task found [05-09-16 2:57:58 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:58:08 pm] | 0 tasks to be cleaned [05-09-16 2:58:08 pm] | 1 task found [05-09-16 2:58:08 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:58:18 pm] | 0 tasks to be cleaned [05-09-16 2:58:18 pm] | 1 task found [05-09-16 2:58:18 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:58:28 pm] | 0 tasks to be cleaned [05-09-16 2:58:28 pm] | 1 task found [05-09-16 2:58:28 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:58:38 pm] | 0 tasks to be cleaned [05-09-16 2:58:38 pm] | 1 task found [05-09-16 2:58:38 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:58:49 pm] | 0 tasks to be cleaned [05-09-16 2:58:49 pm] | 1 task found [05-09-16 2:58:49 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:58:59 pm] | 0 tasks to be cleaned [05-09-16 2:58:59 pm] | 1 task found [05-09-16 2:58:59 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:59:09 pm] | 0 tasks to be cleaned [05-09-16 2:59:09 pm] | 1 task found [05-09-16 2:59:09 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:59:19 pm] | 0 tasks to be cleaned [05-09-16 2:59:19 pm] | 1 task found [05-09-16 2:59:19 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:59:29 pm] | 0 tasks to be cleaned [05-09-16 2:59:29 pm] | 1 task found [05-09-16 2:59:29 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [05-09-16 2:59:39 pm] | 0 tasks to be cleaned [05-09-16 2:59:39 pm] | 1 task found [05-09-16 2:59:39 pm] | Task (1) Multi-Cast Task is already running PID 9405 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 Dropping client #0 because of timeout Disconnecting #0 (192.168.1.106) bytes= re-xmits=0000000 ( 0.0%) slice=0112 - 0 Transfer complete. cat: write error: Broken pipe Udp-sender 20120424 Using mcast address 232.168.1.3 UDP sender for (stdin) at 192.168.1.3 on eth0 Broadcasting control to 224.0.0.1 [05-09-16 2:59:49 pm] | 0 tasks to be cleaned [05-09-16 2:59:49 pm] | 1 task found [05-09-16 2:59:49 pm] | Task (1) Multi-Cast Task is already running PID 9405 [05-09-16 2:59:59 pm] | 0 tasks to be cleaned
@Sebastian-Roth do you believe this is related to this issue that you helped me with before? https://forums.fogproject.org/topic/6295/capture-task-not-completing-after-finishing-and-then-loops/35
All of the UDP tests I have done have worked fine. Is there anything else I can supply that would help?
Thank you.
-
What OS and version is fog running on?
Have you looked through these?
https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_Downloading_-_Multicast
https://wiki.fogproject.org/wiki/index.php?title=Multicast -
@arainero Got me! I do remember this but I have to admit that I have no idea when exactly that patch wandered into the main source. Let me see… Sorry, cannot find it anymore. But I guess it should be in there. As far as I can tell the multicast hang you are seeing is not related to the disk/partition enumeration issue you found five months ago. Please follow the troubleshooting section in our wiki article on multicast. The log is saying that it does start udp-sender. Please make sure those processes are being started (
ps ax | grep udp
). As well you should clean all old multicast sessions from the web gui tasks listing and kill all old udp-sender processes (killall -9 udp-sender; service FOGMulticastManager restart; killall -9 udp-sender
). Then try again. -
I am running CentOS 6.6. I have tried to do the those steps you linked with no luck.
-
Another problem we are noticing that is that several computers are not booting up properly now. We are getting this on about 75% of all computers:
-
@arainero For the later error please check the BIOS/UEFI exit style settings for those hosts! Try different ones till you find one that works for that particular machine. Some work with normal iPXE exit, others are fine with SAN booting and so on. Just try those and see which one is working for you.
Have you gone through running the multicast tools by hand session yet? Please give us more information on what errors you see when you try those commands. We won’t be able to help you with information like “tried the steps … with no luck”. Try harder I’d say then!
-
I will be out of town for the week and plan to submit the requested details when I return. Thank you to everyone who commented so far.
-
Currently running latest trunk of 8020.
I ran the first test from Multicasting Troubleshooting and it appears that I can’t “send” anything from the server. The client computer is able to hit any key to wait to receive a file, however the server doesn’t appear to be able to do the same. Whenever I hit any key on the server to send the file nothing happens. It feels like the server doesn’t receive the command to send the file.
Just to verify, the process I did was:
-
run “sudo udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint” on the server.
-
run udp-receiver on client machine.
-
Hit any button on client machine.
-
After receiving the “go signal” on the client I hit any button on server.
Here is a picture of the server
The first test unsuccessful. Here is a picture of the test computer and the multicast log.
[06-07-16 3:25:08 pm] | 1 task found [06-07-16 3:25:08 pm] | 0 tasks to be cleaned [06-07-16 3:24:58 pm] | 1 task found [06-07-16 3:24:58 pm] | 0 tasks to be cleaned [06-07-16 3:24:48 pm] | 1 task found [06-07-16 3:24:48 pm] | 0 tasks to be cleaned [06-07-16 3:24:38 pm] | 1 task found [06-07-16 3:24:38 pm] | 0 tasks to be cleaned [06-07-16 3:24:28 pm] | 1 task found [06-07-16 3:24:28 pm] | 0 tasks to be cleaned [06-07-16 3:24:18 pm] | 1 task found [06-07-16 3:24:18 pm] | 0 tasks to be cleaned [06-07-16 3:24:08 pm] | 1 task found [06-07-16 3:24:08 pm] | 0 tasks to be cleaned [06-07-16 3:23:58 pm] | 1 task found [06-07-16 3:23:58 pm] | 0 tasks to be cleaned [06-07-16 3:23:48 pm] | 1 task found [06-07-16 3:23:48 pm] | 0 tasks to be cleaned [06-07-16 3:23:38 pm] | 1 task found [06-07-16 3:23:38 pm] | 0 tasks to be cleaned [06-07-16 3:23:28 pm] | 1 task found [06-07-16 3:23:28 pm] | 0 tasks to be cleaned [06-07-16 3:23:18 pm] | 1 task found [06-07-16 3:23:18 pm] | 0 tasks to be cleaned 15:23:12.640334 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=44713 [06-07-16 3:23:08 pm] | 1 task found [06-07-16 3:23:08 pm] | 0 tasks to be cleaned 15:23:06.344629 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=44713 15:23:00.048915 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=44713 [06-07-16 3:22:58 pm] | 1 task found [06-07-16 3:22:58 pm] | 0 tasks to be cleaned 15:22:53.753191 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=44713 [06-07-16 3:22:48 pm] | 1 task found [06-07-16 3:22:48 pm] | 0 tasks to be cleaned 15:22:47.457451 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=44713 15:22:41.161709 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=44713 [06-07-16 3:22:38 pm] | 1 task found [06-07-16 3:22:38 pm] | 0 tasks to be cleaned 15:22:33.816736 Bad command 0200 15:22:29.425225 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:22:28 pm] | 1 task found [06-07-16 3:22:28 pm] | 0 tasks to be cleaned 15:22:27.358714 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 15:22:27.280931 Starting transfer: 00000009 [06-07-16 3:22:18 pm] | 1 task found [06-07-16 3:22:18 pm] | 0 tasks to be cleaned [06-07-16 3:22:08 pm] | 1 task found [06-07-16 3:22:08 pm] | 0 tasks to be cleaned 15:22:00.474605 Broadcasting control to 192.168.1.255 15:22:00.474593 UDP sender for /opt/fog/.fogsettings at 192.168.1.3 on eth0 15:22:00.474516 Using mcast address 232.168.1.3 [06-07-16 3:21:58 pm] | 1 task found [06-07-16 3:21:58 pm] | 0 tasks to be cleaned [06-07-16 3:21:48 pm] | 1 task found [06-07-16 3:21:48 pm] | 0 tasks to be cleaned [06-07-16 3:21:38 pm] | 1 task found [06-07-16 3:21:38 pm] | 0 tasks to be cleaned [06-07-16 3:21:28 pm] | 1 task found [06-07-16 3:21:28 pm] | 0 tasks to be cleaned [06-07-16 3:21:18 pm] | 1 task found [06-07-16 3:21:18 pm] | 0 tasks to be cleaned [06-07-16 3:21:08 pm] | 1 task found [06-07-16 3:21:08 pm] | 0 tasks to be cleaned 15:20:58.775075 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:20:58 pm] | 1 task found [06-07-16 3:20:58 pm] | 0 tasks to be cleaned 15:20:52.708473 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:20:48 pm] | 1 task found [06-07-16 3:20:48 pm] | 0 tasks to be cleaned 15:20:46.641886 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 15:20:40.575285 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:20:38 pm] | 1 task found [06-07-16 3:20:38 pm] | 0 tasks to be cleaned 15:20:34.508688 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 15:20:28.442095 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:20:28 pm] | 1 task found [06-07-16 3:20:28 pm] | 0 tasks to be cleaned 15:20:22.375508 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:20:18 pm] | 1 task found [06-07-16 3:20:18 pm] | 0 tasks to be cleaned 15:20:16.308919 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 15:20:10.242303 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:20:08 pm] | 1 task found [06-07-16 3:20:08 pm] | 0 tasks to be cleaned 15:20:04.175729 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 15:19:58.109125 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-07-16 3:19:58 pm] | 1 task found [06-07-16 3:19:58 pm] | 0 tasks to be cleaned 15:19:56.042597 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 15:19:55.964819 Starting transfer: 00000009 [06-07-16 3:19:48 pm] | 1 task found [06-07-16 3:19:48 pm] | 0 tasks to be cleaned [06-07-16 3:19:38 pm] | 1 task found [06-07-16 3:19:38 pm] | 0 tasks to be cleaned [06-07-16 3:19:28 pm] | 1 task found [06-07-16 3:19:28 pm] | 0 tasks to be cleaned [06-07-16 3:19:18 pm] | 1 task found [06-07-16 3:19:18 pm] | 0 tasks to be cleaned [06-07-16 3:19:08 pm] | 1 task found [06-07-16 3:19:08 pm] | 0 tasks to be cleaned [06-07-16 3:18:58 pm] | 1 task found [06-07-16 3:18:58 pm] | 0 tasks to be cleaned [06-07-16 3:18:48 pm] | 1 task found [06-07-16 3:18:48 pm] | 0 tasks to be cleaned [06-07-16 3:18:38 pm] | 1 task found [06-07-16 3:18:38 pm] | 0 tasks to be cleaned [06-07-16 3:18:28 pm] | 1 task found [06-07-16 3:18:28 pm] | 0 tasks to be cleaned [06-07-16 3:18:18 pm] | 1 task found [06-07-16 3:18:18 pm] | 0 tasks to be cleaned [06-07-16 3:18:08 pm] | 1 task found [06-07-16 3:18:08 pm] | 0 tasks to be cleaned [06-07-16 3:17:58 pm] | 1 task found [06-07-16 3:17:58 pm] | 0 tasks to be cleaned [06-07-16 3:17:48 pm] | 1 task found [06-07-16 3:17:48 pm] | 0 tasks to be cleaned [06-07-16 3:17:38 pm] | 1 task found [06-07-16 3:17:38 pm] | 0 tasks to be cleaned [06-07-16 3:17:28 pm] | 1 task found [06-07-16 3:17:28 pm] | 0 tasks to be cleaned [06-07-16 3:17:18 pm] | 1 task found [06-07-16 3:17:18 pm] | 0 tasks to be cleaned [06-07-16 3:17:08 pm] | 1 task found [06-07-16 3:17:08 pm] | 0 tasks to be cleaned [06-07-16 3:16:58 pm] | 1 task found [06-07-16 3:16:58 pm] | 0 tasks to be cleaned [06-07-16 3:16:48 pm] | 1 task found [06-07-16 3:16:48 pm] | 0 tasks to be cleaned [06-07-16 3:16:37 pm] | 1 task found [06-07-16 3:16:37 pm] | 0 tasks to be cleaned 15:16:37.930643 Broadcasting control to 192.168.1.255 15:16:37.930521 UDP sender for /opt/fog/.fogsettings at 192.168.1.3 on eth0 15:16:37.930421 Using mcast address 232.168.1.3 [06-07-16 3:16:27 pm] | 1 task found [06-07-16 3:16:27 pm] | 0 tasks to be cleaned [06-07-16 3:16:17 pm] | 1 task found [06-07-16 3:16:17 pm] | 0 tasks to be cleaned [06-07-16 3:16:07 pm] | 1 task found [06-07-16 3:16:07 pm] | 0 tasks to be cleaned [06-07-16 3:15:57 pm] | 1 task found [06-07-16 3:15:57 pm] | 0 tasks to be cleaned [06-07-16 3:15:47 pm] | 1 task found [06-07-16 3:15:47 pm] | 0 tasks to be cleaned [06-07-16 3:15:37 pm] | 1 task found [06-07-16 3:15:37 pm] | 0 tasks to be cleaned [06-07-16 3:15:27 pm] | 1 task found [06-07-16 3:15:27 pm] | 0 tasks to be cleaned [06-07-16 3:15:17 pm] | 1 task found [06-07-16 3:15:17 pm] | 0 tasks to be cleaned [06-07-16 3:15:07 pm] | 1 task found [06-07-16 3:15:07 pm] | 0 tasks to be cleaned [06-07-16 3:14:57 pm] | 1 task found [06-07-16 3:14:57 pm] | 0 tasks to be cleaned 15:14:51.929596 Broadcasting control to 192.168.1.255 15:14:51.929580 UDP sender for /opt/fog/.fogsettings at 192.168.1.3 on eth0 15:14:51.929493 Using mcast address 232.168.1.3 [06-07-16 3:14:47 pm] | 1 task found [06-07-16 3:14:47 pm] | 0 tasks to be cleaned [06-07-16 3:14:37 pm] | 1 task found [06-07-16 3:14:37 pm] | 0 tasks to be cleaned [06-07-16 3:14:27 pm] | 1 task found [06-07-16 3:14:27 pm] | 0 tasks to be cleaned [06-07-16 3:14:17 pm] | 1 task found [06-07-16 3:14:17 pm] | 0 tasks to be cleaned [06-07-16 3:14:07 pm] | 1 task found [06-07-16 3:14:07 pm] | 0 tasks to be cleaned [06-07-16 3:13:57 pm] | 1 task found [06-07-16 3:13:57 pm] | 0 tasks to be cleaned [06-07-16 3:13:47 pm] | 1 task found [06-07-16 3:13:47 pm] | 0 tasks to be cleaned [06-07-16 3:13:37 pm] | 1 task found [06-07-16 3:13:37 pm] | 0 tasks to be cleaned [06-07-16 3:13:27 pm] | 1 task found [06-07-16 3:13:27 pm] | 0 tasks to be cleaned [06-07-16 3:13:17 pm] | 1 task found [06-07-16 3:13:17 pm] | 0 tasks to be cleaned [06-07-16 3:13:07 pm] | 1 task found [06-07-16 3:13:07 pm] | 0 tasks to be cleaned [06-07-16 3:12:57 pm] | 1 task found [06-07-16 3:12:57 pm] | 0 tasks to be cleaned [06-07-16 3:12:47 pm] | 1 task found [06-07-16 3:12:47 pm] | 0 tasks to be cleaned [06-07-16 3:12:37 pm] | 1 task found [06-07-16 3:12:37 pm] | 0 tasks to be cleaned [06-07-16 3:12:27 pm] | 1 task found [06-07-16 3:12:27 pm] | 0 tasks to be cleaned [06-07-16 3:12:17 pm] | 1 task found [06-07-16 3:12:17 pm] | 0 tasks to be cleaned [06-07-16 3:12:07 pm] | 1 task found [06-07-16 3:12:07 pm] | 0 tasks to be cleaned [06-07-16 3:11:57 pm] | 1 task found [06-07-16 3:11:57 pm] | 0 tasks to be cleaned [06-07-16 3:11:47 pm] | 1 task found [06-07-16 3:11:47 pm] | 0 tasks to be cleaned [06-07-16 3:11:37 pm] | 1 task found [06-07-16 3:11:37 pm] | 0 tasks to be cleaned [06-07-16 3:11:27 pm] | 1 task found [06-07-16 3:11:27 pm] | 0 tasks to be cleaned [06-07-16 3:11:17 pm] | 1 task found [06-07-16 3:11:17 pm] | 0 tasks to be cleaned
Here is a youtube video of the process of making the group to send out the multicast - Group and Multicast
Here is a youtube video of the process of the computer booting up from the multicast - Multicast fail
Any thoughts on why the server can’t send anything? Nothing has changed on the network, server, etc.
-
-
@arainero I just ran some tests and learned a lot.
I tweaked the commands to specify a port, and after some failure and wireshark-ing, I’ve created new firewalld commands to allow specifically that port and ICMP. I’ve verified they are working here at home, and have updated the wiki.
On the server to configure firewalld:
echo "Open UDP port 9000, this is the default port of udp-sender." firewall-cmd --permanent --add-port=9000-9001/udp echo "Allow IGMP traffic for multicast" firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p igmp -j ACCEPT systemctl restart firewalld.service echo "Done."
Then on the server to start the test using the default port of 9000:
udp-sender --file /opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint --portbase 9000
On the client, it’s the same still:
udp-receiver
Ubuntu doesn’t have firewalld yet, you’ll need to figure out comparable iptables commands. It’s quite simply just allowing UDP port 9000 and 9001, and allowing IGMP traffic.
Alternatively, simplify it for yourself and allow everything and see if you succeed then.
-
I’ve also updated https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_Downloading_-_Multicast to have a test for multicast.
-
I tried running
/opt/fog/.fogsettings --log /opt/fog/log/multicast.log --ttl 1 --nopointopoint --portbase 9000
but the same thing happened. I can see the client connecting, but when the client presses any key nothing happens. The only thing that changed is that when the client presses ctrl-c I seere-xmits=0000000 ( 0.0%) slice=0112 - 0
.
display on the server. That didn’t happen with the previous command.Also, I do not have iptables, selinux, or firewalld running so it isn’t a rule issue I believe.
Here is what was in the multicast log.
11:33:51.058257 Disconnecting #0 (192.168.1.127) 11:33:51.058848 Transfer complete. 11:33:51.058925 [06-08-16 11:33:54 am] | 0 tasks to be cleaned [06-08-16 11:33:54 am] | 1 task found [06-08-16 11:34:04 am] | 0 tasks to be cleaned [06-08-16 11:34:04 am] | 1 task found [06-08-16 11:34:14 am] | 0 tasks to be cleaned [06-08-16 11:34:14 am] | 1 task found [06-08-16 11:34:24 am] | 0 tasks to be cleaned [06-08-16 11:34:24 am] | 1 task found [06-08-16 11:34:34 am] | 0 tasks to be cleaned [06-08-16 11:34:34 am] | 1 task found [06-08-16 11:34:44 am] | 0 tasks to be cleaned [06-08-16 11:34:44 am] | 1 task found [06-08-16 11:34:54 am] | 0 tasks to be cleaned [06-08-16 11:34:54 am] | 1 task found [06-08-16 11:35:04 am] | 0 tasks to be cleaned [06-08-16 11:35:04 am] | 1 task found [06-08-16 11:35:14 am] | 0 tasks to be cleaned [06-08-16 11:35:14 am] | 1 task found [06-08-16 11:35:24 am] | 0 tasks to be cleaned [06-08-16 11:35:24 am] | 1 task found 11:35:27.212477 Using mcast address 232.168.1.3 11:35:27.212572 UDP sender for /opt/fog/.fogsettings at 192.168.1.3 on eth0 11:35:27.212588 Broadcasting control to 192.168.1.255 [06-08-16 11:35:34 am] | 0 tasks to be cleaned [06-08-16 11:35:34 am] | 1 task found [06-08-16 11:35:44 am] | 0 tasks to be cleaned [06-08-16 11:35:44 am] | 1 task found [06-08-16 11:35:54 am] | 0 tasks to be cleaned [06-08-16 11:35:54 am] | 1 task found [06-08-16 11:36:04 am] | 0 tasks to be cleaned [06-08-16 11:36:04 am] | 1 task found 11:36:11.950410 Starting transfer: 00000009 11:36:12.028143 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 11:36:14.094665 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-08-16 11:36:14 am] | 0 tasks to be cleaned [06-08-16 11:36:14 am] | 1 task found 11:36:20.161200 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 [06-08-16 11:36:24 am] | 0 tasks to be cleaned [06-08-16 11:36:24 am] | 1 task found 11:36:26.227781 Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=10000 11:36:26.894416 Disconnecting #0 (192.168.1.127) 11:36:26.894806 Transfer complete. 11:36:26.894865 [06-08-16 11:36:34 am] | 0 tasks to be cleaned [06-08-16 11:36:34 am] | 1 task found [06-08-16 11:36:44 am] | 0 tasks to be cleaned```
-
@arainero the logs say there was a timeout. I suspect a network problem.
Is it possible for you to connect the fog server and a Linux client via a mini switch (live boot) and try? If fog isn’t doing dhcp then you will need to configure a static ip on the live boot before testing. This test will eliminate your network as a variable to test just the server. If it succeeds then the issue is a network one. If it fails then it Is a server one.
-
@Wayne-Workman
When you say live boot what do you mean exactly? I currently have my pfsense router handling DHCP and network-boot requests.What is the best way to have FOG work when removing the router from this situation? I know I can just assign each server their own IP and do a direct connection, but I don’t think the network boot would work then, right?
-
@arainero Maybe it’s not the best idea given your setup.
I’m willing to take a look via TeamViewer around 4:30 Central Standard if you are free/around/being paid.
-
I would love to but I won’t be available until around 6:30 pm central. Would tomorrow work better for you?
-
@arainero 6:30 works for me. Shoot me a PM then.
-
After running some tests with @Wayne-Workman some progress was made!
The udp-sender test from the wiki wasn’t working until we forced the command to utilize the broadcast address on the FOG server NIC. After doing that the udp test worked.
After doing that we changed the broadcast address under the multicast settings in FOG Settings. We no longer got the “blue screen” failure, but a different type of failure and more activity in the logs.
Here is what the multicast log shows:
[06-22-16 8:37:55 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:37:55 pm] | 1 task found [06-22-16 8:37:55 pm] | 0 tasks to be cleaned [06-22-16 8:37:45 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:37:45 pm] | 1 task found [06-22-16 8:37:45 pm] | 0 tasks to be cleaned [06-22-16 8:37:35 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:37:35 pm] | 1 task found [06-22-16 8:37:35 pm] | 0 tasks to be cleaned Broadcasting control to 224.0.0.1 UDP sender for (stdin) at 192.168.1.3 on eth0 Udp-sender 20120424 cat: write error: Broken pipe Could not broadcast data packet to 192.168.1.255:64478 (Permission denied) Starting transfer: 00000009 New connection from 192.168.1.122 (#5) 00000009 New connection from 192.168.1.123 (#4) 00000009 New connection from 192.168.1.124 (#3) 00000009 [06-22-16 8:37:25 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:37:25 pm] | 1 task found [06-22-16 8:37:25 pm] | 0 tasks to be cleaned New connection from 192.168.1.120 (#2) 00000009 New connection from 192.168.1.121 (#1) 00000009 New connection from 192.168.1.119 (#0) 00000009 [06-22-16 8:37:15 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:37:15 pm] | 1 task found [06-22-16 8:37:15 pm] | 0 tasks to be cleaned [06-22-16 8:37:05 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:37:05 pm] | 1 task found [06-22-16 8:37:05 pm] | 0 tasks to be cleaned [06-22-16 8:36:55 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:36:55 pm] | 1 task found [06-22-16 8:36:55 pm] | 0 tasks to be cleaned [06-22-16 8:36:45 pm] | Task (3) Multi-Cast Task is already running PID 8514 [06-22-16 8:36:45 pm] | 1 task found [06-22-16 8:36:45 pm] | 0 tasks to be cleaned [06-22-16 8:36:35 pm] | Task (3) Multi-Cast Task has started. [06-22-16 8:36:35 pm] | CMD: cat /images/6_16_16/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 6 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 64478 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 6 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 64478 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 6 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 64478 --full-duplex --ttl 32 --nokbd --nopointopoint; [06-22-16 8:36:35 pm] | Task (3) Multi-Cast Task sending on base port: 64478 [06-22-16 8:36:35 pm] | Task (3) 6 client(s) found. [06-22-16 8:36:35 pm] | Task (3) /images/6_16_16 image file found. [06-22-16 8:36:35 pm] | Task (3) Multi-Cast Task has been cleaned. Broadcasting control to 224.0.0.1 UDP sender for (stdin) at 192.168.1.3 on eth0 Udp-sender 20120424 [06-22-16 8:36:35 pm] | Task (3) Multi-Cast Task is new! [06-22-16 8:36:35 pm] | 1 task found [06-22-16 8:36:35 pm] | 0 tasks to be cleaned [06-22-16 8:36:25 pm] | Sleeping for 10 seconds to ensure tasks are properly submitted [06-22-16 8:36:15 pm] * No tasks found! [06-22-16 8:36:05 pm] * No tasks found! [06-22-16 8:35:55 pm] * No tasks found! [06-22-16 8:35:45 pm] * No tasks found! [06-22-16 8:35:35 pm] * No tasks found! [06-22-16 8:35:25 pm] * No tasks found! [06-22-16 8:35:15 pm] * No tasks found! [06-22-16 8:35:05 pm] * No tasks found! [06-22-16 8:34:55 pm] * No tasks found! [06-22-16 8:34:45 pm] | Task (2) Multi-Cast Task has been cleaned as cancelled. [06-22-16 8:34:45 pm] | Task (2) Multi-Cast Task is no longer running. [06-22-16 8:34:45 pm] | 1 task found [06-22-16 8:34:45 pm] | 0 tasks to be cleaned [06-22-16 8:34:35 pm] | Task (2) Multi-Cast Task has started. [06-22-16 8:34:35 pm] | CMD: cat /images/6_16_16/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 6 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 9000 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 6 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 9000 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 6 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 9000 --full-duplex --ttl 32 --nokbd --nopointopoint; cat: write error: Broken pipe bind socket to 192.168.1.3:9001 (Address already in use) Udp-sender 20120424 cat: write error: Broken pipe bind socket to 192.168.1.3:9001 (Address already in use) Udp-sender 20120424 [06-22-16 8:34:35 pm] | Task (2) Multi-Cast Task sending on base port: 9000 : Broken pipe cat: write error[06-22-16 8:34:35 pm] | Task (2) 6 client(s) found. bind socket to 192.168.1.3:9001 (Address already in use) Udp-sender 20120424 [06-22-16 8:34:35 pm] | Task (2) /images/6_16_16 image file found. [06-22-16 8:34:35 pm] | Task (2) Multi-Cast Task has been cleaned. [06-22-16 8:34:35 pm] | Task (2) Multi-Cast Task is new! [06-22-16 8:34:35 pm] | 1 task found [06-22-16 8:34:35 pm] | 0 tasks to be cleaned [06-22-16 8:34:25 pm] | Sleeping for 10 seconds to ensure tasks are properly submitted [06-22-16 8:34:15 pm] * No tasks found! [06-22-16 8:34:05 pm] * No tasks found! [06-22-16 8:33:55 pm] * No tasks found! [06-22-16 8:33:45 pm] * No tasks found! [06-22-16 8:33:35 pm] * No tasks found! [06-22-16 8:33:25 pm] * No tasks found! [06-22-16 8:33:15 pm] * No tasks found! [06-22-16 8:33:05 pm] * No tasks found! [06-22-16 8:32:55 pm] * No tasks found! [06-22-16 8:32:45 pm] * No tasks found! [06-22-16 8:32:35 pm] * No tasks found! [06-22-16 8:32:25 pm] * No tasks found! [06-22-16 8:32:15 pm] * No tasks found! [06-22-16 8:32:05 pm] | Task (1) Multi-Cast Task has been cleaned as cancelled. [06-22-16 8:32:05 pm] | Task (1) Multi-Cast Task is no longer running. [06-22-16 8:32:05 pm] | 1 task found [06-22-16 8:32:05 pm] | 0 tasks to be cleaned [06-22-16 8:31:55 pm] | Task (1) Multi-Cast Task has started. [06-22-16 8:31:55 pm] | CMD: cat /images/6_16_16/d1p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 9000 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d1p2.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 9000 --full-duplex --ttl 32 --nokbd --nopointopoint;cat /images/6_16_16/d2p1.img | /usr/local/sbin/udp-sender --interface eth0 --min-receivers 2 --max-wait 60 --mcast-data-address 192.168.1.255 --portbase 9000 --full-duplex --ttl 32 --nokbd --nopointopoint;
-
Particularly, these lines from the log are most interesting:
cat: write error: Broken pipe Could not broadcast data packet to 192.168.1.255:64478 (Permission denied) Starting transfer: 00000009
Also, the invalid session is concerning as well.
-
Looking at the source code from here, this is where the error comes from:
https://www.udpcast.linux.lu/downloads.htmlsenddata.c
static int sendRawData(int sock, struct net_config *config, char *header, int headerSize, unsigned char *data, int dataSize) { struct iovec iov[2]; struct msghdr hdr; int packetSize; int ret; iov[0].iov_base = header; iov[0].iov_len = headerSize; iov[1].iov_base = data; iov[1].iov_len = dataSize; hdr.msg_name = &config->dataMcastAddr; hdr.msg_namelen = sizeof(struct sockaddr_in); hdr.msg_iov = iov; hdr.msg_iovlen = 2; initMsgHdr(&hdr); packetSize = dataSize + headerSize; rgWaitAll(config, sock, config->dataMcastAddr.sin_addr.s_addr, packetSize); ret = sendmsg(sock, &hdr, 0); if (ret < 0) { char ipBuffer[16]; udpc_fatal(1, "Could not broadcast data packet to %s:%d (%s)\n", getIpString(&config->dataMcastAddr, ipBuffer), getPort(&config->dataMcastAddr), strerror(errno)); } return 0; }
-
I tried a few different variations of changing the multicast address with no luck. I also tried a different kernel to see if that would help either, but it didn’t appear to.
@Developers anything else you suggest I try to do or provide on this post that will help?