Fog 1.1.0 multicast sits at "Starting to restore image (-) to device (/dev/sda1)
-
ianabc or anybody,
Are you guys in systems with multiple nodes?
-
Sorry, missed your question. No, single node setup, I think my problem is either with udpcast or our network configuration.
-
For my setup, this looks very similar to [URL=‘http://fogproject.org/forum/threads/multicast-does-not-work-to-multiple-clients-only-to-single-client.863/’]this post[/URL], but their hack with the 239.X.X.X mcast address doesn’t seem to be working for me.
-
I’m also having this issue, and it is really hard to isolate. In my old setup all sites had their own 0.32 FOG Server using Normal installation, and multicast worked great. The catch with this setup was that the images were “per site” as they each had to point to their respective local FOG servers, and also having one SQL database per site was unmanageable and inefficient.
I’m attempting to downgrade / revert for testing purposes but in my new setup each site has a storage node set to “Master” (because only Master nodes can multicast) AND each site’s storage node is its own storage group. I’m not sure if this is the most efficient setup but I need to be sure which node(s) the machines are pulling their images from (at least for now) and if only Master nodes can send multicast traffic, then it follows that each site needs to have a Master node on its LAN.
I’m in the process of reverting and testing, but where would I find the best multicast debug page?
If there is anything else that I can add, please let me know - I plan on working on this all week…
Ubuntu 14.04 - FOG 1.1.2
Multicast log on main FOG server is empty
Multicast log on storage node does not exist (file not there)
Storage nodes ARE pushing images to clients in unicast mode correctly (tested and verified)Thanks for any help,
D.L. -
To add,
If the storage node is the device actually pushing out the image via multicast, then shouldn’t the logs and UDPcast service and testing be done there? The reason that I’m asking is because the storage node(s) do not seem to have anything related to multi-casting, including the absence of FOGMulticastManager as a service…I am definitely puzzled,
Thanks for reading,
D.L. -
I have a clean install of FOG on Ubuntu 13.10.
I’m having the same issue. Hanging on the Partclone screen.
Unicast is workingNetwork is very basic.
FOG Server -> unmanaged switch -> PC’sThere is nothing special about the setup.
I start the Multicast and the session seems to end right away. I have rebooted the server and restarted the services indicated earlier in this post.
Here is the dump from my log.
[07-11-14 3:49:46 pm] * No tasks found!
[07-11-14 3:49:56 pm] | Task (8) Multitest is new!
[07-11-14 3:49:56 pm] | Task (8) /images/163DS image file found.
[07-11-14 3:49:56 pm] | Task (8) 3 client(s) found.
[07-11-14 3:49:56 pm] | Task (8) Multitest sending on base port: 53574
[07-11-14 3:49:56 pm] CMD: cat “/images/163DS”|/usr/local/sbin/udp-sender --min$
[07-11-14 3:49:56 pm] | Task (8) Multitest has started.
[07-11-14 3:50:06 pm] | Task (8) Multitest is no longer running. -
All, I think the issues you’re all experiencing are due to prior issues. This is probably my bad. I’m sorry. As I stated in my beginner’s notice (developer thread and feature request) I am human.
I’m starting to think the issue is because of previous “issues” with multicast tasks.
While not normally recommended, I’m going to make a request for those having issues. Clean out your old tasks.
You will have to do this with mysql. You can do it through phpMyadmin or terminal.
Login to your related element: CLI method below:
[code]mysql -u root [-p’PASSWORDHERE’ #ONLY IF YOU SET PASSWORD] fog[/code]Once in the prompt or in phpmyadmin run these commands.
[code]delete from tasks where taskTypeID=‘8’;
truncate table multicastSessions;
truncate table multicastSessionsAssoc;
exit;[/code]This will remove all assocation of Multicast tasks in the system. All fields from multicastSessions and all associations as well as any lingering tasks.
Then restart the FOGMulticastManager service.
[code]service FOGMulticastManager restart[/code]Then try your tasking again. Hopefully all will work much better. I’m certainly hoping so.
-
I’ve just tried 2039 and 2046 and I still get stuck at “Starting to restore image”
[ATTACH]1159[/ATTACH]
Doing a “multicast” to a group with a single member works as expected, but with two members in the group I get the message above on both machines. I got to the same behaviour on a physical network as well, but I can’t rule out the network as the source of the problem in either case - I’ve never had multicast working, I didn’t use it in 0.32.
I’m starting to think the actual issue is with udpcast. I’ve done some tests and I can’t get it to work on either virtual or physical setup. The problem is the same, multicast to a single machine works, multicast to two fails.
[url=“/_imported_xf_attachments/1/1159_multicast.png?:”]multicast.png[/url]
-
Hi all,
Have you guys tried a FOG installation without a SQL password? Having a blank root password on the SQL database solved all of my multicast problems. And yes, I verified it “6 ways from Sunday” - as in, it was not an incorrect/mismatched password causing the issue. Simply having a SQL password at all causes this behavior (hanging on “starting to restore”) in my environment. It did not matter if it was correct in the configuration files, etc. - just the existence of a root password on the SQL database prevented multicast from working…Strange, I know, but it’s the truth. Sometimes I make mistakes, but in this case I verified the password, and re-installed over and over again, and could not get multicast to work WITH a SQL password, so I’m very sure of this, in my environment at least.
Thanks,
D.L. -
[quote=“rhythmtone, post: 32953, member: 57”]Hi all,
Have you guys tried a FOG installation without a SQL password?
[/quote]I have one system with and one without a mysql password set. I get the same results either way. Can I just ask you to confirm that you are multicasting to more than one machine in a group. I can multicast with a single group member but not with 2 or more. In my case it looks like the problem is with udpcast itself not fog.
-
I have the same multicast-problem with Version 1.1.2 of fog.
OS: Debian 7.5 x64 on ESXI 5.5
Before that everything was working perfekt with Version 0.32 of FOG and x86-Linux.
Is it possible that it has to do something with the 32-to-64-Bit change of the OS? -
OK, in my case the problem is fixed and it was my network (sigh…:). The short story is, I should have started testing at the most basic level by trying to transfer anything by multicast, I’ve found omping and udpcast useful in doing this. If someone has a good understanding of multicast I think some debugging examples for using udpcast or omping would be a nice addition to the wiki.
Now for the long answer…
My test setup uses a KVM/QMEU network along with 3 KVM guests: One fog server and two client machines. All the guests are networked together on 192.168.222.0/24 which is NATed to give access to the outside world. The NATing is done by iptables on the KVM host via libvirt and has a PREROUTING chain which looks like
[CODE]
$ iptables -t nat -nL
…
RETURN all – 192.168.222.0/24 224.0.0.0/24
RETURN all – 192.168.222.0/24 255.255.255.255
MASQUERADE tcp – 192.168.222.0/24 !192.168.222.0/24 masq ports: 1024-65535
MASQUERADE udp – 192.168.222.0/24 !192.168.222.0/24 masq ports: 1024-65535MASQUERADE all -- 192.168.222.0/24 !192.168.222.0/24
[/CODE]
The problem is the first of those lines, which applies only to 224.0.0.0/24, when I want it to include all mutlciast addresses 224.0.0.0/4. Making this change allows omping, udpcast and [I]of course[/I] fog to use multicast without the hanging problem! -
I just did another clean install of Ubuntu Desktop 13.10, with FOG 1.1.2. I made sure that I did not set a MYSQL password.
I still cannot multicast.[07-16-14 4:59:22 pm] * No tasks found!
[07-16-14 4:59:32 pm] | Task (1) Multi is new!
[07-16-14 4:59:32 pm] | Task (1) /images/163QAIfinalRev5 image file found.
[07-16-14 4:59:32 pm] | Task (1) 3 client(s) found.
[07-16-14 4:59:32 pm] | Task (1) Multi sending on base port: 63100
[07-16-14 4:59:32 pm] CMD: cat “/images/163QAIfinalRev5”|/usr/local/sbin/udp-sender --min-receivers 3 --portbase 63100 --interface eth0 --full-duplex --ttl 32 --nokbd;
[07-16-14 4:59:32 pm] | Task (1) Multi has started.
[07-16-14 4:59:42 pm] | Task (1) Multi is no longer running.
[07-16-14 4:59:52 pm] | Task (1) Multi is no longer running.I have deleted the items from the table and restarted the services and nada.
I can single download and upload without issue. -
I found it useful to start debugging with omping and udpcast directly before trying to debug fog - it means you get instant answers, you don’t have to wait for a client to reboot.
If you can install omping somewhere (like a spare linux client), then you can try
[CODE]
fog-server$ omping FOG.CLIENT.IP FOG.SERVER.IP
…fog-client$ omping FOG.SERVER.IP FOG.CLIENT.IP
[/CODE]
On my system I see output on the fog server similar to
[CODE]
192.168.222.102 : multicast, seq=37, size=69 bytes, dist=0, time=0.437ms
192.168.222.102 : unicast, seq=38, size=69 bytes, dist=0, time=0.482ms
192.168.222.102 : multicast, seq=38, size=69 bytes, dist=0, time=0.495ms
192.168.222.102 : unicast, seq=39, size=69 bytes, dist=0, time=0.485ms
192.168.222.102 : multicast, seq=39, size=69 bytes, dist=0, time=0.499ms
192.168.222.102 : unicast, seq=40, size=69 bytes, dist=0, time=0.274ms
192.168.222.102 : multicast, seq=40, size=69 bytes, dist=0, time=0.291ms
192.168.222.102 : unicast, seq=41, size=69 bytes, dist=0, time=0.446ms192.168.222.102 : multicast, seq=41, size=69 bytes, dist=0, time=0.455ms
[/CODE]
When multicast is working. When it isn’t working I only see the unicast responses. -
Also, I just noticed that I can test multicast with the plain vanilla ping command as long as icmp_echo_ignore_broadcasts is set to 0, e.g.
[CODE]
fog-server$ echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
fog-client$ echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcastsfog-server$ ping -c 5 224.0.0.1
PING 224.0.0.1 (224.0.0.1) 56(84) bytes of data.
64 bytes from 192.168.222.100: icmp_seq=1 ttl=64 time=0.044 ms
64 bytes from 192.168.222.102: icmp_seq=1 ttl=64 time=0.393 ms (DUP!)
64 bytes from 192.168.222.100: icmp_seq=2 ttl=64 time=0.043 ms
64 bytes from 192.168.222.102: icmp_seq=2 ttl=64 time=0.414 ms (DUP!)
64 bytes from 192.168.222.100: icmp_seq=3 ttl=64 time=0.041 ms
64 bytes from 192.168.222.102: icmp_seq=3 ttl=64 time=0.402 ms (DUP!)
64 bytes from 192.168.222.100: icmp_seq=4 ttl=64 time=0.042 ms
64 bytes from 192.168.222.102: icmp_seq=4 ttl=64 time=0.424 ms (DUP!)
64 bytes from 192.168.222.100: icmp_seq=5 ttl=64 time=0.036 ms--- 224.0.0.1 ping statistics --- 5 packets transmitted, 5 received, +4 duplicates, 0% packet loss, time 3999ms rtt min/avg/max/mdev = 0.036/0.204/0.424/0.182 ms
[/CODE]
On a network that doesn’t permit multicast, the same thing gives me
[CODE]fog-server$ ping -c 5 224.0.0.1
PING 224.0.0.1 (224.0.0.1) 56(84) bytes of data.
64 bytes from 192.168.222.100: icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from 192.168.222.100: icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from 192.168.222.100: icmp_seq=3 ttl=64 time=0.044 ms
64 bytes from 192.168.222.100: icmp_seq=4 ttl=64 time=0.040 ms
64 bytes from 192.168.222.100: icmp_seq=5 ttl=64 time=0.042 ms--- 224.0.0.1 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 3999ms rtt min/avg/max/mdev = 0.040/0.042/0.046/0.007 ms
[/CODE]
i.e. Only the machine sending the pings responds, the client never sees them.[B]N.B. I should also point out that I have basically no idea what I’m doing when it comes to multicast so YMMV :-)[/B]
-
I tried with fog 1.1.2 and debian 7.6 as well as with ubuntu 12.04.
Uploading works.
Downloading works with a single win-8-client (it hangs after the clone-process, but thats another story i guess).
Multicast hangs at “Starting to restore image (-) to device (/dev/sda1)”.On the same hardware with fog 0.32 and ubuntu 12.04 multicast works like a charm.
Any ideas?
Greeds! -
t.mayer: Could you try the omping commands above just to confirm that multicast routing is working for you.
-
[quote=“ianabc, post: 33453, member: 24548”]t.mayer: Could you try the omping commands above just to confirm that multicast routing is working for you.[/quote]
Thanks for the answer ianabc!I have to be honest: I dont understand the omping-thing…
Isn’t it evidence enough, that it works with fog 0.32?
Can you explaing me in wich state client and server have to be to do the omping? -
I guess so, as long as you can be absolutely sure there were no network config changes between the 0.32 and 1.1.2 tests.
I don’t really understand multicast myself, but if you can find (or compile) omping somewhere I found it really handy for isolating the problem to multicast. Fog has so many parts that it can take a while to debug, omping lets you confirm that IP multicast routing is working on your network so you have the confidence that the problem is actually with fog.
If this is a development fog server you might want to try truncating the SQL tables that tom mentioned above, and also upgrading to SVN to see if the problem is still there.
-
[B][COLOR=#ff0000]EDIT: Sorted! The new SVN helped - it works now like a charm! Thank you![/COLOR][/B]
Good afternoon fellow Foggers!
I have the same problem as stated before - multicast just refuses to work and hangs on the window.Unicast works just fine, but multicast just refuses to work.
Server: Debian 7.6
Fog: 1.1.2 latest version.Here is what I get out of my udp-log file.
[CODE]Udp-sender 20120424
Using mcast address 234.10.48.7
UDP sender for (stdin) at 10.10.48.7 on eth0
Broadcasting control to 224.0.0.1
New connection from 10.10.50.70 (#0) 00000009
New connection from 10.10.50.71 (#1) 00000009
New connection from 10.10.50.72 (#2) 00000009
New connection from 10.10.50.73 (#3) 00000009
New connection from 10.10.50.74 (#4) 00000009
New connection from 10.10.50.75 (#5) 00000009
New connection from 10.10.50.77 (#6) 00000009
New connection from 10.10.50.121 (#7) 00000009
New connection from 10.10.50.76 (#8) 00000009
Starting transfer: 00000009
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Timeout notAnswered=[0,1,2,3,4,5,6,7,8] notReady=[0,1,2,3,4,5,6,7,8] nrAns=0 nrRead=0 nrPart=9 avg=10000
Dropping client #0 because of timeout
Disconnecting #0 (10.10.50.70)
Dropping client #1 because of timeout
Disconnecting #1 (10.10.50.71)
Dropping client #2 because of timeout
Disconnecting #2 (10.10.50.72)
Dropping client #3 because of timeout
Disconnecting #3 (10.10.50.73)
Dropping client #4 because of timeout
Disconnecting #4 (10.10.50.74)
Dropping client #5 because of timeout
Disconnecting #5 (10.10.50.75)
Dropping client #6 because of timeout
Disconnecting #6 (10.10.50.77)
Dropping client #7 because of timeout
Disconnecting #7 (10.10.50.121)
Dropping client #8 because of timeout
Disconnecting #8 (10.10.50.76)
bytes= re-xmits=0000000 ( 0.0%) slice=0112 - 0
Transfer complete.[/CODE]Here are two images from my router. When I launch multicast - the fog control management says that they are in q[SIZE=3][FONT=arial][COLOR=#545454][B]ueue[/B][/COLOR][/FONT][/SIZE] - once I start the clients, the management shows that it’s in “progress”. Here’s what the routers say about it:
Seems like both, the server & the clients are sending signals to each other.Server is on 10.10.48.7 ip address and the clients are on the 10.10.50. subnet
[IMG]http://www.pixentral.com/pics/1y1KRscWeCz5zGoDid0W0VX8YihKLk.png[/IMG]
[IMG]http://www.pixentral.com/pics/19BdkPufSWuXOIvkk7IHRFuAUJHHPg0.png[/IMG]
Is there any possible solution for this?