Fog 1.1.0 multicast sits at "Starting to restore image (-) to device (/dev/sda1)

Michael Mullins

No, I started fresh. I wiped and reinstalled Unbuntu 13.10 and then Installed Fog v 1.1.1. Then Uploaded an image. Test pushed the image to a single machine and then tried a Multicast of 2 Machines. It’s still hanging at the “Starting to restore image” part.

Tom Elliott

Can you please try the steps I just gave?

Michael Mullins

Ok, Tried what you asked. Everything restarted without problems. Tried Multicast again. Still have the problem.

Michael Mullins

Thanks for your Help Tom. I’m headed home for the weekend. Hopefully we can figure out an answer when I get back to work on Monday.

Have a Great Weekend.

Tom Elliott

Based on the information, does your network allow UDP traffic to pass?

snoopsean

im in the same boat as michael. i reinstalled ubuntu 12.04 lts server. installed fog svn 1820. normal unicast works fine, but the group multicast with more than 1 member doesnt work.
here is the multicast.log.udpcast.6 (in this case its a 6) log:

Udp-sender 20120424
Using mcast address 234.1.1.242
UDP sender for (stdin) at 10.1.1.242 on eth0
Broadcasting control to 224.0.0.1
New connection from 10.103.50.144 (#0) 00000009
New connection from 10.103.50.151 (#1) 00000009
Starting transfer: 00000009
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000

it just keeps going.
when i multicast to a group with one member, it works. here is the multicast.log.udpcast.8:
Udp-sender 20120424
Using mcast address 234.1.1.242
UDP sender for (stdin) at 10.1.1.242 on eth0
Broadcasting control to 224.0.0.1
New connection from 10.103.50.144 (#0) 00000009
Starting transfer: 00000009
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1220
bytes= 4 729 088 re-xmits=0000029 ( 0.8%) slice=0112 - 0
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1207
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1311
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1161
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1223
bytes= 21 851 648 re-xmits=0002455 ( 16.3%) slice=0112 - 0
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1238
bytes= 29 516 032 re-xmits=0003904 ( 19.2%) slice=0112 - 0
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=921
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1151
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1173
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1161
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1155
So i notice that for both groups, everthing is the same until Timeout notAnswered=
For the group with 1 member, its:
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1220
For the group with 2 members, its:
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
The group with 2 members, that message keeps on repeating forever, and the client doesnt show any change.
The group with 1 member, every so often will update with a:
bytes= 29 516 032 re-xmits=0003904 ( 19.2%) slice=0112 - 0
If everyone else is getting it to work, then it makes me think the switches on my side are at fault. In which case shame on me for posting this. And shame on my colleagues for messing around with the switches.
Please let me know if anyone else is having this issue. I’ll work on my end and post what i found. Im hoping its just a switch where multicast isnt enabled.

Flavalf

Hi, I reinstalled fog_1.1.1 and tried again : multicast ok on my virtual network.
I will try next week with multicasting 15 real machines on W7

Thanks again for your work.

Tom Elliott

[quote=“Michael Mullins, post: 30264, member: 17924”]Ok, Tried what you asked. Everything restarted without problems. Tried Multicast again. Still have the problem.[/quote]
I think I know what’s causing the issue with the screen “hanging” at the (-) stuff.

Do you have multiple storage nodes?

I found an issue presenting exactly as you’re specifying here if you have multiple storage nodes. Chances are, the node it was working off of at the time had less client load and therefore set the “optimalStorageNode” paramenter. This “optimal node” is not a master node so the udp-sender command is not being pushed when the client is requesting it.

Hopefully this will help, I’ve told multicast jobs to only set the storage node based on the master node that’s enabled within that storage grouping.

RLane

As far as multicasting, we’re running 12.04 and 1.1.1. I tried to multicast and received the same Part-clone hang error. Upgrading to r1852 to see if it resolves it.

Michael Mullins

I reinstalled Unbuntu and Fog today. Still the same issue. Anyone else make it past this?

ianabc

I’m running into the same problem with fog-1.1.1 on RHEL 6.5. I’m pretty much clueless about UDPcast so I’m not sure if the problem is with the network or fog. I can say that everything seems to work with a single machine in the group and fog tells me that it will be using UDPcast for that host.

EDIT I should also say that I took a go at running udp-sender and udp-receiver manually (in debug mode on the clients). The results were the same: a single host worked fine, more than one failed, this time with errors like

[CODE]
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=84698
[/CODE]

Tom Elliott

ianabc or anybody,

Are you guys in systems with multiple nodes?

ianabc

Sorry, missed your question. No, single node setup, I think my problem is either with udpcast or our network configuration.

ianabc

For my setup, this looks very similar to [URL=‘http://fogproject.org/forum/threads/multicast-does-not-work-to-multiple-clients-only-to-single-client.863/’]this post[/URL], but their hack with the 239.X.X.X mcast address doesn’t seem to be working for me.

rhythmtone

I’m also having this issue, and it is really hard to isolate. In my old setup all sites had their own 0.32 FOG Server using Normal installation, and multicast worked great. The catch with this setup was that the images were “per site” as they each had to point to their respective local FOG servers, and also having one SQL database per site was unmanageable and inefficient.

I’m attempting to downgrade / revert for testing purposes but in my new setup each site has a storage node set to “Master” (because only Master nodes can multicast) AND each site’s storage node is its own storage group. I’m not sure if this is the most efficient setup but I need to be sure which node(s) the machines are pulling their images from (at least for now) and if only Master nodes can send multicast traffic, then it follows that each site needs to have a Master node on its LAN.

I’m in the process of reverting and testing, but where would I find the best multicast debug page?

If there is anything else that I can add, please let me know - I plan on working on this all week…

Ubuntu 14.04 - FOG 1.1.2
Multicast log on main FOG server is empty
Multicast log on storage node does not exist (file not there)
Storage nodes ARE pushing images to clients in unicast mode correctly (tested and verified)

Thanks for any help,
D.L.

rhythmtone

To add,
If the storage node is the device actually pushing out the image via multicast, then shouldn’t the logs and UDPcast service and testing be done there? The reason that I’m asking is because the storage node(s) do not seem to have anything related to multi-casting, including the absence of FOGMulticastManager as a service…

I am definitely puzzled,
Thanks for reading,
D.L.

BullDozer

I have a clean install of FOG on Ubuntu 13.10.
I’m having the same issue. Hanging on the Partclone screen.
Unicast is working

Network is very basic.
FOG Server -> unmanaged switch -> PC’s

There is nothing special about the setup.

I start the Multicast and the session seems to end right away. I have rebooted the server and restarted the services indicated earlier in this post.

Here is the dump from my log.

[07-11-14 3:49:46 pm] * No tasks found!
[07-11-14 3:49:56 pm] | Task (8) Multitest is new!
[07-11-14 3:49:56 pm] | Task (8) /images/163DS image file found.
[07-11-14 3:49:56 pm] | Task (8) 3 client(s) found.
[07-11-14 3:49:56 pm] | Task (8) Multitest sending on base port: 53574
[07-11-14 3:49:56 pm] CMD: cat “/images/163DS”|/usr/local/sbin/udp-sender --min$
[07-11-14 3:49:56 pm] | Task (8) Multitest has started.
[07-11-14 3:50:06 pm] | Task (8) Multitest is no longer running.

Tom Elliott

All, I think the issues you’re all experiencing are due to prior issues. This is probably my bad. I’m sorry. As I stated in my beginner’s notice (developer thread and feature request) I am human.

I’m starting to think the issue is because of previous “issues” with multicast tasks.

While not normally recommended, I’m going to make a request for those having issues. Clean out your old tasks.

You will have to do this with mysql. You can do it through phpMyadmin or terminal.

Login to your related element: CLI method below:
[code]mysql -u root [-p’PASSWORDHERE’ #ONLY IF YOU SET PASSWORD] fog[/code]

Once in the prompt or in phpmyadmin run these commands.
[code]delete from tasks where taskTypeID=‘8’;
truncate table multicastSessions;
truncate table multicastSessionsAssoc;
exit;[/code]

This will remove all assocation of Multicast tasks in the system. All fields from multicastSessions and all associations as well as any lingering tasks.

Then restart the FOGMulticastManager service.
[code]service FOGMulticastManager restart[/code]

Then try your tasking again. Hopefully all will work much better. I’m certainly hoping so.

ianabc

I’ve just tried 2039 and 2046 and I still get stuck at “Starting to restore image”

[ATTACH]1159[/ATTACH]

Doing a “multicast” to a group with a single member works as expected, but with two members in the group I get the message above on both machines. I got to the same behaviour on a physical network as well, but I can’t rule out the network as the source of the problem in either case - I’ve never had multicast working, I didn’t use it in 0.32.

I’m starting to think the actual issue is with udpcast. I’ve done some tests and I can’t get it to work on either virtual or physical setup. The problem is the same, multicast to a single machine works, multicast to two fails.

[url=“/_imported_xf_attachments/1/1159_multicast.png?:”]multicast.png[/url]

rhythmtone

Hi all,
Have you guys tried a FOG installation without a SQL password? Having a blank root password on the SQL database solved all of my multicast problems. And yes, I verified it “6 ways from Sunday” - as in, it was not an incorrect/mismatched password causing the issue. Simply having a SQL password at all causes this behavior (hanging on “starting to restore”) in my environment. It did not matter if it was correct in the configuration files, etc. - just the existence of a root password on the SQL database prevented multicast from working…

Strange, I know, but it’s the truth. Sometimes I make mistakes, but in this case I verified the password, and re-installed over and over again, and could not get multicast to work WITH a SQL password, so I’m very sure of this, in my environment at least.

Thanks,
D.L.

Fog 1.1.0 multicast sits at "Starting to restore image (-) to device (/dev/sda1)

79

12.7k

17.6k

156.7k