Fog 1.1.0 multicast sits at "Starting to restore image (-) to device (/dev/sda1)

Michael Mullins

I updated to 1818 and It’s still sitting at the “Starting to restore image (-) to device (/dev/sda1)” screen

I’m using Ubuntu 13.10

Any guess on things to try?

Michael Mullins

[quote=“Flavalf, post: 30153, member: 24625”]I uploaded svn 1817 and then try to update, but I’m stuck in the database update/install process.
When I click on update, I’ve got a blank screen( no error ) with the tittle

[CENTER][SIZE=6][FONT=Ubuntu][B][SIZE=32px][COLOR=#666666][SIZE=4]Database Schema Installer / Updater[/SIZE][/COLOR][/SIZE][/B][/FONT][/SIZE][/CENTER]

. Did I miss something ?[/quote]

Restart MySql. I got this a bunch until I removed the root MySql password.

Tom Elliott

[quote=“Michael Mullins, post: 30162, member: 17924”]Restart MySql. I got this a bunch until I removed the root MySql password.[/quote]

I’d say cancel the tasks and clear your tables.

[code]truncate table multicastSessions;
truncate table multicastSessionsAssociations;[/code]

Recreate your multicast task and restart those clients
all should work.

Michael Mullins

excuse me for my ignorance, but it gives a error saying that i need to give a size when i try to truncate.

Tom Elliott

Login to the mysql instance and use the database fog
[code]mysql -u root [-p IF YOU SET PASSWORD] fog
truncate table mutlicastSessions;
truncate table multicastSessionsAssoc;
exit[/code]

Michael Mullins

Hmm… That did’t work… but it killed mysql…

Michael Mullins

Ok, Reinstall Unbuntu 13.10 and installed Fog v1.1.1 and still hangs at the same place on multicast.

Tom Elliott

If you had to “restart” mysql because it “killed mysql” you’ll most likely need to restart the services.

Here’s the process I imagine:
[code]sudo service mysql restart &&
sudo service apache2 restart &&
sudo service FOGMulticastManager restart &&
sudo service FOGImageReplicator restart &&
sudo service FOGScheduler restart[/code]

The \ just allows you to keep adding to the current command. The && only starts the next command if the previous command completes successfully. If it doesn’t complete successfully it will not run the next, or any other of the commands in sequence. This can help you find out an issue.

Tom Elliott

If all restarts and returns properly, you’ll most likely have to restart the clients as the udp-sender commands will start, but there was no communication between the clients and the server before. Just restart the clients and all should start properly.

Michael Mullins

No, I started fresh. I wiped and reinstalled Unbuntu 13.10 and then Installed Fog v 1.1.1. Then Uploaded an image. Test pushed the image to a single machine and then tried a Multicast of 2 Machines. It’s still hanging at the “Starting to restore image” part.

Tom Elliott

Can you please try the steps I just gave?

Michael Mullins

Ok, Tried what you asked. Everything restarted without problems. Tried Multicast again. Still have the problem.

Michael Mullins

Thanks for your Help Tom. I’m headed home for the weekend. Hopefully we can figure out an answer when I get back to work on Monday.

Have a Great Weekend.

Tom Elliott

Based on the information, does your network allow UDP traffic to pass?

snoopsean

im in the same boat as michael. i reinstalled ubuntu 12.04 lts server. installed fog svn 1820. normal unicast works fine, but the group multicast with more than 1 member doesnt work.
here is the multicast.log.udpcast.6 (in this case its a 6) log:

Udp-sender 20120424
Using mcast address 234.1.1.242
UDP sender for (stdin) at 10.1.1.242 on eth0
Broadcasting control to 224.0.0.1
New connection from 10.103.50.144 (#0) 00000009
New connection from 10.103.50.151 (#1) 00000009
Starting transfer: 00000009
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000

it just keeps going.
when i multicast to a group with one member, it works. here is the multicast.log.udpcast.8:
Udp-sender 20120424
Using mcast address 234.1.1.242
UDP sender for (stdin) at 10.1.1.242 on eth0
Broadcasting control to 224.0.0.1
New connection from 10.103.50.144 (#0) 00000009
Starting transfer: 00000009
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1220
bytes= 4 729 088 re-xmits=0000029 ( 0.8%) slice=0112 - 0
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1207
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1311
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1161
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1223
bytes= 21 851 648 re-xmits=0002455 ( 16.3%) slice=0112 - 0
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1238
bytes= 29 516 032 re-xmits=0003904 ( 19.2%) slice=0112 - 0
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=921
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1151
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1173
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1161
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1155
So i notice that for both groups, everthing is the same until Timeout notAnswered=
For the group with 1 member, its:
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=1220
For the group with 2 members, its:
Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=10000
The group with 2 members, that message keeps on repeating forever, and the client doesnt show any change.
The group with 1 member, every so often will update with a:
bytes= 29 516 032 re-xmits=0003904 ( 19.2%) slice=0112 - 0
If everyone else is getting it to work, then it makes me think the switches on my side are at fault. In which case shame on me for posting this. And shame on my colleagues for messing around with the switches.
Please let me know if anyone else is having this issue. I’ll work on my end and post what i found. Im hoping its just a switch where multicast isnt enabled.

Flavalf

Hi, I reinstalled fog_1.1.1 and tried again : multicast ok on my virtual network.
I will try next week with multicasting 15 real machines on W7

Thanks again for your work.

Tom Elliott

[quote=“Michael Mullins, post: 30264, member: 17924”]Ok, Tried what you asked. Everything restarted without problems. Tried Multicast again. Still have the problem.[/quote]
I think I know what’s causing the issue with the screen “hanging” at the (-) stuff.

Do you have multiple storage nodes?

I found an issue presenting exactly as you’re specifying here if you have multiple storage nodes. Chances are, the node it was working off of at the time had less client load and therefore set the “optimalStorageNode” paramenter. This “optimal node” is not a master node so the udp-sender command is not being pushed when the client is requesting it.

Hopefully this will help, I’ve told multicast jobs to only set the storage node based on the master node that’s enabled within that storage grouping.

RLane

As far as multicasting, we’re running 12.04 and 1.1.1. I tried to multicast and received the same Part-clone hang error. Upgrading to r1852 to see if it resolves it.

Michael Mullins

I reinstalled Unbuntu and Fog today. Still the same issue. Anyone else make it past this?

ianabc

I’m running into the same problem with fog-1.1.1 on RHEL 6.5. I’m pretty much clueless about UDPcast so I’m not sure if the problem is with the network or fog. I can say that everything seems to work with a single machine in the group and fog tells me that it will be using UDPcast for that host.

EDIT I should also say that I took a go at running udp-sender and udp-receiver manually (in debug mode on the clients). The results were the same: a single host worked fine, more than one failed, this time with errors like

[CODE]
Timeout notAnswered=[0] notReady=[0] nrAns=0 nrRead=0 nrPart=1 avg=84698
[/CODE]

Fog 1.1.0 multicast sits at "Starting to restore image (-) to device (/dev/sda1)

89

12.7k

17.6k

156.8k