Multicast randomly hangs

Leonux

Hi

I have been testing the 1.0.0 and now the 1.0.1, and i congratulate the developers team for the hard work, there are nice improvements at the interface level and also supporting new features.

I’m using Ubuntu server 14.04 with kernel 3.13.0-24-generic but i’m having problems with multicast, when i send the multicast task to a group with 3 clients sometimes they all hang starting the fisrt partition (sda1) and other times randomly starting partition (sda3 or sda4), i have been whatching multicast.log.udpcast.16 but cant find the source of the problem.

My HD partitions :

[IMG]http://s27.postimg.org/txiu5tv7j/partitions.jpg[/IMG]

Screen where multicast hangs , i think it could be any partition (sda1 or sda2 and so on):

[IMG]http://s27.postimg.org/67teh4wu7/hangs.jpg[/IMG]

multicast.log.udpcast.16 when it hanged

[IMG]http://s27.postimg.org/dp2lwcmdb/log.jpg[/IMG]

I have many partitions, i know, but i already used the same HD configuration with previous versions of Fog 0,32, and i never had this issue before

I selected Windows 7 OS option and also Multiple Partition - Single Disk , the image is working fine for a single machine download, i only get the problem when downloading using multicast.

Thank you.

Jaymes Driver

I can’t see your images they are soo tiny

please upload your multicast log.

I don’t like multicast… but that is because I think that unicast does imaging so well! I don’t like setting a group of 30 machines to image to find that only 2 completed and the rest I have to image again in the morning because a client fell out in the middle of imaging.

I prefer Unicast, I can up the number of machines I can send to and send the same image to all of the hosts at the same time in much the same fashion as Multicast does, except each machine will politely wait it’s turn and each machine will finish in it’s own time, instead of keeping all the machines at the SAME download checkpoint until all machines reach the same checkpoint.

I just feel that Unicast is a better system

Now that I got to vent I would be happy to help you solve your multicast woes, but you can always use unicast to circumvent the problem until we resolve the issue permanently.

Leonux

Hi and thank you for the quick response.

I uploaded the 3 images and the multicast logs.

Thank you

[url=“/_imported_xf_attachments/0/777_hangs.jpg?:”]hangs.jpg[/url][url=“/_imported_xf_attachments/0/778_log.jpg?:”]log.jpg[/url][url=“/_imported_xf_attachments/0/779_partitions.jpg?:”]partitions.jpg[/url][url=“/_imported_xf_attachments/0/780_multicast.log.zip?:”]multicast.log.zip[/url]

Jaymes Driver

Thank you, is there any possibility of your switchgear stopping the multicast?

I say this because it seems that the stopping point isn’t consistent, so I wonder if the load you are putting on the switch isn’t too much for it to handle. Can you test by putting the machines and the fog machine on a hub for testing purposes?

Leonux

i’m sorry but i dont have any hub’s

I already used this switch from Nortel in other tests i made using previous version of FOG (0.32 or 0.33) and never had any problems using multicast

I have tried with a different image with only 2 or partitions but the problem remains …

I don’t know if this information in important butI was on version 1.0.0 then upgraded to version 1.0.1

Jaymes Driver

hmm, okay well if you used your switches in the past to image, the mutlicast manager is still the same I don’t believe we made any changes to it, so multicasting should work. Let me get a test going and I will see if I can replicate the issue!

Leonux

OK thanks, but in the previous versions of FOG partclone was also used?

It seems to me that there is some kind of syncing that fails, i suppose there is a trigger where the server “knows” that all the machines that belong to a certain multicast sessions are all ready to receive, or he detects that a machine is disconnected when she isn’t.

The multicast never stops in the middle of a partition at least i haven’t seen that, i only hangs when the process is starting a new partition replication.

kingofl337

I am having the same problem with multicast. It gets to the Partclone screen and it hangs.
I also don’t see any instance of the request for multicast in the log. Both computers started up
to multicast but I can’t find a record of it in the multicast log. /opt/fog/log.multicast.log.

version unbuntu server 14.01
fog 1.0.1

Previously working multicast

Tom Elliott

is the FOGMulticastManager service actually operating? On the fog server is the task udpsender actually running?

Try Restarting the FOGMultlicastManager service:
[code]service FOGMulticastManager restart[/code]

To check if udpsender is actually running run the command:
[code]ps -ef|grep udp-sender[/code]

domii666

same problem…

Leonux

Hi and good week for every1,

Just checked the if udp-sender was running and it wasn’t, just restarted [FONT=Consolas]FOGMulticastManager, restarted the clients and multicast is working, lets see if all goes well until the end, since i have many partitions.[/FONT]

[FONT=Consolas]Last week i also managed to get multicast to start, but also hanged between partitions.[/FONT]

[FONT=Consolas]I will keep you updated in 1 hour or so.[/FONT]

Leonux

Both client machines finished all partitions.

I have one client, lets call him client1 with first boot = Network, and the other client2 with first boot = HD, what happened was, after restoring all partitions, client2 booted to windows and changed the hostname accordingly, but client1 entered again in the multicast task and hanged in the last partition, don’t understand how this happened…

Now i’m running the same task in 3 clients, just started, let’s see how it goes.

domii666

i get back unrecognized service after typing “sudo service FOGMulticastManager restart”

greez

Leonux

Thats because the service ins’t running, try [CODE]sudo service FOGMulticastManager start[/CODE]

I normally use in Ubuntu “sudo service <servicename> start/stop” but for some reason it seems that “sudo /etc/init.d/<servicename>[SIZE=2] start/stop” works somehow [/SIZE]differently[SIZE=2] and with some services better!!![/SIZE]

domii666

i get fail.

Tom Elliott

Try rerunning the installer.

Leonux

I have been doing some more tests on this issue.

I have started my normal multicast task for a group with 3 clients, they all went the “Starting to restore image…” Window but cloning doesn’t start.

udp-sender is up and running, and what it seems to be the problem is that while i only have 3 clients in the group the "–min-receivers " was 5 , i suppose this has to be 3.

I canceled the task and executed it again and now the task started, with --min-receivers 3, i’m gonna wait the cloning process finish’s just to confirm that it doesn’t hang between partitions.

Leonux

The task finished fine yesterday.
But since somehow the --min-receivers is not always the number of members in the multicast group, today :
[LIST=1]
[]unnistalled FOG, removed folders, services and erase the fog database.
[]Checked out new revision from [url]https://svn.code.sf.net/p/freeghost/code/trunk[/url]
[]Clean installed FOG
[]Registered 3 clients
[]Created 2 groups, one with 2 clients the other with only one client
[]Started a multicast task for the 2 client’s group
[/LIST]
The task started but hanged in the beginning of the second partition (sda2).
I uploaded the multicast.log, and has you can see the --min-receivers, somehow changed to 4, which initially were 2.
Thanks

[url=“/_imported_xf_attachments/0/799_multicast.log.zip?:”]multicast.log.zip[/url]

Felipe Solari

Don’t know if you got it right by now, but here it goes how I solved it.
I write another thread in BUGS, about this.
It has to do with the way the script fog.download counts partitions.
If you really need multicast with multiple partitions:

unzip init.xz (the same goes for init32.xz) with xz -d init.xz
mount the init file in a loop device with: mount -o loop init sometempdir/
go to sometempdir/bin with cd
edit fog.download, and search for the part that does the multicast write, for your “method” (mps or mpa)
look for the part that does a loop on each partition, and fix it in a way that it checks for the existence of the file
(something like if [ ! -f $imgpart ] ; then echo “Partition file missing …jumping”; sleep 1; else writeMulticastImage; fi )
(look in the “not multicast” lines or the previous multicast method, for just linux type)
save your file, and back to the init file dir.
zip it with xz -C crc32 init

Put the task again to try it.

Try it again …you should briefly the message “Partition file …” and get the correct ones to the partclone/partimage program stream

Other related bug is in the MulticastTask.class.php. If you have 10 or more partitions, you need natsort() instead of sort().

Maybe you need a (not so much experienced) linux shell programmer to help.

P.S. Be sure to have also installed the php-process extensions to PHP, as the killing of multicast tasks uses posix_ functions in it. (That is for a CentOS / Redhat server; on Ubuntu server I have not tested or searched for them)

Multicast randomly hangs

77

12.7k

17.6k

156.8k