FOG 1.2.0 Multicast creates multiple single udpcast sessions
-
Hi there,
My group has encountered an interesting issue which we can’t seem to figure out. We setup a group multicast for an image, but when we look at the system, instead of a single multicast attempt for a set of 10 computers, it creates 10 sets of multicasts (UDP Senders) for the machines, basically contradicting the point of a multicast imaging. We don’t know why the system keeps treating each node as a separate job instead of one job waiting for 10 clients.
-
Multicast uses UDP.
Inside “Tasks”, it will show you the status of each host in that multicast session.
This is important, because, if one fails (and you’re 10 miles away), this is basically your only way of getting early warning.
If you want to see the difference between Unicast & Multicast, tell a group of computers to Unicast… See how long that takes!
-
I understand this. The problem I am seeing is the following:
We define a group of 10 machines to do a Multicast image.
On the FOG server, command line, I see instead of 2 UDP based processes, we have 20 of them, 1 to send, one to receive.
Multicast.log shows it is creating 10 seperate UDP Multicast jobs with only 1 member, instead of 1 UDP multicast job waiting for 10 members.
My issue is trying to find out what, in a group where we specify a multicast job, it creates 10 separate UDP casts for 1 member jobs instead of 1 UDP cast waiting for 10 members job.
-
Simple question, do all the members of the group use the same image?
-
Yes. We even tried doing a simple 2 computer test and the resultant was seeing both machines starting their own separate UDP cast session. Image source is the same. Example from the multicast log:
[03-20-15 4:50:45 pm] | Task (131) xxx56-57 is new!
[03-20-15 4:50:45 pm] | Task (131) /images/03132015 image file found.
[03-20-15 4:50:45 pm] | Task (131) 1 client(s) found.
[03-20-15 4:50:45 pm] | Task (131) xxx56-57 sending on base port: 51130
[03-20-15 4:50:45 pm] CMD: cat /images/03132015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 51130 --interface bond0 --mcast-data-address 239.xxx.xxx.241 --full-duplex --ttl 32 --nokbd;
[03-20-15 4:50:45 pm] | Task (131) xxx56-57 has started.
[03-20-15 4:50:56 pm] | Task (131) xxx56-57 is already running PID 32173
[03-20-15 4:50:56 pm] | Task (132) xxx56-57 is new!
[03-20-15 4:50:56 pm] | Task (132) /images/03132015 image file found.
[03-20-15 4:50:56 pm] | Task (132) 1 client(s) found.
[03-20-15 4:50:56 pm] | Task (132) xxx56-57 sending on base port: 60678
[03-20-15 4:50:56 pm] CMD: cat /images/03132015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 60678 --interface bond0 --mcast-data-address 239.xxx.xxx.241 --full-duplex --ttl 32 --nokbd;
[03-20-15 4:50:56 pm] | Task (132) xxx56-57 has started.As you see here… This is one job, yet it generates two UDP sessions for 2 machines. This is not me spamming the task in succession. You will also notice it says ‘–min-receivers 1’… It should say ‘2’
-
What version of fog is this?
-
This would be for FOG 1.2.0, running on CentOS 6.6
-
How do you schedule the task? Group -> Basic Tasks -> Multicast?
-
Group -> Search for Group -> Select Multicast -> Create Multicast Job.
-
What distribution of Linux and what version?
When was the last time this machine was re-booted?
-
Has multicast ever worked before on this system. Or is this an issue ever since you installed this system?
-
As mentioned above, running CentOS 6.6, using FOG version 1.2.0.
Multicast did work earlier, however, I believe, at one point we had to tweak something with regards to ipxe boot.php as we have one lab where it needs to ‘flip-flop’ between booting from one drive to another (sda - Linux, sdb - Windows), which the default fog ipxe did not allow at the time…
-
Are you sure you only edited the boot.php file? it seems somebody may have made edits to Host.class.php maybe?
-
Well, looking at /opt/fog/service/common/lib…
We had to hack the MulticastTask.class.php for some reason… This is what we have:
[url=“/_imported_xf_attachments/1/1833_MulticastTask.class.php?:”]MulticastTask.class.php[/url]
-
If you’re running 1.2.0 that file should not be doing anything. What’s in the file /opt/fog/service/etc/config.php
-
[quote=“SKasai, post: 44747, member: 29107”]Well, looking at /opt/fog/service/common/lib…
We had to hack the MulticastTask.class.php for some reason… This is what we have:[/quote]
Not sure from which version this file is coming (maybe Tom knows better) but it seems kinda old as the class MulticastTask does not extend FOGBase. Not sure if this plays a role but I definitely wonder if your installation is a bit mixed up with code from different versions and patched files on top of that. Wouldn’t wonder if things go wrong with this.Do you have a spare machine (e.g. just a desktop machine) to setup a new FOG server (version 1.2.0 or current SVN if you like) to see if your multicast issues go away.
It’s very hard to guess what’s going wrong when you use a different set of code than we’d guess you have when talking about version 1.2.0.
-
<?php
define( “WEBROOT”, “/var/www/html/fog” );
?> -
Sorry for taking so long to get back to everyone here… A lot of busy work and had to put this on the back burner…
Found the issue that was causing the problem. We are currently using cfengine and it pushed the 0.3.2 version of MulticastTask.class.php, which is completely not the right one for the 1.2.0, for obvious reasons, but was able to generate the right command with the wrong reasons.
Restoring the original MulticastTask.class.php that comes with 1.2.0 seems to have fixed it.
The reason we had cfengine pushing a modified version of the 0.3.2 config was due to needing to ‘hack’ the multicasttask.class.php to make it work a little easier with the Cisco switches. This was done prior to my time so I don’t know what it was exactly.
-
Well, looks like I may have jumped the gun on that conclusion… A day later, the problem is back and still trying to figure out the issue…
But to put this in a little better context for all of you… Here is the hack on the file for /var/www/html/fog/lib/fog/MulticastTask.class.php that we were using… The First block is our hack, the Second block the Original
92,93c92,93
$cmd = 'cat '.$strRec.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;'; $cmd .= 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd = 'cat '.$strRec.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;'; $cmd .= 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
96c96
$cmd = 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd = 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
121c121
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
125c125
$cmd = 'cat '.rtrim($this->getImagePath(),'/').'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd = 'cat '.rtrim($this->getImagePath(),'/').'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
134c134
if ($handle = opendir($this->getImagePath()))
if($handle = opendir($this->getImagePath()))
153c153
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
178c178
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
So basically, we just added --mcast-data-address 239.x.x.x to force it to do the broadcast to the 239.x.x.x for the Cisco to allow it to work properly… What we are seeing, though is that when it creates command, it seems to do $this->getClientCount() as ‘1’ instead of the number of clients, but generates the clientcount’s # of processes.
-
@SKasai So are you saying that multicasting was actually working recently after you thought you found the problem but now it stopped again? That could be a clue as to what it could maybe be.