FOG 1.2.0 Multicast creates multiple single udpcast sessions
-
What distribution of Linux and what version?
When was the last time this machine was re-booted?
-
Has multicast ever worked before on this system. Or is this an issue ever since you installed this system?
-
As mentioned above, running CentOS 6.6, using FOG version 1.2.0.
Multicast did work earlier, however, I believe, at one point we had to tweak something with regards to ipxe boot.php as we have one lab where it needs to ‘flip-flop’ between booting from one drive to another (sda - Linux, sdb - Windows), which the default fog ipxe did not allow at the time…
-
Are you sure you only edited the boot.php file? it seems somebody may have made edits to Host.class.php maybe?
-
Well, looking at /opt/fog/service/common/lib…
We had to hack the MulticastTask.class.php for some reason… This is what we have:
[url=“/_imported_xf_attachments/1/1833_MulticastTask.class.php?:”]MulticastTask.class.php[/url]
-
If you’re running 1.2.0 that file should not be doing anything. What’s in the file /opt/fog/service/etc/config.php
-
[quote=“SKasai, post: 44747, member: 29107”]Well, looking at /opt/fog/service/common/lib…
We had to hack the MulticastTask.class.php for some reason… This is what we have:[/quote]
Not sure from which version this file is coming (maybe Tom knows better) but it seems kinda old as the class MulticastTask does not extend FOGBase. Not sure if this plays a role but I definitely wonder if your installation is a bit mixed up with code from different versions and patched files on top of that. Wouldn’t wonder if things go wrong with this.Do you have a spare machine (e.g. just a desktop machine) to setup a new FOG server (version 1.2.0 or current SVN if you like) to see if your multicast issues go away.
It’s very hard to guess what’s going wrong when you use a different set of code than we’d guess you have when talking about version 1.2.0.
-
<?php
define( “WEBROOT”, “/var/www/html/fog” );
?> -
Sorry for taking so long to get back to everyone here… A lot of busy work and had to put this on the back burner…
Found the issue that was causing the problem. We are currently using cfengine and it pushed the 0.3.2 version of MulticastTask.class.php, which is completely not the right one for the 1.2.0, for obvious reasons, but was able to generate the right command with the wrong reasons.
Restoring the original MulticastTask.class.php that comes with 1.2.0 seems to have fixed it.
The reason we had cfengine pushing a modified version of the 0.3.2 config was due to needing to ‘hack’ the multicasttask.class.php to make it work a little easier with the Cisco switches. This was done prior to my time so I don’t know what it was exactly.
-
Well, looks like I may have jumped the gun on that conclusion… A day later, the problem is back and still trying to figure out the issue…
But to put this in a little better context for all of you… Here is the hack on the file for /var/www/html/fog/lib/fog/MulticastTask.class.php that we were using… The First block is our hack, the Second block the Original
92,93c92,93
$cmd = 'cat '.$strRec.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;'; $cmd .= 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd = 'cat '.$strRec.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;'; $cmd .= 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
96c96
$cmd = 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd = 'cat '.$strSys.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
121c121
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
125c125
$cmd = 'cat '.rtrim($this->getImagePath(),'/').'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd = 'cat '.rtrim($this->getImagePath(),'/').'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
134c134
if ($handle = opendir($this->getImagePath()))
if($handle = opendir($this->getImagePath()))
153c153
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
178c178
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' --mcast-data-address 239.x.x.x '.$wait.' --full-duplex --ttl 32 --nokbd;';
$cmd .= 'cat '.$path.'|'.UDPSENDERPATH.' --min-receivers '.$this->getClientCount().' --portbase '.$this->getPortBase().' '.$interface.' '.$wait.' --full-duplex --ttl 32 --nokbd;';
So basically, we just added --mcast-data-address 239.x.x.x to force it to do the broadcast to the 239.x.x.x for the Cisco to allow it to work properly… What we are seeing, though is that when it creates command, it seems to do $this->getClientCount() as ‘1’ instead of the number of clients, but generates the clientcount’s # of processes.
-
@SKasai So are you saying that multicasting was actually working recently after you thought you found the problem but now it stopped again? That could be a clue as to what it could maybe be.
-
You may be willing to try SVN/Trunk/GIT or whatever you want to call it of FOG. Development is what I try to call it, but I do refer often to trunk or svn as well.
It shouldn’t have the problems you’re seeing, and should work fairly well. I’m aware of a quirk or two but it seems to work fine.
Please give it a shot.
There’s also alot of added functionality.
-
@isaiah658 said:
@SKasai So are you saying that multicasting was actually working recently after you thought you found the problem but now it stopped again? That could be a clue as to what it could maybe be.
Well, to give you an example of what I was seeing…
[08-03-15 8:55:54 am] CMD: cat /images/image06162015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 55764 --interface bond0 --mcast-data-address 239.x.x.x --full-duplex --ttl 32 --nokbd;
[08-03-15 8:55:54 am] | Task (135) machine21-22 has started.
[08-03-15 8:56:04 am] | Task (135) machine21-22 is already running PID 1672
[08-03-15 8:56:04 am] | Task (136) machine21-22 is new!
[08-03-15 8:56:04 am] | Task (136) /images/image06162015 image file found.
[08-03-15 8:56:04 am] | Task (136) 1 client(s) found.
[08-03-15 8:56:04 am] | Task (136) machine21-22 sending on base port: 53890
[08-03-15 8:56:04 am] CMD: cat /images/image06162015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 53890 --interface bond0 --mcast-data-address 239.111.34.241 --full-duplex --ttl 32 --nokbd;
[08-03-15 8:56:04 am] | Task (136) machine21-22 has started.
[08-03-15 8:56:14 am] | Task (135) machine21-22 is already running PID 1672
[08-03-15 8:56:14 am] | Task (136) machine21-22 is already running PID 1683
[08-03-15 8:56:24 am] | Task (135) machine21-22 is already running PID 1672
[08-03-15 8:56:24 am] | Task (136) machine21-22 is already running PID 1683
[08-03-15 8:56:34 am] | Task (135) machine21-22 is already running PID 1672
[08-03-15 8:56:34 am] | Task (136) machine21-22 is already running PID 1683So as you see here… On Fog GUI, I tell it to multicast to group called machine21-22, 2 machines in the group. Both machines are configured the same way node wise, so it isn’t separate images. This was what happened before I reverted back the fog/services MulticastTask.class.php that our CFengine messed with from the 0.3.2. Which I thought was related to this issue. When I restarted the service and tried to do the task… I started seeing this in the logs:
[08-03-15 10:22:36 am] * Starting FOG Multicast Manager Service
[08-03-15 10:22:41 am] * Checking for new tasks every 10 seconds.
[08-03-15 10:22:41 am] * Starting service loop.
[08-03-15 10:22:41 am] | Task (138) machine21-22 is new!
[08-03-15 10:22:41 am] | Task (138) /images/image06162015 image file found.
[08-03-15 10:22:41 am] | Task (138) 2 client(s) found.
[08-03-15 10:22:41 am] | Task (138) machine21-22 sending on base port: 53480
[08-03-15 10:22:41 am] CMD: cat /images/image06162015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 2 --portbase 53480 --interface bond0 --mcast-data-address 239.x.x.x --full-duplex --ttl 32 --nokbd;
[08-03-15 10:22:41 am] | Task (138) machine21-22 has started.
[08-03-15 10:22:51 am] | Task (138) machine21-22 is already running PID 2146
[08-03-15 10:23:01 am] | Task (138) machine21-22 is already running PID 2146
[08-03-15 10:23:11 am] | Task (138) machine21-22 is already running PID 2146
[08-03-15 10:23:21 am] | Task (138) machine21-22 is already running PID 2146This is the expected behavior I should see. Now besides the know quirk I read that trying to kill this task does not clean up properly (Which I manually killed and not worried about at the moment) The next day, I try to do this test…
[08-04-15 10:53:25 am] * No tasks found!
[08-04-15 10:53:35 am] * No tasks found!
[08-04-15 10:53:45 am] * No tasks found!
[08-04-15 10:53:55 am] | Task (140) machine21-22 is new!
[08-04-15 10:53:55 am] | Task (140) /images/image06162015 image file found.
[08-04-15 10:53:55 am] | Task (140) 1 client(s) found.
[08-04-15 10:53:55 am] | Task (140) machine21-22 sending on base port: 58858
[08-04-15 10:53:55 am] CMD: cat /images/image06162015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 58858 --interface bond0 --mcast-data-address 239.x.x.x --full-duplex --ttl 32 --nokbd;
[08-04-15 10:53:55 am] | Task (140) machine21-22 has started.
[08-04-15 10:54:05 am] | Task (140) machine21-22 is already running PID 3858
[08-04-15 10:54:05 am] | Task (141) machine21-22 is new!
[08-04-15 10:54:05 am] | Task (141) /images/image06162015 image file found.
[08-04-15 10:54:05 am] | Task (141) 1 client(s) found.
[08-04-15 10:54:05 am] | Task (141) machine21-22 sending on base port: 52754
[08-04-15 10:54:05 am] CMD: cat /images/image06162015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 52754 --interface bond0 --mcast-data-address 239.x.x.x --full-duplex --ttl 32 --nokbd;
[08-04-15 10:54:05 am] | Task (141) machine21-22 has started.
[08-04-15 10:54:15 am] | Task (140) machine21-22 is already running PID 3858
[08-04-15 10:54:15 am] | Task (141) machine21-22 is already running PID 3886
[08-04-15 10:54:25 am] | Task (140) machine21-22 is already running PID 3858
[08-04-15 10:54:25 am] | Task (141) machine21-22 is already running PID 3886So what I am seeing is, for some odd reason, the $this->getClientCount() is not getting the right number, but will generate multiple multicast tasks for them individually. Which I don’t understand.
-
@Tom-Elliott said:
You may be willing to try SVN/Trunk/GIT or whatever you want to call it of FOG. Development is what I try to call it, but I do refer often to trunk or svn as well.
It shouldn’t have the problems you’re seeing, and should work fairly well. I’m aware of a quirk or two but it seems to work fine.
Please give it a shot.
There’s also alot of added functionality.
At the moment, this is a production system and somewhat hesitant to go with development code as we also are using this to do a ‘flip-flop’ method with the modified BootMenu… Here is the listing of files that were modded not counting the ones I am trying to tweak…
-rw-r–r-- 1 apache apache 3523 Dec 30 2014 Config.class.php
-rwxr-xr-x 1 root root 27801 Dec 30 2014 BootMenu.class.php.linux
-rwxr-xr-x 1 root root 27801 Dec 30 2014 BootMenu.class.php.windows
-rw-r–r-- 1 root root 8238 Mar 20 15:10 MulticastTask.class.php.work
-rw-r–r-- 1 apache apache 8238 Aug 4 13:25 MulticastTask.class.php
-rwxr-xr-x 1 root root 27801 Aug 5 08:45 BootMenu.class.phpThe bootmenu.class.php is copied from the bootMenu.class.php.linux or .windows depending on when we are doing our ‘flipflop’… The only other thing I can see in Config.class.php is it looks like it has settings defined for our server… With the exception of seeing eth0 instead of bond0, as we did tweak the server to use a bonded network connection for the Multicast… And we did specify this on the GUI side.
-
While I understand the hesitation, I don’t know how much help I can provide.
This is especially important to know because of the files you have edited.
I don’t know what the state of the system is. It’s really hard to fix something when others are playing with other things.
As you described, you guys have changed a number of files.
While there are likely some bugs in what we did originally, this is exacerbated greatly by any changes.
One of the quickest fixes, to at least hopefully attempt helping you along.
Attempt clearing out the MulticastSessions and MulticastSessionsAssoc tables;
truncate table multicastSessions;truncate table multicastSessionsAssoc
Then restart the FOGMulticastManager service and create your tasks.
My guess is there are other jobs that the clients are trying to attach themselves to. Truncating should at least fix it for the first time around.
-
@Tom-Elliott Thanks for the reply, Tom.
I did try the truncate command as you suggested, but got the following:
truncate table multicastSessions;truncate table multicastSessionsAssoc
truncate: you must specify one of ‘–size’ or ‘–reference’
Try ‘truncate --help’ for more information.
truncate: you must specify one of ‘–size’ or ‘–reference’
Try ‘truncate --help’ for more information.FYI this is on a Centos 6.6 Machine
I can also send you what we did with those particular files and specify why given our situation in a private chat.
-
My apologies… Apparently, I missed the fact I need to do this in the mysql db… The results were:
mysql> truncate table multicastSessions; truncate table multicastSessionsAssoc;
Query OK, 0 rows affected (0.04 sec)Query OK, 0 rows affected (0.00 sec)
mysql> exit
However, testing it, it shows the following:
[08-05-15 11:21:18 am] * Starting FOG Multicast Manager Service
[08-05-15 11:21:23 am] * Checking for new tasks every 10 seconds.
[08-05-15 11:21:23 am] * Starting service loop.
[08-05-15 11:21:23 am] * No tasks found!
[08-05-15 11:21:33 am] * No tasks found!
[08-05-15 11:21:43 am] * No tasks found!
[08-05-15 11:21:53 am] * No tasks found!
[08-05-15 11:22:03 am] * No tasks found!
[08-05-15 11:22:13 am] * No tasks found!
[08-05-15 11:22:23 am] | Task (1) machine21-22 is new!
[08-05-15 11:22:23 am] | Task (1) /images/image06162015 image file found.
[08-05-15 11:22:23 am] | Task (1) 1 client(s) found.
[08-05-15 11:22:23 am] | Task (1) machine21-22 sending on base port: 64180
[08-05-15 11:22:23 am] CMD: cat /images/image06162015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 64180 --interface bond0 --mcast-data-address 239.x.x.x --full-duplex --ttl 32 --nokbd;
[08-05-15 11:22:23 am] | Task (1) machine21-22 has started.
[08-05-15 11:22:34 am] | Task (1) machine21-22 is already running PID 18543
[08-05-15 11:22:34 am] | Task (2) machine21-22 is new!
[08-05-15 11:22:34 am] | Task (2) /images/image06162015 image file found.
[08-05-15 11:22:34 am] | Task (2) 1 client(s) found.
[08-05-15 11:22:34 am] | Task (2) machine21-22 sending on base port: 54332
[08-05-15 11:22:34 am] CMD: cat /images/image06162015/d1p1.img|/usr/local/sbin/udp-sender --min-receivers 1 --portbase 54332 --interface bond0 --mcast-data-address 239.x.x.x --full-duplex --ttl 32 --nokbd;
[08-05-15 11:22:34 am] | Task (2) machine21-22 has started.
[08-05-15 11:22:44 am] | Task (1) machine21-22 is already running PID 18543
[08-05-15 11:22:44 am] | Task (2) machine21-22 is already running PID 18563
[08-05-15 11:22:54 am] | Task (1) machine21-22 is already running PID 18543
[08-05-15 11:22:54 am] | Task (2) machine21-22 is already running PID 18563 -
@SKasai Those are SQL commands. Not regular bash commands.
You need to login to the mysql server and run those truncate commands.
-
@Tom-Elliott Yup, realized the mistake and posted a follow up… Unfortunately, yielded the same results. See follow up message.
I am stumped as to what is passing over to the Fogmulticastmanager to do it this way.
-
@SKasai said:
@Tom-Elliott Yup, realized the mistake and posted a follow up… Unfortunately, yielded the same results. See follow up message.
I am stumped as to what is passing over to the Fogmulticastmanager to do it this way.
Just going to remind you that there have been improvements made to multicast in FOG Trunk.
You’ve had this problem for 5 months now??? Would it really be too out-of-the-way to spend a day setting up a test environment and trying FOG Trunk?
It’s simple… I’ll explain.
-
Don’t touch your production server. Setup a new fog trunk server and install it with DHCP enabled (you may use an old desktop for this).
-
Pick a computer lab - take the image that lab needs and copy it to your new fog server.
-
Export hosts from production server and import to new server.
-
Move the new fog server (physically) to that computer lab’s switch. Unplug the switch’s uplink. Connect the fog server to that switch.
-
Try to multicast, observe, report back.
-