Multicast data address not change from one task to another one

Jose Cacho

You are right, they are vCPUs. I’m asking for the CPU cores in our VM Host server and come back with the information.

2 sockets of 6 cores (12 processors) with hyperthreading enabled (24 logical processors). Our FOG server 8 vCPUs.

george1421

@jose-cacho said in Multicast data address not change from one task to another one:

2 sockets of 6 cores

The thing is with 8 vCPUs allocated to the VM, the hypervisor needs to have 8 of the 12 cores available for the VM to be scheduled to execute. The other factor is how many VMs are on this VM Host server. While we are getting off point of your initial post. But my intuition is telling me that 8 vCPU is much and you might see better performance with 4 or 6 vCPUs. But at the moment only change one thing at a time.

When you say 2500 hosts, do all of them have the FOG client installed? If so, what is your client check in time for the fog client? If it still set for 60 seconds, change that to 900 (15 minutes). That will dramatically drop the load on the FOG server.

Jose Cacho

@george1421

When you say 2500 hosts, do all of them have the FOG client installed?

Yes, all of them (7000) have the FOG client installed. But 2500 could be polling the server in a ordinary class day. At this time that it is not a problem because all the schools are in summer break. In order to use the task reboot manager to unnatended image deploys, we set the check in time in 180 seconds. (So if a multicast deploy task is sended, the computers would have time to reboot and get suscribed before the 300 seconds limit.)

my intuition is telling me that 8 vCPU is much and you might see better performance with 4 or 6 vCPUs. But at the moment only change one thing at a time.

Ok, 6 vCPUs was the setting until one month. We were running very slow tasks and I ask for more power on our server (6-8CPU - 12-16GB RAM). We were aware that it could not be the best but we were “forced” to test it. We didn’t notice a much better performance and the plan is to have 6 vCPUs after the classrooms are ready for the new academic year.

While we are getting off point of your initial post.

Ok, coming back to the point ;P. I have been talking with one of our network team and he have give me some general information about our network. Our FOG server is atacched to a “CORE” router (10GbE). From this central point there are connections to the four named campus. I have done a sketch map.
0_1533314602550_3d0da188-fcb9-4621-8268-d9b43b794afa-image.png

Looking the map, and remembering the benchmark tests, the first unicast tests hosts are in Campus 1. And the last unicast test hosts (at “the opposite end” ) are in Campus 4.

So, (if my memory serves me correctly) my network workmate has tell me that IGMP does not use the port number parameter (only IP). And, today, we are not sure if the router has the capability of “discard” or “route efficiently” the muliticast data only to the subscribers on IP+portnumber multicast session. (Our high experienced guy on tunning multicast in our network is on holidays.)

I am looking for the cause that doubles the time needed (with the v1.5.2 server not heavily used) for a multicast tasks. (If we compare it with the v0.30 server.)
We could take a multicast deploy to a group in “campus 3” and on april with v0.30 took less than 4h (about 58GB). But yesterday with v1.5.2 more than 8h (about 67GB).
(Don’t get me wrong, I know it is very difficult to tune up all the settings. And in addition, I think our FOG implementation is not an easy one :). So step by step.)

On another level, to add some little test results to the multicast performance problem we try with the Bitrate option (yes, it seems that setting it up on the “Storage” options it get added to the udpsender command):

In “campus 1” deploy 2.43 GB/min Vs 4.24GB/min (the second test 5 minutes later without --max-bitrate 200m to the same two hosts)

root     23705  2180  0 19:44 ?        00:00:00 /usr/local/sbin/udp-sender --max-bitrate 200m --interface ens192 --min-receivers 2 --max-wait 300 --mcast-data-address 239.0.107.1 --portbase 51604 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/aula-upv-ehu-enajenacion/d1p1.img

root     31218  2180  8 19:51 ?        00:00:12 /usr/local/sbin/udp-sender --interface ens192 --min-receivers 2 --max-wait 300 --mcast-data-address 239.0.107.1 --portbase 52262 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/aula-upv-ehu-enajenacion/d1p1.img

I have not had chance to get tested on other “campus”.

Jose Cacho

Here you have some images, for an overview of our fog server load today.
The active unicast tasks are properly queued if they are more than 10. This setting mantains our unicast tasking giving a good performance.
But the multicast tasks get quite slow if they are not “alone” (“one by one”). And, as you can see on the attached images, we can easily reach to five (or more) multicast groups at the same time.
– FOG Overloaded –

– FOG Managing overload –

@george1421 Thinking aloud, if the mcast-data-address is not part of the performance problem, the way could be to get the multicast tasks queued.

george1421

@jose-cacho said in Multicast data address not change from one task to another one:
I’ve trying to think about how we can best debug this issue.

At this moment I’m just thinking out loud here: There has been many changes since 0.30. Partclone is now used instead of Partimage, ZSTD is used as the standard image decompressor (even if gzip is picked for image capture). The FOS kernel (the customized linux that runs on the target computer) has been updated a hundred time or so. Plus all of the ancillary applications to FOG have been updated. The linux OS of the FOG host server has been updated.

On the other side: The VM is running on the same infrastructure as 0.30 instance. The image is taking the same data path between the VM host server and the target computers.

Well we know we can manually launch the udp-sender application on the FOG server with this command:

/usr/local/sbin/udp-sender --interface ens192 --min-receivers 2 --max-wait 300 --mcast-data-address 239.0.107.1 --portbase 52262 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/aula-upv-ehu-enajenacion/d1p1.img

On the target computer there will be a udp-receiver command that will connect to the multicast stream initiated by the fog server. I don’t know the exact command that FOG is using but it should be close to this

udp-receiver --file /tmp/pig.tmp --nokbd --portbase 52262 --ttl 32 --mcast-rdv-address 239.0.107.1

The one thing I did notice is that the ttl is set to 32, so you can’t have more than 32 hops between the sender and receiver. Unless you have a really big campus then this shouldn’t come into play.

Now if you schedule a debug capture or debug deploy and then pxe boot the target computer, on the target computer you will be dropped to a linux command prompt where you can key in commands like udp-receiver

ref: https://www.udpcast.linux.lu/cmd.html

Jose Cacho

@george1421 Thank you very much for your thougts and suppport. I agree with you, there has been many changes since our last version. And we will have to test the udpcast commands to get the best (thanks). But I have some more tests and data.
We have run simultaneous “controlled” multicast tasks on different campuses and network team has captured the traffic (port mirroring) on one of the multicasted computers. (Please, let me know if you don’t understand something on this post. I am not used to write about network terms, and It could be a better way to explain it.)

The summary is:

All the traffic of the same multicast address IP reaches the computer NIC. It is not filtered by port.
As the multicast IP address is the same, the different mcast sessions to a campus are sent by the same data channel and not over another one (to take advantage of the other data channels if the former one is giving its max throughput). Note we have the campus connected by 2, 3 or 4 different aggregated data channels and the data is balanced to get the overall best throughput and performance. But the IP is a vital data to get it properly routed. So, when one (or more) multicast session is running on a campus, all the multicast data is routed by the same data channel.
FOG server’s CPUs goes to 100% only with 2 simultaneous multicast task on the Campus1: one task of 9 computers and another one of 41.

So, I’m thinking about:
A) could you help us tweaking FOG to get each specific multicast tasks using a different IP?
B) (…And thinking aloud) If FOG needs to resend more packets and it has to be waiting for “an overflow” data channel, could be this the main cause of the CPU comsumption?

And now, some additional data courtesy of our network team:

Port mirroring and capturing the traffic on a multicasted computer, we can see it receives the data of all the running multicast tasks
From https://community.cisco.com/t5/switching/multicast-ports/td-p/854295

But you would do well to use different multicast IP addresses for different application because switches will distribute multicast packets according to the IP address (regardless of port).

So if you have two applications that use the same IP address but different ports, a machine that is interested in either application will have to listen to both sets of traffic and filter out the port it is not interested in. If they are using different IP addresses, the switch will do that for them.
(Actually, its a bit more complicated because the switch distributes according to groups of 32 addresses, so there may be some overlap even if the addresses are different … if the addresses fall in the same MAC group.)

Fernando Gietz

Hi,

I changed the multicasttask.class.php file to give different ips in each multicast session, and the performance is better now.

One line in /var/www/html/fog/service/multicasttask.class.php:

#diff multicasttask.class.php multicasttask.class.php.ori 
421,423d420
< /* Se añade esta linea para que asigne direcciones IP diferentes a cada tarea multicast*/
< 	$address = long2ip(ip2long($address)+(( $this->getPortBase() / 2 + 1) % self::getSetting('FOG_MULTICAST_MAX_SESSIONS')));
< /* FIN DEL CAMBIO*/

This line assigns dinamic multicast ips to the sessions, to do it the code uses some parameters of the server: the portbase (this port is created by FOG randomly) and FOG_MULTICAST_SESSIONS.

You can see the udp-sender commands:

Command: /usr/local/sbin/udp-sender --max-bitrate 200m --interface ens192 --min-receivers 2 --max-wait 300 --mcast-data-address 239.0.106.12 --portbase 63764 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/aula-ehu-upv-enajenacion/d1p1.img;

Command: /usr/local/sbin/udp-sender --max-bitrate 200m --interface ens192 --min-receivers 3 --max-wait 300 --mcast-data-address 239.0.106.31 --portbase 55994 --full-duplex --ttl 32 --nokbd --nopointopoint --file /images/aula-upv-ehu-W10-UEFI/d1p1.img;

Jose Cacho

Hi @george1421,

With the change made by my workmate @Fernando-Gietz, regarding to the use of a multicast data address, we have improved the througput when there are serveral multicast tasks running at the same time. So you can mark this as solved (I don’t find how to do it).

We are now focusing our attention on mysql tunning. Because (as you pointed) with the course started the polls of the fog clients on the hosts bring our CPUs “to the red zone”.

Only for keep the information on the post: I “remember” (I am back from holidays today) that our colleage from network team told me about IGMP: v3 can avoid delivering multicast packets from specific sources to networks where there are no interested receivers, but v2 can’t. And, our router is running IGMP v2.

COMPARISON OF IGMPV1, IGMPV2 AND IGMPV3
Understanding difference between IGMPv2 and v3

Many thanks for your excellent support.

george1421

@Jose-Cacho said in Multicast data address not change from one task to another one:

v3 can avoid delivering multicast packets from specific sources to networks where there are no interested receivers, but v2 can’t. And, our router is running IGMP v2.

I’m not a network engineer, but I think that “IGMP Snooping” configured on the switches will supplement IGMP v2, to make it a bit more like v3 by only delivering the multicast stream to the stream subscribers.

george1421

@Jose-Cacho said in Multicast data address not change from one task to another one:

With the change made by my workmate @Fernando-Gietz, regarding to the use of a multicast data address, we have improved the througput when there are serveral multicast tasks running at the same time. So you can mark this as solved (I don’t find how to do it).

@Developers we might want to consider @Fernando-Gietz patches for the next release of FOG.

Tom Elliott

@Fernando-Gietz said in Multicast data address not change from one task to another one:

Se añade esta linea para que asigne direcciones IP diferentes a cada tarea multicast

I’ve added the patch, but a little more checking involved. This has been added to both the working and working-1.6 branches. It tests the set value for the $address variable. If this variable is set, it will calculate the address. Here’s the snippet of lines:

if ($address) {
    $address = long2ip(
        ip2long($address) + (
            (
                $this->getPortBase() / 2 + 1
            ) % self::getSetting('FOG_MULTICAST_MAX_SESSIONS')
        )
    );
}

Hopefully this will address the problem people have been seeing and allow the use of multiple sessions.

Multicast data address not change from one task to another one

120

12.6k

17.5k

156.3k