Multicast very slow



  • Hello,
    i’m trying multicast with fog 1.5.5 and alcatel switchs (OS6250 and OS6450).
    When i launch a multicast session it’s extremely slow.
    For the configuration of the switchs i made like this :

    On all switchs enable igmp globaly and on vlan 2 (fog server)

    ip multicast status enable
    ip multicast vlan 2 status enable
    ip multicast zapping enable
    ip multicast vlan 2 zapping enable
    ip multicast version 3
    ip multicast vlan 2 version 3
    ip multicast querier-forwarding enable
    ip multicast vlan 2 querier-forwarding enable
    

    On the querier switch (the closest to the fog server)

    ip multicast querying enable
    ip multicast vlan 2 querying enable
    ip interface Querier address 192.168.39.244 mask 255.255.255.0 vlan 2
    

    Disable igmp on all other vlans on all the switchs

    All computers are on 100M port and the fog server on 1G port

    You can see on this diagram http://plegrand1.free.fr/IGMP.png (old diagram )
    not a ghost server but a fog server
    I add global statement
    The server is on a 1G port

    the flow is very fast at the beginning then goes down to be very slow (25M / min)
    Here is the server characteristics

    PowerEdge T310
    Intel(R) Xeon(R) CPU X3480 @ 3.07GHz
    cores	      =	4
    enabledcores  =	4
    threads       =	8
    Ram           = 32GiB
    
    Ethernet interface
    NetXtreme II BCM5716 Gigabit Ethernet
    eth0
    1Gbit/s
    

    Could you help me to solve this problem ?


  • Moderator

    @plegrand said in Multicast very slow:

    Then there is 2 sessions running (test and test2) and all the target computers have about 1.15GB/min rate

    Just so I’m clear on this. When you were able to get 4 computers imaging your transfer rate was 1.15GB/min? That’s still 19MB/sec. You are still above the 100Mb/s theoretical limit.

    All the computers have 2 partitions and strangely the first partition is slower than the second : about 700MB/min rate

    I can explain this. The number is based on an incorrect calculation. The issue is that first partition is pretty small, like 500MB. It transfers so fast that the speed numbers get skewed. The second partition is typically the contents of the C: drive. You can see this if you look at the disk manager in Windows. Look at the size of the first partition.

    I might need to explain how image multicasting works. There is one computer (FOG Server) that is sending the image out. As each multicast client boots up it checks in with the multicast sender through a discovery process. The muticast sender (FOG Server) configures the multicast sender service to wait for X number of clients to check in before going, or after the first client checks in wait for XX seconds before going even if not all have checked in. Once the multicast stream starts, no other late clients can check in (they are blocked). So in the image stream the FOG server sends out the first block of data then stops. It waits for every multicast receiver (target computers) to respond with “OK!”. The FOG server will not send the next block until it hears “OK!” from every client. If something happens and one client didn’t get the block correctly it will send “Retrans” back to the FOG server and the fog server will resend that block back to the client computer (while the others sit and wait until everyone replies with “OK!”. This is why we say multicasting can only go as fast as the slowest computer in the multicast stream. Consider you have 4, 8-core desktops all with SSD drives and one with a Penitum-4 and a slow HDD. If you imaged them all in one stream the 8 core systems would image at the rate the Penitum-4 system can write data to its slow HDD. If you have a system with a failing hard drive if the block transferred to it’s checksum doesn’t match the checksum of the block on the disk it will send a “Retrans” command back to the FOG server while the other clients wait. The point is when everything works it works well, when you have one bad actor everyone suffers.


  • Moderator

    @plegrand said in Multicast very slow:

    So what’s the multicast interest if it’s not faster, apart from not flooding the network?

    This answer is simple. Lets say you want to send the same image to 5 computers and your base image is 20GB in size.

    With a multicast we send out one 20GB image for 20 or 100 systems. In the unicast situation for those 5 systems you would have to transmit 100GB worth of data over your network. So from a network load standpoint you will get less network impact with multicast.


  • Developer

    @plegrand I’d say keep on doing more tests. If you see some machines not joining the session then you might want to cancel it and restart the FOG server just to have a clean test setup on every run.

    All the computers have 2 partitions and strangely the first partition is slower than the second : about 700MB/min rate

    From my point of view this is another hint that it’s not as much a network/switch issue but more due to the clients actually deploying the data to disk. There might be some clients that have a dieing disk that might be causing the slowdown for all clients in one session. You know that with multicast the slowest link of the chain is dictates the overall speed! While it causes way less network traffic than unicast for a group of computers it has the caveat of being regulated by the slowest part speed-wise.



  • @george1421
    Then i made some tests

    Multicast session (test) with 2 target computers : about 1.7GB/min rate

    Multicast session (test) with 3 target computers : about 1.7GB/min rate

    Multicast session (test) with 4 target computers
    The 2 first target computers didn’t start i cant understand why.
    then i removed these to computers from the session (test)
    i create a new multicast session (test2) for these 2 computers

    Then there is 2 sessions running (test and test2) and all the target computers have about 1.15GB/min rate

    All the computers have 2 partitions and strangely the first partition is slower than the second : about 700MB/min rate



  • @george1421 said in Multicast very slow:

    Under a multicast you should get just slightly less than unicast speeds.

    So what’s the multicast interest if it’s not faster, apart from not flooding the network?

    I’m going to make some tests this morning with 2, 3, 4 target computers.
    I will tell you the result



  • @george1421
    The 6450 has all his ports 10/100/1000
    not the 6250 only the uplink ports are on 1000


  • Moderator

    @plegrand said in Multicast very slow:

    I cant do anything else, no enough GB ports

    When I looked up the switch configuration it said that all ports are 10/100/1000 rated. Maybe I did not understand.

    The the OS6250 switch is only 100Mb/s then I understand why you have 100MB/s to the desktop.

    OK on delaying the test. There is a bottleneck someplace we just need to find it.



  • @george1421 said in Multicast very slow:

    Do you have the capabilities to run a second network wire from the switch where the fog server is to the switch where the target computers are? (I’ll explain more if you can).

    It’s quiet already the case
    The switch (OS6250) where are connected all my target computers is just below the switch where the fog server is (OS6450).
    And directly connected by a 1GB link
    Then, only target computers are on a 100MB link



  • @george1421 For these tests for the moment i’ve no time enough but i’ll do these tests in a near futur



  • @george1421
    I’m not sure to well understand
    Actually my diagram is not “really” true.
    The fog server is on a GB port
    All desktop clients are on 100 M ports
    I cant do anything else, no enough GB ports



  • @george1421
    With a target computer far from the fog server it seems to works also

    fddf7741-d626-4eb7-abe3-3625249a467c-image.png file:///home/pascal/Bureau/multicast-205.jpg


  • Moderator

    @plegrand Also I just looked into these switching thinking they were old. They are not (at least performance wise) they are better capacity than Cisco small business switch SG300.

    So I have to ask you why are you running 100MB/s to the desktops?

    Do you have the capabilities to run a second network wire from the switch where the fog server is to the switch where the target computers are? (I’ll explain more if you can).


  • Moderator

    @plegrand Well that makes me think a little differently if they are that near to your fog server.

    Ok lets do a new test for multicast. Lets test 2 computers and then 3 computers. Now these computers must be the same model as the one done in your first test. We need to remove the variable of different models in this test.

    What I expect to see for 2 computers is about the same speed as 1 computer in the multicast. For 3 computers slightly less than 2 computers. In my mind I question at what point do we go from an acceptable level of speed to bad.

    I can tell you with unicasting on a pure 1GbE network you can fill up a 1GbE link with 3 simultaneous unicast deployments of FOG with modern target computers. That is a concern for the link to the fog server and switch to switch links mainly. Under a multicast you should get just slightly less than unicast speeds.



  • @george1421
    To answer to your last question it was a session with 15 target computer and theses target computer was very near the fog server, on my drawing it’s the swith just below the fog server


  • Moderator

    @plegrand OK very good. This tells us that your network between the fog server and the target computer is good. You are getting ~28MB/s transfer rate. That’s a bit of a lie because your 100MB/s network can only transfer 12MB/s. The 20MB/s is the rate at which partclone can expand the image onto the disk of the target computer. But if you can’t feed partclone fast enough that rate will drop off quickly. If you are getting an expansion rate of the image faster than the network speeds your network rate is good.

    Now can you make the same test with a computer located on the switch far away from the FOG server. In your drawing it would be the switch I want to test on the left of the drawing. This test will be the full network path between the fog server and the target PC.

    One question I didn’t ask, how many target computers are in your multicast session?



  • @george1421

    With only one computer it seams to works fine.

    Unicast
    c2b1320f-c025-478f-bcc5-d25eb8678311-image.png file:///home/pascal/Bureau/unicast.jpg

    Multicast
    b1e83cb1-140c-4d9e-b26a-9700b65acd8d-image.png file:///home/pascal/Bureau/multicast.jpg



  • @george1421
    Image menu
    Multicast Image
    Start Multicast Session



  • @george1421 Sorry but where is this “Start Multicast session” ?


  • Moderator

    @plegrand

    mcast_client_count.png

    Then pxe boot the client and select the session name.


Log in to reply
 

375
Online

6.2k
Users

13.5k
Topics

127.5k
Posts