• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Multicast deploy terribly slow and huge re-xmits percentage

    Scheduled Pinned Locked Moved
    FOG Problems
    2
    7
    964
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      bmacadre
      last edited by

      Hi,

      I’m trying to deploy a linux image over about 100 workstations. For testing purpose I’ve tried with only one classroom (16 workstations).

      With these 16 workstations deploying in multicast was terribly slow (between 20 and 50 MB/min) and in the udpcast log I see a lot of timeout with a really high re-xmits percentage (about 230%) like this :

      Timeout notAnswered=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] notReady=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] nrAns=0 nrRead=0 nrPart=16 avg=270
      bytes= 79 416 064  re-xmits=0129589 (237.5%) slice=0112 -   8
      Timeout notAnswered=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] notReady=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] nrAns=0 nrRead=0 nrPart=16 avg=154
      bytes= 81 536 000  re-xmits=0132880 (237.2%) slice=0112 -   4
      Timeout notAnswered=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] notReady=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] nrAns=0 nrRead=0 nrPart=16 avg=249
      bytes= 90 015 744  re-xmits=0146857 (237.5%) slice=0112 -   1
      Timeout notAnswered=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] notReady=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] nrAns=0 nrRead=0 nrPart=16 avg=226
      bytes= 96 375 552  re-xmits=0157193 (237.4%) slice=0112 -  11
      Timeout notAnswered=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] notReady=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] nrAns=0 nrRead=0 nrPart=16 avg=265
      bytes=114 476 544  re-xmits=0185615 (236.0%) slice=0112 -   4
      

      If I deploy this image over only 2 workstations my speed grows to about 1 GB/min but I always get timeouts and a re-xmits percentage around 60% like this :

      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=936
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=906
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=1076
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=1030
      bytes=  9 438 906K re-xmits=4274638 ( 64.3%) slice=0112 -   0
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=1033
      bytes=  9 573 791K re-xmits=4335096 ( 64.3%) slice=0112 -   1
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=713
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=1077
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=821
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=921
      bytes=  9 578 887K re-xmits=4338016 ( 64.3%) slice=0112 -   0
      Timeout notAnswered=[0,1] notReady=[0,1] nrAns=0 nrRead=0 nrPart=2 avg=1098
      

      I’ve tested some tweaks like :

      $ sysctl -w net.core.rmem_max=16777216
      $ sysctl -w net.core.rmem_default=16777216
      

      But nothing improve the stability of multicasting… Is anybody have an idea ? I’m really stuck and I can’t deploy only 1 or 2 machines at a time.

      Thanks,
      Regards,
      Bruno

      1 Reply Last reply Reply Quote 0
      • B
        bmacadre
        last edited by

        Just for informational purpose, workstations are DELL Optiplex 7010 connected on a extreme network switch (100mb ports) and fog-server is hosted on a DELL poweredge R420 server connected on same stack (1 Gb port).

        Wayne WorkmanW 1 Reply Last reply Reply Quote 0
        • Wayne WorkmanW
          Wayne Workman @bmacadre
          last edited by

          @bmacadre said in Multicast deploy terribly slow and huge re-xmits percentage:

          connected on a extreme network switch (100mb ports)

          That’s your problem.

          The switch is important with multicasting.

          There is a certain amount of processing power involved with replicating a packet to all ports - and cheap switches just don’t cut it.

          There’s also maximum total throughput to consider. For example, at home I have a consumer grade Cisco Small business switch. It’s 1Gbps on each port and has 5 ports. But total internal throughput is 3Gbps. That means that I would never be able to multicast at home using that switch at 1Gbps speeds for more than 2 computers at a time. However I have a new 8 port 1Gbps z-link switch from China (for 28 bucks new) that has internal throughput of 5Gbps. Meaning that device would be able to multicast to 4 computers at once with 1Gbps speed to each.

          Again, cheap equipment just doesn’t cut it when it comes to multicast and really needing every port to operate at it’s maximum speed. The higher end Cisco equipment usually doesn’t have a problem though with this, they have the horsepower and typically have very high total internal throughput.

          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
          Daily Clean Installation Results:
          https://fogtesting.fogproject.us/
          FOG Reporting:
          https://fog-external-reporting-results.fogproject.us/

          B 1 Reply Last reply Reply Quote 0
          • B
            bmacadre @Wayne Workman
            last edited by bmacadre

            Thanks for replying me

            @wayne-workman said in Multicast deploy terribly slow and huge re-xmits percentage:

            There is a certain amount of processing power involved with replicating a packet to all ports - and cheap switches just don’t cut it.

            That’s the real problem, these switch aren’t cheap switch (they are really expensive when we bought them many years ago), they have an internal bandwitch of 48,8 Gb/s (for x250e) and 128 Gb/s (for x450e). So no problem on this side.

            After some research, and many many tests, I found a problem with workstations in 100Mb. I think it’s probably a bug (or a need of some tweaking) in workstation’s network driver (kernel 4.17). I explain :

            First of all : To avoid some congestion on switch, I set a max bitrate in storage configuration at 80mb.

            • First test : Workstations and server on a x250e (workstation on 100Mb ports and server on a 1Gb port). Result : Many packets are dropped (about 1 milion for a 10 GB image) and about 50% of re-xmits.

            • Second test : Workstations and server on a x450e all on 1Gb ports (auto-neg). Result : No drop at all and 0% of re-xmits.

            • Third test : Workstations and server on a x450e all on 1Gb ports but all workstation’s ports are fixed in 100Mb/full duplex. Result : Same as first test.

            Conclusion : Problem is not switches, they can easily manage this load. So I think there’s a problem on the client side… But I’ve no idea about that…

            Regards,
            Bruno

            Wayne WorkmanW 1 Reply Last reply Reply Quote 0
            • Wayne WorkmanW
              Wayne Workman @bmacadre
              last edited by

              @bmacadre I don’t understand where the 100Mb is coming from? You said in your second post:

              extreme network switch (100mb ports)

              Which makes me think the switch is a 100Mbps switch.

              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
              Daily Clean Installation Results:
              https://fogtesting.fogproject.us/
              FOG Reporting:
              https://fog-external-reporting-results.fogproject.us/

              1 Reply Last reply Reply Quote 0
              • B
                bmacadre
                last edited by

                The extreme network switch x250e-48 has 48 10/100Mb ports and 2 1Gb ports (Combo copper/fiber). And the x450e-48 has 50 10/100/1000Mb ports (two of them are combo copper/fiber).

                So in my first post (and my first try) workstations and server are connected to a x250e (workstations on a 100Mb port) and server on a 1Gb port.

                Sorry for my bad explanation (and my bad english). I hope it’s clearer now.

                Regards,
                Bruno

                Wayne WorkmanW 1 Reply Last reply Reply Quote 0
                • Wayne WorkmanW
                  Wayne Workman @bmacadre
                  last edited by

                  @bmacadre Try to put the FOG Server on one of the 100Mbps ports. This would obviously severely hinder unicast imaging, but if you’re mostly doing multicast then this might work better.

                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                  Daily Clean Installation Results:
                  https://fogtesting.fogproject.us/
                  FOG Reporting:
                  https://fog-external-reporting-results.fogproject.us/

                  1 Reply Last reply Reply Quote 0
                  • 1 / 1
                  • First post
                    Last post

                  156

                  Online

                  12.0k

                  Users

                  17.3k

                  Topics

                  155.2k

                  Posts
                  Copyright © 2012-2024 FOG Project