• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Network throughput crippled.

    Scheduled Pinned Locked Moved Solved
    Linux Problems
    3
    12
    3.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Wayne WorkmanW
      Wayne Workman
      last edited by Wayne Workman

      This I strongly believe is not a fog problem but is a Linux and or a VMware problem.

      At work we have a fog system that has about 15 servers in total. All of the storage nodes and the main server are all a member of the same storage group but I think this is irrelevant.

      We recently had a power outage that lasted for about 3 hours at our Administration Center and the VM platform went down. It’s configured to use a San, I believe there are two sans that are mirrored and 2 VM platforms that are also mirrored and a switch configured in whatever is standard for this particular setup. I didn’t build the platform so I’m unsure of any specific details about that.

      All of the storage nodes are operating just fine at about 1 gigabit per second. The main fog server which is hosted in VMware is averaging about 30 megabits per second to anywhere besides itself.
      I used iPerf to test throughput from the fog server to several other places and they all averaged 30 megabits per second.

      When I tested throughput using iPerf to 127.0.0.1 I get 34Gbps.

      Other VMS in the platform are operating normal, I tested those as well.

      CPU usage remains under 0.3, Disk Utility usage remains at about 3%, and memory usage is negligible.

      I used ethtool to verify that the adapter is configured at 1 gigabit per second.

      The main fog server other than throughput is otherwise operating perfectly normally.

      I have several ideas about how to recover from this if I can’t fix the throughput problem, but I’m reaching out to pick the heads of all you gurus here about what could I possibly do to test or fix the throughput issue.

      The fog server is CentOS_7.

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
      Daily Clean Installation Results:
      https://fogtesting.fogproject.us/
      FOG Reporting:
      https://fog-external-reporting-results.fogproject.us/

      1 Reply Last reply Reply Quote 0
      • george1421G
        george1421 Moderator
        last edited by

        After thinking about this for a bit I have a few questions and some comments.

        You have to remember there is a virtualization layer between the FOG host system and the physical world. So checking with ethertool on the vm client will only give you a false sense of what is going on because it tells you what is going on between the host system and the virtualization layer vSwitch. What you need to really find out is what is going on between the physical vm host server and the physical core switch. On your virtualization host, do you have a network LAG setup between the vm host server and your core switch? If you do ensure that all LAG elements (ports) are running at GbE speeds (from the core switch side).

        Are all 15 FOG storage nodes connected to the same core switch? If not what kind of throughput do you get to a linux server connected to the same core switch. What about a linux host on the same virtualization host? You need to start ruling out where the problem isn’t Is it the vm host-> same vm host, vm host-> other vm host on same core switch, vm host -> some vm server on the other side of your network?

        I assume your vm client disks (vmdk files) reside on the SAN. If so you also have to take that into account for overall system speed. 30MB/s is something I might expect from an old SATA disk. Have you tested with hdparm to see what your disk transfer rates are? You may have an issue on the SAN or SAN LAN side that is causing slow disk access not related to client network throughput. If your SAN LAN uses mpio for load balancing and redundancy, you may have one of the mpio branches off-line. But since you are using iperf to measure and it is reporting slow, its probably not the disk subsystem at fault here. But it would be good to check that out since a slow disk would create an overall low throughput to the target computer.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        1 Reply Last reply Reply Quote 1
        • george1421G
          george1421 Moderator
          last edited by

          said in Network throughput crippled.:

          When I tested throughput using iPerf to 127.0.0.1 I get 34Gbps.

          This statement is interesting since you are getting 34Gb/s, but that is all on host and doesn’t ever leave the vm client (fog server). I think if you have the resources, spin up another centos 7 (test) box on this same vm host server and see what your throughput is to that clean centos 7 server (you can destroy it after testing is done).

          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

          Wayne WorkmanW 1 Reply Last reply Reply Quote 1
          • Wayne WorkmanW
            Wayne Workman @george1421
            last edited by

            @george1421 Thanks. I’ll ask about all those things. I was going to do an iPerf test from the fog server to a windows server on the same platform, would this accomplish the same thing as spinning up another linux vm and testing between it and the fog server? Or are you looking to see how a fresh linux vm performs?

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
            Daily Clean Installation Results:
            https://fogtesting.fogproject.us/
            FOG Reporting:
            https://fog-external-reporting-results.fogproject.us/

            george1421G 1 Reply Last reply Reply Quote 0
            • george1421G
              george1421 Moderator @Wayne Workman
              last edited by

              @Wayne-Workman I would use an existing vm client if I had it. We have a centos 7 vm template so spinning up a new centos 7 vm take about 3 minutes not including boot time. The goal here is to see if it is the vm host or vm host to the network where you are taking the hit. Test 2 would be the fog server to another server on the same core switch to start testing devices near to far of the fog server to see if you can establish a pattern. Right now its not clear in my mind where the problem isn’t.

              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

              Wayne WorkmanW 1 Reply Last reply Reply Quote 1
              • Wayne WorkmanW
                Wayne Workman @george1421
                last edited by

                @george1421 I confirmed the unit of measurement is Mbits/sec

                Averages are now around 15Mbits/sec.
                0_1469103769430_upload-dbeff3e6-d7c2-4db8-8311-835260fd5886

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                Daily Clean Installation Results:
                https://fogtesting.fogproject.us/
                FOG Reporting:
                https://fog-external-reporting-results.fogproject.us/

                george1421G M 2 Replies Last reply Reply Quote 0
                • george1421G
                  george1421 Moderator @Wayne Workman
                  last edited by

                  @Wayne-Workman OK, wow that sucks. Now start testing your way from near systems to far systems to see if you can pinpoint where things go bad.

                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                  1 Reply Last reply Reply Quote 0
                  • M
                    Mentaloid @Wayne Workman
                    last edited by Mentaloid

                    @Wayne-Workman
                    Just as a reference, my fog server on ESXi 6 to another (OLD debian jessie, including old unoptimised net devices) VM on the same host

                    Server listening on TCP port 5001
                    TCP window size: 85.3 KByte (default)
                    ------------------------------------------------------------
                    ------------------------------------------------------------
                    Client connecting to 10.1.110.100, TCP port 5001
                    TCP window size: 1.83 MByte (default)
                    ------------------------------------------------------------
                    [  5] local 10.1.100.50 port 50312 connected with 10.1.110.100 port 5001
                    [ ID] Interval       Transfer     Bandwidth
                    [  5]  0.0-10.0 sec  7.35 GBytes  6.31 Gbits/sec
                    [  4] local 10.1.100.50 port 5001 connected with 10.1.110.100 port 40811
                    [  4]  0.0-10.0 sec  12.4 GBytes  10.6 Gbits/sec
                    

                    Only thing I can suggest is possibly removing the virt network device from the guest, boot, re-add the most current, and configure.

                    I honestly can’t see what a power outage would do other than possibly mess something up in the guest os VMX file. Do you have a backup to compare?

                    These are the relevant lines from my vmx

                    ethernet0.virtualDev = "vmxnet3"
                    ethernet0.networkName = "VM Network VLAN100"
                    ethernet0.addressType = "generated"
                    ethernet0.uptCompatibility = "TRUE"
                    ethernet0.present = "TRUE"
                    ethernet0.pciSlotNumber = "192"
                    ethernet0.generatedAddress = "00:0c:29:XX:YY:ZZ"
                    ethernet0.generatedAddressOffset = "0"
                    
                    Wayne WorkmanW 1 Reply Last reply Reply Quote 0
                    • Wayne WorkmanW
                      Wayne Workman @Mentaloid
                      last edited by Wayne Workman

                      @Mentaloid On Monday, after taking a new snapshot (just to have it), we’re going to apply an older snapshot and see if the problem is resolved or not.

                      The VMX stuff you posted, what file has that in Debian, or is that in ESXi?

                      Earlier today, we removed the E1000 adapter, and added an E1000E adapter and gave a reboot, the OS didn’t detect the adapter right, and there was no network connectivity. I could have just missed a step, I did generate a new UUID for the old interface name (it’s a red hat thing), but I am not entirely sure the new interface’s name would be the same as the old one. The old interface name was ens32. I’ll have to do more testing on this Monday.

                      Of course - any and all advice or questions are welcome. I need all the help I can get.

                      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                      Daily Clean Installation Results:
                      https://fogtesting.fogproject.us/
                      FOG Reporting:
                      https://fog-external-reporting-results.fogproject.us/

                      M 1 Reply Last reply Reply Quote 0
                      • M
                        Mentaloid @Wayne Workman
                        last edited by Mentaloid

                        @Wayne-Workman
                        .VMX is the ESXi config file for the guest OS - it normally would be stored with your virt HD on your SAN. If you can’t edit/view the file (plaintext) directly on your SAN, You can view it via the vsphere client/web interface. Edit the powered off VM, go to the options tab, and then advanced/general hit the button for configuration params. Be careful in there!

                        Centos I would imagine supports vmxnet adapters… if you have the option for support in your guest OS, they are more efficient/faster than e1000/e1000e emulation. I think this guide should help you get that running…

                        vmxnet3incentos7

                        Back to an earlier point though, you have confirmed that GuestOS/VM0 to another GuestOS/VM on the same VMHost is not working at correct speed?

                        If VM to VM is good (therefore the VMHost virt switch & virt adapters are working internally), I’d look at your LAG/Bond for your uplink. I’ve have ESXi puke and start dropping packets on a LAG before, and I’ve also had switches with good “links” and no frame errors, but not passing data in one direction after a power loss. This can be frustrating to troubleshoot in LAG/Bonded links. Pull out your all but one of the LAG wires (admin down from switch isn’t good enough as you physically have to break the link for ESXi to figure out that it shouldn’t use the port for data), and verify. If it’s good, pull it, and try another - keep going through and verify each port is functioning on it’s own. Of course you need to ensure your using a known good port on this switch for your test machine!

                        1 Reply Last reply Reply Quote 0
                        • george1421G
                          george1421 Moderator
                          last edited by

                          Something just struck me, did you have any pending vm host updates waiting on a reboot? With ESXi, it executes out of memory, its possible to update the system files for ESXi and for those updates to only be applied upon reboot. Not saying this is the case, but is possible to explain why after an entire system restart things are acting a bit strange.

                          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                          1 Reply Last reply Reply Quote 0
                          • Wayne WorkmanW
                            Wayne Workman
                            last edited by

                            So, we found out what it was.

                            A security camera contractor had damaged the fiber line that the VMWare platform used.

                            He fixed it Friday morning. I’m guessing he told our network team what happened sometime Friday. After that, everything was fine.

                            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                            Daily Clean Installation Results:
                            https://fogtesting.fogproject.us/
                            FOG Reporting:
                            https://fog-external-reporting-results.fogproject.us/

                            1 Reply Last reply Reply Quote 0
                            • 1 / 1
                            • First post
                              Last post

                            193

                            Online

                            12.0k

                            Users

                            17.3k

                            Topics

                            155.2k

                            Posts
                            Copyright © 2012-2024 FOG Project