Network throughput crippled.

george1421

When I tested throughput using iPerf to 127.0.0.1 I get 34Gbps.

This statement is interesting since you are getting 34Gb/s, but that is all on host and doesn’t ever leave the vm client (fog server). I think if you have the resources, spin up another centos 7 (test) box on this same vm host server and see what your throughput is to that clean centos 7 server (you can destroy it after testing is done).

Wayne Workman

@george1421 Thanks. I’ll ask about all those things. I was going to do an iPerf test from the fog server to a windows server on the same platform, would this accomplish the same thing as spinning up another linux vm and testing between it and the fog server? Or are you looking to see how a fresh linux vm performs?

george1421

@Wayne-Workman I would use an existing vm client if I had it. We have a centos 7 vm template so spinning up a new centos 7 vm take about 3 minutes not including boot time. The goal here is to see if it is the vm host or vm host to the network where you are taking the hit. Test 2 would be the fog server to another server on the same core switch to start testing devices near to far of the fog server to see if you can establish a pattern. Right now its not clear in my mind where the problem isn’t.

Wayne Workman

@george1421 I confirmed the unit of measurement is Mbits/sec

Averages are now around 15Mbits/sec.
0_1469103769430_upload-dbeff3e6-d7c2-4db8-8311-835260fd5886

george1421

@Wayne-Workman OK, wow that sucks. Now start testing your way from near systems to far systems to see if you can pinpoint where things go bad.

Mentaloid

@Wayne-Workman
Just as a reference, my fog server on ESXi 6 to another (OLD debian jessie, including old unoptimised net devices) VM on the same host

Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 10.1.110.100, TCP port 5001
TCP window size: 1.83 MByte (default)
------------------------------------------------------------
[  5] local 10.1.100.50 port 50312 connected with 10.1.110.100 port 5001
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec  7.35 GBytes  6.31 Gbits/sec
[  4] local 10.1.100.50 port 5001 connected with 10.1.110.100 port 40811
[  4]  0.0-10.0 sec  12.4 GBytes  10.6 Gbits/sec

Only thing I can suggest is possibly removing the virt network device from the guest, boot, re-add the most current, and configure.

I honestly can’t see what a power outage would do other than possibly mess something up in the guest os VMX file. Do you have a backup to compare?

These are the relevant lines from my vmx

ethernet0.virtualDev = "vmxnet3"
ethernet0.networkName = "VM Network VLAN100"
ethernet0.addressType = "generated"
ethernet0.uptCompatibility = "TRUE"
ethernet0.present = "TRUE"
ethernet0.pciSlotNumber = "192"
ethernet0.generatedAddress = "00:0c:29:XX:YY:ZZ"
ethernet0.generatedAddressOffset = "0"

Wayne Workman

@Mentaloid On Monday, after taking a new snapshot (just to have it), we’re going to apply an older snapshot and see if the problem is resolved or not.

The VMX stuff you posted, what file has that in Debian, or is that in ESXi?

Earlier today, we removed the E1000 adapter, and added an E1000E adapter and gave a reboot, the OS didn’t detect the adapter right, and there was no network connectivity. I could have just missed a step, I did generate a new UUID for the old interface name (it’s a red hat thing), but I am not entirely sure the new interface’s name would be the same as the old one. The old interface name was ens32. I’ll have to do more testing on this Monday.

Of course - any and all advice or questions are welcome. I need all the help I can get.

Mentaloid

@Wayne-Workman
.VMX is the ESXi config file for the guest OS - it normally would be stored with your virt HD on your SAN. If you can’t edit/view the file (plaintext) directly on your SAN, You can view it via the vsphere client/web interface. Edit the powered off VM, go to the options tab, and then advanced/general hit the button for configuration params. Be careful in there!

Centos I would imagine supports vmxnet adapters… if you have the option for support in your guest OS, they are more efficient/faster than e1000/e1000e emulation. I think this guide should help you get that running…

vmxnet3incentos7

Back to an earlier point though, you have confirmed that GuestOS/VM0 to another GuestOS/VM on the same VMHost is not working at correct speed?

If VM to VM is good (therefore the VMHost virt switch & virt adapters are working internally), I’d look at your LAG/Bond for your uplink. I’ve have ESXi puke and start dropping packets on a LAG before, and I’ve also had switches with good “links” and no frame errors, but not passing data in one direction after a power loss. This can be frustrating to troubleshoot in LAG/Bonded links. Pull out your all but one of the LAG wires (admin down from switch isn’t good enough as you physically have to break the link for ESXi to figure out that it shouldn’t use the port for data), and verify. If it’s good, pull it, and try another - keep going through and verify each port is functioning on it’s own. Of course you need to ensure your using a known good port on this switch for your test machine!

george1421

Something just struck me, did you have any pending vm host updates waiting on a reboot? With ESXi, it executes out of memory, its possible to update the system files for ESXi and for those updates to only be applied upon reboot. Not saying this is the case, but is possible to explain why after an entire system restart things are acting a bit strange.

Wayne Workman

So, we found out what it was.

A security camera contractor had damaged the fiber line that the VMWare platform used.

He fixed it Friday morning. I’m guessing he told our network team what happened sometime Friday. After that, everything was fine.

Network throughput crippled.

158

12.4k

17.4k

155.9k