Odd performance issue

entr0py

Devs,

This summer I transitioned our FOG server from being a guest on a much older and slower Hyper-V platform to more modern hardware. Along the way we went from Ubuntu 16.04 to 22.04 and etc.

Now we’re in the thick of things getting ready to deploy hundreds of devices per day and I’m seeing very very strange behavior when it comes to image throughput. The first device will image at a full gig, the second will take an additional gig, etc. From there, it gets worse and eventually everything will basically freeze. None of the underlying network infrastructure has changed apart from the migration from 10 Gig to 40 Gig on the server itself, still into the same switch, still across the same 10 gig to my lab, etc.

The server itself has plenty of horsepower and I can get 40 gig between it and other devices on the 40G backbone. Even running multiple copies or transfers at the same time I don’t see the performance issues. The images are coming off SSD so bandwidth there shouldn’t be an issue either. Load averages are low, top shows nfsd taking 6-8% of the CPU per thread (16 cores are allocated to the FOG guest from a 128 core host). I’m perplexed as to why this odd behavior. Even if I just transfer a file from /images via nfs to my workstation I can pull 10G to my workstation (all I’ve got for NIC).

Please see the screenshot below and let me know what more you need for diagnostic info. Thanks!

https://imgur.com/a/zZk5ppA

george1421

@entr0py said in Odd performance issue:

We were hitting 7-8 Gbit last night with 8-9 devices imaging.

I would say this is fairly into the acceptable range on a well managed 10GbE network. Good job!!

george1421

@entr0py You didn’t happen to mention if the FOG server is now virtualized or physical.

In doing some benchmark testing back in the day, I was able to saturate a 1GbE link with 3 simultaneous unicast imaging. While you talk about 10 and 40 GbE, this point may not be relevant, but 3 is when things start falling down in your environment. Its not the solution but just one data point.

During imaging the FOG server doesn’t require much CPU. Its only function is to monitor the imaging process and move files from the storage subsystem to the network adapter. All of the heavy lifting during imaging is happening at the target computer. Heck I can run FOG on a raspberry pi 3 and image at almost a normal speed (one unicast image only). So I’m just saying, having a FOG server with gobs of RAM and 128 processors won’t really speed up the imaging process. It will help with multiple concurrent unicast imaging but it won’t make the process faster.

So there are two areas I would look into

Disk subsystem
Network performance.

I have this article from a long time ago that will give you commands to test your FOG server. https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast?_=1691688342623

If you put one of the target computers into debug mode you can startup the iperf3 (built into FOS Linux) as a server iper3 -s and then from the fog server run the performance tests lets see how fast your fog server can go.

entr0py

@george1421

It is still virtualized, it was moved from a Serve 2012 R2 Hyper-V host to a Server 2022 Hyper-V host. The prior storage subsystem was iSCSI across 10G to a shared storage server. The images themselves ran from a single SATA SSD.

I realize that it doesn’t take a workhorse to run FOG. The server before that was an old HP Microserver with a couple SSDs and a 10G nic. Even with 4 cores and 4 GB RAM that machine would saturate the 10G uplinks and push to clients at a full gig each.

I think the problem may have been found but I’ll keep testing and update. The server itself runs an Intel XL710 40GbE NIC and there is a driver function called Virtual Machine Queues and apparently some people have experienced issues with that. I’ll know in a few hours if things are better.

george1421

@entr0py OK lets see what you find. iperf3 will help you with bandwidth checking.

Yep on the sever thing for imaging if you have a fast path from disk to network that is all you need. The only thing that will put a heavy load is if you have computers running the fog client software. Depending on your check in interval that can put a heavy load on the fog server and make it a bit slow to respond in the web ui. But that’s different than imaging.

DBCountMan

I tested 10Gbe FOG imaging on my Hyper-V server (from FOG Server VM to Client PC VM, both connected at 10Gbe via internal switch) and imaging speed was exactly the same as 1Gbe. The Hyper-V host has all SSD storage too. After some research, it seems that partclone (clonezilla) does not scale well with faster network connections. What I haven’t tested yet was imaging several 1Gbe workstations on a 10Gbe backbone to a 10Gbe FOG server. Just don’t have the infrastructure for it.

Tom Elliott

@brakcounty I don’t think it’s a “scaling” issue persay.

One of the things that plays into the speed is generally NOT the networking itself.

The components that go into the speed is the Data -> CPU -> Memory -> HD Speed.

So while it’s possible your network is transferring up to 10Gbps, the delay is down to how quickly your overall system can get, decompress, process, and write back to the drive.

DBCountMan

@Tom-Elliott said in Odd performance issue:

get, decompress, process, and write back to the drive.

Right this is what I was getting at about partclone. I don’t think it is capable of performing those tasks at a rate that would saturate a 10Gbe channel. Assuming system components aren’t the bottleneck, software and how it is coded becomes the determining bottleneck factor.

Tom Elliott

@brakcounty

I think you’re mistaking things.

Networking is only a “component” of things. While we read as FIFO (First in First out) data is transferred in “chunks” that the rest of the system can process.

It’s not that it’s not possible, that’s not the issue. I don’t know how to explain it.

If we had a spot to “download the whole dataset” you would see the network network completely saturated.

When we’re deploying to a system, though, where are you supposed to place that image file?

DBCountMan

@Tom-Elliott Based on my tests, partclone hasn’t worked faster on a 10Gbe link vs a 1Gbe link. This is on a 64-core hyper-v host with 768MB RAM and 12TB SSD storage.

On that note, does FOG use pigz or just regular gzip compression?

entr0py

Update - found a couple issues lurking.

First, I think if you’re going to run 40G, and maybe even 10G to an extent, you need to play with the queuelen parameter on your ethernet interface. Raising that seemed to help things a bunch. I’d like to go to Jumbo Frames too but I’m not brave enough to make that leap yet.

Second, playing with the settings in the NIC on the host device made little difference, but playing with the ones on the actual Hyper-V switch did. Mainly some of the offloading and VLAN filtering settings.

Third, we weren’t ever going to hit more than 10G anyway considering there was a VLAN misconfiguration and some of the devices were actually routing back to the FOG server instead of being switched to there and the router only has 10G so yeah…

Anyway, long story short, We were hitting 7-8 Gbit last night with 8-9 devices imaging. This morning I’m running a steady 3+ running 4 devices at a time. If my weekend goes right I’ll end up deploying some labs that have 10 devices per row and each row has a 10G uplink a 10G uplink back to the core switch so we’ll see if it can scale beyond 10G. I should be able to hit it with a full 40G worth of request at once.

https://imgur.com/LWoeOQC

DBCountMan

@entr0py said in Odd performance issue:

Second, playing with the settings in the NIC on the host device made little difference, but playing with the ones on the actual Hyper-V switch did. Mainly some of the offloading and VLAN filtering settings

Curious, which Hyper-V settings did you play and found success with?

entr0py

@brakcounty

root@fogserver:~# ethtool eth0 | grep Speed
Speed: 40000Mb/s

Raising txqueuelen on the interface to 40K (seems like 1K was the default when 1 gig cards became kind of standard) got rid of the falling bandwidth issue that was present on the throughput graph yesterday. I’m wondering if it didn’t get a buffer overrun or some other kind of nonsense at the kernel level and that was why it would run great for a while, then once it threw enough errors or whatever it started falling on it’s face? IDK.

As for the Hyper-V settings, I was noticing odd CPU behavior on the Hyper-V host when I’d saturate the network from on the FOG guest. Setting RSS to NUMA scaling instead of closest processor static got rid of that behavior.

txqueuelen made the biggest different in getting it to a stable state, then the NIC settings increased the total throughput. But there were a ton of things played with too so I’m not exactly sure what all “tweaks” helped, but those two were the largest factors I noticed.

george1421

@entr0py said in Odd performance issue:

We were hitting 7-8 Gbit last night with 8-9 devices imaging.

I would say this is fairly into the acceptable range on a well managed 10GbE network. Good job!!

entr0py

TL;DR,

If you’re seeing performance issues scaling with high bandwidth FOG servers that should have the hardware horsepower to make it work, check your NIC settings, especially when virtualized. txqueuelen parameter in Linux can make a huge difference. If it’s virtualized, look into the settings on both the host and the guest.

You should be able to get >10Gbit from your FOG server if you have hardware that will support it. We hit ours with 20 devices this afternoon all pulling 47gb images and the whole shebang was over in 7 minutes.

Solved!

Odd performance issue

182

12.2k

17.3k

155.5k