100mb/s speed limit, please help
-
So we’ve been experiencing an issue where the maximum speed (both download and upload) while imaging is capped at 100mb/s when we should be able to get near 1000mb/s like we do with any other file transfer we do. All our infrastructure is sound and gigabit capable. After a little research we discovered that the netgear smart switches we were using beforehand gave several people speed issues and fluctuations so we changed those out for smaller gigabit cisco regular switches but that didn’t solve any issues. Is there some obvious bandwidth limit config that we’re missing or is it deeper? This only happens when trying to image with fog, any other ftp test transfers are very near gigabit speeds. Additionally, when imaging more than one computer at a time it will split up that 100mb/s semi-evenly among those computers.
-
Lets start with the basics
- What version of fog are you using? (1.2.0 stable or an trunk version (pre1.3.0)
- What OS is the FOG server running?
- Lets just make sure we know the unit of measure. You are telling me Mb/s [mega bits] and not MB/s [mega bytes] right? (the reason why I ask is 1000Mb/s is 125MB/s so if you are seeing 100MB/s then is reasonable)
- Is your FOG server a physical box or is it virtual?
- Have you confirmed that the FOG server is connected via a GbE link? (I would assume so because each host is seeing 100mb/s according to your post)
- If you are able to get multiple hosts to image at 100mb/s at the same time, then you don’t have a 100mb/s switch somewhere in the database.
To my knowledge there is no throttling for image creation, only for image replication to other storage nodes
-
I’d say do an iPerf test between the FOG server an a desktop connected to the same switch. See what happens.
Heres instructions on that: https://iperf.fr/
Let us know the results of the TCP speed test from client to server?
-
Hi,
Sorry for the lack of information. I will lay things out a bit better here for you.- We are using fog 1.2.0
- The OS is Debian 7.9 Wheezy
- We are stuck at 100 Megabits for imaging. Imaging is capped at around 680 MegaBytes per Minute both ways, which equates to about 90Mbps. Adding more clients to image cuts this number between them.
- The FOG server is a physical desktop machine.
- The FOG server is connected with a CAT 5e cable to a gigabit switch.
I tried the IPerf test before, and confirmed that I am capable of getting gigabit (1000Mbps) speeds accross the network that I am using. I have also connected to the share the images are hosted and copied them to another computer. I was able to do this at 75-100MBps, which would be most definitely be in the gigabit speed range.
The next test I tried was FTP. I was able to FTP an image at around 50-80MBps. It is a little bit slower, but still is completely reasonable, and much faster than imaging speeds.
All of the clients I am trying to image are connected to a gigabit switch with a Cat5e cable. The server is on the same gigabit switch.
-
There is a new setting called bitrate for storage nodes in trunk version of FOG that may be missing a zero (I do not mean replication bandwidth)
-
-
@Quazz said:
There is a new setting called bitrate for storage nodes in trunk version of FOG that may be missing a zero (I do not mean replication bandwidth)
First the simple answer. The bitrate is only for moving files/images between the master node and a storage node. Not between the fog server and the target computers.
-
@stowtwe Thank you for the details. I see that you are using the 1.2.0 stable version and not a trunk (pre1.3.0) version. Is that correct?
We may need to get one of the devs to verify, but image deployment (partclone) should use nfs to move files to the target computers. Your testing with iperf and ftp show that the computer is capable of transferring at wire speeds (I would expect 50MB-80MB would be your cap because of internal computer restrictions. The 125MB/s is a theoretical max for GbE network).
So I wonder if there are inefficiencies for NFS on your version of debian. (sorry rhel person here) Isn’t wheezy a bit old?
-
@george1421 said:
@Quazz said:
There is a new setting called bitrate for storage nodes in trunk version of FOG that may be missing a zero (I do not mean replication bandwidth)
First the simple answer. The bitrate is only for moving files/images between the master node and a storage node. Not between the fog server and the target computers.
Wait, then what exactly is the difference between the replication bandwidth setting (which controls at what speeds images are allowed to replicate to other storage nodes) and the bitrate setting?
-
@george1421 Yes, I am on 1.2.0 stable. Would it be worthwhile to upgrade to trunk?
I will look into NFS. currently the images are on a drive that is in the FOG Server Tower, and I have that set up as the master node. There are no other nodes. These are the settings.
The switch I am using is an unmanaged Cisco switch. It has no configuration at all. I do not have access to any of the managed switches on the network, so I am trying to work around them the best I can by not running stuff directly through them. I have proxy DHCP configured on the FOG server, among other things.
-
It’s probably your compression settings then.
-
@stowtwe I’m not sure the trunk version will help you with this one.
Looking at your max clients, I’m surprised that it is two. On the default install of 1.2.0 from yesterday this value is 10.
@Wayne-Workman in version 1.2.0 there isn’t a compression setting from what I can see in 1.2.0 stable. So what ever the default value is, is what it is.
-
@george1421 Yes there is, in FOG Config -> FOG Settings -> FOG Boot Settings -> FOG_PIGZ_COMP
-
@Wayne-Workman mea culpa, I stand corrected
-
OK on my clean build of 1.2.0 the compression settings are 9
On my production instance that is on 1.2.0 trunk 5070 the compression is 6
On my dev instance that is on 1.2.0 trunk 5676 the compression is 6 (must be default) because I know I haven’t tweaked this system.
Changing this now shouldn’t have an impact since the image that is captured already will be compressed at the higher value. Something I don’t know is where does the decompression happen, I’m suspecting the client computer. If that’s the case the slower (cpu wise) the computer is the slower the transfer rate will be.
-
Compression and decompression happens on the host. The default compression value was changed from 9 to 6 because 6 is faster (in probably 99% of cases).
If the OP can change his compression to 6 and then re-upload the image, he might find that his image deploys much faster afterwards.
@ch3i posted a script that will change the compression of an image without deploying/capturing again. It does it all on the server. It’s in the forums somewhere here. It needs to go in the wiki. Somebody hash tag it if they find it.
-
I have an image with compression at 0 or 1 (I can’t recall what it was for sure), and an image with compression at 3, but both have about the same speeds. I have also tried compression 9 just to try the other side of things, and unsurprisingly, things were much slower.
Recapturing the image is not a huge deal for me, and I am open to try anything right now.
-
@stowtwe Try with it set to 6.
-
@stowtwe said:
I have an image with compression at 0 or 1 (I can’t recall what it was for sure), and an image with compression at 3, but both have about the same speeds. I have also tried compression 9 just to try the other side of things, and unsurprisingly, things were much slower.
Just to be clear here, an image with compression of 0 or 1 gave the same transfer speeds as a 3 or 9? I just want to make sure we are not chasing the wrong assumption.
-
Is the speed “limiting” on all systems? Have you attempting imaging in other locations?
From the sounds of things, things are working exactly as intended. This means, if there’s a 10/100 SWITCH between any of the GIG parts and your imaging to a system on the other side of that “middleman” switch, it would be giving you the same type of thing.
Speed isn’t always related to decompression/compression though it does play into it. Uploads would be the most often AFFECTED issue with compression as the CLient machine is not only transferring the data, but compressing it before it goes up the drain.
I think we need to trace where the 10/100 is starting at. It could even be as simple as an improper punch down.