Can't PXE boot properly once MTU is set to anything over 1500
-
Hello! I recently considered increasing the MTU size in my network to see if it would reduce imaging times.
As of right now my tests are being done in a hypervisor to see if it would be worth it.
My NIC port is set to 8972 MTU
the Bridge for the VM that hosts fog and the test VM is also set to 8972 MTUFor the OS running fog I set the MTU to the same value
using this commandsudo ifconfig ens18 mtu 8972
but when I try to PXE boot with all these settings in place
I am met with thisWhen I set the bridge MTU back to 1500
I am able to PXE boot properly and get to the fog menu with no issues.All of my PXE/fog related files are being hosted on 192.168.2.43
My routers MTU is also set to 8972so far this issue is only affecting my VMs, when I PXE boot my supermicro system it is able to get to the fog menu with no issues, but chances are its MTU is by default set to 1500.
-
@45lightrain Firstly, let me say I have no experience with increasing hte MTU with FOG or any benefits. Where its failing is in iPXE. For some reason iPXE can’t take the larger MTU. The second challenge is with the FOS Engine (bzImage), you will need to set the mtu in there too. The FOS Linux engine is where the rubber really meets the road with MTU changes.
You could create a USB boot drive and boot directly into FOS (that would reduce the number of other things you need to address, just to see if changing the MTU would help.
I think you will be opening a lot of nuance issues in your network by upping the MTU, regardless if it helps with imaging or not. I would think it should help but unsure to what degree.
You say you are doing this to help with imaging performance. What numbers are you seeing during the partclone part of imaging? On a well managed 1GbE network you should be seeing 5.7 to 6.1 GB/Min. If you have 10GbE in your network core, and the fog server running on a healthy VM host server, with 1GbE to the desktop you should see in the area of 12-15GB/min range.
What numbers are you seeing and under what conditions?
-
@george1421
ah that’s unfortunate.
Thank you for informing me of this limitation with iPXE.When writing /dev/sda I an seeing a range of 5.7GB - 6.1GB
But early in the process when its doing the other partitions its at ~16/GB I was hoping to have this rate be consistent with all partitions.
This is with all 1GB Ethernet connections.
Have there been tests done with raid arrays?
Would hosting this on a Raid 0 paired with a 10Gb connections on the nodes help decrease the imaging times by a sizeable amount? -
@45lightrain ok so lets start with some basics.
a 1GbE link (under theoretical conditions) is 1 Giga bits per second or 128MB/sec or 7.5GB/min raw data. Understand that there is ethernet overhead and you will never achieve 7.5GB/min.
So how is it possible to see speeds above 7.5GB/min on a 1GbE link? Simply data compression. So what you are seeing in the part clone screen is a composite speed including fog server speed sending the data to the network, network transfer time, the client receiving the image, expanding in it memory, and then writing it to disk.
If you are getting 5.5-6.1GB/min in partclone on a pure 1GbE network your fog environment is well designed and network well managed.
I wrote an article a few years ago that has some benchmark tools you can use to see where you can get additional speed out of your setup. https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast
So the executive summary is that if you want to go fast.
- Install at least (1) 10GbE network link. (If you have many computers running the fog clients, run (2) 10GbE links in lag configuration.
- If you have many clients hitting your fog server while you are trying to image use a ssd disk or nvme drives on your fog server (I would look at spending here the last, typically its not the disk that is slow, unless you have just a single spindle hdd driving your fog server).
- Try to get the 10GbE network as close to the target computers as you can.
- If you are trying to image multiple target computers at the same time look into fog casting your image to the target computers.
- When capturing your image use the zstd compression tool over gzip. Set zstd at compression 11 to start. If your target computers have a lot of horsepower, 16GB of ram, and fast nvme disks you can get more data through your network by compressing the data more. This will put a heavier load on the target computer expanding the image and writing it to disk.
Think if your imaging as 3 factor triangle. You have server speed to get the image to the network adapter, the speed at which your network can move the image to the target computer, and finally the time it takes for the target computer to intake the image, expand it in memory and write to disk. In the imaging process the fog server typically has the least impact on imaging of the three.