Bonding multiple network cards for better throughput
-
I asked the question here in case Fog worked better or worst with a particular set-up. I’ve run Fog on Ubuntu for years but I’ve read somewhere that it runs better on Cent OS, so it was to find someone who could say with confidence this combination will work rather than me trying different things. I tried bonding in Ubuntu but all it did was to slow it down further. I tried all the different bonding types. Some failed to work at all.
-
@Zourous Depending on the bonding mode you select you will see some benefit with FOG. Bonding is helpful if you typically have multiple unicast streams running at a single time or a large campus where you have a couple hundred fog clients checking.
The best mode to use is LACP (802.3ad) mode. This requires both the host computer and network switches to be configured in lacp mode (static or dynamic). The other bonding modes (fail over mode) really don’t help you to aggregate the bandwidth among multiple links.
I did some FOG bench marking a while ago and found that I could saturate a 1GbE link with 3 unicast imaging streams. Over 3 unicast streams I was getting a lot of retransmits on the FOG server network link.
Before you go down the route of setting up bonding, are you having a specific issue with FOG imaging?
-
I’m just looking at maximum throughput, that is all, to speed up imaging overall when imaging many clients. Thanks
-
@Zourous You have not mentioned multicasting yet at all. Do you use this?
-
@Zourous said in Bonding multiple network cards for better throughput:
I’m just looking at maximum throughput, that is all, to speed up imaging overall when imaging many clients.
Don’t take this next statement as negative because its not intended that way. What you are basically saying is equivalent to “I want to drive my car fast”. There is no measurable value in there. Do you want to drive 100kph or 200kph? There is a big difference on how you get there.
Lets start with some basics.
- How many fog clients do you expect when FOG is fully installed on your campus?
- How many simultaneous unicast installs do you need to make?
- How many systems do you expect to image a week?
- When you image a system today, what does partclone list as your transfer rate? This measurement needs to be taken about 1-2 minutes after the image push starts. For a well designed network imaging to a modern computer with an ssd, you should have about 6GB/min transfer rate after 2 minutes.
- What is the expected size of your images (as saved on the fog server)? My golden image is about 25GB in size. Before we installed 10GbE in our network core I was having push times of about 5 minutes. Now our core network is on 10GbE, I’m seeing push times of just over 2 minutes to systems with nvme disks.
- While we are talking about networking here, your disk subsystem on your fog server also plays a role in simultaneous imaging. If your fog server is a physical machine and your fog storage is a single HDD, that system will have an overall impact once you start imaging more than one computer at a time. Ideally you would want to have your fog images stored on a raid disk array of HDD or a single/raid SSD array.
These are the type of data we need to understand going fast.
-
- About 1000
2 & 3. I’m at a school. Day to day only 1 at a time when needed. In the Summer when we reimage all our student PC’s, as many as it can handle in one go. I have had between 30-120 on the go at the same time but using the slots of 10. During this time we just wait, so to speed this up is my goal. - On one of our newest laptops it can get to around 9GB/min
- About 35-50 GB
- It’s an old PC with a single HDD
My theory was that if I could double the network bandwidth with bonding I could get some faster deploy times which would result in less waiting around. I tried many times with Ubuntu but haven’t had any success yet.
- About 1000
-
@Zourous ok here are a few recommendations.
- For 1000 hosts and trying to image at the same time, network bonding will help here. Also adjust your fog client check in time from 5 minutes (300) to 15 minutes (960). This will help with fog server performance overall. Just realize that when you deploy a snapin it may take up to 15 minutes before the client sees the request with the slower checkin times.
- 9GB/min is a great performance for a 1GbE network going to a nvme drive. Understand that network bonding will use only one link between any two hosts. Network bonding does not multiplex the communication overall channels in the link to aggregate bandwidth, but rather pick one link for all communications between 2 hosts based on the hashing algorithm used. This hashing algorithm doesn’t consider current utilization on the link to decide which link to pick the link. In its simplest form its hash is based on the source and destination mac addresses of the devices in the conversation. The hope is that this hashing combination will be random enough to use different bonding channels for different host communications.
- Image size is reasonable. If you were talking 100GB+ then we need to dig a bit deeper into the image requirements.
- I’m a bit surprised that you are getting 9GB/m with a fog server using a single HDD. Sustained transfer rates of those SATA HDD is about 70MB/s or about 4.2GB/min. 2 simultaneous unicast image deployments will flat kill that hard drive’s performance. The first thing I would consider here is (if using the same server is a requirement) moving your /image directory to an SSD drive. With an SSD you don’t have the seek times you would with a traditional (spinning) HDD. This alone will help with multiple simultaneous unicast images. The FOG server doesn’t need to be a very powerfull computer since its roles are 1. moving the image from disk to the network 2. managing the imaging process 3. responding to client check-ins. I’m not suggesting doing this in a production environment, but I have FOG running on a Raspberry Pi as a proof of concept. For single unicast imaging it does better than expected.
-
Why not use multicast?!?!!!
-
@george1421 another option is fog storage servers to open up more transmit slots at a time
-
Thanks for the info so far.
We don’t use the multi-casting as it hasn’t worked too well for us. In previous Fog versions (haven’t tried it recently) the clients wouldn’t start imaging, even though I have had it working before on specific versions. Secondly when it was working it was flooding the network and slowing general users down, so for this reason I stuck to unicast.
Is there a best OS in anyone’s opinion where the bonding will most likely work without issues. If you search on the net for Ubuntu and bonding there are various people that can’t get it working so I’m not sure if it’s a bug or just a mis-configuration thing. I’ve spend hours trying to get it working with no success.
-
@Zourous I’ve used network bonding on both ubuntu based as well as rhel based systems without issue.
Do you have a ubuntu based fog server now?
Is it a physical server?
Can you make changes to your network switch configuration?
What switch manufacturer do you use? -
@Zourous said in Bonding multiple network cards for better throughput:
We don’t use the multi-casting as it hasn’t worked too well for us. In previous Fog versions (haven’t tried it recently) the clients wouldn’t start imaging, even though I have had it working before on specific versions.
Multicast has not changed much over the years. Most probably it’s a switch/router/network configuration thing that prevents it from working properly.
Secondly when it was working it was flooding the network and slowing general users down, so for this reason I stuck to unicast.
I can see why you are headed this way because unicast just seems simpler and works in most cases whatever your network network is. If multicast is setup correctly there shouldn’t be a flood of packets as it sends way less data over your network than unicast does! For unicasting 20 clients you have 20 x (image size + NFS/TCP/IP/Ethernet header data) while multicasting the same 20 clients you have 1 x (image size + UDP/IP/Ethernet header data) x 1.01 to 2.0 factor of re-transmitted packets that were lost by one of the 20 clients => heaps less data the more clients you cast at the same time.
At my old working place we sometimes multicasted 5 labs with 120 PCs altogether in one big batch. Took 10 % more time than doing a single lab but we were done with all 5 labs in one go.
I know, multicast seems to be too complex but I think it really is not. Start out with 3-5 clients that are all connected to the same switch. See if you can get it to work. Your FOG server is probably connected to a different switch or maybe even in a different subnet and you might need to play with IGMP settings and so on. But as soon as you master multicast you’ll never need to worry about deploying your 1000 clients again. Should be possible to do within one or two days.
Just my 50 cents. To me unicast just seems to be the wrong “screwdriver” for your situation. Maybe I am wrong (not exactly knowing your network setup).
-
Yes, Yes, Maybe & Dell
-
@Zourous Sorry I should have asked this in the last round, does your fog server currently have multiple nic cards?
Dell switch, what model?