[Seeking Volunteers] Bench Testing! Our trip to the best results!
-
@Tom-Elliott said in [Seeking Volunteers] Bench Testing! Our trip to the best results!:
I’ll bet if you Unicast to 5 machines and Multicast to the same 5 machines, you’ll see a semi-evening out of the speed rates.
Only citing for reference: During my testing 3 unicast images would saturate a 1GbE link. So 5 would surely show the difference between multicasting and unicasting.
@Mokerhamer You also need to remember two things about multicasting.
- The speed of the multcast is controlled by the speed of the slowest computer in the multicast group. If you have a computer with a slow check in for “next block” that will impact the whole group.
- Multicasting is a different technology than unicasting. Muticasting relies on how efficient your network switches handle multicast packets and if you have igmp snooping enabled, and if you have the switches in pim mode sparse vs dense.
Please also understand that no one here is discouraging your testing. It is and will be helpful to future FOG Admin. Its an interesting topic that is why you have so much focus on your thread. Well done!!
-
You knocked it right on the head with the multicast details, took a few tries to get the all the details configured. We’re now thinking about setting up a 10GB network and do the exact same tests. just curious… what speed would we reach? especially with all the variables in play.
This is a pure trial and fail, find the limits. Fail uncountable times and still keep seeking for answers. We’re using something new with a very high compression ration and i find there is a limited information pool about it. So i am extra curious about pushing limits with this.
In my eyes these trial and fails can decide or break a future plan of our classroom hardware architecture.
-
@Mokerhamer said in [Seeking Volunteers] Bench Testing! Our trip to the best results!:
We’re now thinking about setting up a 10GB network and do the exact same tests.
What we have here is a 3 legged stool. On the one leg we have CPU+Memory, on the next leg we have the disk subsystem and on the final leg we have networking. Its always a challenge to see which way the stool will tip.
If you look a bit back in time at the target computers the disk subsystems were the bottleneck. They were in the range of 40-90MB/s. The CPU+memory has been fine from the speed side for many years as well as the networking had plenty of bandwidth.
Now look today we still have primarly a 1 GbE networking infrastructure to the desk, NVMe disks that can write upwards of 700MB/s, Fast and WIDE CPUs (multiple cores). Now the bottle neck is the network. It just can’t pump the bits down the wire to keep both the CPU and disk busy.
Moving to 10GbE will be interesting to see which leg will fail next. With 10GbE you will have a maximum throughput of 1.250 MB/s. On a clear network you “should” be able to saturate that disk subsystem again, assuming the CPU+memory can keep up with the network card and expand the data stream fast enough.
Make sure when you get it sorted out you share your changes on what you found so others can see the improvements you’ve made.
-
We’re having difficulties with the 10GBe network card on client.
We’ve Fully disabled onboard NIC on the system (Bios).
System boots PXE (TFTP/http)… but when it wants to mount FOG it suddenly said no DHCP on ENP12S0 nic. Like it’s expecting to receive DHCP on onboard nic. Dident expect that…
-
@Mokerhamer said in [Seeking Volunteers] Bench Testing! Our trip to the best results!:
but when it wants to mount FOG
Lets just be sure I understand correctly.
You can pxe boot into the fog iPXE menu. When you select something like full registration or pick imaging both bzImage and init.xz is transferred to the target computer. The target computer then starts FOS Linux, but during the boot of FOS, you get to a point where it can’t get an IP address or contact the fog server, it tries 3 times then gives up? Is that where its failing?
-
Yes!
-
@Mokerhamer Something happened with the picture upload. You need to wait until the image appears in the right edit panel before submitting your post.
OK it sounds like FOS Linux doesn’t have the driver for your network adapter.
Lets start out by having your schedule a debug deploy/capture to this target computer. When you schedule the task tick the debug checkbox before you press the schedule task button.
PXE boot the target computer, after several screens of text where you have to clear by pressing the enter key you should be dropped to the FOS Linux command prompt.
At the FOS Linux command prompt key in the following and post a the screen shots here.
ip link show
lspci -nn|grep -i net
Also what model of 10G adapter are you using?
-
@george1421 said in [Seeking Volunteers] Bench Testing! Our trip to the best results!:
When you schedule the task tick the debug checkbox before you press the schedule task button.
*Doing it now (debug).
*Nic X550T1BLK
https://www.kommago.nl/intel-x550-t1-10-gigabit-netwerk-adapter/pid=51799 -
@Mokerhamer If you have this nic in a running windows box. Will you get the hardware ID of it? OR from the FOS Linux run the lspci command as I’ve outlined below. I’ll look it up to see if linux supports that card.
The 10G stuff is new and may not be enabled in FOS Linux. Having the hardware ID will help (i.e. 8086:1AF2 made up number, but that is what I’m looking for)
-
-
@Mokerhamer That card driver should be included with FOS Linux it has been in the linux kernel since 4.7. I checked and its enabled in the FOS Linux build config: https://github.com/FOGProject/fos/blob/master/configs/kernelx64.config#L1447
From the FOS Linux command prompt key in
ip addr show
uname -a
and post the results
-
Okey.
-
@Mokerhamer Well this is a good one. It should be working.
At the fos linux command prompt key in
/sbin/udhcpc -i enp11s0 --now
then do an
ip addr show
-
@george1421
Done -
@Mokerhamer Ok at this point, time fixes your problem.
So if this was a 1GbE network I would say on the switch you are connected to has standard spanning tree enabled. Again if this was a 1GbE network I would recommend that you enable one of the fast spanning tree protocols like RSTP, MSTP, fast-STP. I don’t know if that translates to a 10GbE switch or not. I know on our hybrid switch 100/1000/10000 I have MSTP enabled because we have multiple stp zones.
-
Reboot? or a command to start cast? Checking out our switches meanwhile
-
@Mokerhamer You are probably better to cancel the task on the fog server then reboot. If you were unicasting and wanted to single step through deployment you would enter
fog
at the FOS linux command prompt. You may be able to do that with a multicast, but I never tried. -
Resolved. Spanning-tree was not enabled on the switch port that was directly attached to system.
spanning-tree portfast on the port solved it.What for 10GBe switches are you using? i might purchase the same ones ( we’re seeking for 10GBe switch to test deployment)
-
@Mokerhamer I wanted to clear up my first reply. I’m sorry, I didn’t want to seem so negative and I didn’t fully understand what this encompassed. I apologize if I came off as a meanie. I now see you have really good ideas/plans (and the equipment to back it up). Thank you for supporting the community through awareness and testing. I wish you the best for your trials and await the results!
-
@Mokerhamer said in [Seeking Volunteers] Bench Testing! Our trip to the best results!:
What for 10GBe switches are you using?
Its an older kit, Procurve 5412zl at our core switch and inside data center.
That’s great it was something simple like spanning tree. The issue is with standard spanning tree is that it doesn’t start forwarding data for 27 seconds once the link is established. Well during the pxe booting process the link “winks” 2 times. The first as iPXE takes over from the PXE rom, and the second time is when FOS Linux takes over from iPXE. FOS Linux boots so fast that by the time the port starts forwarding data FOS Linux has already given up.
Standard STP listens for a BPU then forwards. Fast-STP forwards first then listens for the BPU.