UNSOLVED FOG is very slow to server kernel and init image

  • Hello all,
    I am having incredibly slow network speeds when a host is trying to get the kernel and init.xz from the fog server (ie, when it needs to be imaged / captured, when I select “Host information”, etc).
    The two files, roughly 25MB, take a minute or two to transfer.
    I believe the cause of the low throughput is high latency - pings from the FOG server to the host take 5-15 seconds. TCP is very slow over high latency networks.
    So here’s what is making this confusing for me:

    • I have used this FOG server with multiple other hosts, virtual and baremetal without this issue
    • This very high latency only occurs when trying to grab those two files. For example, if I select “Host Information”, grabbing the kernel and init image takes ages, but once I am booted into the host information thing, I can ping the fog server with <.5s latency. Same thing if I am booted from disk. The latency of the host goes up 10x only when at the fog splash screen and trying to grab files from fog
    • The file transfer speeds up significantly if I unplug the other network interfaces. This is an HPE server, with add-in cards that give us a total of 14 network interfaces, 6 of which are utilized. If I unplug everything but the single interface used to PXE boot, the transfer speed goes way up (still not as fast as our 10Gb network would allow, but still fast).

    So its not an issue with the FOG server or the network. And somehow having other network interfaces with link makes it all go much slower.

    Thank you all for any help 🙂

  • Senior Developer

    @pberberian said in FOG is very slow to server kernel and init image:

    I am using an intel addin card that gives 4x10G SFP interfaces

    We need more details than that!

  • @sebastian-roth The host fails to boot at all with those files. It must chainload to ipxe first before it can boot (because of HPE’s crap pxe implementation).
    I am using an intel addin card that gives 4x10G SFP interfaces

  • Senior Developer

    @pberberian Give undionly.kpxe and undionly.kkpxe a try. That definitely uses the more general UNDI stack instead of the possibly buggy native NIC driver.

    As well you might tell us which NIC exactly is used in that machine as a starter.

  • @tom-elliott
    I did some more digging.
    I can access the iPXE command line by pressing escape at the FOG boot menu. Here’s what I did:

    • ping <fog_server_ip> --> Works fine, perfectly fast
    • imgfetch http://<fog_server>/fog/service/ipxe/init.xz --> Very very slow, manually cancelled
    • ping <fog_server_ip> --> times out. If I try to ping from the fog server, we have the ~5s latency again

    So this definitely seems to be an issue with the networking stack of the iPXE image. It just runs incredibly slow once you try to start a file download, even if the download is cancelled. Any idea how to fix this?

  • @tom-elliott I actually go through 3 switches, with STP configured properly and everything.
    I know the network topology is good because everything works perfectly when the host is booted in linux

  • @pberberian Are you connected to a switch, or a hub? This difference is significant.

  • @tom-elliott I am running FOG 1.4.4

    The hp server grabs ipxe.pxe from the FOG server and boots from there

  • @pberberian
    After more investigation, the network latency seems to be directly proportional to how many network interfaces are connected. With all six, there are ~5 seconds of latency. With half of them plugged in, it is closer to three seconds. With only the one plugged in, it is still at 800ms, which is faster for sure but still much slower than the .1ms I get when booted in Linux.

    Why does the number of connected interfaces affect the latency?

  • What version of fog are you running?

    What BOOT file are you using? This sounds like you’re using pxelinux.0 still as that limited the network transfer to 10mbps (which would also cause high latency).