ipxe boot slow after changing to HTTPS
-
@Sebastian-Roth I haven’t tried different binaries yet. Wouldn’t I have to recompile them to use HTTPS? Did the -s switch during setup automatically compile all those efi binaries and place them into /tftproot?
-
Now that you’ve mentioned ipxe driver issue, it seems more likely. The delay is longer on my xencenter VMs vs VirtualBox VMs and physical PCs.
-
@brakcounty said in ipxe boot slow after changing to HTTPS:
Did the -s switch during setup automatically compile all those efi binaries and place them into /tftproot?
Yes.
-
Booting from snponly.efi doesn’t recognize the network adapter. I tried using Intel and ParaVirt in VirtualBox.
-
@brakcounty Try out different ones, like
intel.efi
for example. -
@brakcounty and @Sebastian-Roth
I recently did a fresh install of a fog dev server and did https and experienced similar slowness on the kernel loading.
I’ll give some of this testing a try and report back to see if this is maybe more common than we think. -
@Sebastian-Roth I tried intel.efi, still slow.
-
@brakcounty said in ipxe boot slow after changing to HTTPS:
The delay is longer on my xencenter VMs vs VirtualBox VMs and physical PCs.
Let’s go back to this information. Are physical PCs as fast as it used to be with plain HTTP?
I do use VirtualBox in my test setups and never saw it going slow on HTTPS.
-
@Sebastian-Roth physical PCs are still slower on HTTPS than HTTP. I was saying that the delay is exacerbated on VMs, especially slow (the slowest in fact) on XCP-NG guests. VirtualBox is better, physical is fastest. All three environments are still slower using HTTPS vs HTTP. I remember how instant HTTP was on any platform.
-
I just want to reiterate that when I say slow/fast, I’m referring to the time it takes to initiate a download (get) of a file via HTTPS. Once the download starts, then the speed is fine.
-
@brakcounty said:
I was saying that the delay is exacerbated on VMs, especially slow (the slowest in fact) on XCP-NG guests. VirtualBox is better, physical is fastest.
Although I am not sure this is important I would say we better keep that information afloat in the back of our minds.
Ran from a console, instant. Still working on getting an accurate pcap.
Ok, we need to get back to that point then.
- Please schedule a debug (capture or deploy) task for any machine you see this issue on. Start it up and hit ENTER twice to get to the shell.
Then runwget --no-check-certificate https://fogserverip/fog/service/ipxe/bzImage
and let us know if this is starting instantly or delayed. - In the FOG web UI go to FOG Configuration -> iPXE New Menu Entry and enter the following information:
Menu Item:fog.ipxeshell
Description:iPXE shell
Parameters:shell || goto MENU
Boot Options: leave empty
Default Item: unchecked
Hot Key Enabled: unchecked
Hot Key to use: leave empty
Menu Show with: Registered Hosts
Now boot up a machine/VM having the issue, select the iPXE shell and run commandkernel bzImage
and once again let us know if this is starting instantly or delayed.
Outcomes:
- If both those show the delay symptom we are surely talking about a very crude network issue that is only seen in FOS/iPXE but not when the OS is booted - very unlikely. But if that’s the case you need to look into packet capturing as suggested before!!
- If the first test is instant but the second one is delayed we seem to have an iPXE issue - on the one hand I have never seen this on my HTTPS setups but also this is the most likely outcome from my perspective.
- If the first one is delayed but the second one gets an instant response - kind of impossible - then I have no idea and we need to re-think the whole case.
- And finally, if both tests yield in an instant response I would be puzzled as well. Then we’d need to dig into the differences between manual test and the normal PXE booting sequence.
- Please schedule a debug (capture or deploy) task for any machine you see this issue on. Start it up and hit ENTER twice to get to the shell.
-
@Sebastian-Roth I pm’d you a pcap
Ran these tests on my hyper-v and xcp vms:
- In the FOG debug console (Both Hyper-V and XCP showed this result)
wget --no-check-certificate https://fogserverip/fog/service/ipxe/bzImage wget: not an http or ftp url: https://fogserverip/fog/service/ipxe/bzImage
- kernel bzImage took about 3-4 seconds on hyper-v, 10 seconds on xcp, then returned with
bzImage...ok
-
@brakcounty said in ipxe boot slow after changing to HTTPS:
wget: not an http or ftp url: https://fogserverip/fog/service/ipxe/bzImage
I have to admit that I have not tried it myself yet but I’d be pretty amazed if the wget binary we ship is not able to handle the HTTPS protocol. Anyhow, can you try
curl -v -k https://fogserverip/fog/service/ipxe/bzImage
instead?kernel bzImage took about 3-4 seconds on hyper-v, 10 seconds on xcp, then returned with
Is this slower or faster than you see when PXE booting into a task?
I pm’d you a pcap
The first TCP SYN send by the client to open the connection should be answered by a SYN,ACK by the server but in the PCAP we see a simple ACK which wireshark tells us is “ACKed unseen segment” - like a packet from a different connection (but on the same ports!). This is very unusual! Then the client re-sends the initial SYN packet and gets a proper SYN,ACK back, returns an ACK to properly finish the TCP three way handshake.
Beside this strange behavior I wonder where the delay would happen. The first 9-10 seconds take for the DHCP DORA. The TCP handshake starts at 9.88 and goes straight into the SSL session setup. Between “Server Key Exchange, Server Hello Done” and “Client Key Exchange” there is a 2.5 second delay (caused by the client waiting) which I don’t find normal. Though I can imagine this is due to crypto algorithm calculations. The rest of the TCP communication looks to be fast.
-
Ran the curl command, instant.
-
@Sebastian-Roth said in ipxe boot slow after changing to HTTPS:
- like a packet from a different connection (but on the same ports!)
This could be the NAT’d VM IP. I ran wireshark on the Default Hyper-V Switch adapter.
-
@Sebastian-Roth said in ipxe boot slow after changing to HTTPS:
If the first test is instant but the second one is delayed we seem to have an iPXE issue - on the one hand I have never seen this on my HTTPS setups but also this is the most likely outcome from my perspective.
So this is what we are at right now, right?? And you tested this on different machines, VMs as well as hardware.
I will try to replicate the issue. If I can’t we should schedule for a debug session together some time next week.
-
@Sebastian-Roth Definitely looks like it is isolated to ipxe.
@Sebastian-Roth said in ipxe boot slow after changing to HTTPS:
I have never seen this on my HTTPS setups
Out of curiosity, what NICs do you typically run ipxe on?
-
@brakcounty Didn’t find the time to test on my side yet. Will do in the next days and let you know.
-
@Sebastian-Roth Cool thanks, much apppreciated. By the way, this isn’t an operation-breaking critical issue, so take your time.
-
@brakcounty I still have not found enough time to do a test setup…
Out of curiosity, what NICs do you typically run ipxe on?
The default on Linux virtualbox: Intel PRO/1000 MT Desktop (82540EM)