Can't get UEFI to PXE boot
-
Hello,
I’m trying to switch over to PXE booting using UEFI since newer computers are shipping without legacy mode. I’ve changed the boot file to ipxe.efi and am able to PXE boot when the PC is on the same VLAN as the Fog server. However when I am on a different VLAN, the computer just sits at “Start PXE over IPv4”. When I turn legacy mode back on and switch the boot file back to undionly.kpxe, the PC is able to PXE from that other VLAN. Is there a different network configuration I need to do to get UEFI working?
Any help would be appreciated!
-
@jclumbo33 said in Can't get UEFI to PXE boot:
the computer just sits at PXE initializing.
Just for clarity is it PXE initializing or iPXE initializing? There is a difference here.
Also if you transport the computer to the same subnet as the FOG server, it work with no changes?
-
@george1421 Sorry, I was going off memory and mixed it up. It sits at “Start PXE over IPv4”. And yes, I have a port with the server VLAN on it and just plugging into there lets it work with no changes.
-
@jclumbo33 said in Can't get UEFI to PXE boot:
Start PXE over IPv4
OK this is then a networking issue and not specifically FOG. But lets start building a truth table to see where the problem isn’t:
- So what device serves dhcp addresses to this vlan?
- Have you checked and confirmed the scope for this subnet contains both dhcp options 66 and 67?
- You can pxe boot no problem in bios mode no problem on this vlan?
- Do you have a computer that you can load wireshark on that is on the same subnet as the computer that won’t pxe boot?
-
- It is a different scope on the same Windows Server 2019 DHCP server.
- Yes I just double checked that 66 and 67 were both set for the scope.
- Yes, if I enable legacy mode and change the bootfile back to undionly.kpxe, it works on this VLAN.
- Yes I can set up wireshark on another PC, though I’m not too experienced with it.
-
- OK good we know the mechanics of PXE booting is setup so we don’t need to focus there.
- Then the dhcp server should send the right values (we’ll double check in a minute)
- Again the mechanics are in place so we don’t need to check for screening routers or something else that might block pxe booting.
- Wireshark is were I think we will start to see what the target computer is being told.
Load wireshark on a witness computer (second computer on the same subnet as the pxe booting computer). PXE booting uses broadcast messages so we will be able to see what the target computer is being told. Use the capture filter of exactly this:
port 67 or port 68
Start the capture and then pxe boot the target computer until you get to the error and then stop recording.
Now you can either upload the pcap file to a fileshare site and post the link here or IM me (in the FOG Forum) the link and I will take a look at the file.
-OR-
You can (should) look at the pcap. We are looking for the DORA sequence (Discover, Offer, Request, Ack/Nack).The pxe booting computer will first send out a discover packet. That is basically its hello world I’m here, configure me message.
Now one or more DHCP servers that hear this discover packet will reply with an OFFER packet. This is the packet(s) we are interested in. This is an OFFER from the dhcp server to the target computer. It will contain the target computer’s IP address, pxe boot info and dhcp options.
If you look into this packet a bit more you will see the bootp header where it should have {next-server} and {boot-file} this should point to your FOG server and ipxe.efi. If it doesn’t then that is problem #1. Now look a bit more into the dhcp options supplied by the dhcp server, there should be dhcp option 66 and 67 in there. That should match the settings in {next-server} and {boot-file}
Now I’ve seen two or more OFFER responses where you have a primary and secondary dhcp servers, and one is misconfigured. You will get random pxe boots and then pxe boot failures depending on which dhcp server responds first.
Decoding pxe booting is a bit of a black art, but we can usually tell from the pcap what is going wrong.
Don’t worry the capture filter I provided will ONLY record pxe booting steps and no other network communication.
-
@george1421 said in Can't get UEFI to PXE boot:
- OK good we know the mechanics of PXE booting is setup so we don’t need to focus there.
- Then the dhcp server should send the right values (we’ll double check in a minute)
- Again the mechanics are in place so we don’t need to check for screening routers or something else that might block pxe booting.
- Wireshark is were I think we will start to see what the target computer is being told.
Load wireshark on a witness computer (second computer on the same subnet as the pxe booting computer). PXE booting uses broadcast messages so we will be able to see what the target computer is being told. Use the capture filter of exactly this:
port 67 or port 68
Start the capture and then pxe boot the target computer until you get to the error and then stop recording.
Now you can either upload the pcap file to a fileshare site and post the link here or IM me (in the FOG Forum) the link and I will take a look at the file.
-OR-
You can (should) look at the pcap. We are looking for the DORA sequence (Discover, Offer, Request, Ack/Nack).The pxe booting computer will first send out a discover packet. That is basically its hello world I’m here, configure me message.
Now one or more DHCP servers that hear this discover packet will reply with an OFFER packet. This is the packet(s) we are interested in. This is an OFFER from the dhcp server to the target computer. It will contain the target computer’s IP address, pxe boot info and dhcp options.
If you look into this packet a bit more you will see the bootp header where it should have {next-server} and {boot-file} this should point to your FOG server and ipxe.efi. If it doesn’t then that is problem #1. Now look a bit more into the dhcp options supplied by the dhcp server, there should be dhcp option 66 and 67 in there. That should match the settings in {next-server} and {boot-file}
Now I’ve seen two or more OFFER responses where you have a primary and secondary dhcp servers, and one is misconfigured. You will get random pxe boots and then pxe boot failures depending on which dhcp server responds first.
Decoding pxe booting is a bit of a black art, but we can usually tell from the pcap what is going wrong.
Don’t worry the capture filter I provided will ONLY record pxe booting steps and no other network communication.
Thanks for the info, really appreciate you walking through it all for me.
I ran the capture and only see the Discover packet, nothing else comes through. I captured the working BIOS pxe booting from the same PC/VLAN just in case it helps. I got Request packets on that.
Here’s the pcap files:
UEFI PXE: https://drive.google.com/file/d/1h8cCwnOEtkl8VN9nxg9k4rSZ_6rOZF5S/view?usp=sharing
Bios PXE: https://drive.google.com/file/d/1rf5d5lEISYxleMYuynOer26q6iSSTw0q/view?usp=sharing
-
@jclumbo33 Unfortunaly that pcap doesn’t help us because we are not seeing a response.
Looking at the bios one though we are seeing the discover and the request. Both are coming from the client. So we are seeing half of the conversation there.
That makes me think your router’s dhcp-helper service is using unicast messaging for the pxe boot responses. That’s not normal, but its OK. So for uefi (I might guess) since we are not seeing the request from the client computer (like we see for the bios pcap) that might mean that
- The client didn’t like what was in the OFFER
- Or the OFFER was never received.
Also looking at the bios the response should be Discover, Offer, Request, Ack. While we can only see the Discover, and Request. What we are seeing in the bios capture is Discover, Request, Discover, Discover, Request. It should not have issued that second and third discover on a healthy network. I’m not saying this is a problem, just its not typical.
We are kind of stuck debugging this right now. We really need wireshark on a mirrored port to the pxe booting computer if the dhcp-helper service is switching to unicast messages. A mirrored port is typically a function of an enterprise/business switch with a management interface. The mirrored function duplicate the data sending to pxe booting computer and the wireshark computer.
-
@george1421 said in Can't get UEFI to PXE boot:
@jclumbo33 Unfortunaly that pcap doesn’t help us because we are not seeing a response.
Looking at the bios one though we are seeing the discover and the request. Both are coming from the client. So we are seeing half of the conversation there.
That makes me think your router’s dhcp-helper service is using unicast messaging for the pxe boot responses. That’s not normal, but its OK. So for uefi (I might guess) since we are not seeing the request from the client computer (like we see for the bios pcap) that might mean that
- The client didn’t like what was in the OFFER
- Or the OFFER was never received.
Also looking at the bios the response should be Discover, Offer, Request, Ack. While we can only see the Discover, and Request. What we are seeing in the bios capture is Discover, Request, Discover, Discover, Request. It should not have issued that second and third discover on a healthy network. I’m not saying this is a problem, just its not typical.
We are kind of stuck debugging this right now. We really need wireshark on a mirrored port to the pxe booting computer if the dhcp-helper service is switching to unicast messages. A mirrored port is typically a function of an enterprise/business switch with a management interface. The mirrored function duplicate the data sending to pxe booting computer and the wireshark computer.
The guy who built the network definitely isn’t typical, so that doesn’t surprise me haha. I got the port mirrored and ran Wireshark again. I still don’t see Offer on either UEFI or BIOS. I ran these captures without the filter because I was curious what else was in there.
UEFI: https://drive.google.com/file/d/1iXFyqHJf4P4UcUEhCw5z8qaFBGDIyaY5/view?usp=sharing
BIOS: https://drive.google.com/file/d/1X29MFV2SMhYA2AIU5cg8_x0PRXhN0tmt/view?usp=sharing
I’m off for the day. I’ll be working on it again tomorrow. Thank you again for your help.
-
@jclumbo33 said in Can't get UEFI to PXE boot:
I got the port mirrored and ran Wireshark again.
Strange we don’t see the DHCP answers then. Which port is mirrored? The client port?
-
@sebastian-roth said in Can't get UEFI to PXE boot:
@jclumbo33 said in Can't get UEFI to PXE boot:
I got the port mirrored and ran Wireshark again.
Strange we don’t see the DHCP answers then. Which port is mirrored? The client port?
Yeah I thought that as well. The client port is mirrored. I can see all web traffic from the client PC when I browse around on it, so I am assuming the mirror is working correctly.
-
@jclumbo33 said in Can't get UEFI to PXE boot:
@sebastian-roth said in Can't get UEFI to PXE boot:
@jclumbo33 said in Can't get UEFI to PXE boot:
I got the port mirrored and ran Wireshark again.
Strange we don’t see the DHCP answers then. Which port is mirrored? The client port?
Yeah I thought that as well. The client port is mirrored. I can see all web traffic from the client PC when I browse around on it, so I am assuming the mirror is working correctly.
Just kidding, I missed half of the steps while mirroring. It wasn’t showing all the traffic. I got it working now. Thanks for your patience, I’m learning a bunch!
BIOS: https://drive.google.com/file/d/1Dx1GpviCMtQfj0M7f_HbNZ323lSuGeWR/view?usp=sharing
UEFI: https://drive.google.com/file/d/1Ej8eAfOORMMvhcXWe6uZXtcwvewqcFG6/view?usp=sharing
-
@jclumbo33 Well done, the new PCAPs have all the information we need to look into this!
Comparing the DHCP discovery packets between the two I see a few minor differences:
- Option 60: Vendor Class Identifier (see section 3 in RFC 3925)
- Option 93: Client System Architecture (see section 2.1 in RFC 4578)
- Option 94: Client Network Device Interface (see section 2.2 in RFC 4578)
- Option 57: Maximum DHCP Message Size (see section 9.10 in RFC 2132)
All those seem obvious to me and I would not expect any to cause this trouble.
Now let’s take a look at the actual DHCP sequence. Client sends DHCP discover and we see the DHCP offer in response in both cases. While the BIOS PCAP proceeds with the expected DHCP request packet the UEFI one does not!
That’s real interesting because I don’t see much of a difference in the Dhcp offer except bootfile name (as we want it) and DHCP transaction ID (expected and not a problem). I have compared the two DHCP offers and can’t see why this shouldn’t work.
Well there is one thing jumping at me. You seem to use non default subnetting masks. This is totally fine as long as your network admins know what they are doing and I expect this to be the case. BUT I can imagine the UEFI firmware to go mad at this (for no good reason really!) and stop the DHCP handshake at this point.
You said that PXE boot works in a different VLAN - which probably uses a different IP subnet and maybe also a different subnet mask?!
-
@sebastian-roth said in Can't get UEFI to PXE boot:
@jclumbo33 Well done, the new PCAPs have all the information we need to look into this!
Comparing the DHCP discovery packets between the two I see a few minor differences:
- Option 60: Vendor Class Identifier (see section 3 in RFC 3925)
- Option 93: Client System Architecture (see section 2.1 in RFC 4578)
- Option 94: Client Network Device Interface (see section 2.2 in RFC 4578)
- Option 57: Maximum DHCP Message Size (see section 9.10 in RFC 2132)
All those seem obvious to me and I would not expect any to cause this trouble.
Now let’s take a look at the actual DHCP sequence. Client sends DHCP discover and we see the DHCP offer in response in both cases. While the BIOS PCAP proceeds with the expected DHCP request packet the UEFI one does not!
That’s real interesting because I don’t see much of a difference in the Dhcp offer except bootfile name (as we want it) and DHCP transaction ID (expected and not a problem). I have compared the two DHCP offers and can’t see why this shouldn’t work.
Well there is one thing jumping at me. You seem to use non default subnetting masks. This is totally fine as long as your network admins know what they are doing and I expect this to be the case. BUT I can imagine the UEFI firmware to go mad at this (for no good reason really!) and stop the DHCP handshake at this point.
You said that PXE boot works in a different VLAN - which probably uses a different IP subnet and maybe also a different subnet mask?!
You are correct, the VLAN that it works on uses a /16 subnet. That’s interesting, so we could potentially be screwed without redoing our network then?
Here’s an idea of maybe a workaround? Our fog server is running in Hyper-V. If I add a NIC to the VM and put it on each VLAN we need, then point the DHCP options to the new IP address on the same VLAN, would that potentially act as a work around? And is that even possible to configure fog with multiple NICs like that? Just a thought.
-
@jclumbo33 said in Can't get UEFI to PXE boot:
You are correct, the VLAN that it works on uses a /16 subnet. That’s interesting, so we could potentially be screwed without redoing our network then?
I am not saying your network is at fault. I might be partly causing what you see but in fact it would be a firmware issue. Before you even think about redoing your network I suggest you setup a test VLAN where you can try out if non-default subnet masks are actually causing this problem.
Here’s an idea of maybe a workaround? Our fog server is running in Hyper-V. If I add a NIC to the VM and put it on each VLAN we need, then point the DHCP options to the new IP address on the same VLAN, would that potentially act as a work around? And is that even possible to configure fog with multiple NICs like that? Just a thought.
Not a good idea. FOG needs a specific network interface and IP to be able to properly work.