Issue with EFI through PFSense Firewall
-
@jonhwood360 Searching the web I found this topic: https://communities.vmware.com/t5/vSphere-Host-Client-Discussions/No-PXE-Boot-from-VM-with-EFI-Using-Microsoft-WDS/m-p/972385/highlight/true#M251
I just had some luck with this… On my vm guest, the default network adapter type was e1000e. I was regularly getting stuck at “Start PXE over IPv4”. I just removed the 1000e and setup a vmxnet network card. Great Success. I am uploading an image of my guest over PXE boot right now.
Be aware that there are many non-sense answers in that topic but this particular one does sound promising to me.
With the things you have tested so far (PXE boot in BIOS mode ok and UEFI PXE boot when hooked to the same network section) I would be fairly sure your setup is ok. It sounds like a PXE routing issue within vmWare to me. Though I am not sure.
-
@sebastian-roth I’ll give the vmxnet3 card a try today and report back. Thanks for the suggestion!
One alternative for the setup is to have the fogserver vm have a local adapter on the networks that it will deploy to, however I was hoping to have the firewall between endpoints and the fog server for security (once I tighten up the ruleset).
-
@sebastian-roth said in Issue with EFI through PFSense Firewall:
@jonhwood360 Searching the web I found this topic: https://communities.vmware.com/t5/vSphere-Host-Client-Discussions/No-PXE-Boot-from-VM-with-EFI-Using-Microsoft-WDS/m-p/972385/highlight/true#M251
I just had some luck with this… On my vm guest, the default network adapter type was e1000e. I was regularly getting stuck at “Start PXE over IPv4”. I just removed the 1000e and setup a vmxnet network card. Great Success. I am uploading an image of my guest over PXE boot right now.
Be aware that there are many non-sense answers in that topic but this particular one does sound promising to me.
With the things you have tested so far (PXE boot in BIOS mode ok and UEFI PXE boot when hooked to the same network section) I would be fairly sure your setup is ok. It sounds like a PXE routing issue within vmWare to me. Though I am not sure.
Okay so I tested the following:
replaced all virtual network card hardware types with vmxnet3 in environment - no change in status. Still can BIOS PXE, but not UEFI PXE through pfsense firewall.
tried hardware laptop (realtek network adapter) - could not EFI PXE boot. When allowed to boot to OS, gets DHCP exchange fine. PXE booting the laptop through the OP rom setting on laptop also works.
I’ll dig into the potential for routing issues in the mean time. I just find it weird that bios PXE boot works, but EFI doesn’t work through the pfSense router. its like it can’t access the TFTP for some reason, or something is getting mangled.
Since this is happening on hardware and VM, I don’t think its a VMware issue persay. I’ll dig into the vmware network settings and see if changing some of the security toggles changes something.
-
@jonhwood360 All of the broadcast stuff from your pcap looks good. What I don’t see is the tftp request from the client computer because that is using unicast messaging.
If you follow this guide and run tcpdump from the fog server we should be able to see the tftp pull request (hopefully). https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
So in your setup it appears you are using the FOG server for the dhcp server on the external network? This is not specifically necessary since pfsense can function as the dhcp server and supports dynamic pxe booting (uefi / bios). There is no need for the fog server to do this. Using the pfsense router also eliminates some complexity in the setup. You won’t need the dhcp-relay service either.
I do have to say I have seem some wonkiness with uefi booting in vmware. Where as sometimes it will not pxe boot on a warm restart but it will from a powered up state. I would repeat your testing with a physical machine from the external network just once to rule out vmware acting strange. But I have seen when it toggles between the EFI Network… and the uefi firmware either the boot file is not received or a bad boot loader is sent to the target computer.
-
@george1421 said in Issue with EFI through PFSense Firewall:
@jonhwood360 All of the broadcast stuff from your pcap looks good. What I don’t see is the tftp request from the client computer because that is using unicast messaging.
If you follow this guide and run tcpdump from the fog server we should be able to see the tftp pull request (hopefully). https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
So in your setup it appears you are using the FOG server for the dhcp server on the external network? This is not specifically necessary since pfsense can function as the dhcp server and supports dynamic pxe booting (uefi / bios). There is no need for the fog server to do this. Using the pfsense router also eliminates some complexity in the setup. You won’t need the dhcp-relay service either.
I do have to say I have seem some wonkiness with uefi booting in vmware. Where as sometimes it will not pxe boot on a warm restart but it will from a powered up state. I would repeat your testing with a physical machine from the external network just once to rule out vmware acting strange. But I have seen when it toggles between the EFI Network… and the uefi firmware either the boot file is not received or a bad boot loader is sent to the target computer.
Hi George,
I’ll grab that tcpdump shortly from the fog server.
The reason I am using the fog server for dhcp is: A.) keep all the deployment configuration in one place regardless of intermediary device, B.) Let the fog dhcp filters take effect during requests, and C.) integrate DNS with the DHCP so that automatic zone entries will work.
I have actually tried with a hardware device (laptop) as mentioned previously, with the same results as the VM.
So, let me get that tcpdump real quick… I’ll edit this post with it…
Here is the PCAP fogserver-BIOS-EFI-exchange.pcap
For the capture I first powered it on with BIOS set, which booted successfully. Then I powered off the machine, set it to EFI, and then powered it back on, and no boot.
-
@jonhwood360 Well this pcap was unremarkable, other than we can see that there is no tftp download of the ipxe.efi file.
I did find something interesting looking at the previous pcap. For a bios computer the lease time is 6 hours, for a uefi the lease time is 4 hours. Once might think that if you are using the same dhcp server the lease time should be the same.
Now looking at the two offer packets side by side the values are identical, but I did notice something that could cause the issue. If you look at the picture below. Only the bootp protocol fields have been populated in the offer packet. What is missing is the dhcp boot protocol using dhcp options 66 and 67. The problem is its up to the target computer firmware which fields it wants to look at bootp or dhcp. Most dhcp servers that support pxe booting will populate both parts. This is because the client could chose bootp or dhcp.
So is this dhcp server configured by FOG or did you hand configure this dhcp server? The FOG configuration “should” provide both boot protocols.
-
@george1421 said in Issue with EFI through PFSense Firewall:
@jonhwood360 Well this pcap was unremarkable, other than we can see that there is no tftp download of the ipxe.efi file.
I did find something interesting looking at the previous pcap. For a bios computer the lease time is 6 hours, for a uefi the lease time is 4 hours. Once might think that if you are using the same dhcp server the lease time should be the same.
Now looking at the two offer packets side by side the values are identical, but I did notice something that could cause the issue. If you look at the picture below. Only the bootp protocol fields have been populated in the offer packet. What is missing is the dhcp boot protocol using dhcp options 66 and 67. The problem is its up to the target computer firmware which fields it wants to look at bootp or dhcp. Most dhcp servers that support pxe booting will populate both parts. This is because the client could chose bootp or dhcp.
So is this dhcp server configured by FOG or did you hand configure this dhcp server? The FOG configuration “should” provide both boot protocols.
The DHCP server was configured by fog, and I duplicated the entry for the internal subnet, and changed the relevant fields for the external one (attaching the conf).
Also I took another pcap, just with the EFI boot, but from both FOG and pfsense(External). (attached).
fogserver-EFI-timed-exchange.pcap
pfsense-EFI-timed-exchange-EXTERNAL-interface.pcap
dhcpd.conf.txtAlso, in the original pcap from the fogserver, only transaction ID 0xe25c58da is the EFI PXE exchange. The options you pointed out are set for that transaction as:
The other two transactions are when the initial tftp happens from the BIOS boot rom, and then the loading of iPXE it looks like.
-
@jonhwood360 While I find this a bit confusing lets see if this solves the issue.
in the isc-dhcp server config file modify this section
option domain-name-servers 10.255.252.5; option domain-name "jukebox.local"; ddns-domainname "jukebox.local."; ddns-rev-domainname "in-addr.arpa."; next-server 10.255.252.5; class "Legacy" {
Between next-server and class insert this line
option tftp-server-name 10.255.252.5;
Then for every instance of “filename…”; just below that line enter
option bootfile-name "<boot_file>";
Where
<boot_file>
matches the boot file name issued by thefilename
command.Save that config and restart the dhcp server. If that fixes the pxe booting issue then go in and comment out the
filename
lines, restart the dhcp server and see if that still works for bios and uefi. You might want to grab a pcap to ensure the header (bootp) and dhcp options are still being set. I think thebootfile-name
command should set both. -
@george1421 said in Issue with EFI through PFSense Firewall:
Between next-server and class insert this line
option tftp-server-name 10.255.252.5;Then for every instance of “filename…”; just below that line enter
option bootfile-name “<boot_file>”;Where <boot_file> matches the boot file name issued by the filename command.
George,
Just wanted to take a moment to thank you for helping to troubleshoot this with me.
I did as you requested.
if the “option tftp…” line is enabled the dhcp server will fail to start. Isn’t it the case for isc-dhcp-server that the “next-server …” line is the equivalent?
if I disable the next-server line and just use “option tftp” it also fails to start:
if I leave the “option boot-file…” lines in and leave “next-server…” on and “option tftp…” off, the server starts
However there is no difference in boot either in a VM, or hardware. I did however snag this screenshot (sorry for the glare) from the laptop which might shed some light.
I then reverted my dhcpd.conf to without the changes you recommended and confirmed the same result, sans the square at the end of the filename. Still same PXE error.
So it appears that the dhcp exchange for EFI mode on the computers are happening, they are getting the right configuration data, but the data is getting mangled at some point.
Any thoughts?
-
@jonhwood360 OK I know this now. That unprintable character is what is causing the issue. I see that in some “university” dhcp servers. Let me look at your pcap again.
-
@george1421 said in Issue with EFI through PFSense Firewall:
@jonhwood360 OK I know this now. That unprintable character is what is causing the issue. I see that in some “university” dhcp servers. Let me look at your pcap again.
Just a note, once I reverted my config to my original without alterations, that character went away.
-
@jonhwood360 But I assume it still doesn’t boot.
So without the config file alterations you don’t get the screen about filesize 0 bytes, but with the alterations you get that error. Just thinking about it a bit more I think we are on the right direction, maybe just not the right path.
Did you get a workstation pcap of the process where you received the 0 bytes filesize? I’m interested to know if dhcp options 66 and 67 were being set, where it was actually kind of working. If they were there with those settings and the settings were the key for uefi dhcp booting then we just need to work on why the settings were wrong. I didn’t get a chance last night to test this in my home lab where I have a ubuntu server to understand why the settings stopped isc-dhcp from booting. But I’ll look at it tonight if we can’t get it sorted out. The commands I gave you were from the isc-dhcp config file.
-
@george1421 said in Issue with EFI through PFSense Firewall:
@jonhwood360 But I assume it still doesn’t boot.
So without the config file alterations you don’t get the screen about filesize 0 bytes, but with the alterations you get that error. Just thinking about it a bit more I think we are on the right direction, maybe just not the right path.
Did you get a workstation pcap of the process where you received the 0 bytes filesize? I’m interested to know if dhcp options 66 and 67 were being set, where it was actually kind of working. If they were there with those settings and the settings were the key for EFI DHCP booting then we just need to work on why the settings were wrong. I didn’t get a chance last night to test this in my home lab where I have a ubuntu server to understand why the settings stopped isc-DHCP from booting. But I’ll look at it tonight if we can’t get it sorted out. The commands I gave you were from the isc-DHCP config file.
No no, I get the same error, just without the unprintable character.
So, yes option 66 (next-server) and 67 (filename) are being set. If I set the Endpoint VM to BIOS, it works fine in the external network.
If I put the endpoint in the same network as the DHCP server, both EFI and BIOS works fine.
I’ll check to make sure the DHCP settings between internal subnet and external subnet are the same (sans the obvious IP changes)
Thanks again for helping!
-
so one thing I keep seeing in the packet capture which is weird, is that the router option (3) is set to 10.255.252.1 which is the gateway that the fog server uses, but is not the gateway for the external network. .
and I figured out why the tftp-server-name line you gave me didn’t work. It was expecting a dns name, not an IP, there is a tftp-server-address option, which I set and did work (config loaded). This did not change anything though.
-
@jonhwood360 Do you have the option to use a mirrored network port on the external network? This would give us the best fidelity to the network exchange. On a witness port we can only see the broadcast (dhcp) side of the conversation. We miss all of the unicast traffic. There has to be something different in the exchange between a bios and uefi modes on the same computer. The same exact protocols are used so the firewall shouldn’t be blocking. While this mostlikely is not the case here, we have seen a smaller MTU than 1468 cause tftp issues. Smaller MTUs than 1468 cause the tftp packet to fragment and the receiving computer will reject the file transfer. But one would think if that was the case bios and uefi would be the same.
-
@jonhwood360 said in Issue with EFI through PFSense Firewall:
so one thing I keep seeing in the packet capture which is weird, is that the router option (3) is set to 10.255.252.1 which is the gateway that the fog server uses, but is not the gateway for the external network. .
Well lets think about this for a minute. When I looked at the packet capture I did notice the lease times were different too.
Looking at the config file you provided both the lease times and route address are set correctly. Why are we seeing a difference in the packet captures from the configuration. Its almost like we have a different dhcp server for bios than uefi. Or two instances running with different config files on the FOG server.
-
@george1421 said in Issue with EFI through PFSense Firewall:
@jonhwood360 said in Issue with EFI through PFSense Firewall:
so one thing I keep seeing in the packet capture which is weird, is that the router option (3) is set to 10.255.252.1 which is the gateway that the fog server uses, but is not the gateway for the external network. .
Well lets think about this for a minute. When I looked at the packet capture I did notice the lease times were different too.
Looking at the config file you provided both the lease times and route address are set correctly. Why are we seeing a difference in the packet captures from the configuration. Its almost like we have a different dhcp server for bios than uefi. Or two instances running with different config files on the FOG server.
Yep. The only thing I can figure is that somehow either the DHCP Relay or Arp Proxy in Pfsense is doing something weird, or the DHCP server is not processing my second range definition in its entirety (which is weird because booting into an OS proper, the exchange happens normally and gets the right settings).
Although in the mean time I have had to move along in my research so I moved the DHCP/ Bind DNS to PFsense and have configured it to have the correct options. EFI boot from fog server is working in this new configuration. I am leaving the existing configuration in place on the fog server though to return to this issue (which I think has value to try and solve).