Issues with UEFI boot 1.5.9
-
I’m having issues with UEFI boot on 1.5.9. I am using FOG as a dhcp server on the vlan I created for imaging. I do have dhcp-relay setup on my Palo Alto firewall. Devices can legacy boot to the server, but not UEFI. I just captured an image using UEFI only a few hours ago, so it was working. Now that I tried to deploy that same image, machines can’t boot to the server. Please help, and thanks in advance!
-
@pamadmax What device is your dhcp server? You say the dhcp-relay service is setup on your PA firewall, but what is the dhcp server?
To successfully pxe boot a uefi system you need to send ipxe.efi or snponly.efi boot file name via dhcp option 67 and/or in the ethernet header as the field {boot-file}
-
@george1421 Thanks for the reply. I am running the isc-dhcp server that you can install via the installfog.sh script. It is running on the Fog Server. Configuration is:
# DHCP Server Configuration file\n#see /usr/share/doc/dhcp*/dhcpd.conf.sample # This file was created by FOG #Definition of PXE-specific options # Code 1: Multicast IP Address of bootfile # Code 2: UDP Port that client should monitor for MTFTP Responses # Code 3: UDP Port that MTFTP servers are using to listen for MTFTP requests # Code 4: Number of seconds a client must listen for activity before trying # to start a new MTFTP transfer # Code 5: Number of seconds a client must listen before trying to restart # a MTFTP transfer option space PXE; option PXE.mtftp-ip code 1 = ip-address; option PXE.mtftp-cport code 2 = unsigned integer 16; option PXE.mtftp-sport code 3 = unsigned integer 16; option PXE.mtftp-tmout code 4 = unsigned integer 8; option PXE.mtftp-delay code 5 = unsigned integer 8; option arch code 93 = unsigned integer 16; use-host-decl-names on; ddns-update-style interim; ignore client-updates; # Specify subnet of ether device you do NOT want service. # For systems with two or more ethernet devices. # subnet 136.165.0.0 netmask 255.255.0.0 {} subnet 10.250.0.0 netmask 255.255.255.0{ option subnet-mask 255.255.255.0; range dynamic-bootp 10.250.0.10 10.250.0.254; default-lease-time 21600; max-lease-time 43200; option routers 10.250.0.1; option domain-name-servers 10.207.0.15; next-server 10.250.0.15; class "Legacy" { match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00000"; filename "undionly.kkpxe"; } class "UEFI-32-2" { match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00002"; filename "i386-efi/ipxe.efi"; } class "UEFI-32-1" { match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00006"; filename "i386-efi/ipxe.efi"; } class "UEFI-64-1" { match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00007"; filename "ipxe.efi"; } class "UEFI-64-2" { match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00008"; filename "ipxe.efi"; } class "UEFI-64-3" { match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00009"; filename "ipxe.efi"; } class "SURFACE-PRO-4" { match if substring(option vendor-class-identifier, 0, 32) = "PXEClient:Arch:00007:UNDI:003016"; filename "ipxe.efi"; } class "Apple-Intel-Netboot" { match if substring(option vendor-class-identifier, 0, 14) = "AAPLBSDPC/i386"; option dhcp-parameter-request-list 1,3,17,43,60; if (option dhcp-message-type = 8) { option vendor-class-identifier "AAPLBSDPC"; if (substring(option vendor-encapsulated-options, 0, 3) = 01:01:01) { # BSDP List option vendor-encapsulated-options 01:01:01:04:02:80:00:07:04:81:00:05:2a:09:0D:81:00:05:2a:08:69:50:58:45:2d:46:4f:47; filename "ipxe.efi"; } } } }
-
@pamadmax said in Issues with UEFI boot 1.5.9:
I am using FOG as a dhcp server on the vlan I created for imaging. I do have dhcp-relay setup on my Palo Alto firewall.
Ok thank you for the clarity. I’m trying to build a logic truth table here and I see a discrepancy so I’m trying to understand the actors involved.
If your fog server and target computers are on the same dedicated “imaging” subnet then the dhcp-relay server is probably not wanted. When you have random pxe booting experiences that often indicates 2 dhcp servers are in the mix some how. The fog server running the isc dhcp server is configured to automatically hand out the right boot file name based on the pxe booting computer. So that can rule out a bios boot file being handed to a uefi computer.
To find out if we have 2 dhcp servers (actors) in the configuration the easiest way is to use a witness computer (third computer) on the same imaging subnet. Load wireshark on that witness computer and use the capture filter of exactly
port 67 or port 68
Start capturing with wireshark and then pxe boot the computer to the error then stop capturing with wireshark. In the payload area you will see.
1 DISCOVER this is the pxe booting computer saying hello world configure me.
2. OFFER this is the dhcp server replying to the DISCOVER packet. In your case you should only get a OFFER packet from your FOG server, if you get a second one then that is your problem.
3. REQUEST is the target computer asking for specific dhcp values.
4. ACK is the responsible dhcp server saying OK you have your IP address locked in.This is called the DORA process.
-
@george1421 I did already check the traffic using the monitor function of my Palo Alto. Everything looks good. Plus this just worked a few hours earlier. Thanks for the help. I know I probably have a very weird situation.
-
@pamadmax We really need to get a packet capture when things don’t work. Because if you have 2 dhcp servers, it depends on which one responds first to what the client does. This what we’ve seen where you have random pxe boot failures. If you don’t know how to read a pcap, upload it to a file share site (i.e. google drive, etc) as public read, and either share the link with me in FOG IM or post the link here and I will take a look at it. Once we review the pcap you should take it down from the file share site.
-
@george1421 I do not have 2 DHCP servers. I have a DHCP server, and a Relay that points at it. I did find the solution to my issue. The LAGG configuration between my 2 Juniper EX4600 switches got messed up. Only some traffic would pass, and not everything. After correcting the issue, I am all good now. Thank you all for the help.
Cheers!
-
@pamadmax Its great that you have it solved. Intermittent communications would also cause this pattern.
Just to clarify the point a bit about dual dhcp servers. If your fog server is on the same imaging network and its “the dhcp server” for your imaging network, then you don’t need a dhcp-relay service listening on this subnet. Actually you should clearly not have it listening on this subnet. You can have the dhcp-relay service on your router, that’s fine, but not listening on the imaging network’s vlan.
The issue is this, if the dhcp-relay service IS listening on the imaging subnet, the relay service will relay the dhcp request to your main server. With the FOG dhcp server also listening there is a chance that your pxe booting computer will get two OFFERS. One from your FOG server and one from the dhcp-relay service as a proxy for your primary dhcp server, hence the comment about 2 dhcp servers.
You have it worked out, so that is all that really matter here. Good going on finding the problem. I’m sure that misconfiguration on the LAG trunk was causing other strange issues on your network too.