180 (or 190) seconds wait for iPXE DHCP configuration
-
Hey.
First of all, let me preface this, that I’m not even sure if this is a FOG, Proxmox, Windows DHCP or iPXE problem.
As title says, I have to wait for ~300 seconds for iPXE ‘configuring (net0 …)’ to get Proxmox VM to send DHCP REQUEST packet.
Before that there will be DISCOVER and OFFER packets sent/received every second. (I checked on vmbr0 interface as well as VM’s tap)This only seems to happen when I select VirtIO (or VMware vxnet3) network device for VM. It works fine with Intel E1000 or Realtek RTL8139.
I added netboot.xyz in FOG menu to see how long does it take to boot. It takes 310 seconds.
There’s more. This is all if I boot in UEFI mode (OVMF). If I select SeaBIOS, then I have to wait ~60 seconds for PXE ROM boot DHCP (which is pretty much instant in UEFI), but iPXE DHCP takes only few seconds…
As far as boot files go I tried pretty much everything (ipxe.efi, snp.efi, snponly.efi for UEFI, undionly.kpxe, ipxe.pxe for BIOS) without seeing any difference… or maybe I just don’t remember all the combinations anymore.
STP is off on Proxmox bridge, but there is RSTP on Juniper switches where Windows DHCP is. (On DHCP I have configured policies for UEFI/BIOS boot file)
I observe similar behavior on some PCs, but haven’t really looked much into it yet (they are connected to small home switches, which are eventually connected to Juniper switch)
Any pointer in the right direction will be appreciated.
-
@zbe I assume you have wireshark setup on a witness computer because you make reference to DISCOVER and OFFER. Is that witness computer on your lan or your proxmox?
Is your windows dhcp server on proxmox or is it external/physical?
Typically when you have a DISCOVER, OFFER, with no REQUEST is the client computer is not getting what it asked for in the DISCOVER so it will request again hoping someone will give them an OFFER with the parameters it needs.
When you pxe boot a device the very first DORA process is the uefi firmware asking for dhcp configuration. This is even before ipxe is in the picture. So the boot loader ipxe.efi or snp.efi for uefi and undionly.kpxe for bios computers does not matter.
So if you try to remove the proxmox pxe boot client from the picture for a moment, can you pxe boot a physical computer? Do you see this same discover, offer sequence with a physical computer?
I can’t tell from your post if this is for a home lab or business. My only question around this is if by chance you have 2 dhcp servers on your network? I’ve seen 2 windows servers configured (master/slave) setup where the slave dhcp is not configured correctly and it will give pxe booting random work or not depending on which dhcp server responds to the request.
Right now there is a lot of unknows so lets figure out where the problem is not first.
-
@zbe I assume you have wireshark setup on a witness computer because you make reference to DISCOVER and OFFER. Is that witness computer on your lan or your proxmox?
I only used tcpdump on proxmox host so far and an “empty” VM for testing this.
Is your windows dhcp server on proxmox or is it external/physical?
It’s external - it’s a VM (DC) on a hyper-v server.
Typically when you have a DISCOVER, OFFER, with no REQUEST is the client computer is not getting what it asked for in the DISCOVER so it will request again hoping someone will give them an OFFER with the parameters it needs.
That was my reasoning as well, but looking at tcpdump packets I don’t see any difference between first ~300 OFFERs and the last, successful one.
I admit I only looked at terminal output of:
tcpdump -i vmbr0 port 67 or port 68 -e -n -vv
so I don’t see byte-to-byte comparison.
When you pxe boot a device the very first DORA process is the uefi firmware asking for dhcp configuration. This is even before ipxe is in the picture. So the boot loader ipxe.efi or snp.efi for uefi and undionly.kpxe for bios computers does not matter.
Well, the long 300s wait happens on second DORA, after boot file is downloaded.
I’ll try putting a screenshot at the end with copy/paste of some tcpdump output.
So if you try to remove the proxmox pxe boot client from the picture for a moment, can you pxe boot a physical computer? Do you see this same discover, offer sequence with a physical computer?
I see same behavior on some physical PCs, but I haven’t actually done any packet capture there yet.
Capture/deployment on some of these PCs and this waiting time is what actually annoyed me to look into this more.
But I attributed it to perhaps some of these PCs being connected to some home-tplink-style 5-port switches (lack of cabling :P) and STP and whatnot.I can’t tell from your post if this is for a home lab or business.
It’s a school. (So can’t do any capturing of physical PCs right now.)
My only question around this is if by chance you have 2 dhcp servers on your network? I’ve seen 2 windows servers configured (master/slave) setup where the slave dhcp is not configured correctly and it will give pxe booting random work or not depending on which dhcp server responds to the request.
No, it’s a single DHCP as far as this network is concerned. (There is another one on different VLAN, but proxmox, hyper-v and any other end-devices aren’t VLAN aware. They are all connected to access ports.)
Right now there is a lot of unknows so lets figure out where the problem is not first.
Yes, indeed. That’s why I said I’m not even sure it’s a FOG issue.
Screenshot from proxmox vm boot:
A)
root@pve:~# tcpdump -i vmbr0 port 67 or port 68 -e -n -vv tcpdump: listening on vmbr0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 16:19:17.181436 82:84:ed:04:a2:59 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 389: (tos 0x0, ttl 64, id 44425, offset 0, flags [none], proto UDP (17), length 375) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 82:84:ed:04:a2:59, length 347, xid 0x1aa0fbf6, Flags [Broadcast] (0x8000) Client-Ethernet-Address 82:84:ed:04:a2:59 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Discover MSZ (57), length 2: 1472 Parameter-Request (55), length 35: Subnet-Mask (1), Time-Zone (2), Default-Gateway (3), Time-Server (4) IEN-Name-Server (5), Domain-Name-Server (6), Hostname (12), BS (13) Domain-Name (15), RP (17), EP (18), RSZ (22) TTL (23), BR (28), YD (40), YS (41) NTP (42), Vendor-Option (43), Requested-IP (50), Lease-Time (51) Server-ID (54), RN (58), RB (59), Vendor-Class (60) TFTP (66), BF (67), GUID (97), Unknown (128) Unknown (129), Unknown (130), Unknown (131), Unknown (132) Unknown (133), Unknown (134), Unknown (135) GUID (97), length 17: 0.53.246.81.35.77.101.93.69.156.163.70.76.167.27.65.180 NDI (94), length 3: 1.3.1 ARCH (93), length 2: 7 Vendor-Class (60), length 32: "PXEClient:Arch:00007:UNDI:003001" 16:19:17.182898 00:15:5d:01:b8:03 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 368: (tos 0x0, ttl 128, id 15368, offset 0, flags [none], proto UDP (17), length 354) x.x.x.5.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 326, xid 0x1aa0fbf6, Flags [none] (0x0000) Your-IP x.x.x.11 Server-IP x.x.x.9 Client-Ethernet-Address 82:84:ed:04:a2:59 file "ipxe.efi" Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Offer Subnet-Mask (1), length 4: 255.255.255.128 RN (58), length 4: 21600 RB (59), length 4: 37800 Lease-Time (51), length 4: 43200 Server-ID (54), length 4: x.x.x.5 Default-Gateway (3), length 4: x.x.x.1 Domain-Name-Server (6), length 4: x.x.x.5 Domain-Name (15), length 13: "edus.lokalno^@" TFTP (66), length 12: "x.x.x.9^@" BF (67), length 9: "ipxe.efi^@" 16:19:21.135659 82:84:ed:04:a2:59 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 401: (tos 0x0, ttl 64, id 44426, offset 0, flags [none], proto UDP (17), length 387) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 82:84:ed:04:a2:59, length 359, xid 0x1aa0fbf6, Flags [Broadcast] (0x8000) Client-Ethernet-Address 82:84:ed:04:a2:59 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Request Server-ID (54), length 4: x.x.x.5 Requested-IP (50), length 4: x.x.x.11 MSZ (57), length 2: 65280 Parameter-Request (55), length 35: Subnet-Mask (1), Time-Zone (2), Default-Gateway (3), Time-Server (4) IEN-Name-Server (5), Domain-Name-Server (6), Hostname (12), BS (13) Domain-Name (15), RP (17), EP (18), RSZ (22) TTL (23), BR (28), YD (40), YS (41) NTP (42), Vendor-Option (43), Requested-IP (50), Lease-Time (51) Server-ID (54), RN (58), RB (59), Vendor-Class (60) TFTP (66), BF (67), GUID (97), Unknown (128) Unknown (129), Unknown (130), Unknown (131), Unknown (132) Unknown (133), Unknown (134), Unknown (135) GUID (97), length 17: 0.53.246.81.35.77.101.93.69.156.163.70.76.167.27.65.180 NDI (94), length 3: 1.3.1 ARCH (93), length 2: 7 Vendor-Class (60), length 32: "PXEClient:Arch:00007:UNDI:003001" 16:19:21.137080 00:15:5d:01:b8:03 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 368: (tos 0x0, ttl 128, id 15369, offset 0, flags [none], proto UDP (17), length 354) x.x.x.5.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 326, xid 0x1aa0fbf6, Flags [none] (0x0000) Your-IP x.x.x.11 Server-IP x.x.x.9 Client-Ethernet-Address 82:84:ed:04:a2:59 file "ipxe.efi" Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: ACK RN (58), length 4: 21600 RB (59), length 4: 37800 Lease-Time (51), length 4: 43200 Server-ID (54), length 4: x.x.x.5 Subnet-Mask (1), length 4: 255.255.255.128 Default-Gateway (3), length 4: x.x.x.1 Domain-Name-Server (6), length 4: 95.87.170.5 Domain-Name (15), length 13: "edus.lokalno^@" TFTP (66), length 12: "x.x.x.9^@" BF (67), length 9: "ipxe.efi^@"
B)
16:19:25.403481 82:84:ed:04:a2:59 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 442: (tos 0x0, ttl 64, id 310, offset 0, flags [none], proto UDP (17), length 428) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 82:84:ed:04:a2:59, length 400, xid 0x239c1627, secs 4, Flags [Broadcast] (0x8000) Client-Ethernet-Address 82:84:ed:04:a2:59 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Discover MSZ (57), length 2: 1472 ARCH (93), length 2: 7 NDI (94), length 3: 1.3.10 Vendor-Class (60), length 32: "PXEClient:Arch:00007:UNDI:003010" User-Class (77), length 4: instance#1: [ERROR: invalid option] Parameter-Request (55), length 23: Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), LOG (7) Hostname (12), Domain-Name (15), RP (17), MTU (26) Vendor-Option (43), Vendor-Class (60), TFTP (66), BF (67) Unknown (119), Unknown (128), Unknown (129), Unknown (130) Unknown (131), Unknown (132), Unknown (133), Unknown (134) Unknown (135), Unknown (175), Unknown (203) Unknown (175), length 48: 2969895194,4094689515,50402561,385941796,16848385,18022657,335610129,16852737,19464449,352387366,16849665,17957121 Client-ID (61), length 7: ether 82:84:ed:04:a2:59 GUID (97), length 17: 0.53.246.81.35.77.101.93.69.156.163.70.76.167.27.65.180 16:19:25.404582 00:15:5d:01:b8:03 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 368: (tos 0x0, ttl 128, id 15370, offset 0, flags [none], proto UDP (17), length 354) x.x.x.5.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 326, xid 0x239c1627, Flags [none] (0x0000) Your-IP x.x.x.11 Server-IP x.x.x.9 Client-Ethernet-Address 82:84:ed:04:a2:59 file "ipxe.efi" Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Offer Subnet-Mask (1), length 4: 255.255.255.128 RN (58), length 4: 21600 RB (59), length 4: 37800 Lease-Time (51), length 4: 43200 Server-ID (54), length 4: x.x.x.5 Default-Gateway (3), length 4: x.x.x.1 Domain-Name-Server (6), length 4: x.x.x.5 Domain-Name (15), length 13: "edus.lokalno^@" TFTP (66), length 12: "x.x.x.9^@" BF (67), length 9: "ipxe.efi^@"
These repeat x300 and at the end:
C)
16:22:25.375712 82:84:ed:04:a2:59 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 442: (tos 0x0, ttl 64, id 46568, offset 0, flags [none], proto UDP (17), length 428) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 82:84:ed:04:a2:59, length 400, xid 0x239c1627, secs 726, Flags [Broadcast] (0x8000) Client-Ethernet-Address 82:84:ed:04:a2:59 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Discover MSZ (57), length 2: 1472 ARCH (93), length 2: 7 NDI (94), length 3: 1.3.10 Vendor-Class (60), length 32: "PXEClient:Arch:00007:UNDI:003010" User-Class (77), length 4: instance#1: [ERROR: invalid option] Parameter-Request (55), length 23: Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), LOG (7) Hostname (12), Domain-Name (15), RP (17), MTU (26) Vendor-Option (43), Vendor-Class (60), TFTP (66), BF (67) Unknown (119), Unknown (128), Unknown (129), Unknown (130) Unknown (131), Unknown (132), Unknown (133), Unknown (134) Unknown (135), Unknown (175), Unknown (203) Unknown (175), length 48: 2969895194,4094689515,50402561,385941796,16848385,18022657,335610129,16852737,19464449,352387366,16849665,17957121 Client-ID (61), length 7: ether 82:84:ed:04:a2:59 GUID (97), length 17: 0.53.246.81.35.77.101.93.69.156.163.70.76.167.27.65.180 16:22:25.376696 00:15:5d:01:b8:03 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 368: (tos 0x0, ttl 128, id 15550, offset 0, flags [none], proto UDP (17), length 354) x.x.x.5.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 326, xid 0x239c1627, Flags [none] (0x0000) Your-IP x.x.x.11 Server-IP x.x.x.9 Client-Ethernet-Address 82:84:ed:04:a2:59 file "ipxe.efi" Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Offer Subnet-Mask (1), length 4: 255.255.255.128 RN (58), length 4: 21600 RB (59), length 4: 37800 Lease-Time (51), length 4: 43200 Server-ID (54), length 4: x.x.x.5 Default-Gateway (3), length 4: x.x.x.1 Domain-Name-Server (6), length 4: x.x.x.5 Domain-Name (15), length 13: "edus.lokalno^@" TFTP (66), length 12: "x.x.x.9^@" BF (67), length 9: "ipxe.efi^@" 16:22:27.375677 82:84:ed:04:a2:59 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 454: (tos 0x0, ttl 64, id 46843, offset 0, flags [none], proto UDP (17), length 440) 0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 82:84:ed:04:a2:59, length 412, xid 0x239c1627, secs 734, Flags [Broadcast] (0x8000) Client-Ethernet-Address 82:84:ed:04:a2:59 Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Request MSZ (57), length 2: 1472 ARCH (93), length 2: 7 NDI (94), length 3: 1.3.10 Vendor-Class (60), length 32: "PXEClient:Arch:00007:UNDI:003010" User-Class (77), length 4: instance#1: [ERROR: invalid option] Parameter-Request (55), length 23: Subnet-Mask (1), Default-Gateway (3), Domain-Name-Server (6), LOG (7) Hostname (12), Domain-Name (15), RP (17), MTU (26) Vendor-Option (43), Vendor-Class (60), TFTP (66), BF (67) Unknown (119), Unknown (128), Unknown (129), Unknown (130) Unknown (131), Unknown (132), Unknown (133), Unknown (134) Unknown (135), Unknown (175), Unknown (203) Unknown (175), length 48: 2969895194,4094689515,50402561,385941796,16848385,18022657,335610129,16852737,19464449,352387366,16849665,17957121 Client-ID (61), length 7: ether 82:84:ed:04:a2:59 GUID (97), length 17: 0.53.246.81.35.77.101.93.69.156.163.70.76.167.27.65.180 Server-ID (54), length 4: x.x.x.5 Requested-IP (50), length 4: x.x.x.11 16:22:27.376786 00:15:5d:01:b8:03 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 368: (tos 0x0, ttl 128, id 15551, offset 0, flags [none], proto UDP (17), length 354) x.x.x.5.67 > 255.255.255.255.68: [udp sum ok] BOOTP/DHCP, Reply, length 326, xid 0x239c1627, Flags [none] (0x0000) Your-IP x.x.x.11 Server-IP x.x.x.9 Client-Ethernet-Address 82:84:ed:04:a2:59 file "ipxe.efi" Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: ACK RN (58), length 4: 21600 RB (59), length 4: 37800 Lease-Time (51), length 4: 43200 Server-ID (54), length 4: x.x.x.5 Subnet-Mask (1), length 4: 255.255.255.128 Default-Gateway (3), length 4: x.x.x.1 Domain-Name-Server (6), length 4: x.x.x.5 Domain-Name (15), length 13: "edus.lokalno^@" TFTP (66), length 12: "x.x.x.9^@" BF (67), length 9: "ipxe.efi^@"
x.x.x.5 = Windows DC with DHCP
x.x.x.9 = FOG
x.x.x.11 = test proxmox vm(Network really is /25)
Thanks for your time.
-
I’m sorry I just figured I put wrong number of seconds in the title.
It’s actually 3 minutes (+10s), so it should be
“180 (or 190) seconds […]”. If anyone can edit title, please do. -
@zbe So from your picture it appears the wait time is coming from iPXE getting an IP address?
What version of FOG are you using?
I have seen/know that standard spanning tree will add about a 30 second delay to pxe booting because it listens for the BPDU packets for 27 seconds before forwarding data, where portfast/fast-stp/rstp will start forwarding right away while listening for the BPDU packets, but not a 3 minute delay.
I do have a tutorial on pxe booting with a slightly different tcpdump command. Could you grab a new pcap of the failing pxe boot sequence using the command in this tutorial then upload the pcap to a file share site? DM me using FOG chat the url and I’ll take a look at the packet capture. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue?_=1681044745660
There has to me something going on here we don’t expect.
-
@zbe So from your picture it appears the wait time is coming from iPXE getting an IP address?
Yes.
What version of FOG are you using?
1.5.10
Could you grab a new pcap of the failing pxe boot sequence
Well, technically it’s not failing, it just takes 180 seconds for yet unknown reason.
–
I PM-ed you 6 captures. 3 are done booting UEFI and other 3 BIOS.
I used 3 different network devices:
- VirtIO paravirtualized
- Intel E1000
- Realtek RTL8139
As you’ll see almost every configuration behaves somewhat unique. It makes no sense to me.
I can do captures of physical PCs sometime next week, hopefully.