UEFI pxe boot problem from a network
-
Several vlan:
-
148.60.0.0 255.255.248.0 (148.60.0.0 > 148.60.7.255)
(fog server vlan 148.60.4.1, dhcp 148.60.4.3 router 148…60.7.254 -
148.60.8.0 255.255.255.0 (148.60.8.0 > 148.60.8.255)
router 148…60.8.254 no dhcp -
148.60.10.0 255.255.255.0 (148.60.10.0 > 148.60.10.255)
dhcp 148.60.10.252 router 148…60.10.254 (vlan with deployment problem) -
148.60.11.0 255.255.255.0 (148.60.11.0 > 148.60.11.255)
dhcp 148.60.11.248 router 148…60.11.254 -
148.60.12.0 255.255.255.0 (148.60.12.0 > 148.60.12.255)
dhcp 148.60.11.252 router 148…60.12.254 -
148.60.13.0 255.255.255.0 (148.60.13.0 > 148.60.13.255)
dhcp 148.60.13.248 router 148…60.13.254 -
148.60.14.0 255.255.255.0 (148.60.14.0 > 148.60.14.255)
dhcp 148.60.14.252 router 148…60.14.254 -
148.60.15.0 255.255.255.0 (148.60.15.0 > 148.60.15.255)
dhcp 148.60.15.109 (its native vlan) router 148…60.15.254
-
-
@george1421
Here is the capture from fog server, client in uefi mode
uefi.pcap -
@lebrun78 Well I’m not sure how to explain this situation but @Sebastian-Roth is spot on.
First the easy part, it appears there are 2 dhcp servers (or configurations) involved here. The reason why I say that is that they are giving different responses to the pxe boot request. If you look at the pcap on the working subnet it responds with dhcp option 12, the not working pcap does not include dhcp option 12. This is only important to show there are different settings for these two pcaps.
Now to the hard part to explain.
On the working subnet
Client IP: 148.60.3.152
Subnet Mask: 255.255.248.0
Gateway: 148.60.7.254
Subnet Range: 148.60.0.1-148.60.7.254On the not working subnet.
Client IP: 148.60.10.193
Subnet Mask: 255.255.248.0
Gateway: 148.60.7.254
Subnet Range: 148.60.8.1-148.60.15.254So now to identify the problem. If you look at the not working subnet you will see the gateway IP address is outside of the usable range of the client’s IP address. The gateway address is 148.60.7.254 but the subnet base address is 148.60.8.0. So its not possible for the client to reach the router to get outside of the subnet to connect to the FOG server at 148.60.4.1. At this time the problem is infrastructure related and not FOG.
-
@george1421
“At this time the problem is infrastructure related and not FOG.” I agree.
I don’t understand the boot dhcp response on vlan 10.
The ip configuration is good when loaded ! -
@lebrun78 said in UEFI pxe boot problem from a network:
The ip configuration is good when loaded !
You mean when Windows boots it’s correct?
-
@lebrun78 said in UEFI pxe boot problem from a network:
I don’t understand the boot dhcp response on vlan 10.
Looking at the dhcp packet from your main dhcp server its giving out the wrong default router address for this subnet. So any computer that uses dhcp should not be able to connect to any device beyond its local subnet. Its impossible since the router its being told to use to leave the local subnet, is on a different subnet to start with.
You should contact your infrastructure staff and ask they to confirm the dhcp settings are correct for this subnet. If I had to guess, I would think they just copied the settings from the subnet where your FOG server is and pasted them into the vlan 10 subnet configuration and missed the router value. But that is only a guess made from 6600km away.
-
@Sebastian-Roth
Yes ipconfig is good. -
@george1421
I’m the infrastructure manager, the dhcpd.conf is the one in the first post … -
@lebrun78 I’m going to have to look into this, but I have to ask the question why does the dhcp servers have two different IP addresses? Those each are listed in the pcaps.
-
This post is deleted! -
Yes, we don’t have dhcp relay, the dhcp server have several virtual network interface, one on each vlan.
-
@lebrun78 I can’t see from the config how/why its sending out the wrong router address unless something in
include "/etc/dhcp/vip.conf";
is doing it.Wait, there is something strange going on here. Look at the base address and the subnet mask as defined.
subnet 148.60.10.0 netmask 255.255.255.0 { ########################################## option domain-name-servers 148.60.15.109,148.60.15.106 ; option domain-name "istic.univ-rennes1.fr" ; option routers 148.60.10.254 ; option subnet-mask 255.255.255.0 ; default-lease-time 600 ; max-lease-time 1200 ; group { # On commente les deux lignes suivantes pour éviter le menu de Fog next-server 148.60.4.1;
But look at the pcap what the client is being told.
As you see in the picture the client is being told that its subnet mask is 255.255.248.0, but your config files says 255.255.255.0. The client is being told the router is 148.60.7.254 but your config file says 148.60.10.254.
So I’ll ask you the same question again in a different way. Is dhcp server 148.60.10.252 and 148.60.4.3 the same computer? If it is do you have 2 different instances of isc-dhcp server running, where each instance is bound to a different network interface? Something is strange with the 148.60.10.252 dhcp server.
-
So I’ll ask you the same question again in a different way. Is dhcp server 148.60.10.252 and 148.60.4.3 the same computer?
YES
I have only one dhcpd.conf file, sone only one instance of dhcpHere is what I get on the same machine on vlan 148.60.10.0/24 when windows is loaded:
It’s crazy, no ?
-
I have made 2 boot on the windows machine, UEFi pxe boot and hard drive boot.
I get this logs in my dhcp server:Apr 6 09:46:02 sybille2 dhcpd: PXEClient:Arch:00007:UNDI:003016 Apr 6 09:46:02 sybille2 dhcpd: DHCPDISCOVER from 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:03 sybille2 dhcpd: DHCPOFFER on 148.60.10.198 to 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:05 sybille2 dhcpd: PXEClient:Arch:00007:UNDI:003016 Apr 6 09:46:05 sybille2 dhcpd: DHCPREQUEST for 148.60.10.198 (148.60.10.252) from 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:05 sybille2 dhcpd: DHCPACK on 148.60.10.198 to 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:41 sybille2 dhcpd: MSFT 5.0 Apr 6 09:46:41 sybille2 dhcpd: DHCPDISCOVER from 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:42 sybille2 dhcpd: DHCPOFFER on 148.60.10.190 to 10:65:30:83:5c:4b (MININT-S9D1BSU) via em2.10 Apr 6 09:46:42 sybille2 dhcpd: MSFT 5.0 Apr 6 09:46:42 sybille2 dhcpd: DHCPREQUEST for 148.60.10.190 (148.60.10.252) from 10:65:30:83:5c:4b (MININT-S9D1BSU) via em2.10 Apr 6 09:46:42 sybille2 dhcpd: DHCPACK on 148.60.10.190 to 10:65:30:83:5c:4b (MININT-S9D1BSU) via em2.10 Apr 6 09:46:42 sybille2 dhcpd: Unable to add forward map from MININT-S9D1BSU.istic.univ-rennes1.fr to 148.60.10.190: not found
The same machine gets to differents IP, 148.60.10.190 and 148.60.10.198 at 09:46:03 (pxe booot) and at 09:46:40
-
@george1421 said in UEFI pxe boot problem from a network:
I really don’t understand how this is possible. I can understand the dhcp server giving its a new IP address as its booting. I’ve seen it before. What I don’t understand is how it would give it information that is not from its pool. That is totally confusing. If it was giving the complete information from the wrong pool I might understand, but the original pcap has the right IP address range and the wrong router and subnet information.
Can you grab a pcap from a witness computer on this vlan 10 using this new capture filter. PXE boot it to the error and then let it boot into windows. I want to see the response from both dhcp requests.
port 67 or port 68 and ether host 10:65:30:83:5c:4b
The only thing I can think that we might do is create a second instance of the dhcp server, adjust the dhcp config files accordingly, and then bind each instance with config file to the proper interface. Your setup is not a traditional one using dhcp helper services and a single dhcp interface on the server. Its possible that the dhcp server is getting confused to where the bootp request is coming from. Right now I’m just grabbing at ideas, because what you are reporting should not be.
-
@george1421
Hello Geoge
this morning I made a test by reversing the places of the declarations of the subnet.
In fact the client recovers the mask and router of the first declared subnet… -
@lebrun78 Hey, I was just comparing your dhcp config file with an example ubuntu dual interface example. I loaded your configuration into notepad++ and it pointed out you have an extra curly brace at the end of your config file. I don’t know if this was a type-o when you pasted it in or you do have an extra curly brace in the config.
-
@george1421 Never mind, I just got excited for finding nothing. Still looking into the setup.
-
@george1421
I have just done the search of extra curly brace with notepadd++, I didn’t see the problem.The file in first post is an extract, you can view the production file here:
https://filesender.renater.fr/?s=download&token=11cc357f-4663-41c8-830b-71938d2d2aa7 -
@lebrun78 I just had an idea. Maybe this is caused by a problematic entry in the DHCP leases cache file?? Take a look at
/var/lib/dhcpd/dhcpd.leases
. Not really sure what we are looking for but you might search that file for pattern148.60.10.
to see what leases are in the store. If you find something concerning than I would stop dhcp service for a second, make a backup copy of that file, edit and remove the problematic entry and start dhcp service up again.As you seem to have a lot if fixed addresses defined you might not even care much about the leases. In that case you could even clear the whole leases file (stop dhcp before) and see if it makes a difference.