UEFI pxe boot problem from a network
-
@george1421
“At this time the problem is infrastructure related and not FOG.” I agree.
I don’t understand the boot dhcp response on vlan 10.
The ip configuration is good when loaded ! -
@lebrun78 said in UEFI pxe boot problem from a network:
The ip configuration is good when loaded !
You mean when Windows boots it’s correct?
-
@lebrun78 said in UEFI pxe boot problem from a network:
I don’t understand the boot dhcp response on vlan 10.
Looking at the dhcp packet from your main dhcp server its giving out the wrong default router address for this subnet. So any computer that uses dhcp should not be able to connect to any device beyond its local subnet. Its impossible since the router its being told to use to leave the local subnet, is on a different subnet to start with.
You should contact your infrastructure staff and ask they to confirm the dhcp settings are correct for this subnet. If I had to guess, I would think they just copied the settings from the subnet where your FOG server is and pasted them into the vlan 10 subnet configuration and missed the router value. But that is only a guess made from 6600km away.
-
@Sebastian-Roth
Yes ipconfig is good. -
@george1421
I’m the infrastructure manager, the dhcpd.conf is the one in the first post … -
@lebrun78 I’m going to have to look into this, but I have to ask the question why does the dhcp servers have two different IP addresses? Those each are listed in the pcaps.
-
This post is deleted! -
Yes, we don’t have dhcp relay, the dhcp server have several virtual network interface, one on each vlan.
-
@lebrun78 I can’t see from the config how/why its sending out the wrong router address unless something in
include "/etc/dhcp/vip.conf";
is doing it.Wait, there is something strange going on here. Look at the base address and the subnet mask as defined.
subnet 148.60.10.0 netmask 255.255.255.0 { ########################################## option domain-name-servers 148.60.15.109,148.60.15.106 ; option domain-name "istic.univ-rennes1.fr" ; option routers 148.60.10.254 ; option subnet-mask 255.255.255.0 ; default-lease-time 600 ; max-lease-time 1200 ; group { # On commente les deux lignes suivantes pour éviter le menu de Fog next-server 148.60.4.1;
But look at the pcap what the client is being told.
As you see in the picture the client is being told that its subnet mask is 255.255.248.0, but your config files says 255.255.255.0. The client is being told the router is 148.60.7.254 but your config file says 148.60.10.254.
So I’ll ask you the same question again in a different way. Is dhcp server 148.60.10.252 and 148.60.4.3 the same computer? If it is do you have 2 different instances of isc-dhcp server running, where each instance is bound to a different network interface? Something is strange with the 148.60.10.252 dhcp server.
-
So I’ll ask you the same question again in a different way. Is dhcp server 148.60.10.252 and 148.60.4.3 the same computer?
YES
I have only one dhcpd.conf file, sone only one instance of dhcpHere is what I get on the same machine on vlan 148.60.10.0/24 when windows is loaded:
It’s crazy, no ?
-
I have made 2 boot on the windows machine, UEFi pxe boot and hard drive boot.
I get this logs in my dhcp server:Apr 6 09:46:02 sybille2 dhcpd: PXEClient:Arch:00007:UNDI:003016 Apr 6 09:46:02 sybille2 dhcpd: DHCPDISCOVER from 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:03 sybille2 dhcpd: DHCPOFFER on 148.60.10.198 to 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:05 sybille2 dhcpd: PXEClient:Arch:00007:UNDI:003016 Apr 6 09:46:05 sybille2 dhcpd: DHCPREQUEST for 148.60.10.198 (148.60.10.252) from 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:05 sybille2 dhcpd: DHCPACK on 148.60.10.198 to 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:41 sybille2 dhcpd: MSFT 5.0 Apr 6 09:46:41 sybille2 dhcpd: DHCPDISCOVER from 10:65:30:83:5c:4b via em2.10 Apr 6 09:46:42 sybille2 dhcpd: DHCPOFFER on 148.60.10.190 to 10:65:30:83:5c:4b (MININT-S9D1BSU) via em2.10 Apr 6 09:46:42 sybille2 dhcpd: MSFT 5.0 Apr 6 09:46:42 sybille2 dhcpd: DHCPREQUEST for 148.60.10.190 (148.60.10.252) from 10:65:30:83:5c:4b (MININT-S9D1BSU) via em2.10 Apr 6 09:46:42 sybille2 dhcpd: DHCPACK on 148.60.10.190 to 10:65:30:83:5c:4b (MININT-S9D1BSU) via em2.10 Apr 6 09:46:42 sybille2 dhcpd: Unable to add forward map from MININT-S9D1BSU.istic.univ-rennes1.fr to 148.60.10.190: not found
The same machine gets to differents IP, 148.60.10.190 and 148.60.10.198 at 09:46:03 (pxe booot) and at 09:46:40
-
@george1421 said in UEFI pxe boot problem from a network:
I really don’t understand how this is possible. I can understand the dhcp server giving its a new IP address as its booting. I’ve seen it before. What I don’t understand is how it would give it information that is not from its pool. That is totally confusing. If it was giving the complete information from the wrong pool I might understand, but the original pcap has the right IP address range and the wrong router and subnet information.
Can you grab a pcap from a witness computer on this vlan 10 using this new capture filter. PXE boot it to the error and then let it boot into windows. I want to see the response from both dhcp requests.
port 67 or port 68 and ether host 10:65:30:83:5c:4b
The only thing I can think that we might do is create a second instance of the dhcp server, adjust the dhcp config files accordingly, and then bind each instance with config file to the proper interface. Your setup is not a traditional one using dhcp helper services and a single dhcp interface on the server. Its possible that the dhcp server is getting confused to where the bootp request is coming from. Right now I’m just grabbing at ideas, because what you are reporting should not be.
-
@george1421
Hello Geoge
this morning I made a test by reversing the places of the declarations of the subnet.
In fact the client recovers the mask and router of the first declared subnet… -
@lebrun78 Hey, I was just comparing your dhcp config file with an example ubuntu dual interface example. I loaded your configuration into notepad++ and it pointed out you have an extra curly brace at the end of your config file. I don’t know if this was a type-o when you pasted it in or you do have an extra curly brace in the config.
-
@george1421 Never mind, I just got excited for finding nothing. Still looking into the setup.
-
@george1421
I have just done the search of extra curly brace with notepadd++, I didn’t see the problem.The file in first post is an extract, you can view the production file here:
https://filesender.renater.fr/?s=download&token=11cc357f-4663-41c8-830b-71938d2d2aa7 -
@lebrun78 I just had an idea. Maybe this is caused by a problematic entry in the DHCP leases cache file?? Take a look at
/var/lib/dhcpd/dhcpd.leases
. Not really sure what we are looking for but you might search that file for pattern148.60.10.
to see what leases are in the store. If you find something concerning than I would stop dhcp service for a second, make a backup copy of that file, edit and remove the problematic entry and start dhcp service up again.As you seem to have a lot if fixed addresses defined you might not even care much about the leases. In that case you could even clear the whole leases file (stop dhcp before) and see if it makes a difference.
-
@Sebastian-Roth
Hello
I have blanked the lease file. At the reboot of the client, same problem.
Here is the actual lease file:cat dhcpd.leases # The format of this file is documented in the dhcpd.leases(5) manual page. # This lease file was written by isc-dhcp-4.2.5 server-duid "\000\001\000\001&\036\337\215P\232L\202P~"; lease 148.60.10.180 { starts 2 2020/04/07 06:53:04; ends 3 2020/04/08 06:53:04; cltt 2 2020/04/07 06:53:04; binding state active; next binding state free; rewind binding state free; hardware ethernet 10:65:30:83:5c:4b; set vendor-string = "PXEClient:Arch:00007:UNDI:003016";
-
@george1421
So here is a capture with 2 request, on pxe at time 0 and at time 196 a usb boot ubuntu.
When ubuntu loaded, ip a give good IP adress, good router and good netwask
capturedhcp.pcap -
@lebrun78 I have some good news and some bad. The good news is I found some time to setup a VM to try to replicate your setup and play with it. Found this is happening in my test setup as well!
Bad news is that I have not found why this is happening yet. But I am fairly sure I will! Stay tuned.