Network Boot forgets Ethernet card exists after booting
-
Server
- FOG Version: SVN 6063
- OS: Debian Jessie
Client
- Service Version: N/A
- OS: N/A
Description
PXE Booting is going to be the death of me! I need to image some computers, and then I learned that FOG is not working. So, I start the troubleshooting, and here’s what I found:
- The default menu appears for SOME clients (this is new to me, I’ve never seen it before. The only option is “fog”). After the zero second time out (where is it getting this info from?), it moves on to try to Network Boot the Wifi card, which fails.
- On the one I care about imaging right now, it starts the Network Boot, gets its IP address and info from the DHCP server, then starts to load the IPXE environment, which tries to boot the Wifi card, and does not seem to detect the Ethernet card that booted it.
I have tried to re-install the latest version from git, and the issue is still here. It seems like the IPXE kernel is not understanding the basic NIC in two different test machines - a VirtualBox machine, and an Asus EeePC.
DHCP options:
next-server 192.168.0.4;
filename “undionly.kpxe”;FOG server is located at 192.168.0.4, and the /tftpboot directory looks normal:
/tftpboot# ls -lah total 7.1M drwxr-xr-x 6 fog root 4.0K Jan 27 12:09 . drwxr-xr-x 25 root root 4.0K Jan 27 12:09 .. drwxr-xr-x 2 fog root 4.0K Jan 27 12:09 10secdelay -rw-r-xr-x 1 fog root 840 Jan 27 12:09 boot.txt -rw-r-xr-x 1 fog root 426 Jan 27 12:09 default.ipxe drwxr-xr-x 2 fog root 4.0K Aug 25 09:00 i386-7156-efi drwxr-xr-x 2 fog root 4.0K May 31 2016 i386-efi -rw-r-xr-x 1 fog root 195K Jan 27 12:09 intel7156.efi -rw-r-xr-x 1 fog root 216K Jan 27 12:09 intel.efi -rw-r-xr-x 1 fog root 92K Jan 27 12:09 intel.kkpxe -rw-r-xr-x 1 fog root 92K Jan 27 12:09 intel.kpxe -rw-r-xr-x 1 fog root 92K Jan 27 12:09 intel.pxe -rw-r-xr-x 1 fog root 921K Jan 27 12:09 ipxe7156.efi -rw-r-xr-x 1 fog root 959K Jan 27 12:09 ipxe.efi -rw-r-xr-x 1 fog root 846K Jan 27 12:09 ipxe.iso -rw-r-xr-x 1 fog root 337K Jan 27 12:09 ipxe.kkpxe -rw-r-xr-x 1 fog root 337K Jan 27 12:09 ipxe.kpxe -rw-r-xr-x 1 fog root 337K Jan 27 12:09 ipxe.krn -rw-r-xr-x 1 fog root 337K Jan 27 12:09 ipxe.pxe -rw-r-xr-x 1 fog root 121K Jan 27 12:09 ldlinux.c32 -rw-r-xr-x 1 fog root 184K Jan 27 12:09 libcom32.c32 -rw-r-xr-x 1 fog root 26K Jan 27 12:09 libutil.c32 -rw-r-xr-x 1 fog root 26K Jan 27 12:09 memdisk -rw-r-xr-x 1 fog root 29K Jan 27 12:09 menu.c32 -rw-r-xr-x 1 fog root 43K Jan 27 12:09 pxelinux.0 -rw-r-xr-x 1 fog root 43K Jan 27 12:09 pxelinux.0.old drwxr-xr-x 2 fog root 4.0K May 31 2016 pxelinux.cfg -rw-r-xr-x 1 fog root 195K Jan 27 12:09 realtek7156.efi -rw-r-xr-x 1 fog root 216K Jan 27 12:09 realtek.efi -rw-r-xr-x 1 fog root 93K Jan 27 12:09 realtek.kkpxe -rw-r-xr-x 1 fog root 93K Jan 27 12:09 realtek.kpxe -rw-r-xr-x 1 fog root 93K Jan 27 12:09 realtek.pxe -rw-r-xr-x 1 fog root 194K Jan 27 12:09 snp7156.efi -rw-r-xr-x 1 fog root 215K Jan 27 12:09 snp.efi -rw-r-xr-x 1 fog root 194K Jan 27 12:09 snponly7156.efi -rw-r-xr-x 1 fog root 215K Jan 27 12:09 snponly.efi -rw-r-xr-x 1 fog root 92K Jan 27 12:09 undionly.kkpxe -rw-r-xr-x 1 fog root 92K Jan 27 12:09 undionly.kpxe -rw-r-xr-x 1 fog root 374K May 31 2016 undionly.kpxe.INTEL -rw-r-xr-x 1 fog root 92K Jan 27 12:09 undionly.pxe -rw-r-xr-x 1 fog root 30K Jan 27 12:09 vesamenu.c32
Suggestions of where to look? The DHCP server is also Debian Jessie, but on a different server.
I ran a
tcpdump
on the interface, and looked at it through Wireshark, as suggested. The only errors I see is an unknown error at the top, and missing files, such aspxelinux.cfg/80833fa6-f091-4681-2969-485b39123be5
, then other weird ID numbers, until it findspxelinux.cfg/default
, and continues on. I can post the capture file if it would help futher -
@Sebastian-Roth Using
isc-dhcp-server
on a separate Debian Jessie server (192.168.0.1)@Wayne-Workman I’ll give that a try and report back. Because I never installed FOG’s DHCP server, I never knew about those lines.
Edit: I got this working. Issue with the switch, and I’m guessing, Auto-Negotiation. Turned it down to 100M Full Duplex, and it worked on the next boot! No change with the extra lines added to DHCP though.
-
This post is deleted! -
Based on your tcpdump your pxe server is being handed information by the pxelinux.0 file, not ipxe. This would be, from my understanding, passing to an ipxe.lkrn file.
The pxelinux.cfg is the indicator to me for this.
-
@Tom-Elliott OK, I can find the
ipxe.krn
file (notipxe.lkrn
) in my/tftpboot
folder. Assuming this is correct, why will the FOG menu not appear? Is there more work I can trace back to help? -
I think the problem is we need to see what the DHCP is actually handing out.
From what I can see, it’s pointing at pxelinux.0.
pxelinux.0 is passing to ipxe.krn. The ipxe.krn has an embedded script that is told to look at the default.ipxe file.
pxelinux.0, when initially loaded, looks for the “uuid” first, and on down until there’s nothing found and then it tries default. Default is what’s handing out the data back to the client machine (telling it to load ipxe.krn).
If your dhcp is handing out undionly.kpxe as your original post suggests, then you’re not looking at the right place because your clients are definitely NOT looking at what you think they’re looking at.
-
@Tom-Elliott Well, this is awkward…
Captured a
tcpdump
from the DHCP server, and you’re right! It’s dishing outpxelinux.0
, not what I specified in the/etc/dhcp/dhcpd.conf
file! More investigating on my part now… Thanks for the direction to look in! -
OK, I got it to use the correct filename. Now, the machines start the booting from the network, and when iPXE loads (1.0.0+ 26050), it goes to:
Configuring (net0 <MAC Address>)...... No configuration methods succeeded Configuring (net0 <MAC Address>)...... OK iPXE>
The advantage now is that the correct MAC address is attempting to boot; the unfortunate part is that the FOG menu is still not coming up
I also tried with a “dumb switch” just before the computers in question - no change. I know that STP is disabled on all my switches leading back to the FOG Server (when Googling other forums, this came up quite a bit to check for).
-
@lukebarone It’s dropping you to an ipxe shell.
So, you said previously that you had the correct file configured in dhcpd.conf, however later found that this wasn’t the case. What did you find? What did you change? Do you have more than one DHCP server? What was nextserver configured as (option 066 in windows) ?
Can you post your dhcpd.conf file here so I can look through it for issues?
-
@Wayne-Workman Here it is:
authoritative; option domain-name "sd57.lan"; option domain-name-servers 192.168.0.1,199.175.16.2; option netbios-name-servers 192.168.0.3; option local-pac-server code 252 = text; option domain-search "sd57.bc.ca", "sd57.lan"; option routers 192.168.31.254; ddns-updates on; ddns-update-style interim; ignore client-updates; update-static-leases on; default-lease-time 3600; #1 Hr max-lease-time 28800; log-facility local7; include "/etc/dhcp/ddns.key"; zone sd57.lan. { primary 127.0.0.1; key DDNS_UPDATE; } zone 0.168.192.in-addr.arpa. { primary 127.0.0.1; key DDNS_UPDATE; } subnet 192.168.0.0 netmask 255.255.224.0 { authoritative; ddns-domainname "cla.sd57.bc.ca"; next-server 192.168.0.4; # FOG Server # filename "ipxe.pxe"; # filename "intel.pxe"; # filename "pxelinux.0"; filename "undionly.kpxe"; # The closest thing I have to something working # filename "undionly.pxe"; range 192.168.1.1 192.168.8.254; option subnet-mask 255.255.224.0; option broadcast-address 192.168.31.255; option routers 192.168.31.254; option netbios-name-servers 192.168.0.3; option netbios-node-type 8; }
-
@lukebarone You’re missing all the pxe options in the configuration. Below is what the fog installer puts at the top of the dhcpd.conf file. Put this at the top of the file and then give dhcpd a restart.
option space PXE; option PXE.mtftp-ip code 1 = ip-address; option PXE.mtftp-cport code 2 = unsigned integer 16; option PXE.mtftp-sport code 3 = unsigned integer 16; option PXE.mtftp-tmout code 4 = unsigned integer 8; option PXE.mtftp-delay code 5 = unsigned integer 8; option arch code 93 = unsigned integer 16; # RFC4578
Also towards the bottom of your configuration, the below lines are redundant, as you have it set globally already.
option routers 192.168.31.254; option netbios-name-servers 192.168.0.3;
-
@lukebarone What kind of DHCP server/software are we talking about here. The config snippet you posted is partly DHCP and partly DNS (
zone ...
). I am not aware of any software being able to handle this kind of config. dnsmasq can do DHCP and DNS but has a different config syntax as far as I remember. -
@Sebastian-Roth Using
isc-dhcp-server
on a separate Debian Jessie server (192.168.0.1)@Wayne-Workman I’ll give that a try and report back. Because I never installed FOG’s DHCP server, I never knew about those lines.
Edit: I got this working. Issue with the switch, and I’m guessing, Auto-Negotiation. Turned it down to 100M Full Duplex, and it worked on the next boot! No change with the extra lines added to DHCP though.
-
@Sebastian-Roth According to this discussion (http://serverfault.com/questions/806875/how-to-tell-isc-dhcp-correct-zone-for-reverse-zone-ddns-update) it has to do with reverse dns lookups