FOG will not boot - "Failed to get an IP via DHCP!
-
@george1421 said in FOG will not boot - "Failed to get an IP via DHCP!:
You can test this by putting a dumb (unmanaged) switch between the booting computer and the building network switch.
He did that, in his first post:
I’ve simplified the infrastructure to the client and the FOG server connected through a dumb switch.
-
@Wirefall said in FOG will not boot - "Failed to get an IP via DHCP!:
@Wirefall Update: Looking at the daemon.log, it appears that the client makes a DHCP request and is provided an IP from the FOG server of X.X.X.11. There is then an error message of “init.tftpd[XXXX]: tftp: client does not accept options.” A new DHCP Discover is then completed and the client receives X.X.X.10 (the first available entry in the DHCP pool).
The “tftp: client does not accept options.” message is a misnomer. It doesn’t mean anything in regards to the problem(s) you’re seeing. The DHCP request appears to be the issue but only within the init itself. This leads me to think the switch is potentially a “power saving” switch?
-
@Tom-Elliott said in FOG will not boot - "Failed to get an IP via DHCP!:
The “tftp: client does not accept options.” message is a misnomer. It doesn’t mean anything in regards to the problem(s) you’re seeing.
I agree, that tftp message is just a warning message.
We’ve seen a similar issue before with realtek nics and green ethernet/power saving/802.3az being enabled on the switch. But this is the first time seeing an intel nic do this.
-
@george1421 Thanks for the pointer, I’ll check to see if there’s anything in the BIOS I can manipulate regarding power saving features.
-
@Tom-Elliott I’ve tried through two separate switches. One managed the other as cheap as cheap can get…doubt it has any additional functionality. I’ll dig out an old hub to make sure this isn’t the problem. Thanks for input!
-
@Wirefall I think our next step is to get the target computer to boot into a debug capture console to see what is going on. If a dumb switch with no fancy support doesn’t work I think a hub would be a waste of time (unless you had one under your desk already).
-
@george1421 I’ll attempt a manual assignment and look at the debug info if successful. Cheers
-
@george1421 Roger that. I’ll attempt the manual insertion and debug capture this evening. Thanks for the prompt responses!
And, yes, I have all sorts of junk under my desk…
-
@george1421 @TOM ELLIOTT @WAYNE WORKMAN
I manually added and assigned debug task. The contents of /var/log/ are attached. I also ran ifconfig; it only shows the local adapter and it wasn’t possible to bring up eth0.
Also, previously daemon.log on the FOG server showed a DHCP offer of X.X.X.11 that was then requested by the client and ack’d by the server. This was almost immediately followed by a new discover and an offer made for X.X.X.10, which was also requested and ack’d. This time the same thing occurred, other than the initial offer being for .13, which was still followed up with .10.
Thanks for looking at this. Let me know if there’s any other information that would be helpful to troubleshoot.
Cheers!
-
@Wirefall What is the model of the problematic host? Is this a server?
-
We’d also need to see
ifconfig -a
rather than justifconfig
(The latter only shows active devices). -
@Tom-Elliott @Tom-Elliott identical output - lo only. When not PXE booting the device is running a Debian variant (Kali) with full network capabilities.
-
@Wayne-Workman The full specs for the MintBox can be found here - http://www.fit-pc.com/web/products/mintbox/mintbox-mini/
Max Screen Resolution 1920 x 1200 @ 60Hz
Processor AMD A Series
RAM 4 GB SO-DIMM DDR3
Hard Drive 64 GB
Graphics Coprocessor Radeon
Card Description dedicated
Wireless Type 802.11bgn
Number of USB 2.0 Ports 3
Other Technical Details
Brand Name MintBox
Series MintBox Mini
Item model number FITLET-GB-C64-D4-M64S-W-XL-BMint
Hardware Platform PC
Operating System linux
Item Weight 8.8 ounces
Product Dimensions 4.2 x 3.3 x 0.9 inches
Item Dimensions L x W x H 4.25 x 3.27 x 0.94 inches
Color Green
Processor Brand AMD
Computer Memory Type DDR3 SDRAM
Hard Drive Interface Solid State
Audio-out Ports (#) 1
Power Source DC
Voltage 12 volts -
@Wirefall From within the FOG debug console (the same place where you entered the
ifconfig -a
command key inlspci -nn|grep etwork
We will be interested in a line that looks like this
00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04)
More precisely the hex numbers (i.e. [8086:1502] ) in the square brackets. Those numbers identifies the nic controller.
-
Just a simple question, seeing as you state the device id information is 8086:1539 (which has been in the linux kernels for a very long time), does your registration issues happen on multiple systems of the same model or just this one system? I only ask because this isn’t seeming to make sense. The only things I can think of:
- Device is having issues (Unlikely seeing as tftp and ipxe appears to work properly).
- Device patch cable is screwed up? (Likely because TFTP doesn’t require the cabling layout as full network support typically does, – confusing because while TFTP might work, iPXE shouldn’t).
-
@george1421 [8086:1539]
-
@Tom-Elliott This does occur on both of the two test boxes I have.
To rule out flaky hardware/cables, I’ve taken the following steps.
- Swapped smart switch for unmanaged switch.
- Replaced all cables.
- Replaced NIC on FOG server - both are Intel Gigabit, [8086:10ce] and [8086:10d3]
Thanks and Happy New Year!
-
I’ve eliminated the possibility of the problem being the infrastructure or of it being PXE server specific.
-
I PXE booted my netbook within the test environment and everything worked as would be expected.
-
I booted the server off of a clonezilla live CD and attempted to PXE boot one of the mintboxes from that. I received the clonezilla menu, but selecting an imaging option failed on bringing up the Ethernet device “If this fails, maybe the ethernet card is not supported by the kernel 4.7.0-1-amd64”. So, basically the same thing I’m seeing from FOG.
Definitely odd as these machines come pre-configured with Linux Mint and I’ve installed Kali (debian-based distro) on four of them without a single networking issue.
If anyone has any other ideas I’m definitely willing to try them. I’d rather not have to go with another hardware platform if I don’t have to. Luckily still in dev, though.
Thanks for all your help! Cheers
-
-
@Wirefall Great you posted the full kernel messages listing. At first I didn’t notice any issue but looking closer I found the issue:
igb: probe of 0000:01:00.0 failed with error -2
The PCI ID perfectly matches the one you mentioned in your fist post. Didn’t take long to find several reports on this issue that came in just lately:
https://lkml.org/lkml/2016/11/24/172
https://bugzilla.suse.com/show_bug.cgi?id=1009911
https://patchwork.ozlabs.org/patch/700615/Some say that it might possibly work if you disable PXE boot for this NIC in BIOS. I don’t think this is a great solution as FOG heavily relies on PXE booting the clients. Let’s hope that this will be fixed in the latest kernel fairly soon!
-
@Sebastian-Roth Is there any chance that an older kernel will work here? I’m going to assume that both Mint and Kali are using older kernels that do work (which may not be a solid test since @Wirefall is not pxe booting either OS platforms).