FOG will not boot - "Failed to get an IP via DHCP!



  • I just installed my first FOG server (1.3.0 SVN Rev 6050) and am receiving this same error message, “Failed to get an IP via DHCP! Tried on interfaces(s):” when attempting to register a client.

    I’ve simplified the infrastructure to the client and the FOG server connected through a dumb switch.

    The client appears to receive an IP from the FOG server during PXE boot. The error message is received only after full or quick registration is selected.

    This is a mintbox mini (http://www.fit-pc.com/web/products/mintbox/mintbox-mini/).

    A developer’s previous request asked another user for the output to lspci -nn | grep Ethernet. Mine is as follows…

    01:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)

    Any hints, help, or pointers would be appreciated.

    Cheers!



  • @Tom-Elliott Woo-hoo, capturing an image now!

    A big thank you to everyone who helped track down and fix this. Your support is amazing! Looks like I have another FOSS project to add to my contribution list…


  • Senior Developer

    Alright,

    I’ve rebuilt the 4.9.0 kernels to have the patch in the first link.

    Please goto FOG Configuration Page->Kernel Update

    Download the 4.9.0 kernels as bzImage/bzImage32.

    The one with Arch Type (x86_64) would be named bzImage
    The one with Arch Type (x86) would be named bzImage32


  • Senior Developer

    @Sebastian-Roth It’s funny you mention the idea that the kernel is the problem (I didn’t notice the error message until you pointed it out). My Next question was to ask about using older kernels.


  • Moderator

    @Sebastian-Roth Is there any chance that an older kernel will work here? I’m going to assume that both Mint and Kali are using older kernels that do work (which may not be a solid test since @Wirefall is not pxe booting either OS platforms).


  • Developer

    @Wirefall Great you posted the full kernel messages listing. At first I didn’t notice any issue but looking closer I found the issue:

    igb: probe of 0000:01:00.0 failed with error -2
    

    The PCI ID perfectly matches the one you mentioned in your fist post. Didn’t take long to find several reports on this issue that came in just lately:
    https://lkml.org/lkml/2016/11/24/172
    https://bugzilla.suse.com/show_bug.cgi?id=1009911
    https://patchwork.ozlabs.org/patch/700615/

    Some say that it might possibly work if you disable PXE boot for this NIC in BIOS. I don’t think this is a great solution as FOG heavily relies on PXE booting the clients. Let’s hope that this will be fixed in the latest kernel fairly soon!



  • @Tom-Elliott @george1421

    I’ve eliminated the possibility of the problem being the infrastructure or of it being PXE server specific.

    1. I PXE booted my netbook within the test environment and everything worked as would be expected.

    2. I booted the server off of a clonezilla live CD and attempted to PXE boot one of the mintboxes from that. I received the clonezilla menu, but selecting an imaging option failed on bringing up the Ethernet device “If this fails, maybe the ethernet card is not supported by the kernel 4.7.0-1-amd64”. So, basically the same thing I’m seeing from FOG.

    Definitely odd as these machines come pre-configured with Linux Mint and I’ve installed Kali (debian-based distro) on four of them without a single networking issue.

    If anyone has any other ideas I’m definitely willing to try them. I’d rather not have to go with another hardware platform if I don’t have to. Luckily still in dev, though.

    Thanks for all your help! Cheers



  • @Tom-Elliott This does occur on both of the two test boxes I have.

    To rule out flaky hardware/cables, I’ve taken the following steps.

    1. Swapped smart switch for unmanaged switch.
    2. Replaced all cables.
    3. Replaced NIC on FOG server - both are Intel Gigabit, [8086:10ce] and [8086:10d3]

    Thanks and Happy New Year!



  • @george1421 [8086:1539]


  • Senior Developer

    Just a simple question, seeing as you state the device id information is 8086:1539 (which has been in the linux kernels for a very long time), does your registration issues happen on multiple systems of the same model or just this one system? I only ask because this isn’t seeming to make sense. The only things I can think of:

    1. Device is having issues (Unlikely seeing as tftp and ipxe appears to work properly).
    2. Device patch cable is screwed up? (Likely because TFTP doesn’t require the cabling layout as full network support typically does, – confusing because while TFTP might work, iPXE shouldn’t).

  • Moderator

    @Wirefall From within the FOG debug console (the same place where you entered the ifconfig -a command key in lspci -nn|grep etwork

    We will be interested in a line that looks like this

    00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 04)
    

    More precisely the hex numbers (i.e. [8086:1502] ) in the square brackets. Those numbers identifies the nic controller.



  • @Wayne-Workman The full specs for the MintBox can be found here - http://www.fit-pc.com/web/products/mintbox/mintbox-mini/

    Max Screen Resolution 1920 x 1200 @ 60Hz
    Processor AMD A Series
    RAM 4 GB SO-DIMM DDR3
    Hard Drive 64 GB
    Graphics Coprocessor Radeon
    Card Description dedicated
    Wireless Type 802.11bgn
    Number of USB 2.0 Ports 3
    Other Technical Details
    Brand Name MintBox
    Series MintBox Mini
    Item model number FITLET-GB-C64-D4-M64S-W-XL-BMint
    Hardware Platform PC
    Operating System linux
    Item Weight 8.8 ounces
    Product Dimensions 4.2 x 3.3 x 0.9 inches
    Item Dimensions L x W x H 4.25 x 3.27 x 0.94 inches
    Color Green
    Processor Brand AMD
    Computer Memory Type DDR3 SDRAM
    Hard Drive Interface Solid State
    Audio-out Ports (#) 1
    Power Source DC
    Voltage 12 volts



  • @Tom-Elliott @Tom-Elliott identical output - lo only. When not PXE booting the device is running a Debian variant (Kali) with full network capabilities.

    0_1483246994180_boot_01.png
    0_1483247003687_boot_02.png


  • Senior Developer

    We’d also need to see ifconfig -a rather than just ifconfig (The latter only shows active devices).


  • Moderator

    @Wirefall What is the model of the problematic host? Is this a server?



  • @george1421 @TOM ELLIOTT @WAYNE WORKMAN

    I manually added and assigned debug task. The contents of /var/log/ are attached. I also ran ifconfig; it only shows the local adapter and it wasn’t possible to bring up eth0.

    Also, previously daemon.log on the FOG server showed a DHCP offer of X.X.X.11 that was then requested by the client and ack’d by the server. This was almost immediately followed by a new discover and an offer made for X.X.X.10, which was also requested and ack’d. This time the same thing occurred, other than the initial offer being for .13, which was still followed up with .10.

    Thanks for looking at this. Let me know if there’s any other information that would be helpful to troubleshoot.

    Cheers!

    0_1483156867766_messages



  • @george1421 Roger that. I’ll attempt the manual insertion and debug capture this evening. Thanks for the prompt responses!

    And, yes, I have all sorts of junk under my desk…



  • @george1421 I’ll attempt a manual assignment and look at the debug info if successful. Cheers


  • Moderator

    @Wirefall I think our next step is to get the target computer to boot into a debug capture console to see what is going on. If a dumb switch with no fancy support doesn’t work I think a hub would be a waste of time (unless you had one under your desk already).



  • @Tom-Elliott I’ve tried through two separate switches. One managed the other as cheap as cheap can get…doubt it has any additional functionality. I’ll dig out an old hub to make sure this isn’t the problem. Thanks for input!


Log in to reply
 

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.