FOG will not boot - "Failed to get an IP via DHCP! Tried on interface(s):"
-
Server
- Version: 1.3.0-RC-9 (SVN Revision: 5952)
- OS: Debian Jessie x64
Description
When trying to PXE boot on some more Dell Latitude 5520s, I can open the FOG menu, but when I choose “Full host registration”, I get a bunch of lines about
/testcase-data/phandle-tests/consumer-a: could not <extra info>
, followed by:Failed to get an IP via DHCP! Tried on interfaces(s): #<-- grammar is wrong! Please check your network setup and try again! Press enter to continue
I imaged 2 laptops this morning (same model, same switch, same ethernet cords) successfully with FOG. I have just tried another two, but both are showing this error message. If I reboot, I can select Client System Information (Compatibility), and the menu comes up for Reboot, Network Information, Partition Information, Check FOG Compatibility, etc. Choosing the FOG Compatibility option shows that the computer is compatible with FOG.
On the server, I do not see any log files relating to this issue. The DHCP server is hosted on another Debian Jessie x64 server, but has worked for many many years, and still offers a DHCP lease so that the computer can see the FOG menu. I have tried adding a dumb switch in between the network and the computers I am trying, but it is not making a difference. I have tried adding Kernel 4.7.1 for both 32- and 64-bit through FOG, but no change.
These machines I am trying to work on have NOT yet been registered in FOG. The first two I did this morning worked great. What can I look at next?
-
This issue has been fixed!
Upgrading from RC-8 to RC-9 to RC-10, somewhere in there, it started putting files in
/var/www/html/fog
instead of/var/www/fog
like the rest of the installation. After switching the TFPTBOOT directory to/var/www/html/fog/service/ipxe
, and downloading the 4.1.0 kernel, these Dell e3350’s would boot to the Deploy job. Next up, figuring out why the keyboard won’t work and I need to plug a USB one in.Thanks for the help though, it was quite helpful.
-
@lukebarone While this isn’t an answer, if you pick quick registration or quick image (or what ever it was changed to just recently) does it work correctly. I’m questioning if there is a bug in the full registration code (which I suspect) and/or are these systems having an issue getting an ip address that is being caused by the first error.
And just to confirm these to 5520s have the same bios version as the ones you did earlier (trying to explain the differences other than time of day)?
-
@george1421 Yes, the BIOS version is still A01. Trying to do the Quick Host Registration works… Weird!
At least we know where the issue lies now…
-
@Developers any idea about what might trigger
/testcase-data/phandle-tests/consumer-a: could not <extra info>
to be thrown on the FOS client? -
@lukebarone Is there any chance you can take a picture of the error screen on a mobile phone and post it in the forum here? That would give the developers an exact placement of the error in the startup code.
-
Failed to find cpu0 device node tg3 0000:0a:00.0: Cannot map device to registers, aborting i801_smbus 0000:00:1f.3: SMBus base address uninitialized, upgrade BIOS /testcase-data/phandle-tests/consumer-a: could not get #phandle-cells-missing for /testcase-data/phandle-tests/provider1 /testcase-data/phandle-tests/consumer-a: could not get #phandle-cells-missing for /testcase-data/phandle-tests/provider1 /testcase-data/phandle-tests/consumer-a: could not find phandle /testcase-data/phandle-tests/consumer-a: could not find phandle /testcase-data/phandle-tests/consumer-a: arguments longer than property /testcase-data/phandle-tests/consumer-a: arguments longer than property overlay_is_topmost: #5 clashes #6 @/testcase-data/overlay-node/test-bus/test-unittest8 overlay_removal_is_ok: overlay #5 is not topmost of_overlay_destroy: removal check failed for overlay #5 Starting logging: OK Initializing random number generator... done. Failed to get an IP via DHCP! Tried on interfaces(s): Please check your network setup and try again!
-
The tg3 line looks interesting since that is the name of the Broadcom nic driver. The rest of the messages are not as exciting.
-
@george1421 I added a bunch more utilities and drivers to the kernels. This is likely what’s causing the extra output. It shouldn’t be impacting anything. If you turn the loglevel lower - like to 0, none of that will display. (I just tested to verify and this is indeed the case.)
-
@Tom-Elliott any idea about the tg3 unable to map registers?
@lukebarone it would be interesting if you manually register one of these 2 computers and then schedule a debug deploy. That should drop you to a command prompt on the target computer. From there we should run a few commands to inspect the hardware.
I’m working under the assumption this is not a spanning tree issue since you imaged the same model on the same network port.
-
@george1421 said in FOG will not boot - "Failed to get an IP via DHCP! Tried on interface(s):":
@lukebarone it would be interesting if you manually register one of these 2 computers and then schedule a debug deploy. That should drop you to a command prompt on the target computer. From there we should run a few commands to inspect the hardware.
I’m working under the assumption this is not a spanning tree issue since you imaged the same model on the same network port.I’m assuming it’s not an STP issue either, as that is disabled on my switches. I’ll try a debug task, but I assume I need to register the host first, right?
-
@lukebarone yes manually register the failing system and then schedule a debug deploy.
once you get to the command line it would be interesting to check out the logs in /var/logs (either messages or dmesg) to see if there is any helpful errors with the nic adapter.
-
I’m now thinking the primary issue with DHCP is because of firmware. I’m guessing the tg3 fw is not available on first boot until after the error message displays. So the system doesn’t have a mapping capability to look for the nic. On a warm boot however, it’s already been uploaded to the nic. So the system can now map it as expected, which might explain why quick reg worked.
Can you remove the host from fog and let it boot into wibdows and shutdown to see if a cold boot->quick reg still works or gives the same error?
-
@lukebarone Could you please boot up a live linux CD/DVD and run the following commands to get the exact model of your ethernet chip:
lspci -nn | grep Ethernet
Or boot windows, go to driver manager and find out the PCI IDs there. Otherwise it’s just a lot of useless research and guesswork to find out what NIC is build into your devices.
-
@Sebastian-Roth said in FOG will not boot - "Failed to get an IP via DHCP! Tried on interface(s):":
Or boot windows, go to driver manager and find out the PCI IDs there. Otherwise it’s just a lot of useless research and guesswork to find out what NIC is build into your devices.
Vendor 14E4, Device ID 1681.
@Tom-Elliott said in FOG will not boot - "Failed to get an IP via DHCP! Tried on interface(s):":
I’m now thinking the primary issue with DHCP is because of firmware. I’m guessing the tg3 fw is not available on first boot until after the error message displays. So the system doesn’t have a mapping capability to look for the nic. On a warm boot however, it’s already been uploaded to the nic. So the system can now map it as expected, which might explain why quick reg worked.
Can you remove the host from fog and let it boot into wibdows and shutdown to see if a cold boot->quick reg still works or gives the same error?The host was never registered, and the hard drives switched out with SSDs. I’ve even tried with removing the battery and AC adapter for 5 minutes, and trying it again. Quick Reg works, Full Reg does not.
-
@lukebarone so both times you could started for quick and for full?
-
And the quick reg working is actually creating the entries?
-
@Tom-Elliott No, I could only get it to work by Quick Registering first. If I choose Full, then I get the issue shown above in the question.
After choosing Quick, I log in to the Web UI, rename the system, assign an image and tell it to join AD. Then, I can deploy the image on the next boot. -
Just for the records. This is a Broadcom NetXtreme BCM5761 Gigabit Ethernet PCIe chip. Other people had issues with those as well.
Are you able to boot into a linux live CD and have ethernet setup properly???
-
@Sebastian-Roth said in FOG will not boot - "Failed to get an IP via DHCP! Tried on interface(s):":
Are you able to boot into a linux live CD and have ethernet setup properly???
Yes, I can use the NetInst ISO for Debian Jessie (8.1), and ping my machines and download web pages. Hope that helps!
-
@Sebastian-Roth So I also just tried the new version from Git (1.3.0 RC-10), and the Dell 3350’s will register OK, but I get the same error message during the image deployment. It boots to the network, downloads the bzImage and init.xz, then the same data as posted earlier. These laptops are using a Realtek driver, but I could NetBoot them and register.