Failed to get an IP via DHCP!
-
@george1421 said in Failed to get an IP via DHCP!:
@george1421 sorry to keep sending you back to the well, but is FOG_WEB_ROOT set to
/fog/
?The other fields I was mentioned are FOG_TFTP_HOST and FOG_WEB_HOST
Don’t worry about that. Yes, both are set to the IP address and FOG_WEB_ROOT is set to
/fog/
. -
@george1421 - Ok… The Dell server does the exact same thing. I did find something interesting however. For this Dell, there are two onboard NICs and a PCIe NIC. I initially plugged a cable into eth0 and got nothing. A added a cable to eth1 and still got nothing. The system would attempt to get an IP and then fail. However, when I added a cable to the PCIe card, the system inventoried. I then added a card to one of the HP servers and was able to get it to inventory. So… the embedded NIC is good for PXE, but bad for getting a DHCP and the PCIe is good for getting a DHCP, but bad for PXE?
While this is a viable workaround, I won’t have the time to pop in an add in card to every system just to get it to image? I did notice that both of the servers are running NetXtreme II NICs onboard. Is there something with that manufacturer that doesn’t work with isc-dhcp?
-
@kellyg If I remember correctly (at least on dell servers), when it goes an enumerates the network interfaces it will start with the add on adapters first and assign the LOM adapters last.
If FOS is seeing the mac address of the LOM network adapters then it should be able to use them to pxe boot. If the mac addresses are not showing up then FOS probably doesn’t have the nic drivers.
-
@george1421 - I’m not sure about the drivers. So here’s what I did. With the addon card connected to the switch and the LOM connected to the switch, I was able to register and inventory the server with no problem. However, FOS records the addon card as the primary mac address. The server won’t PXE to the addon card, even thought it’s listed in the boot order, it only wants to use the embedded. Don’t know why only the onboard, must be some connection problem between the server and the chair.
I connected the cable back to the LOM port 1 and the system booted to PXE fine. However, unless I change the primary MAC in FOS to match the embeded, then it will report that the system has not been registered or inventoried. So if I create a task to push an image to the server, it doesn’t recognize it and never starts.
Changing the MAC in FOS and everything works correctly. But then I’m back the original problem, I’ve got to install and addin card into the server just to get it to image. Might as well pop in the winblows DVD and install it that way. -
@kellyg Here’s what I want to try (sorry don’t have a clean understanding just yet).
- Place the pci nic in the server.
- On the FOG management gui schedule a debug (capture or deploy) it doesn’t matter we need access to the FOS command prompt.
- PXE boot the target computer.
- After a few presses of the enter key it should drop you to a command prompt.
- Give root a password, anything is fine. Root just needs a password. Set it with
passwd
. This password will be only temporary since FOS runs out of RAM. - Get the IP address of the target computer with
ip addr
- Now from a windows computer use putty to connect to FOS at the IP address collected from #6 and login as *root and the password defined in #5. (we are doing this to allow easy copy and paste of commands between windows and FOS.
- Now lets understand what FOS sees
ip link show
Please post this here. If possible note the mac address to the physical network interface. Holefully you will have one ethX interface for each physical interface. - Next lets see what FOS sees as pci devices
lspci |grep net
Leave this setup configured. It would be interesting to know if you connected one of the
non-functional
network ports to a second network cable (not use the one on the pci nic) do they pick up an IP address with FOS running? But this is something we should test after collecting what we need to know. With the second network cable installedip addr show
should show if these other LOM nics pick up an IP address. -
@KellyG Could spanning tree or auto-speed negotiation issues play a role here as well? Just to rule that out please connect a dumb unmanaged switch in between the client and the actual switch.
To figure out which NICs play nicely with the linux kernel (as this is where you experience the issue) you might want to boot any recent live linux CD (kernel version 4.8 and newer I’d recommend). Plugin on the first NIC, boot from CD and run
lspci -vv | grep -e "^[0-9]" -e "Kernel driver" | grep -A 1 "net"
. On my PC for example this lookes like this:# lspci -vv | grep -e "^[0-9]" -e "Kernel driver" | grep -A 1 "net" 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) Kernel driver in use: e1000e
Then see if you can setup the network properly and ping other PCs. Note down driver, NIC, MAC address. Then shutdown, re-plug to a different NIX and redo the same thing again. Please post all the information here so we can see if we have all the kernel drivers included in our build. If not we can add those for you.
-
@sebastian-roth Sorry for the delay in getting back, I only work Monday through Thursday. Booting to a live CD and pulling the NIC list gives me the following
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Kernel Driver in use: bnx2 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Kernel Driver in use: bnx2 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Kernel Driver in use: bnx2 03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) Kernel Driver in use: bnx2
-
@george1421 Just scrolling through all of the responses. I’ll run through your steps here shortly and get back to you.
-
@KellyG Hmm, from the output you posted it seems like it only recognized the Broadcom NetXtreme II 5709 Gigabit NIC - Quad Port. What about those two onboard NICs? Not recognized at all? I wonder why they don’t show up in the
lspci
listing. Can you get the PCI IDs (for onboard and the PCIe NetXtreme card) from the windows device manager - post the *full “Hardware ID” string you find in the details tab)? Possibly we need to add a firmware driver to the kernel. -
@sebastian-roth Those are the onboard nic’s. I had to pull the PCIe card for another server. If I have the card in the system, I can boot to the LOM, but they don’t recognize that there is a DHCP responding. The PCIe NIC will respond to the DHCP, but won’t PXE boot.
I’m not finding anything specific about the NetXtreme, are they supported for imaging? -
@KellyG In general the BNX2 driver is in our kernel, see here but some of those NICs need special (closed source) firmware blobs added to the kernel. We do add those as requested but we need to know exactly what card you have to figure out if firmware is needed and which one. That is why I asked for the exact PCI IDs.
I can boot to the LOM, but they don’t recognize that there is a DHCP responding
What do you mean by LOM?
-
@KellyG Did you read what I was saying about spanning tree and trying out a dumb (unmanaged) mini switch??
-
@sebastian-roth with regard to the switch, I’m still trying to locate a non-managed switch. According to the network team, the switch I’m using has been wiped (although I suspect something is still enabled on it). I don’t have the password for it so I can’t log in and verify spanning tree. I have a small 4 port switch at home and will try it tomorrow.
For the ID’s, I hate KVM switches when they aren’t labeled properly. I was looking at the FOG server instead of the server I’ve been trying to image. This is what I show;
$ lspci -vv |grep -e "^[0-9]" -e "Kernel driver" |grep -A 1 "net" 03:00.0 Ethernet controller: Broadcom Corporation Device 1657 (rev 01) 03:00.1 Ethernet controller: Broadcom Corporation Device 1657 (rev 01) 03:00.2 Ethernet controller: Broadcom Corporation Device 1657 (rev 01) 03:00.3 Ethernet controller: Broadcom Corporation Device 1657 (rev 01) 0a:00.0 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42) Kernel driver in use: netXen_nic 0a:00.1 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42) Kernel driver in use: netXen_nic 0a:00.2 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42) Kernel driver in use: netXen_nic 0a:00.3 Ethernet controller: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter (rev 42) Kernel driver in use: netXen_nic
-
@kellyg said in Failed to get an IP via DHCP!:
the switch I’m using has been wiped (although I suspect something is still enabled on it).
The issue is a wiped switch will typically have spanning tree enabled by default. It depends on the switch mfg, but most of the time its a good then to have spanning tree enabled. So your suspicion is probably spot on.
-
@KellyG Unfortunately those adapters seem to be a little special. Linux does not translate the PCI IDs as good as I expected it to. So I am unable to reverse those and need to ask you to run this as well:
lspci -nn | grep net
(sorry, should have asked for this earlier as well)
Seems like we do have the NETXEN_NIC driver in the most current kernel configs. But don’t know about the Broadcom devices - need the PCI IDs first.Edit: From that number 1657 it looks as if this could be a “NetXtreme BCM5719 Gigabit Ethernet PCIe card” - PCI ID 14e4:1657. It’s supposed to work with the TIGON3 driver. Don’t think we need a firmware blob for this. But on the other hand I am wondering why “Kernel driver in use” is missing in the output for those Broadcom NIC(s).
-
By the way. Thanks to the both of you for your help.
@george1421 I’ll know something tomorrow with the dumb switch. It’s an old netgear sitting in a drawer.
@Sebastian-Roth Here’s the latest data for you.
03:00.0 Ethernet controller [0200]: Broadcom Corporation Device [14e4:1657] (rev 01) 03:00.1 Ethernet controller [0200]: Broadcom Corporation Device [14e4:1657] (rev 01) 03:00.2 Ethernet controller [0200]: Broadcom Corporation Device [14e4:1657] (rev 01) 03:00.3 Ethernet controller [0200]: Broadcom Corporation Device [14e4:1657] (rev 01) 0a:00.0 Ethernet controller [0200]: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter [4040:0100] (rev 42) 0a:00.1 Ethernet controller [0200]: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter [4040:0100] (rev 42) 0a:00.2 Ethernet controller [0200]: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter [4040:0100] (rev 42) 0a:00.3 Ethernet controller [0200]: NetXen Incorporated NX3031 Multifunction 1/10-Gigabit Server Adapter [4040:0100] (rev 42)
Let me know what else you need.
-
@KellyG Ok, from what it looks to me we have the right kernel drivers included (definitely for FOG 1.4.4 and I am sure much earlier but haven’t checked).
Try the dumb mini-switch today and see what you get.
Just as a hint: The ethernet ports aren’t always “sorted” the same way. So what is
eth0
on your installed OS does not have to beeth0
(most probably is not) when FOG is booting up. Please keep that in mind when trying through the ports. Our network start script should handle this though as it does enumerate the NICs found and tries them all. -
@sebastian-roth & @george1421
Ok… The netgear switch didn’t change anything. I’m starting to suspect the problem is in the server FOG is running on. I’ve got a meeting with HPE next week. I’ve been pinging them about this and the tech keeps telling me that there is nothing wrong with the server, however, I tried to boot the server to the corporate PXE and I couldn’t get it to even get the PXE menu. Whereas with the FOG server, I could get the PXE Menu with no problem. I will let everyone know the outcome of my meeting with HPE and we can pick it up from there.I noticed that when I ran ifconfig on the server after booting to a Knoppix Live CD, The only adapters that were detected were the add-in cards. It registered them as a Broadcom NIC. So my assumption (I’m sure I don’t have to explain what assume means) is that the netXen nic is the LOM and the Broadcom is the PCIe. So Knoppix is registering eth0 as the Broadcom. I booted to the OS (Winblows) and it shows the LOM as the primary NICs.
-
Moderator’s note: May not be related, but cross linking threads: https://forums.fogproject.org/topic/10599/hp-proliant-dl580-g4
-
@KellyG Any news on this?