Registration of Hosts With Multiple NICs
-
I’m in charge of deploying upwards of 50 servers with several NICs that we want to integrate into our existing Fog managed deployment.
These servers are from HP. Each has a 4-port RJ-45 NIC, a 2-port integrated 10Gbit SFP+ NIC, and two intel 4-port SFP+ 10Gbit NICs. We want to do most all management over the integrated 10Gbit SFP+ NIC, and have no way of interfacing with the NIC sporting the RJ-45 jacks at this time.
When we PXE boot from Fog, we are able to register hosts on the integrated 10Gbit NIC on the first port, but when we reboot and PXE boot again they show up as unregistered. On test servers, disabling any interface before the one used seems to fix this issue, but that isn’t a particularly viable solution-especially if we get more (likely) with the same limitation that disabling the NIC isn’t easily scriptable.What could be the reason why extra interfaces (with seeming precedence for PXE booting) confuse Fog and cause it to show hosts as unregistered?
Thanks in advance for any help.
-
So just for clarity here, when you pxe boot your test server, you are pxe booting the same interface each time?
When you enter the FOG iPXE menu is the “unregistered” mac address the same as what was registered?
And I probably need to ask, what version of FOG are you using?
-
We’re running fog 1.4.4.
We PXE booted from the same interface each time, and the unregistered mac address is the same as the registered one. -
@sbergeron Well this one is an interesting one.
When you go into host management and look at the host definition for this system, what mac is listed as primary?
If there is more than one mac address listed, you may need to tell FOG to ignore the macs you don’t care about.
-
@george1421 It shows the same mac address it was registered with and succeeded in PXE booting from.
There is only one mac address listed -
@sbergeron Did you manually register this host or registered it via the quick or full registration?
-
@george1421 Tried each one, got the same result. Also, when doing full registration/quick registration if I register it twice (delete host and re-register) it gives the exact same mac address each time.
At this point I’m super stumped, because it would kind of require a host defining itself based on two different mac addresses in two different places
-
@sbergeron Hmmm, something is not matching up here. It quite possibly be that there are so many mac addresses in that system its confusing FOG. I can’t see how at the moment. Can you confirm that the mac address that is registered in FOG is the mac address for the network adapter in question?
-
@george1421 Yes, it is the mac address of the first port on the integrated 10Gbit SFP+ NIC
-
@sbergeron ok give me a few minutes to come up with a sql query. We need to ensure that the mac address is actually being recorded correctly in the database.
-
@george1421 What’s odd is if we register it with the 1gig interface enabled, then disable it afterwards, it PXE boots just fine and shows as registered.
-
@sbergeron See that is what I was referring to with the multiple interfaces. What is going on is that iPXE (the tool that creates the boot menu) is/only looks at the first two mac addresses in the device.
(correction, it looks at the fist three interfaces) ref: https://github.com/FOGProject/fogproject/blob/master/src/ipxe/src/ipxescript
I’m still not sure how its getting to the iPXE menu at this point, because if all of the first 3 interfaces do not get an IP address then it should error out.
as for the sql statement, I don’t think we need it at this point but I’ll document it here just in case.
Select h.hostName, m.hmMAC, length(m.hmMAC) from hosts h left join hostMAC m on h.hostID=m.hmHostID where h.hostName='<name of host>';
-
@sbergeron Well I think I know why its messing up, but we may need to get a developer in the mix here to fix it. It can be fixed its just going to take some noodling.
[for developers] @Developers
In the ipxe script that is in the ipxe boot kernel it tries net0-net2 to get a dhcp address failing that it ties dhcp all, which is where its probably getting an IP address on the net3-net7 interfaces. Then it chains to default.ipxeIn default.ipxe it executes this ipxe script:
#!ipxe cpuid --ext 29 && set arch x86_64 || set arch i386 params param mac0 ${net0/mac} param arch ${arch} param platform ${platform} param product ${product} param manufacturer ${product} param ipxever ${version} param filename ${filename} isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :bootme chain http://<fog_server_ip>/fog/service/ipxe/boot.php##params
Where again it only looks at net0-net2. I think to fix this we need to make mac0 be the interface that is actually getting the IP address and not just the first network interface detected. I realize this is a rare case where we have a device that has more than 3 mac addresses being returned. But if it happened once, it will happen again (IMO).
-
Well, looks like we have our answer.
We’re currently just having a couple people go through the servers and disable that first NIC but if this gets resolved before we get more servers that’s a fine solution for me.
-
@Sbergeron This sounds very interesting. Can you explain in more detail! I still don’t really get what is going wrong here.
We PXE booted from the same interface each time, and the unregistered mac address is the same as the registered one. […] It shows the same mac address it was registered with and succeeded in PXE booting from.
This just doesn’t add up for me. Looking forward to hear what’s going on.
-
@sebastian-roth While I can’t speak to what the OP is seeing, I think I understand what is happening.
The OP has a server with 6 network interfaces. (2) 1GbE on the mobo, (2) 10G on a riser card and (4) 1GbE on another card (the counts are right the location are guesses). So that is 6 mac addresses. Not knowing the order iPXE and FOS find the actual mac addresses, but lets say the 5th network adapter is actually plugged into their business network. The ipxe environments only look at the first 3 mac addresses. It never attempts to query the 5th network card to see if it is valid.
I did a little thinking on this over lunch and I think this script (replacing the default.ipxe for this OP only) will get us started. I can tell you that it will not work in its current state (probably) and I haven’t had time to even debug it, but here is the idea.
#!ipxe set fogip 192.168.1.50 set idx:int32 0 set bmac ${net0/mac} :nettest isset ${net${idx}.dhcp/ip:ipv4} || goto nexttest ping --count 1 ${fogip} || goto nexttest set bmac ${net${idx}/mac} goto nettestdone :nexttest inc idx iseq ${idx} 10 || goto nettest :nettestdone cpuid --ext 29 && set arch x86_64 || set arch i386 params param mac0 ${bmac} param arch ${arch} param platform ${platform} param product ${product} param manufacturer ${product} param ipxever ${version} param filename ${filename} isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :bootme chain http://${fogip}/fog/service/ipxe/boot.php##params
-
@george1421 On the one side I really like your idea. But then I am wondering how often this extra thing will cause problems to other users maybe cause ICMP is blocked or what not. Don’t get me wrong. I am not saying we shouldn’t implement this.
Mind opening an issue on github for this to discuss this?
-
@Sbergeron Talked to George about this in chat an I think his point on adjusting
/tftpboot/default.ipxe
script could help in this particular case. So you might want to give this a try. Please let us know if that works instead of disabling the other network cards. -
I’ll mark this solved as it seems like the issue was fixed by disabling the other NICs.