Registration of Hosts With Multiple NICs
-
@george1421 Tried each one, got the same result. Also, when doing full registration/quick registration if I register it twice (delete host and re-register) it gives the exact same mac address each time.
At this point I’m super stumped, because it would kind of require a host defining itself based on two different mac addresses in two different places
-
@sbergeron Hmmm, something is not matching up here. It quite possibly be that there are so many mac addresses in that system its confusing FOG. I can’t see how at the moment. Can you confirm that the mac address that is registered in FOG is the mac address for the network adapter in question?
-
@george1421 Yes, it is the mac address of the first port on the integrated 10Gbit SFP+ NIC
-
@sbergeron ok give me a few minutes to come up with a sql query. We need to ensure that the mac address is actually being recorded correctly in the database.
-
@george1421 What’s odd is if we register it with the 1gig interface enabled, then disable it afterwards, it PXE boots just fine and shows as registered.
-
@sbergeron See that is what I was referring to with the multiple interfaces. What is going on is that iPXE (the tool that creates the boot menu) is/only looks at the first two mac addresses in the device.
(correction, it looks at the fist three interfaces) ref: https://github.com/FOGProject/fogproject/blob/master/src/ipxe/src/ipxescript
I’m still not sure how its getting to the iPXE menu at this point, because if all of the first 3 interfaces do not get an IP address then it should error out.
as for the sql statement, I don’t think we need it at this point but I’ll document it here just in case.
Select h.hostName, m.hmMAC, length(m.hmMAC) from hosts h left join hostMAC m on h.hostID=m.hmHostID where h.hostName='<name of host>';
-
@sbergeron Well I think I know why its messing up, but we may need to get a developer in the mix here to fix it. It can be fixed its just going to take some noodling.
[for developers] @Developers
In the ipxe script that is in the ipxe boot kernel it tries net0-net2 to get a dhcp address failing that it ties dhcp all, which is where its probably getting an IP address on the net3-net7 interfaces. Then it chains to default.ipxeIn default.ipxe it executes this ipxe script:
#!ipxe cpuid --ext 29 && set arch x86_64 || set arch i386 params param mac0 ${net0/mac} param arch ${arch} param platform ${platform} param product ${product} param manufacturer ${product} param ipxever ${version} param filename ${filename} isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :bootme chain http://<fog_server_ip>/fog/service/ipxe/boot.php##params
Where again it only looks at net0-net2. I think to fix this we need to make mac0 be the interface that is actually getting the IP address and not just the first network interface detected. I realize this is a rare case where we have a device that has more than 3 mac addresses being returned. But if it happened once, it will happen again (IMO).
-
Well, looks like we have our answer.
We’re currently just having a couple people go through the servers and disable that first NIC but if this gets resolved before we get more servers that’s a fine solution for me.
-
@Sbergeron This sounds very interesting. Can you explain in more detail! I still don’t really get what is going wrong here.
We PXE booted from the same interface each time, and the unregistered mac address is the same as the registered one. […] It shows the same mac address it was registered with and succeeded in PXE booting from.
This just doesn’t add up for me. Looking forward to hear what’s going on.
-
@sebastian-roth While I can’t speak to what the OP is seeing, I think I understand what is happening.
The OP has a server with 6 network interfaces. (2) 1GbE on the mobo, (2) 10G on a riser card and (4) 1GbE on another card (the counts are right the location are guesses). So that is 6 mac addresses. Not knowing the order iPXE and FOS find the actual mac addresses, but lets say the 5th network adapter is actually plugged into their business network. The ipxe environments only look at the first 3 mac addresses. It never attempts to query the 5th network card to see if it is valid.
I did a little thinking on this over lunch and I think this script (replacing the default.ipxe for this OP only) will get us started. I can tell you that it will not work in its current state (probably) and I haven’t had time to even debug it, but here is the idea.
#!ipxe set fogip 192.168.1.50 set idx:int32 0 set bmac ${net0/mac} :nettest isset ${net${idx}.dhcp/ip:ipv4} || goto nexttest ping --count 1 ${fogip} || goto nexttest set bmac ${net${idx}/mac} goto nettestdone :nexttest inc idx iseq ${idx} 10 || goto nettest :nettestdone cpuid --ext 29 && set arch x86_64 || set arch i386 params param mac0 ${bmac} param arch ${arch} param platform ${platform} param product ${product} param manufacturer ${product} param ipxever ${version} param filename ${filename} isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :bootme chain http://${fogip}/fog/service/ipxe/boot.php##params
-
@george1421 On the one side I really like your idea. But then I am wondering how often this extra thing will cause problems to other users maybe cause ICMP is blocked or what not. Don’t get me wrong. I am not saying we shouldn’t implement this.
Mind opening an issue on github for this to discuss this?
-
@Sbergeron Talked to George about this in chat an I think his point on adjusting
/tftpboot/default.ipxe
script could help in this particular case. So you might want to give this a try. Please let us know if that works instead of disabling the other network cards. -
I’ll mark this solved as it seems like the issue was fixed by disabling the other NICs.