i219LM NIC, ASUS Q170M-C Motherboard
-
@dustindizzle11 S40network is the correct init.d script to call.
That said, can you try putting a “dummy” switch between the main and the system you’re having issues with?
See your info tells me the link up is not being detected, which is usually a STP portion issue. That or the patch cable isn’t returning “link up”.
-
@dustindizzle11 I agree with Tom. What we’ve seen is that if the building switch has Spanning Tree enabled and the switch is not configured for one of the fast spanning tree modes the FOS engine will boot, wait and give up before spanning tree starts forwarding data. On your building switch you need to check to see if STP is enabled, if it is you need to enable one of the fast spanning tree protocols (port fast, RSTP, fast STP, or what ever your switch mfg calls it).
One quick check to see if it is a spanning tree issue, is to put an dumb (unmanaged) switch between the target computer and the building switch and then pxe boot the target computer. That unmanaged switch will keep the building switch from seeing the target nic wink (momentarily drop the port link) as the target transitions from the iPXE kernel to the FOS Engine kernel.
-
There is currently already a dummy switch in between. I can image any other model from the same spot, just this model behaves this way.
-
We have port fast enabled on all ports
-
@dustindizzle11 Do you have other systems of the same model MB? Does this happen in the same way for all systems, or just this one system?
-
@Tom-Elliott Yes we have other systems with the same model MB. This happens in the same way for all systems that have this MB. Thanks again for trying to help with this, I appreciate the time you guys take to help troubleshoot. It is strange that it is not auto configuring eth0. Btw, when in debug mode just adding eth0 to the interfaces file then restarting the network is all that needs to be done to get an address and fog the computer. Just mentioning that because I mentioned in an earlier comment what I did to get an address, which was more than I needed to do.
-
@dustindizzle11 the s40 file you referenced is supposed to do this for you but it only adds the interface I’d the nic has a cable attached (or shall I say recognizes the link is up). And I suspect this is where it’s failing. It can’t detect the link for whatever reason.
-
@Tom-Elliott I thoght that either Sabastian or you added a loop to the network startup code where it would check wait, and then check again for the a link or dhcp packet to be received before giving up on the interface. This was done to mask the spanning tree issue (I know this is not the case here) but it would seem the network link is slow to come up for some reason.
-
@george1421 Just want to add, we just tested a different fog server that has it’s own network (hands out dhcp) and only a dumb switch sits between this fog server and one of these models. Still the same exact result.
-
@dustindizzle11 As George already said the network script does kind of wait for a link to come up. It checks link state of all the network devices for about 35 seconds and bails out if the link does not come up in that time. Take a look at the script here.
I am not sure what else we could do? Bring up the interface even if the link state is not connected?
-
@Sebastian-Roth I totally understand if not much can be done, if anything, I just want to give people a heads of that this NIC or NIC/MB combo does not play nicely with Fog at the moment. We were able to image the whole lab, but in involved entering debug mode for all the machines and manually bringing up the interface, then typing “fog” to image them. Thanks everyone for your time and help trying to figure this out. If I find anything else out in the future I will add it here.
Dustin
-
@dustindizzle11 Its not clear in my mind, is it time that fixes the issue or manually bringing up the interface?
If its manually brining up the interface (or time for that matter), you “could” create a custom init.xz (i.e. init_asus.xz) that executes the ifup command during the initialization stage of the FOS Engine startup. Then assign this new init_asus.xz to those systems that have this mobo installed. Its not the cleanest solution but then at least you could automate system imaging with this kind of mobo. If you know what needs to be done the actual update will take you about 15 minutes to copy and modify a new init.xz and push it out to these clients settings in the FOG gui.
[edit] changed "15 minutes to create a new init.xz " to "15 minutes to copy and modify a new init.xz " to give a better understanding of what must be done.
-
@dustindizzle11 Yeah it would be great to actually find out what’s causing this issue. Are you keen to modify your init.xz as George suggested? I think this is a great idea! Just wondering if you are keen to do that. Let us know so we can guide you.
-
I think the fogproject repo actually has ways to help this, but if not I’ve also had my own scripts to help with “changing” things.
https://forums.fogproject.org/topic/7525/useful-scripts/3 It’s just a script, but the principle of how it works can be used to help do what you need to do I think.
-
@george1421 The problem seems to lie with “eth0” not being present in the interfaces file itself. Once it is added and the network is restarted, we get an address. I am definitely open to creating a custom init, it doesn’t look too complicated. I just need to figure out what specific things I would need to do to bring it up.
@Sebastian-Roth Yes I am willing to modifying the init.xz, I just need to make some time to do it.
@Tom-Elliott Thanks for the script! Ill have to take a look at that.
-
@dustindizzle11 said:
The problem seems to lie with “eth0” not being present in the interfaces file itself.
Have you had a look at the network startup script yet? We generate the config file and start udhcpc as needed.
The issue probably is, that the NIC is not showing up fast enough as we enumerate a list of NICs only once (line 18f):
read p_ifaces <<< $(/sbin/ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && /'$mac'/ {print $2}') read o_ifaces <<< $(/sbin/ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && !/'$mac'/ {print $2}')
Then we loop till we get the interfaces up. In your case the NIC is not showing up at first but somehow later I suppose.
@Tom-Elliott Should we change the script to enumerate NICs after some waiting period again? It’s interesting that we didn’t run into this for such a long time. Definitely seems like this NIC/mainboard combo is kind of special.
-
@Sebastian-Roth Maybe even like if you don’t enumerate the network interfaces on the first pass, wait for 30 seconds (or what ever time) and then try one more time. This would keep booting fast for well behaved network interfaces and then give the “slow” ones a second chance to come up.
Tom: What about the usb kernel flag, would that give enough pause for the network to come up on these boards? I don’t know where else this flag is used or the impact.
-
@Sebastian-Roth I think the problem is the network interface is not showing up at all, not due to a timing thing. For whatever reason, the device never thinks it’s NIC is in an “Up” state. Because it never detects it as up, it never adds it to the list.
Maybe we should move the “link up” state and just try to bring the nic up ourselves?
-
@Tom-Elliott We don’t actually check the “up” state but the carrier link state. Possibly the NIC / driver just does not show up as link connected until the interface is brought UP. This is kind of stupid, but well…
-
@Sebastian-Roth @Tom-Elliott @george1421
I took your advice and modified the init.xz to make a custom init, and all I did was just add “auto eth0” and “auto iface eth0 inet dhcp” to the interfaces file. It brings up eth0 and it images after adding this. So at this point it works. The fix is probably not the best fix in the world, but hey, it works!