i219LM NIC, ASUS Q170M-C Motherboard
-
There is currently already a dummy switch in between. I can image any other model from the same spot, just this model behaves this way.
-
We have port fast enabled on all ports
-
@dustindizzle11 Do you have other systems of the same model MB? Does this happen in the same way for all systems, or just this one system?
-
@Tom-Elliott Yes we have other systems with the same model MB. This happens in the same way for all systems that have this MB. Thanks again for trying to help with this, I appreciate the time you guys take to help troubleshoot. It is strange that it is not auto configuring eth0. Btw, when in debug mode just adding eth0 to the interfaces file then restarting the network is all that needs to be done to get an address and fog the computer. Just mentioning that because I mentioned in an earlier comment what I did to get an address, which was more than I needed to do.
-
@dustindizzle11 the s40 file you referenced is supposed to do this for you but it only adds the interface I’d the nic has a cable attached (or shall I say recognizes the link is up). And I suspect this is where it’s failing. It can’t detect the link for whatever reason.
-
@Tom-Elliott I thoght that either Sabastian or you added a loop to the network startup code where it would check wait, and then check again for the a link or dhcp packet to be received before giving up on the interface. This was done to mask the spanning tree issue (I know this is not the case here) but it would seem the network link is slow to come up for some reason.
-
@george1421 Just want to add, we just tested a different fog server that has it’s own network (hands out dhcp) and only a dumb switch sits between this fog server and one of these models. Still the same exact result.
-
@dustindizzle11 As George already said the network script does kind of wait for a link to come up. It checks link state of all the network devices for about 35 seconds and bails out if the link does not come up in that time. Take a look at the script here.
I am not sure what else we could do? Bring up the interface even if the link state is not connected?
-
@Sebastian-Roth I totally understand if not much can be done, if anything, I just want to give people a heads of that this NIC or NIC/MB combo does not play nicely with Fog at the moment. We were able to image the whole lab, but in involved entering debug mode for all the machines and manually bringing up the interface, then typing “fog” to image them. Thanks everyone for your time and help trying to figure this out. If I find anything else out in the future I will add it here.
Dustin
-
@dustindizzle11 Its not clear in my mind, is it time that fixes the issue or manually bringing up the interface?
If its manually brining up the interface (or time for that matter), you “could” create a custom init.xz (i.e. init_asus.xz) that executes the ifup command during the initialization stage of the FOS Engine startup. Then assign this new init_asus.xz to those systems that have this mobo installed. Its not the cleanest solution but then at least you could automate system imaging with this kind of mobo. If you know what needs to be done the actual update will take you about 15 minutes to copy and modify a new init.xz and push it out to these clients settings in the FOG gui.
[edit] changed "15 minutes to create a new init.xz " to "15 minutes to copy and modify a new init.xz " to give a better understanding of what must be done.
-
@dustindizzle11 Yeah it would be great to actually find out what’s causing this issue. Are you keen to modify your init.xz as George suggested? I think this is a great idea! Just wondering if you are keen to do that. Let us know so we can guide you.
-
I think the fogproject repo actually has ways to help this, but if not I’ve also had my own scripts to help with “changing” things.
https://forums.fogproject.org/topic/7525/useful-scripts/3 It’s just a script, but the principle of how it works can be used to help do what you need to do I think.
-
@george1421 The problem seems to lie with “eth0” not being present in the interfaces file itself. Once it is added and the network is restarted, we get an address. I am definitely open to creating a custom init, it doesn’t look too complicated. I just need to figure out what specific things I would need to do to bring it up.
@Sebastian-Roth Yes I am willing to modifying the init.xz, I just need to make some time to do it.
@Tom-Elliott Thanks for the script! Ill have to take a look at that.
-
@dustindizzle11 said:
The problem seems to lie with “eth0” not being present in the interfaces file itself.
Have you had a look at the network startup script yet? We generate the config file and start udhcpc as needed.
The issue probably is, that the NIC is not showing up fast enough as we enumerate a list of NICs only once (line 18f):
read p_ifaces <<< $(/sbin/ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && /'$mac'/ {print $2}') read o_ifaces <<< $(/sbin/ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && !/'$mac'/ {print $2}')
Then we loop till we get the interfaces up. In your case the NIC is not showing up at first but somehow later I suppose.
@Tom-Elliott Should we change the script to enumerate NICs after some waiting period again? It’s interesting that we didn’t run into this for such a long time. Definitely seems like this NIC/mainboard combo is kind of special.
-
@Sebastian-Roth Maybe even like if you don’t enumerate the network interfaces on the first pass, wait for 30 seconds (or what ever time) and then try one more time. This would keep booting fast for well behaved network interfaces and then give the “slow” ones a second chance to come up.
Tom: What about the usb kernel flag, would that give enough pause for the network to come up on these boards? I don’t know where else this flag is used or the impact.
-
@Sebastian-Roth I think the problem is the network interface is not showing up at all, not due to a timing thing. For whatever reason, the device never thinks it’s NIC is in an “Up” state. Because it never detects it as up, it never adds it to the list.
Maybe we should move the “link up” state and just try to bring the nic up ourselves?
-
@Tom-Elliott We don’t actually check the “up” state but the carrier link state. Possibly the NIC / driver just does not show up as link connected until the interface is brought UP. This is kind of stupid, but well…
-
@Sebastian-Roth @Tom-Elliott @george1421
I took your advice and modified the init.xz to make a custom init, and all I did was just add “auto eth0” and “auto iface eth0 inet dhcp” to the interfaces file. It brings up eth0 and it images after adding this. So at this point it works. The fix is probably not the best fix in the world, but hey, it works!
-
This is the file that handles network loading and generating the /etc/network/interfaces file.
If you look at line 24 we’re defining the “up” state of the interface. Then we check the carrier which should tell us if a link is detected or not.
I think, in this particular system at least, the link isn’t able to be detected. Do you see any messages while it’s booting? Particularly the message would be: “No link detected on <interface> for # seconds, skipping it.” It tries to do this for 1 second every time. Maybe we need to up the “sleep” to use the “timeout” value instead of only waiting for 1 minute?
I would think the carrier state would be able to update within 36 seconds, but maybe our thoughts are wrong? We aren’t bringing the interface down and back up through each check (not that that should be necessary).
One thing, I should add, is that we do add interfaces to the interfaces file. So it’s more interesting, based on what I can see, that it’s not finding an interface to insert to begin with. If you look at lines 18 and 19 of the file, then at line 23 within the for loop, you can clearly see it’s trying to append the interface to the /etc/network/interfaces file. The fact that it’s not inserting the interface tells me we’re not finding the interface to begin with.
If you can boot the system into “debug” again, can you try the commands:
ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && /<MACOFHOST>/ {print $2}' ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && !/<MACOFHOST>/ {print $2}'
Change “<MACOFHOST>” with the mac address with colons of that host. You may need to try this command twice. Once with ALL CAPS, once with all lower. Maybe it’s the upper/lower case that’s the issue?
@Sebastian-Roth I don’t think timing is the issue here. At the point where S40network is run, the kernel has loaded all the drivers. I’m suspecting, possibly, it may be the awk -F statement? (Maybe ether isn’t shown until the link is up?) That doesn’t make sense why it works otherwise though.
If it’s not the awk, (I doubt it), it is probably the ip addr show. Maybe we should use
ip -0 -o link show
? -
@dustindizzle11 Would you please show us the script you used?
With the unmodified init.xz what do you see on screen? I guess you see
No network interfaces found, your kernel is most probably missing the correct driver!
, right?