i219LM NIC, ASUS Q170M-C Motherboard
-
@dustindizzle11 Yeah it would be great to actually find out what’s causing this issue. Are you keen to modify your init.xz as George suggested? I think this is a great idea! Just wondering if you are keen to do that. Let us know so we can guide you.
-
I think the fogproject repo actually has ways to help this, but if not I’ve also had my own scripts to help with “changing” things.
https://forums.fogproject.org/topic/7525/useful-scripts/3 It’s just a script, but the principle of how it works can be used to help do what you need to do I think.
-
@george1421 The problem seems to lie with “eth0” not being present in the interfaces file itself. Once it is added and the network is restarted, we get an address. I am definitely open to creating a custom init, it doesn’t look too complicated. I just need to figure out what specific things I would need to do to bring it up.
@Sebastian-Roth Yes I am willing to modifying the init.xz, I just need to make some time to do it.
@Tom-Elliott Thanks for the script! Ill have to take a look at that.
-
@dustindizzle11 said:
The problem seems to lie with “eth0” not being present in the interfaces file itself.
Have you had a look at the network startup script yet? We generate the config file and start udhcpc as needed.
The issue probably is, that the NIC is not showing up fast enough as we enumerate a list of NICs only once (line 18f):
read p_ifaces <<< $(/sbin/ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && /'$mac'/ {print $2}') read o_ifaces <<< $(/sbin/ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && !/'$mac'/ {print $2}')
Then we loop till we get the interfaces up. In your case the NIC is not showing up at first but somehow later I suppose.
@Tom-Elliott Should we change the script to enumerate NICs after some waiting period again? It’s interesting that we didn’t run into this for such a long time. Definitely seems like this NIC/mainboard combo is kind of special.
-
@Sebastian-Roth Maybe even like if you don’t enumerate the network interfaces on the first pass, wait for 30 seconds (or what ever time) and then try one more time. This would keep booting fast for well behaved network interfaces and then give the “slow” ones a second chance to come up.
Tom: What about the usb kernel flag, would that give enough pause for the network to come up on these boards? I don’t know where else this flag is used or the impact.
-
@Sebastian-Roth I think the problem is the network interface is not showing up at all, not due to a timing thing. For whatever reason, the device never thinks it’s NIC is in an “Up” state. Because it never detects it as up, it never adds it to the list.
Maybe we should move the “link up” state and just try to bring the nic up ourselves?
-
@Tom-Elliott We don’t actually check the “up” state but the carrier link state. Possibly the NIC / driver just does not show up as link connected until the interface is brought UP. This is kind of stupid, but well…
-
@Sebastian-Roth @Tom-Elliott @george1421
I took your advice and modified the init.xz to make a custom init, and all I did was just add “auto eth0” and “auto iface eth0 inet dhcp” to the interfaces file. It brings up eth0 and it images after adding this. So at this point it works. The fix is probably not the best fix in the world, but hey, it works!
-
This is the file that handles network loading and generating the /etc/network/interfaces file.
If you look at line 24 we’re defining the “up” state of the interface. Then we check the carrier which should tell us if a link is detected or not.
I think, in this particular system at least, the link isn’t able to be detected. Do you see any messages while it’s booting? Particularly the message would be: “No link detected on <interface> for # seconds, skipping it.” It tries to do this for 1 second every time. Maybe we need to up the “sleep” to use the “timeout” value instead of only waiting for 1 minute?
I would think the carrier state would be able to update within 36 seconds, but maybe our thoughts are wrong? We aren’t bringing the interface down and back up through each check (not that that should be necessary).
One thing, I should add, is that we do add interfaces to the interfaces file. So it’s more interesting, based on what I can see, that it’s not finding an interface to insert to begin with. If you look at lines 18 and 19 of the file, then at line 23 within the for loop, you can clearly see it’s trying to append the interface to the /etc/network/interfaces file. The fact that it’s not inserting the interface tells me we’re not finding the interface to begin with.
If you can boot the system into “debug” again, can you try the commands:
ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && /<MACOFHOST>/ {print $2}' ip -0 -o addr show | awk -F'[: ]+' '/link[/]?ether/ && !/<MACOFHOST>/ {print $2}'
Change “<MACOFHOST>” with the mac address with colons of that host. You may need to try this command twice. Once with ALL CAPS, once with all lower. Maybe it’s the upper/lower case that’s the issue?
@Sebastian-Roth I don’t think timing is the issue here. At the point where S40network is run, the kernel has loaded all the drivers. I’m suspecting, possibly, it may be the awk -F statement? (Maybe ether isn’t shown until the link is up?) That doesn’t make sense why it works otherwise though.
If it’s not the awk, (I doubt it), it is probably the ip addr show. Maybe we should use
ip -0 -o link show
? -
@dustindizzle11 Would you please show us the script you used?
With the unmodified init.xz what do you see on screen? I guess you see
No network interfaces found, your kernel is most probably missing the correct driver!
, right? -
@Tom-Elliott Thanks for the response! So I got some mixed results with the commands you gave me, specifically the “awk” commands. It seems that capitalization does matter. To make sense of the picture below…
First I ran “ip -0 -o link show”, which lists eth0 as a network device.
Next I ran “ip -0 -o addr show | awk -F’[: ]+’ ‘/link[/]?ether/ && !/<MACOFHOST>/ {print $2}’” with the MACOFHOST in all Capitals (notice the mac only has one letter to capitalize in the picture). This gave me “eth0” in return.
After that I ran the same exact command as above, but with the MACOFHOST being lowercase (the “f” is lowercase). In return I got nothing. So lowercase did not output anything for me in this case.
Then I ran the version without the exclamation point, (which I’m assuming means NOT command wise), with the MAC being CAPITALIZED “ip -0 -o addr show | awk -F’[: ]+’ ‘/link[/]?ether/ && /<MACOFHOST>/ {print $2}’”. In return I got nothing.
Finally, I ran the above command again, but lowercase instead. In return I got “eth0” again.
So it seems that capitalizing does the opposite for each command. With the “!” before the MACOFHOST, having the MAC be Capitalized provides “eth0” in return. Without the “!” before MACOFHOST, having the MAC be Lowercase provides “eth0” in return. Hopefully this helps troubleshoot the issue in some way.
-
@Sebastian-Roth When imaging it just zips to a black screen. I am not able to see any error messages. All I did to the init.xz was just modifying the /etc/network/interfaces and adding eth0 to the list to make it work.
-
@dustindizzle11 Thanks for doing the tests and posting a picture. As we see eth0 is showing up right when you enter the first command. Is this with your modified init.xz or with the original one?
-
@Sebastian-Roth Original
-
@Tom-Elliott This definitely looks like a timing issue to me. As he said th last picture is from starting the original init.xz in debug and then running
ip
command without any other actions I suppose. As we see in the picture the interfaceeth0
is shown properly. So I am pretty sure that it is timing. The interface is not “available” on kernel driver load time straight away but somehow delayed…@dustindizzle11 Thanks again! Can you please try the following stuff: Use the original init.xz and add
has_usb_nic=1
to the kernel parameters in the host configuration for one of your test clients having this issue. Then when you boot up the client it asks you to replug the (USB-)NIC. Just ignore the message, wait for 10 seconds and hit ENTER to proceed. Does it solve the issue?! -
@Sebastian-Roth I say it isn’t timing not because it couldn’t be, but rather I think the device is there but for whatever reason the carrier link is unrecognized. So I think to further test this theory we would need to see the carrier state not the simple ip command. If it is timing that’s alright as we just increase wait time and all should work fine. If it’s not then I think we need to figure out why the carrier link is unseen.
-
@Sebastian-Roth I added “has_usb_nic=1” to the Host Kernel Arguments section and it did the same thing it would do before, which is just zip to a black screen. This is the behavior I get when I don’t get an address. I am using the 4.6.2 Kernel. I didn’t get any messages about re-plugging the USB NIC in though. Let me know if you need to me try anything else.
-
@dustindizzle11 said:
I didn’t get any messages about re-plugging the USB NIC in though. Let me know if you need to me try anything else.
Well then there is no delay and you will run into the same issue again and again I am sure. Can you please post a screenshot of the host kernel setting in the web GUI…
-
-
@dustindizzle11 Is your kernel file actually named
4.6.2
? What do you see when runningls -al /var/www/{,html}/fog/service/ipxe/
?As well would you mind posting a sceenshot of the host settings where we see the MAC address as well. I just want to make sure that we are not running in circles not seeing the obvious reason…