full registration hangs at bzimage
-
-
@christian99x Looking at the pcap we see it transfer undionly.kpxe without issue to the last block. At this point you should see the iPXE boot menu.
I think where you are getting stuck at XX% is when FOS loads. From your pcap I’m not seeing the pull request for bzImage.
Do you have the ability to take a second computer and a small hub or a small switch (like SLM2008) using a mirror port capture the traffic actually going in and out of that target computer. We really need to see the entire pxe booting conversation here. The tcpdumps from the FOG perspective only tell us what the FOG server is doing. We need to see from the target computer perspective what is getting to the target from the fog server, dhcp server, tftpboot, etc.
I know we are asking a lot here. You have an abnormal situation that is causing this to fail. What you have is abnormal at least from what we’ve seen historically.
-
@christian99x Is this packet dump somehow being filtered after capturing? The only thing I see is TFTP and ARP traffic. Missing is DHCP (should at least see broadcasts) and HTTP packets.
So either those were filtered out or your network is way more complex. Possibly DHCP server, client and FOG server are in three different network segments. That way we wouldn’t see the DHCP messages when capturing packets on the FOG server. But then… where are the HTTP packets? Maybe you filtered to only show UDP packets??
-
Is this packet dump somehow being filtered after capturing?
No - I used the command you mentioned and uploaded the original file
Possibly DHCP server, client and FOG server are in three different network segments.
Not that I know, though I did not set it up…
Do you have the ability to take a second computer and a small hub or a small switch (like SLM2008) using a mirror port capture the traffic actually going in and out of that target computer.
I’ve never done something like this and I only got a rough idea how to do it, I really would appreciate if you can point me in the right direction
-
if I cancel with ctrl-c and do a ifstat on the pxe shell I receice the following message(s):
net0: 00:1a:92:9e:10:e1 using undionly on 0000:03:00.0 (open)
[Link:up, TX:150 TXE:1 RX:276 RXE:9]
[TXE: 1 x “Network unreachable (http://ipxe.org/28086011)”]
[RXE: 4 x “Operation not supported (http://ipxe.org/3c3f6303)”]
[RXE: 4 x “Error 0x42306001 (http://ipxe.org/42306001)”]
[RXE: 1 x “Invalid argument (http://ipxe.org/1c056002)”] -
@christian99x said in full registration hangs at bzimage:
[Link:up, TX:150 TXE:1 RX:276 RXE:9]
Looks good! TX and RX having reasonable numbers. Don’t worry about the TXE / RXE, that’s not probelmatic errors.
I keep wondering why we don’t see the HTTP traffic in the packet dump?!
-
I did another attempt with tcpdump and now I can see at least some HTTP traffic - maybe this helps:
https://www.dropbox.com/s/4sl1rog18suypmu/bootissue.pcap?dl=0
After ctrl-c tcpdump told me:
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
97 packets captured
97 packets received by filter
0 packets dropped by kernel
67 packets dropped by interface…still working on the mirror port capture…
Thanks!
-
@christian99x Now we see a lot more in the packet dump! Yeah. So it does request
boot.php
which is transferred just fine. Next isbg.png
- here we already see some first TCP retransmission packets, though it seems to finish properly. ThenbzImage
transfer begins and seems of for the first couple of data and acknowledge packets going back and forth. But in the first microseconds the transfer seems to stall completely. From my point of view this is because the client machine does not acknowledge the packets anymore. The interesting thing is that we see ACKs from the client 15, 30 and 45 seconds after the stall. So it kind of seems that the client is not “dead”.Unfortunately there is not much we can do for you I think. I’d need access to such a machine and a lot of time to debug what is causing the network stall. It’s a driver issue within iPXE I reckon.
But to be sure we’d need to rule out other things. Can you try connecting the FOG server and this single one client by using a dump mini switch or even a crossover cable. Does it do the same thing?
-
It might be worthwhile to try a different boot file (eg ipxe.pxe instead of undionly.kpxe) as well.
-
@quazz said in full registration hangs at bzimage:
It might be worthwhile to try a different boot file (eg ipxe.pxe instead of undionly.kpxe) as well.
changing the boot file to ipxe.pxe did the trick! The host performed the full registration successfully without any errors!
Should we continue? It will take me some time to do a mirror port capture (but I’m definitely okay doing this)
-
@christian99x said:
Should we continue? It will take me some time to do a mirror port capture (but I’m definitely okay doing this)
Don’t worry about the mirror port. It’s definitely fine to use
ipxe.pxe
if it works for you. Some work better (or at all) for different hardware. Just see if you can boot all your hardware usingipxe.pxe
. If so, just stick to that. We default toundionly.kpxe
because that causes the least issues. But as we see there are pieces of hardware around not liking the UNDI driver stuff.@Quazz Thanks heaps for mentioning the other binaries. I had thought about this as well but forgot to mention it in my last post.