HP Stream 11 G3 - locks up after init.xz?
-
@theterminator93 Ok, here is the latest binary with even more debug output. It’s called
08_undionly.kpxe
(link). Let’s hope we reach the end soon! -
I tried running lsusb from a debug kernel off FOS, nothing appeared with ethernet in the name so I snapped this in case it proves useful.
And here is what we lock up at with 08…
-
@theterminator93 Oh yes, the
lsusb
output only has numbers. I’ll figure those out…We are getting closer, seems like it hangs when calling some internal UNDI wrapper. I commented this call and compiled
09_undionly.kpxe
for you. See if this runs all the way through!? -
Ok, so the first entry of ‘lsusb’ is telling us you have a Realtek RTL8153 Gigabit Ethernet Adapter/Chip in that USB NIC. The WorkingDevices list in our wiki has this kind of USB NIC listed as confirmed working using
undionly.kkpxe
(double ‘k’) in one case andipxe.efi
in the other. Both didn’t work for you, right?! So I guess the firmware of this particular model is crappy. Let’s see what you get booting09_undionly.kpxe
… -
Actually both undionly.kpxe and undionly.kkpxe managed to successfully give me the FOG menu. I didn’t try any of the EFI binaries since I built the Win10 image in legacy mode.
The good news is… 09_undionly.kpxe didn’t lock up and we appear to be in business!
It did spit out a bunch of debug output immediately before the screen went black (like usual) which I attempted to capture:
The only oddity now is that after I PXE boot once (maybe twice), subsequent boot attempts throw an error and reboot until I restart the FOG server.
tftp://10.15.1.20/default.ipxe… ok
http://10.15.1.20/fog/service/ipxe/boot.php… Connection reset (http://ipxe.org/0f0a6039)
Could not boot: Connection reset (http://ipxe.org/0f0a6039)
Chainloading failed, hit ‘s’ for the iPXE shell; reboot in 10 seconds -
@theterminator93 Great to see we got through! On first sight this “connection reset” thing has nothing to do with the patched iPXE binary. I can’t think of how this would be related. But then, you never know.
Does this only happen when cold booting the device? Do you see anything in the apache error logs when this is happening (see my signature on where to find those)?
-
Odd… I did nothing and it started working. I cold booted two times in a row and it worked both times.
But then I then tried a different (same model) host just to be sure, and it threw the error. After that I tried the original host and it was erroring out too. Then I tried undionly.kpxe to see if it would at least give me a menu, same result. Then I tried a different subnet… and it worked. Apparently it’s something screwy with the network or switch.
As far as the error log, nothing of significance. Only events indicating the events that the OS is shutting down and starting again.
-
@theterminator93 Would be interesting to see if the connection is actually being reset by TCP packets. Please install
tcpdump
on your FOG server and then runtcpdump -w /tmp/reset.pcap port 80
and leave the command. Startup your client and see if you run into the issue. Either way stop tcpdump after that with Ctrl+C. In case you don’t see the error just fire up the same command and boot up the client again till you see the “connection reset”. Then upload that PCAP file and post a link here. -
@theterminator93 Any news on this?
-
@Quazz The debug 09_undionly.kpxe binary ended up working properly. Whatever Sebastian did as a workaround allowed it to go straight from PXE to the kernel.
@Sebastian-Roth Here’s a link to a pcap when a different machine did the connection reset jig today.
-
@theterminator93 Thanks for capturing a packet dump. Unfortunately I can’t find any TCP reset (wireshark display filter:
tcp.flags.reset==1
) HTTP request to /fog/service/ipxe/boot.php (display filter:http.request.uri contains ipxe
) or any other obvious issue in there.I will look into what’s causing the initial hang issue in the next days! Will let you know.
-
@theterminator93 I dug through the iPXE code and read Intel’s PXE spec but I am still not sure why it would hang/loop/lock up (?) right at this point. This is where iPXE calls basic PXE functions (provided by the NIC’s PXE code) to cleanup the base memory before handing over (e.g. to the linux kernel) - explained here. As far as I understand the PXE specification was never very clear and therefor different vendors implement those PXE functions in a different manner.
Years ago in the early days of iPXE (was called Etherboot then) the developers were not sure about the order to call those functions.
There are other PXE boot loaders out there doing things differently. For example see the pxelinux code. It even mentions that iPXE is doing it differently in a comment. Would be interesting to see if pxelinux is doing fine on your USB NIC. This is easy to test as we still have this stuff from the old days. Please change your DHCP option 67 from
undionly.kpxe
topxelinux.0.old
. As well you want to change the timeout in /tftpboot/pxelinux.cfg/default toTIMEOUT 05
so it’s not just flicking through. Boot up your client and see what happens. My guess is that you see the pxelinux menu and then it properly chainloadsipxe.krn
which then probably hangs as it used to with the other iPXE binaries. In case it hangs on a kernel panic then we are hitting a different spot.As well I added more debugging statements to the code and compiled a new
10_undionly.kpxe
(download, compiled withDEBUG=undinet
) for you to test. Please post a picture of the messages on screen again. By the way I don’t see the pictures hosted on photobucket anymore - says “please update your accound to enable 3rd party hosting”.