HP Stream 11 G3 - locks up after init.xz?
-
@theterminator93 Thanks for testing and posting a picture here. I am not exactly sure yet but booting this 01_undionly.kpxe in a virtual machine I see the this…
Up till now I wasn’t sure if iPXE or the kernel is causing the hang. I am still not exactly sure if we are right on spot here but comparing the debug outputs I think in your case iPXE is playing up. So I compiled a new binary (02_undionly.kpxe
, download) with added debug statements. Please use this new binary and take another picture. Unfortunately this will be a step by step thing till we find out what is causing the hang… -
All righty - here’s what we get with 02_undionly.kpxe:
-
This is how it looks booting this in a virtualbox:
I don’t know the iPXE code and stuff too well. Hopefully the iPXE devs can help us here. I will get in contact and hopefully get some information. Meanwhile you can try
03_undionly.kpxe
which I just uploaded. It will end up in a panic after loading the kernel but it would still be interesting to see if it’s doing the same on your HP device. -
That binary isn’t loading, for some reason it is trying to pull bzImage from a class C private address rather than the address of the server onsite after it gets an IP.
-
@theterminator93 Wow that is strange. The system that manages where it gets the bImage file is managed by your fog server.
Can you call this url with a web browser, replacing the fog server IP that is correct for your network?
http://<fog_server_ip>/fog/service/ipxe/boot.php?mac=00:00:00:00:00:00
Look through that text file and see if you see that unknown class C address.
If you go into the fog web gui and schedule a deployment/capture task for some test system and then update the mac address above with the actual mac address of the test system does it show you that class C address?
-
@theterminator93 Sorry for that. My fault. I compiled and uploaded this binary in a rush and still had a different iPXE script embedded that I use for testing with my virtualbox environment. Please try
04_undionly.kpxe
… (same download link as before) -
@theterminator93 Could you please boot George’s USB stick again, schedule a debug upload task and run the following command when you get to the command shell:
lspci -nn | grep -i ethernet
As well I build a few more binaries to test with after talking to one of the iPXE developers. Please skip number four and go straight to
05_undionly.kpxe
(DEBUG=init,device,undi,pci
and added statements).Then there is
06_undionly.kpxe
which is a build from the iPXE source without using our FOG modified header files (suggested by the iPXE develover). Only change I made was addingCMD_PARAM
because we need this! Don’t worry about the FOG menu being very basic with that one.And finally it would be interesting to know if iPXE also hangs when a normal exit is done.
07_undionly.kpxe
does just that. What happens when you use this?Please take pictures of all four and post here. Thanks!
-
@Sebastian-Roth said in HP Stream 11 G3 - locks up after init.xz?:
, schedule a debug upload task
Psssst… in the grub menu, menu item #6 (I think) you can jump right to debug mode no capture or deploy step required. You can’t continue to capture or deploy from there, but you can run the commands you outlined.
-
Strange. I rebooted the server and the 192.168.x.x address was no longer interfering with PXE booting. No signs of it in boot.php based on the suggestions from George at this point either.
In any case, if I run the first command from a FOS USB, it comes back with no results. What’s interesting is it tries three times to get an IP but seems to give up, despite its actually getting an address…?
Here’s what it locked up at with 05_undionly and the added host debug parameters DEBUG=init,device,undi,pci.
06_undionly with the extra debug params was just a black screen when it locked up, with bzImage… ok and init.xz… ok and a blinking cursor.
And 07, with debug params, just skips the FOG menu and boots to the OS. This is what’s on the screen for a split-second before it starts booting Windows.
-
@theterminator93 Thanks again for testing and posting pictures!
Strange. I rebooted the server and the 192.168.x.x address was no longer interfering with PXE booting.
That was my fault. I had the wrong script embedded in one of the older binaries. See my post here further down. Don’t worry about it.
In any case, if I run the first command from a FOS USB, it comes back with no results. What’s interesting is it tries three times to get an IP but seems to give up, despite its actually getting an address…?
FOS tries to contact the FOG webserver after getting an IP via DHCP to make sure it’s fully connected. When building George’s USB FOS you need to change
myfogip=x.x.x.x
in that script to match your FOG server IP. Otherwise you’ll run into this issue.
About the empty output, again my fault, sorry! Should have asked you to runlsusb
instead! Post a picture of the full output please. One first info we already have thanks to the picture. Seems like the driverr8152
is handling eth0… Will be interesting to see the USB IDs.Here’s what it locked up at with 05_undionly
Alright, it hangs when trying to unload the UNDI root bus. Can’t tell you exactly what that is but it sounds like the UNDI implementation of that USB NIC is faulty. Possibly we can work around this but I need a little more time. Let me see.
06_undionly with the extra debug params was just a black screen when it locked up, with bzImage… ok and init.xz… ok and a blinking cursor.
Fine, so none of our iPXE header configs is causing this issue. 06 was the clean build.
And 07, with debug params, just skips the FOG menu and boots to the OS…
Nice! See this, when iPXE simply exits (shutting itself down to boot from hard disk) there is no hang on “Removed UNDI root bus”!
Hope I can give you some more information or binaries to test soon!
-
@theterminator93 Ok, here is the latest binary with even more debug output. It’s called
08_undionly.kpxe
(link). Let’s hope we reach the end soon! -
I tried running lsusb from a debug kernel off FOS, nothing appeared with ethernet in the name so I snapped this in case it proves useful.
And here is what we lock up at with 08…
-
@theterminator93 Oh yes, the
lsusb
output only has numbers. I’ll figure those out…We are getting closer, seems like it hangs when calling some internal UNDI wrapper. I commented this call and compiled
09_undionly.kpxe
for you. See if this runs all the way through!? -
Ok, so the first entry of ‘lsusb’ is telling us you have a Realtek RTL8153 Gigabit Ethernet Adapter/Chip in that USB NIC. The WorkingDevices list in our wiki has this kind of USB NIC listed as confirmed working using
undionly.kkpxe
(double ‘k’) in one case andipxe.efi
in the other. Both didn’t work for you, right?! So I guess the firmware of this particular model is crappy. Let’s see what you get booting09_undionly.kpxe
… -
Actually both undionly.kpxe and undionly.kkpxe managed to successfully give me the FOG menu. I didn’t try any of the EFI binaries since I built the Win10 image in legacy mode.
The good news is… 09_undionly.kpxe didn’t lock up and we appear to be in business!
It did spit out a bunch of debug output immediately before the screen went black (like usual) which I attempted to capture:
The only oddity now is that after I PXE boot once (maybe twice), subsequent boot attempts throw an error and reboot until I restart the FOG server.
tftp://10.15.1.20/default.ipxe… ok
http://10.15.1.20/fog/service/ipxe/boot.php… Connection reset (http://ipxe.org/0f0a6039)
Could not boot: Connection reset (http://ipxe.org/0f0a6039)
Chainloading failed, hit ‘s’ for the iPXE shell; reboot in 10 seconds -
@theterminator93 Great to see we got through! On first sight this “connection reset” thing has nothing to do with the patched iPXE binary. I can’t think of how this would be related. But then, you never know.
Does this only happen when cold booting the device? Do you see anything in the apache error logs when this is happening (see my signature on where to find those)?
-
Odd… I did nothing and it started working. I cold booted two times in a row and it worked both times.
But then I then tried a different (same model) host just to be sure, and it threw the error. After that I tried the original host and it was erroring out too. Then I tried undionly.kpxe to see if it would at least give me a menu, same result. Then I tried a different subnet… and it worked. Apparently it’s something screwy with the network or switch.
As far as the error log, nothing of significance. Only events indicating the events that the OS is shutting down and starting again.
-
@theterminator93 Would be interesting to see if the connection is actually being reset by TCP packets. Please install
tcpdump
on your FOG server and then runtcpdump -w /tmp/reset.pcap port 80
and leave the command. Startup your client and see if you run into the issue. Either way stop tcpdump after that with Ctrl+C. In case you don’t see the error just fire up the same command and boot up the client again till you see the “connection reset”. Then upload that PCAP file and post a link here. -
@theterminator93 Any news on this?
-
@Quazz The debug 09_undionly.kpxe binary ended up working properly. Whatever Sebastian did as a workaround allowed it to go straight from PXE to the kernel.
@Sebastian-Roth Here’s a link to a pcap when a different machine did the connection reset jig today.