• Hi all,

    I’m trying to get FOG to work with some HP Stream 11 G3s for a client. I recently imaged all of them with CloneZilla (no Ethernet port), but wanted to try out FOG since it would reduce deployment time in our environment under our particular circumstances.

    I wanted to ensure maximum likelihood that the USB-Ethernet adapter I purchased would properly PXE boot, so I picked up HP’s branded USB 3.0 to Ethernet adapter. It PXE boots perfectly, and I get the FOG menu. When I try to do a full registration and inventory, it shows that it loads the kernel and init.xz, then it locks up. No caps lock LED response, stays that way after 10-15 minutes, and I have to hold the power button for 10 seconds to get it to power off.

    I’m running 1.4.0 RC6 (and will update, try again, and report after I get onsite this morning) and I made sure I had the most up-to-date kernel - but no change. Any suggestions or config changes which might resolve this?

  • Moderator

    @theterminator93 I dug through the iPXE code and read Intel’s PXE spec but I am still not sure why it would hang/loop/lock up (?) right at this point. This is where iPXE calls basic PXE functions (provided by the NIC’s PXE code) to cleanup the base memory before handing over (e.g. to the linux kernel) - explained here. As far as I understand the PXE specification was never very clear and therefor different vendors implement those PXE functions in a different manner.

    Years ago in the early days of iPXE (was called Etherboot then) the developers were not sure about the order to call those functions.

    There are other PXE boot loaders out there doing things differently. For example see the pxelinux code. It even mentions that iPXE is doing it differently in a comment. Would be interesting to see if pxelinux is doing fine on your USB NIC. This is easy to test as we still have this stuff from the old days. Please change your DHCP option 67 from undionly.kpxe to pxelinux.0.old. As well you want to change the timeout in /tftpboot/pxelinux.cfg/default to TIMEOUT 05 so it’s not just flicking through. Boot up your client and see what happens. My guess is that you see the pxelinux menu and then it properly chainloads ipxe.krn which then probably hangs as it used to with the other iPXE binaries. In case it hangs on a kernel panic then we are hitting a different spot.

    As well I added more debugging statements to the code and compiled a new 10_undionly.kpxe (download, compiled with DEBUG=undinet) for you to test. Please post a picture of the messages on screen again. By the way I don’t see the pictures hosted on photobucket anymore - says “please update your accound to enable 3rd party hosting”.

  • Moderator

    @theterminator93 Thanks for capturing a packet dump. Unfortunately I can’t find any TCP reset (wireshark display filter: tcp.flags.reset==1) HTTP request to /fog/service/ipxe/boot.php (display filter: http.request.uri contains ipxe) or any other obvious issue in there.

    I will look into what’s causing the initial hang issue in the next days! Will let you know.

  • @Quazz The debug 09_undionly.kpxe binary ended up working properly. Whatever Sebastian did as a workaround allowed it to go straight from PXE to the kernel.

    @Sebastian-Roth Here’s a link to a pcap when a different machine did the connection reset jig today.

  • Moderator

    @theterminator93 Any news on this?

  • Moderator

    @theterminator93 Would be interesting to see if the connection is actually being reset by TCP packets. Please install tcpdump on your FOG server and then run tcpdump -w /tmp/reset.pcap port 80 and leave the command. Startup your client and see if you run into the issue. Either way stop tcpdump after that with Ctrl+C. In case you don’t see the error just fire up the same command and boot up the client again till you see the “connection reset”. Then upload that PCAP file and post a link here.

  • Odd… I did nothing and it started working. I cold booted two times in a row and it worked both times.

    But then I then tried a different (same model) host just to be sure, and it threw the error. After that I tried the original host and it was erroring out too. Then I tried undionly.kpxe to see if it would at least give me a menu, same result. Then I tried a different subnet… and it worked. Apparently it’s something screwy with the network or switch.

    As far as the error log, nothing of significance. Only events indicating the events that the OS is shutting down and starting again.

  • Moderator

    @theterminator93 Great to see we got through! On first sight this “connection reset” thing has nothing to do with the patched iPXE binary. I can’t think of how this would be related. But then, you never know.

    Does this only happen when cold booting the device? Do you see anything in the apache error logs when this is happening (see my signature on where to find those)?

  • Actually both undionly.kpxe and undionly.kkpxe managed to successfully give me the FOG menu. I didn’t try any of the EFI binaries since I built the Win10 image in legacy mode.

    The good news is… 09_undionly.kpxe didn’t lock up and we appear to be in business! 😀

    It did spit out a bunch of debug output immediately before the screen went black (like usual) which I attempted to capture:

    alt text

    The only oddity now is that after I PXE boot once (maybe twice), subsequent boot attempts throw an error and reboot until I restart the FOG server.

    tftp://… ok… Connection reset (http://ipxe.org/0f0a6039)
    Could not boot: Connection reset (http://ipxe.org/0f0a6039)
    Chainloading failed, hit ‘s’ for the iPXE shell; reboot in 10 seconds

  • Moderator

    Ok, so the first entry of ‘lsusb’ is telling us you have a Realtek RTL8153 Gigabit Ethernet Adapter/Chip in that USB NIC. The WorkingDevices list in our wiki has this kind of USB NIC listed as confirmed working using undionly.kkpxe (double ‘k’) in one case and ipxe.efi in the other. Both didn’t work for you, right?! So I guess the firmware of this particular model is crappy. Let’s see what you get booting 09_undionly.kpxe

  • Moderator

    @theterminator93 Oh yes, the lsusb output only has numbers. I’ll figure those out…

    We are getting closer, seems like it hangs when calling some internal UNDI wrapper. I commented this call and compiled 09_undionly.kpxe for you. See if this runs all the way through!?

  • I tried running lsusb from a debug kernel off FOS, nothing appeared with ethernet in the name so I snapped this in case it proves useful.

    alt text

    And here is what we lock up at with 08…

    alt text

  • Moderator

    @theterminator93 Ok, here is the latest binary with even more debug output. It’s called 08_undionly.kpxe (link). Let’s hope we reach the end soon!

  • Moderator

    @theterminator93 Thanks again for testing and posting pictures!

    Strange. I rebooted the server and the 192.168.x.x address was no longer interfering with PXE booting.

    That was my fault. I had the wrong script embedded in one of the older binaries. See my post here further down. Don’t worry about it.

    In any case, if I run the first command from a FOS USB, it comes back with no results. What’s interesting is it tries three times to get an IP but seems to give up, despite its actually getting an address…?

    FOS tries to contact the FOG webserver after getting an IP via DHCP to make sure it’s fully connected. When building George’s USB FOS you need to change myfogip=x.x.x.x in that script to match your FOG server IP. Otherwise you’ll run into this issue.
    About the empty output, again my fault, sorry! Should have asked you to run lsusb instead! Post a picture of the full output please. One first info we already have thanks to the picture. Seems like the driver r8152 is handling eth0… Will be interesting to see the USB IDs.

    Here’s what it locked up at with 05_undionly

    Alright, it hangs when trying to unload the UNDI root bus. Can’t tell you exactly what that is but it sounds like the UNDI implementation of that USB NIC is faulty. Possibly we can work around this but I need a little more time. Let me see.

    06_undionly with the extra debug params was just a black screen when it locked up, with bzImage… ok and init.xz… ok and a blinking cursor.

    Fine, so none of our iPXE header configs is causing this issue. 06 was the clean build.

    And 07, with debug params, just skips the FOG menu and boots to the OS…

    Nice! See this, when iPXE simply exits (shutting itself down to boot from hard disk) there is no hang on “Removed UNDI root bus”!

    Hope I can give you some more information or binaries to test soon!

  • Strange. I rebooted the server and the 192.168.x.x address was no longer interfering with PXE booting. No signs of it in boot.php based on the suggestions from George at this point either.

    In any case, if I run the first command from a FOS USB, it comes back with no results. What’s interesting is it tries three times to get an IP but seems to give up, despite its actually getting an address…?

    alt text

    Here’s what it locked up at with 05_undionly and the added host debug parameters DEBUG=init,device,undi,pci.

    alt text

    06_undionly with the extra debug params was just a black screen when it locked up, with bzImage… ok and init.xz… ok and a blinking cursor.

    And 07, with debug params, just skips the FOG menu and boots to the OS. This is what’s on the screen for a split-second before it starts booting Windows.

    alt text

  • Moderator

    @Sebastian-Roth said in HP Stream 11 G3 - locks up after init.xz?:

    , schedule a debug upload task

    Psssst… in the grub menu, menu item #6 (I think) you can jump right to debug mode no capture or deploy step required. You can’t continue to capture or deploy from there, but you can run the commands you outlined.

  • Moderator

    @theterminator93 Could you please boot George’s USB stick again, schedule a debug upload task and run the following command when you get to the command shell: lspci -nn | grep -i ethernet

    As well I build a few more binaries to test with after talking to one of the iPXE developers. Please skip number four and go straight to 05_undionly.kpxe (DEBUG=init,device,undi,pci and added statements).

    Then there is 06_undionly.kpxe which is a build from the iPXE source without using our FOG modified header files (suggested by the iPXE develover). Only change I made was adding CMD_PARAM because we need this! Don’t worry about the FOG menu being very basic with that one.

    And finally it would be interesting to know if iPXE also hangs when a normal exit is done. 07_undionly.kpxe does just that. What happens when you use this?

    Please take pictures of all four and post here. Thanks!

  • Moderator

    @theterminator93 Sorry for that. My fault. I compiled and uploaded this binary in a rush and still had a different iPXE script embedded that I use for testing with my virtualbox environment. Please try 04_undionly.kpxe… (same download link as before)

  • Moderator

    @theterminator93 Wow that is strange. The system that manages where it gets the bImage file is managed by your fog server.

    Can you call this url with a web browser, replacing the fog server IP that is correct for your network?


    Look through that text file and see if you see that unknown class C address.

    If you go into the fog web gui and schedule a deployment/capture task for some test system and then update the mac address above with the actual mac address of the test system does it show you that class C address?

  • That binary isn’t loading, for some reason it is trying to pull bzImage from a class C private address rather than the address of the server onsite after it gets an IP.