Hyper V and Pxe boot to Fog problems



  • @sudburr Yes, as George said, we are building images on windows 10 (now 1709). We don’t currently have a server running server 2016 so I am unsure if it would behave any differently. I assume if server 2016 is not already affected by the same issue, it will be soon, but this could be helpful to some to know for sure.


  • Moderator

    @sudburr The issue (as I understand it) is where he’s running hyper-v on top of Windows 10 1709. Where in Windows 10 1703 iPXE booted correctly, and now it doesn’t. As I’ve said before, while Win10 1709 flies the Win10 banner, it is very different operating system under the hood than is 1703.



  • What version Hyper-V are you running?
    What is the precise building of your virtual machine (prior to installing your OS)?

    EG: My setup is on Server 2016 Standard with Hyper-V role

    Virtual Machine Generation 1

    • 4 Processors
    • Memory Startup RAM 4096 MB (NO Dynamic Memory)
    • Network Adapter (Not Connected)
    • Delete SCSI controller
    • Boot Order = CD, IDE, Legacy Network Adapter
    • VHDX, (1024 GiB), Dynamic
    • Secure Boot Disabled
    • Standard Checkpoints
    • Automatic Start Action (nothing)

    Virtual Machine Generation 2

    • 4 Processors
    • Memory Startup RAM 4096 MB ( NO Dynamic Memory)
    • Network Adapter (Not Connected)
    • Boot Order = DVD Drive, File, Hard Drive, Network Adapter
    • VHDX, (1024 GiB), Dynamic
    • Secure Boot Disabled
    • Standard Checkpoints
    • Automatic Start Action (nothing)

    Then I install the OS… I don’t connect the network adapter until after entering audit mode.

    When it comes time to capture the machine, after it’s shutdown I do this.

    Gen1.
    ADD Legacy Network Adapter with Virtual Switch to CONNECTED
    SET Network Adapter Virtual Switch to CONNECTED
    SET BIOS to Boot from Legacy Network Adapter

    Gen2.
    SET Network Adapter Virtual Switch to CONNECTED
    SET BIOS to Boot from Network Adapter

    Then capture.


  • Developer

    @Paulman9 I think I am at a loss here although I have played with iPXE and dug through the code a fair bit over the years. You might want to post this in the iPXE forums (see their website).



  • @george1421 We are BIOS booting these, I didn’t think secure boot should have a hand in it, but it sure doesn’t work with these switches, and strangely I believe the UEFI image works fine with a gen 2 1709 vm (at least with secure boot off.) We don’t use fog for our UEFI images yet so I’ve only tested it once. Only security related option I see in a gen 1 vm is for key storage drives, and we aren’t using that. Anyway, if there is anything you would like me to test for this, just let me know. Otherwise, it seems we are good to go.

    Edit: Rebooted to verify, Secure boot is off on the host I was testing on


  • Moderator

    @paulman9 said in Hyper V and Pxe boot to Fog problems:

    IMAGE_TRUST_CMD

    Interesting these all deal with certificates. As long as you are not doing anything with https on your FOG server, what you built should work OK.

    This makes me wonder if the certificates may be related to secure boot being enabled on this win10 host system? I’m only guessing (TBH) but just trying to correlate why an upgrade to 1709 and certificates/image verify would be related. BUT this is excellent info to take back to the iPXE guys. I’m sure they will see this more often than the FOG project.



  • After going though the differences one by one, I have narrowed it down to three switches that will cause the hang in hyper-v.
    #define DOWNLOAD_PROTO_HTTPS
    #define IMAGE_TRUST_CMD
    #define CERT_CMD
    Comment out those lines from the general.h (I used the files from dev-branch on GitHub, if they differ from master, then that might cause additional issues, didn’t check) and the current build (47849) of ipxe works in hyper-v. Would probably fix older versions too, but I don’t see any reason to downgrade as everything is working so far. Unsure how these are even affecting hyper-v at such an early state, but all I know is mine works with no other changes than that. Hope this helps someone make sense of this issue, or at least get around this if they are affected. Again, thanks for everyone’s help with this.



  • @sebastian-roth I appreciate your help, and that does answer my question. For the record, though, the problem remains. Building from source using the fog provided (GitHub dev-branch pulled) console.h, general.h, and settings.h gives the same failure in hyper-v. Building from source, and only including ipxescript (not replacing the 3 files above) results in a undionly.kpxe that does work past initializing devices in hyper-v. I’m going through each option now to try to narrow it down as it seems now something in the config is what is tripping up hyper-v.

    Edit: Modified the default ipxe general.h file to include param_cmd, and changed nothing else. Completed downloading an image to my vm on hyper-v. Unsure what I broke in the process, as there are a lot of switches I’m missing, but I can confirm this works on my setup.


  • Developer

    Ok, got an update here. The issue has been reported already: https://github.com/xbgmsharp/ipxe-buildweb/issues/49 - but no response yet.


  • Developer

    @paulman9 Ok, from what I can gather it seems like selecting the advanced mode you always get a full blown ipxe.kpxe binary with all the native iPXE drivers included even “Choose a NIC type” is set to “undionly”. I will ask the iPXE people about this. For now you are probably best of compiling your own binaries using this make command: make bin/undionly.kpxe EMBED=ipxescript


  • Developer

    @Paulman9 This sounds interesting. I don’t have access to my dev env right now but I can have a look later today.



  • @george1421 Strangely enough, it seems making an image in standard mode on rom-o-matic, results in a much smaller image than creating in advanced mode (even if you change nothing.) Space disparity aside, creating in standard mode as you did boots on hyper-v, however you cannot configure the fog needed options. Creating in advanced mode (at least using no combination of options I chose, even default) results in failure on hyper-v. Any idea what is different between these two modes?


  • Moderator

    @paulman9 said in Hyper V and Pxe boot to Fog problems:

    iPXE 1.0.0+ (47849)

    This is the number I was looking for. OK going to the rom-o-matic site, if you scroll down to the bottom (after picking .kpxe) there is a which version section, and the default is master, if you pick the drop down the top entry is 47849, so that is the latest version.

    So the question is why did the ipxe file I created work?? All I did, was picke .kpxe file type and pasted in the ipxe script and hit proceed button. What in 1.5.0 stable appears to be the same release.

    I just looked at the ipxe kernel in my google drive and its 546dd (a previous version)



  • @george1421 I’m not positive, but I think I’m using the one pulled from Github right now. I assumed this to be the most reliable one as, well, I didn’t make it. I don’t think hyper-v gets far enough to show that
    0_1519826568804_Untitled.png
    Though, On a working machine, it shows iPXE 1.0.0+ (47849).


  • Moderator

    @paulman9 Well I seemed to have forgot about this too.

    Can you do one thing for me, when iPXE boots, the version you have. You will see a iPXE banner line. There is a hex code between two parenthesizes. Can you tell me the hex code on the version you are testing?

    Also the developers just release 1.5.0 (stable) this week. That should have the latest version too. But again the hex code will tell us what version you have.



  • @george1421 Sorry to dig this back up, however, I am confused as to what you did differently on the one you built to make it get past this point. I have rebuilt undionly.kpxe several times, from source (instructions used), from rom-o-matic site (using instructions on same page), even grabbed the one off GitHub on dev-branch supposedly updated by Tom less than a week ago. Most of the ones I’ve made work, but fail at the exact same point as before. Is there a step I am missing to get the latest ipxe version? I apologize if this is my ignorance, I just am at a loss to understand what is different about this one. Thanks for your help.


  • Moderator

    @paulman9 OK great, that means that the currently shipping version of iPXE does resolve your issue. I suspected that it would not work 100%, but what did work was getting past the initializing devices. I’ll take a look at creating the full undionly.kpxe boot kernel in a while for you to test. But for now it looks like we have a path forward.

    @Developers be aware that the current version of iPXE addresses hyper-v booting. I’ve seen this in a few threads lately.



  • @george1421 Your’s worked past the iPXE initializing devices yet stalled after on “params: command not found” then loops. I tried to rebuild again including params using rom-o-matic site you linked but the one I made freezes again at the same spot(ipxe init) as before, then some warning about using a legacy nic wrapper, but I’ve never done this before and could be missing something simple.


  • Moderator

    @paulman9 Ok, so now we’ve determined that a bios mode vm running under Win10 1709 stops at initializing devices in the iPXE kernel. I’m going to create an updated (test) ipxe kernel for you to try with this new hyper-v host. Understand this is not a supported route, but I’m interested to see if the issue with 1709 has already been addressed by the iPXE group.

    The basic steps I’m doing is using the rom-o-matic site https://rom-o-matic.eu/ to create a new undionly.kpxe also I’ll include the ipxe script from github https://github.com/FOGProject/fogproject/blob/master/src/ipxe/src/ipxescript

    I’ll leave the file here for a few days to give you a chance to to see if it works. https://drive.google.com/open?id=1fmy2CKSxGlMAv4gW6CFk_ncA694KaOpm

    Will it work?? Hopefully yes, but maybe not. Lets see…



  • @george1421 Thank you for clarifying for the kernel, I wasn’t sure at what point it was relevant, but tried it anyway. I created a gen 1 vm (legacy.) I actually used a machine I was testing with that I know for sure was working before the 1709 update (same vm, same config,) created another from scratch to ensure nothing went wrong with it and same result. Had another user on another machine try and he gets the same result. We tried a third machine still on 1703 and it works perfectly. There are no error messages, the machine simply gets to “iPxE initializing devices…” and sits there. It loads up 100% usage on one cpu core and sits there. I’ve left the machine for at least 3 days without it progressing.


Log in to reply
 

520
Online

5.9k
Users

13.3k
Topics

125.1k
Posts