Hyper V and Pxe boot to Fog problems
-
@Paulman9 This sounds interesting. I don’t have access to my dev env right now but I can have a look later today.
-
@paulman9 Ok, from what I can gather it seems like selecting the advanced mode you always get a full blown
ipxe.kpxe
binary with all the native iPXE drivers included even “Choose a NIC type” is set to “undionly”. I will ask the iPXE people about this. For now you are probably best of compiling your own binaries using this make command:make bin/undionly.kpxe EMBED=ipxescript
-
Ok, got an update here. The issue has been reported already: https://github.com/xbgmsharp/ipxe-buildweb/issues/49 - but no response yet.
-
@sebastian-roth I appreciate your help, and that does answer my question. For the record, though, the problem remains. Building from source using the fog provided (GitHub dev-branch pulled) console.h, general.h, and settings.h gives the same failure in hyper-v. Building from source, and only including ipxescript (not replacing the 3 files above) results in a undionly.kpxe that does work past initializing devices in hyper-v. I’m going through each option now to try to narrow it down as it seems now something in the config is what is tripping up hyper-v.
Edit: Modified the default ipxe general.h file to include param_cmd, and changed nothing else. Completed downloading an image to my vm on hyper-v. Unsure what I broke in the process, as there are a lot of switches I’m missing, but I can confirm this works on my setup.
-
After going though the differences one by one, I have narrowed it down to three switches that will cause the hang in hyper-v.
#define DOWNLOAD_PROTO_HTTPS
#define IMAGE_TRUST_CMD
#define CERT_CMD
Comment out those lines from the general.h (I used the files from dev-branch on GitHub, if they differ from master, then that might cause additional issues, didn’t check) and the current build (47849) of ipxe works in hyper-v. Would probably fix older versions too, but I don’t see any reason to downgrade as everything is working so far. Unsure how these are even affecting hyper-v at such an early state, but all I know is mine works with no other changes than that. Hope this helps someone make sense of this issue, or at least get around this if they are affected. Again, thanks for everyone’s help with this. -
@paulman9 said in Hyper V and Pxe boot to Fog problems:
IMAGE_TRUST_CMD
Interesting these all deal with certificates. As long as you are not doing anything with https on your FOG server, what you built should work OK.
This makes me wonder if the certificates may be related to secure boot being enabled on this win10 host system? I’m only guessing (TBH) but just trying to correlate why an upgrade to 1709 and certificates/image verify would be related. BUT this is excellent info to take back to the iPXE guys. I’m sure they will see this more often than the FOG project.
-
@george1421 We are BIOS booting these, I didn’t think secure boot should have a hand in it, but it sure doesn’t work with these switches, and strangely I believe the UEFI image works fine with a gen 2 1709 vm (at least with secure boot off.) We don’t use fog for our UEFI images yet so I’ve only tested it once. Only security related option I see in a gen 1 vm is for key storage drives, and we aren’t using that. Anyway, if there is anything you would like me to test for this, just let me know. Otherwise, it seems we are good to go.
Edit: Rebooted to verify, Secure boot is off on the host I was testing on
-
@Paulman9 I think I am at a loss here although I have played with iPXE and dug through the code a fair bit over the years. You might want to post this in the iPXE forums (see their website).
-
What version Hyper-V are you running?
What is the precise building of your virtual machine (prior to installing your OS)?EG: My setup is on Server 2016 Standard with Hyper-V role
Virtual Machine Generation 1
- 4 Processors
- Memory Startup RAM 4096 MB (NO Dynamic Memory)
- Network Adapter (Not Connected)
- Delete SCSI controller
- Boot Order = CD, IDE, Legacy Network Adapter
- VHDX, (1024 GiB), Dynamic
- Secure Boot Disabled
- Standard Checkpoints
- Automatic Start Action (nothing)
Virtual Machine Generation 2
- 4 Processors
- Memory Startup RAM 4096 MB ( NO Dynamic Memory)
- Network Adapter (Not Connected)
- Boot Order = DVD Drive, File, Hard Drive, Network Adapter
- VHDX, (1024 GiB), Dynamic
- Secure Boot Disabled
- Standard Checkpoints
- Automatic Start Action (nothing)
Then I install the OS… I don’t connect the network adapter until after entering audit mode.
When it comes time to capture the machine, after it’s shutdown I do this.
Gen1.
ADD Legacy Network Adapter with Virtual Switch to CONNECTED
SET Network Adapter Virtual Switch to CONNECTED
SET BIOS to Boot from Legacy Network AdapterGen2.
SET Network Adapter Virtual Switch to CONNECTED
SET BIOS to Boot from Network AdapterThen capture.
-
@sudburr The issue (as I understand it) is where he’s running hyper-v on top of Windows 10 1709. Where in Windows 10 1703 iPXE booted correctly, and now it doesn’t. As I’ve said before, while Win10 1709 flies the Win10 banner, it is very different operating system under the hood than is 1703.
-
@sudburr Yes, as George said, we are building images on windows 10 (now 1709). We don’t currently have a server running server 2016 so I am unsure if it would behave any differently. I assume if server 2016 is not already affected by the same issue, it will be soon, but this could be helpful to some to know for sure.
-
There is also the option of running the free Microsoft Windows Server 2016 Hyper-V Core (10.0.14393.0) as your hypervisor.
Okay, since I misinterpreted (ie: skimmed ) the OP, I will see what I can reproduce with Windows 10v1709 as the hypervisor.
-
Alrighty then. Hyper-V running on Windows 10v1709.
Gen2 (UEFI) can network boot ipxe.efi just dandily and image.
Gen1 (Legacy) can network boot with undionly.kpxe but sits indefinitely at iPXE initialising devices…hmm …
-
hangs after “GATEWAY IP:”
default.ipxe -
hangs after “iPXE initialising devices…”
intel.kkpxe
intel.kpxe
intel.pxe
realtek.kkpxe
realtek.kpxe
realtek.pxe
undionly.kkpxe
undionly.kpxe
unidonly.pxe -
hangs after “WARNING: Using legacy NIC wrapper on”
ipxe.kkpxe
ipxe.kpxe
ipxe.pxe
So all I have accomplished is to confirm the problem as a third party.
-
-
Bad news everyone!
The same problem exists also in Windows Server Insider Preview build 17093.
-
@sudburr Thanks heaps for testing this and letting us know!!
@paulman9 said in Hyper V and Pxe boot to Fog problems:
#define DOWNLOAD_PROTO_HTTPS
#define IMAGE_TRUST_CMD
#define CERT_CMDWould you be able to break this further down? Are you able to compile a binary without
IMAGE_TRUST_CMD
andCERT_CMD
? Should work I think as those are only commands added to the iPXE command line interface.As well you might want to compile a debug binary:
make bin/undionly.kpxe EMBED=ipxescript DEBUG=https
to see if that gives us more information on where exactly it hangs. -
So glad I found this thread. I have been trying to figure out what was going on with our setup. We just upgraded to 1.5 stable and decided to begin building our images from VMs instead of physical machines, however we came across issues with Windows 10 (1709) hyper-v host and getting stuck at initializing devices. Should we move to a different hyper visor?
-
@robertd At the moment if you want to use hyper-v on win10, then use 1703. Or use hyper-v on 2012 server.
-
Microsoft Windows Server 2016 Hyper-V Core (10.0.14393.0) and Microsoft Windows Server 2016 Standard (10.0.14393.0) are both fine.
-
@george1421 rom-o-matic build image url for ipxe with
#undefine DOWNLOAD_PROTO_HTTPS #undefine IMAGE_TRUST_CMD #undefine CERT_CMD
https://rom-o-matic.eu/build.fcgi?BINARY=ipxe.efi&BINDIR=bin-x86_64-efi&REVISION=master&DEBUG=&EMBED.00script.ipxe=%23%21ipxe%0Aisset%20%24%7Bnet0/mac%7D%20%26%26%20ifopen%20net0%20%26%26%20dhcp%20net0%20%7C%7C%20goto%20dhcpnet1%0Aecho%20Received%20DHCP%20answer%20on%20interface%20net0%20%26%26%20goto%20proxycheck%0A%0A%3Adhcpnet1%0Aisset%20%24%7Bnet1/mac%7D%20%26%26%20ifopen%20net1%20%26%26%20dhcp%20net1%20%7C%7C%20goto%20dhcpnet2%0Aecho%20Received%20DHCP%20answer%20on%20interface%20net1%20%26%26%20goto%20proxycheck%0A%0A%3Adhcpnet2%0Aisset%20%24%7Bnet2/mac%7D%20%26%26%20ifopen%20net2%20%26%26%20dhcp%20net2%20%7C%7C%20goto%20dhcpall%0Aecho%20Received%20DHCP%20anser%20on%20infterface%20net2%20%26%26%20goto%20proxycheck%0A%0A%3Adhcpall%0Adhcp%20%26%26%20goto%20proxycheck%20%7C%7C%20goto%20dhcperror%0A%0A%3Adhcperror%0Aprompt%20--key%20s%20--timeout%2010000%20DHCP%20failed%2C%20hit%20%27s%27%20for%20the%20iPXE%20shell%3B%20reboot%20in%2010%20seconds%20%26%26%20shell%20%7C%7C%20reboot%0A%0A%3Aproxycheck%0Aisset%20%24%7Bproxydhcp/next-server%7D%20%26%26%20set%20next-server%20%24%7Bproxydhcp/next-server%7D%20%7C%7C%20goto%20nextservercheck%0A%0A%3Anextservercheck%0Aisset%20%24%7Bnext-server%7D%20%26%26%20goto%20netboot%20%7C%7C%20goto%20setserv%0A%0A%3Asetserv%0Aecho%20-n%20Please%20enter%20tftp%20server%3A%20%26%26%20read%20next-server%20%26%26%20goto%20netboot%20%7C%7C%20goto%20setserv%0A%0A%3Anetboot%0Achain%20tftp%3A//%24%7Bnext-server%7D/default.ipxe%20%7C%7C%0Aprompt%20--key%20s%20--timeout%2010000%20Chainloading%20failed%2C%20hit%20%27s%27%20for%20the%20iPXE%20shell%3B%20reboot%20in%2010%20seconds%20%26%26%20shell%20%7C%7C%20reboot%0A&general.h/IMAGE_SCRIPT:=1&general.h/IMAGE_EFI:=1&general.h/IWMGMT_CMD:=0&general.h/NSLOOKUP_CMD:=1&general.h/TIME_CMD:=1&general.h/DIGEST_CMD:=1&general.h/LOTEST_CMD:=1&general.h/VLAN_CMD:=1&general.h/REBOOT_CMD:=1&general.h/POWEROFF_CMD:=1&general.h/PCI_CMD:=1&general.h/PARAM_CMD:=1&general.h/NEIGHBOUR_CMD:=1&general.h/PING_CMD:=1&general.h/CONSOLE_CMD:=1&general.h/NTP_CMD:=1&console.h/CONSOLE_FRAMEBUFFER:=1&general.h/ROM_BANNER_TIMEOUT=40%20&branding.h/PRODUCT_NAME=FOG%20Project&branding.h/PRODUCT_SHORT_NAME=FOG%20iPXE&
-
Is there any update on this? Or are we waiting on the iPXE crew to fix it?