Hyper-V 2016 Gen2 VM (UEFI) fails to complete network boot
-
This isn’t something FOG can do (the UEFI booting in Hyper V) unfortunately, so I don’t know how I can help fix it.
The “warnings” aren’t problems, persay. Either the /var/www/fog/service/ipxe folder does not have the permissions set appropriately (can happen from time to time) or it’s attempting to do the actions that it cannot perform. (mkdir would fail if the directory already exists, rename would fail if the file doesn’t already exist during backup).
-
For what it’s worth, this is a known issue (supposedly) with Gen 2 Hyper-V.
https://community.spiceworks.com/topic/1957582-pxe-problem-hyper-v-2016-with-gen2-vm-uefi-boot
Latest post is from today even.
-
@Tom-Elliott Its possible to skip pxe booting all together by booting FOS directly (akin to usb booting fos, but this is only speculation since “I don’t do” hyper-v virtualization). This “should” bypass the faulty “firmware” in the hyper-v client.
-
This was working fine as late as May 5th with 1.4.0-RC4.
It is UEFI booting, downloading the NBP but the process is failing/restarting after the initialization of the second part at:
iPXE 1.0.0+ (a19ac) -- Open Source Network Boot Firmware -- http://ipxe.org Features: DNS FTP HTTP HTTPS ISCSI NFS TFTP SRP VLAN AoE EFI Menu
This was working fine as late as May 5th with 1.4.0-RC4.
-
@sudburr Still not a bug FOG can fix.
If you can get the version of iPXE at which it worked, we could file a bug report to try to help figure out where it broke from then to now.
-
@sudburr said in Hyper-V 2016 Gen2 VM (UEFI) fails to complete network boot:
The version is the stuff in the parenthesis:
iPXE 1.0.0+ (a19ac)
The shot you provided the version is a19ac
-
@Tom-Elliott What about using the magic ipxe7156.efi kernel that works for the surface pros? Wasn’t that one frozen in time?
-
@george1421 It is indeed. But i doubt it will work. There was some effort in current ipxe to work with hyper-v I thought though.
-
Where can I find a repository of the ipxe.efi that have been used in FOG?
-
@sudburr What do you mean?
I build at seemingly random intervals. So if you know RC-4 of 1.4.0 was the last known good, just install RC-4 from git:
git checkout tags/1.4.0-RC-4 cd bin ./installfog.sh -y
Figure out what version is in the string after confirming it works.
Then re-install the dev-branch (or whatever version you’re installing) with:
git checkout dev-branch git pull cd bin ./installfog.sh -y
-
That’ll do the job. I was hoping there would be a way to just download only the ipxe.efi files without the need to re-install the entire beasty. I’m sure there’s a way, but I don’t use GIT professionally.
RC-4 with fd6d1 works
RC-5 with 84d4 works
RC-6 to RC-9.2 with 2d79 works
… and there we go, the problem starts with RC-9.3 and ipxe.efi 17887 . -
Dropping ipxe.efi 2d79 onto a 1.4.0 install works happily.
-
@sudburr I’ve found what I think is the problem. Not sure why this would be a problem but i need confirmation so I can give the background to report a proper bug report.
I’ve built a set of ipxe binaries that I hope will work. Please try:
git fetch --all git checkout hyperv-ipxetest1 git pull cd bin ./installfog.sh -y
Does this work? Essentially it’s the most current pull with the minor reversion from:
https://github.com/ipxe/ipxe/commit/a0f6e75532c68f49b3e1c73ca88151d9663f5269All it does is revert this as I’m fairly sure the message is not the part breaking it:
- return -EBUSY;
-
Of note: this shows all the changes. Notice the three elements dealing with hyper v?
I haven’t tried the second in the change set, as I’m really hoping not to have to mess with it. This, anyway, is not a means to say I believe it is or isn’t the problem, but just trying to figure out more specifically where the error is. In particular:
@mcb30 mcb30 [hyperv] Do not fail if guest OS ID MSR is already set … a0f6e75 @mcb30 mcb30 [hyperv] Remove redundant return status code from mapping functions … 276d618 @mcb30 mcb30 [hyperv] Cope with Windows Server 2016 enlightenments … b91cc98
These above items appear to be the only things related to hyper v.
I’m thinking the problem occurs, mainly, from the first 2 issues. I’m almost certain it’s got to be 276d618 as the call in a0f6e75 appears to be looking for an int to test against. If the ints are removed, would it still fail in the same fashion? (I don’t really know, just going on what I’m seeing).
-
So take 2, I’ve applied what I “suspect” is the problem. Apparently there were other changes and the notable portion was how it’s handling the vmprob_bus call. This caused a problem when I originally posted (I wasn’t noticing the error messages sorry.)
In the past, vmbus_probe only called hv_unmap_synic ( hv ) if the check failed.
If the hv_map_hypercall or hv_map_synic failed it would call hv_unmap_hypercall ( hv ) and hv_free_message ( hv ) (respectively).In the new code, if err_vmbus_probe fails it calls all three. The reason for the change, as I can gather it, is the hv_map_hypercall and hv_map_synic never returned a failed message, while the vmbus_probe would potentially fail.
My changes are just to comment the hv_unmap_hypercall and hv_free_message as they wouldn’t have been called in the past. These are still a guess.
-
@Tom-Elliott Yes, from quickly looking over the code changes I would guess that b91cc98 is probably the main cause. It seems like a bit change. Let’s see if you get a positive test from @sudburr…
-
Strangely:
git checkout hyperv-ipxetest1 error: Your local changes to the following files would be overwritten by checkout: packages/tftp/ipxe.efi Please, commit your changes or stash them before you can switch branches. Aborting
So I ran:
git checkout -- .
Which returned nothing but then I:
git checkout hyperv-ipxetest1 Branch hyperv-ipxetest1 set up to track remote branch hyperv-ipxetest1 from origin. Switched to a new branch 'hyperv-ipxetest1'
and
git pull Already up-to-date.
So I proceeded with installation and it reported:
Version: 9220986 Installer/Updater
iPXE.efi is now 356f … and it fails into a restart loop the same way.
-
@sudburr Mind retrying installer? I’m installing before the “big” change occurred.
Also, to fix the issue you saw with the original checkout, please try:
git reset --hard git checkout hyperv-ipxetest1 git pull
-
ipxe.efi 276d6 also fails the same way.
-
While working this out through chat, I’ve pushed patched, and working ipxe binaries that address this particular problem with booting. I should note, however, this is still not a “FOG” specific problem, rather something went wonky in ipxe binaries. I’ve made a posting on their forums and hope to hear back soon.