EFI_exit failed error when leaving FOG Menu on Dell Optiplex 7010 and Lenovo M83
-
@sebastian-roth That seems to have worked. Was able to boot to to a task on an Optiplex 7010. Will test on a few others tomorrow to be sure.
-
@hancocza That sounds like we hit an error that was introduced in iPXE some time within the last year. Would you be keen to find out which change exacly it was to hopefully report to the iPXE devs and get it fixed?
That would mean you’d compile a fair amount of binaries and test all of them. Would you be willing? Half a day of work should be enough I suppose.
It’s not too complicated and you will learn using an awesome feature of git called “bisect”…
-
@sebastian-roth Yeah I can do that. Give me till next week, gotta get a few imaged while it’s up and running.
-
@hancocza Just wondering if you are still keen to dig into this?
-
@sebastian-roth Can do. Feel free to post the instructions and I’ll get working.
-
@hancocza Essentially what we want to do is find the commit that introduced the issue. For that we can either compile binaries from each and every commit that was added since then (lots of unneeded work) or we can use the well known strategy called divide and conquer. So for example if we have 13 commits with number 1 being the latest and we know it has the issue and number 13 being an older one we tested to not show the issue. Now divide the list by half and test number 7. Let’s assume it shows the same issue as 1. So we know 2-6 have the bug as well without needing to test those. Next we compile and test number 10 and so forth. This method of divide and conquer is all pretty much handled by git (bisect subcommand).
Now to add a bit of complexity git does not have commit ids in a row but uses kind of random hash values. So it might be helpful for you to keep an eye on the list of commits of the iPXE code repo. On the other hand you should not need to keep track of the commit ids yourself but I just though I let you know about this list so you can take a look in case you find yourself lost somehow.
Now let’s get into the details of this:
cd /root/ipxe-efi/ipxe/ git clean -fd git reset --hard git checkout master git bisect start git bisect bad git bisect good e3ca2110712f6472465c70f2e83b745ff8a25fcc
Essentially you go to the ipxe code used earlier, prepare it (cleanup), checkout the latest commit (in master branch), start bisecting and mark this latest commit as “bad” and the one back from Feb 16th, 2020 as “good”. Now git will tell you:
Bisecting: 77 revisions left to test after this (roughly 6 steps) [02201417104c751545dda261eb33f0012703d1ff] [efi] Fix reporting of USB supported languages array
Now the repository is checked out with the code of the commit halfway between the known bad and the known good commit. We just need to compile iPXE with that code base and test it:
cd src/ cp ../../fogproject/src/ipxe/src-efi/ipxescript . cp ../../fogproject/src/ipxe/src-efi/config/general.h config/ cp ../../fogproject/src/ipxe/src-efi/config/settings.h config/ cp ../../fogproject/src/ipxe/src-efi/config/console.h config/ make EMBED=ipxescript bin-x86_64-efi/ipxe.efi CERT=/opt/fog/snapins/ssl/CA/.fogCA.pem TRUST=/opt/fog/snapins/ssl/CA/.fogCA.pem cp bin-x86_64-efi/ipxe.efi /tftpboot cd .. git clean -fd git reset --hard
Now you will either see the issue happen or not. If the issue is seen you just enter
git bisect bad
and otherwise (no issue) rungit bisect good
. Now you can do the next test loop using the exact same commands as seen above staring withcd src/
.Keep going until you get a message
...is the first bad commit
after telling bisect good or bad. Should take about 6-7 rounds of testing in this case. -
@sebastian-roth Finished this. Ended up with “c70b3e04e86cefca335e36f883829d89583a6921 is the first bad commit”.
-
@hancocza Nice work! Looking at this commit I see that several other people reported an issue with this particular commit: https://github.com/ipxe/ipxe/commit/c70b3e04e86cefca335e36f883829d89583a6921 (scroll down to the comments)
A huge discussion sparked of in issue #164 about specific EFI drivers that need to be blocked in the code on specific hardware with faulty drivers. This is on a Dell OptiPlex 9020, Dell OptiPlex 3020M and HP t620.
EDIT: Ok, reading all the way to the bottom of this issue report I see that iPXE developer Michael Brown got himself a machine to reproduce the issue and just pushed out a new commit a few hours ago!! Wow. So please go ahead, pull the very latest souce from iPXE github repo, compile a binary and test.
Let us know if this works on your Dell Optiplex 7010 and Lenovo M83 devices as well.
-
@sebastian-roth said in EFI_exit failed error when leaving FOG Menu on Dell Optiplex 7010 and Lenovo M83:
Let us know if this works on your Dell Optiplex 7010
I realize you are talking about EFI exit mode but the 7010s have an issue when ipxe initializes on the 7010. It will hang at initializing… in uefi mode and never proceed. Unless this fix addresses that part too it might not be possible to test the efi exit function.
-
@george1421 As far as I get it this is not about EFI exit mode but iPXE not being able to chainload to the Linux kernel for a task. It’s interesting you mention the hang on “Initializing devices…” because that is mentioned for the 7010 in this particular issue report as well: https://github.com/ipxe/ipxe/issues/164#issuecomment-726391430 (though never addressed)
I am not sure if @hancocza is able to boot in the 7010’s by using snponly.efi or plain ipxe.efi?! Or maybe this issue has been fixed some time ago already.
-
@sebastian-roth As long as I can remember, I haven’t had an issue with iPXE on an EFI usb drive on an Optiplex 7010. I have had the issue where it gets stuck on “Initializing iPXE…” on 790s when using the EFI usb drive. I can’t speak to the built in iPXE booting on those machines since our network structure doesn’t allow for direct pxe booting.
-
@sebastian-roth Just reran the test with the latest ipxe version. Still has the same issue on a 7010.
-
@hancocza said in EFI_exit failed error when leaving FOG Menu on Dell Optiplex 7010 and Lenovo M83:
I can’t speak to the built in iPXE booting on those machines since our network structure doesn’t allow for direct pxe booting.
Are you saying you don’t PXE boot at all but using iPXE on USB all the time?
Just reran the test with the latest ipxe version. Still has the same issue on a 7010.
Make sure you see the latest commit id in the header when iPXE loads (just the first 4-5 characters).
-
-
Yep, use the iPXE on USB to boot to the iPXE on the fogserver, then use that to chainload.
-
I think it’s the latest: g47098 matches the 47098 start of the commit id
-
-
@hancocza Well then the next step would probably be to boot into UEFI shell and get some specific output from your machine (read 1, 2 and 3) and update that topic on github. Maybe you can even provide hardware to Michael for debugging.
If there is a chance you can get this fixed then now is the time to engage with the iPXE dev. Make sure you tell them you are booting off a USB key not via PXE. Not sure if that makes a difference in this context.
Did you try on the Lenovo M83? Is that fixed or still showing the same issue?
-
@sebastian-roth Back tracking on this a bit…
I was able to get it to work by following George’s instructions at the start of this thread, with making a USB the easy way (using the latest iPXE version directly on the USB drive). My other test this morning was using an older version of iPXE on the USB drive that then boots to the iPXE version on the server. That does mean though that their latest version fixed the issue, since I tried the easy method USB drive at the start of this thread with the same error. My thought is that it exits fine on the server, but then when it tries to exit the older iPXE version on the flash drive that it encounters the error.
-
@hancocza Ahhhh yeah, right. I somehow forgot about that you chainload into the iPXE binary in the FOG server using another iPXE on the USB key to boot up. Now, needing to update the iPXE binary in the USB as well makes sense to me!! Great!
Have you had a chance to try out in the Lenovo M83 as well?
-
@sebastian-roth I was able to test on both an M83 and a 7010. I’ll be doing more widespread testing once we get back after the holidays, but everything seems to be fine!