EFI_exit failed error when leaving FOG Menu on Dell Optiplex 7010 and Lenovo M83
-
@george1421 I created the uefi usb drive using the steps for the harder way in: https://wiki.fogproject.org/wiki/index.php?title=USB_Bootable_Media#USB_Boot_UEFI_client_into_FOG_menu_.28harder_way.29
This way has it so that it always uses the ipxe.efi file that is created in the installation process to actually boot the fog menu, from what I gather. So I assume it’s using the recompiled ipxe binaries that are made with the certificate that is created at installation.
I can try redoing this method and see if it changes things, but I’m not sure if it will since it’s leveraging that ipxe file.
-
@hancocza So are you booting uefi clients only?
I wrote a different one for uefi only clients that is much easier to pull off: https://forums.fogproject.org/topic/6350/usb-boot-uefi-client-into-fog-menu-easy-way
In this case since you used the -S install time switch you need to grab the ipxe.efi from the FOG server because that has the compiled certificate in it. All you need is that file and a properly formatted usb stick and the file renamed and saved in the right location.
-
@hancocza The method you use (USB iPXE -> TFTP pull iPXE from server -> HTTPS pull FOG menu) should not break due to a new HTTPS certificate on your FOG server unless something went wrong with the compile process.
It makes it to the FOG menu, but any option I choose from there ends with an efi_exit…failed! message.
Can you please be more specific? Take a picture of the error and post here. Which options did you test? Always getting the exact same error message? What if you schedule a task for one of the hosts and boot that up? Same error?
I really think we need a picture of the error as I don’t think I have ever seen that one before.
-
@sebastian-roth Here is an output of the process when scheduling a hardware inventory update task.
The exit_boot() and efi_main() both show up when choosing the Full Inventory option when a host isn’t registered, when trying to deploy an image from the menu, and when I schedule either of those tasks. It seems to boot fine to the hard drive, update product key, etc from the menu.
This only occurs with the uefi boot process. Legacy works fine, but it won’t be an option for our newer computers that are UEFI only.
-
@sebastian-roth A bit more interesting information…
I’ve tried it on a few different models that we have here. I get this error on every Dell Optiplex 7010 and Lenovo M83 that I try. I also tried a Dell Latitude E6320 and a Dell Optiplex 5040. Both of those were able to get past that part and image.
-
@hancocza Secure boot and security chip disabled in the BIOS?
-
@sebastian-roth secure boot is disabled, only thing security-wise that is enabled is the TPM, but that was enabled previously as well and was able to boot then.
-
@hancocza Did you use the standard kernel shipped with FOG 1.5.7 or did you use a newer one? We need to figure out which part is causing this: iPXE or the Linux kernel.
Going back to an realier iPXE version is not as easy because the compile script used to build the iPXE binaries capeable to boot from a HTTPS enabled server are build from the latest iPXE code “at that time”. It does a
git pull/clone
form the official iPXE repo and compiles the binaries.Please see if you still have the old stuff around:
ls -al / | grep tftp ls -al /home/fogproject ls -al /home/fog
Post output of these commands here.
-
@sebastian-roth I never change kernels, so I’m assuming that it was the standard one packaged in to 1.5.7, and then if updated it was the latest one from 1.5.9.
Output from
ls -al / | grep tftp
drwxr-xr-x 5 fogproject root 4096 Jun 22 10:09 tftpboot drwxr-xr-x 5 root root 4096 Nov 20 12:12 tftpboot.prev
Output from
ls -al /home/fogproject
total 44 drwxr-xr-x 3 fogproject fogproject 4096 Feb 26 2020 . drwxr-xr-x 9 root root 4096 Nov 20 12:12 .. -rw-r--r-- 1 fogproject fogproject 220 Apr 4 2018 .bash_logout -rw-r--r-- 1 fogproject fogproject 4144 Feb 26 2020 .bashrc drwxr-xr-x 3 fogproject fogproject 4096 Feb 26 2020 .config -rw-r--r-- 1 fogproject fogproject 8980 Apr 16 2018 examples.desktop -rw-r--r-- 1 fogproject fogproject 807 Apr 4 2018 .profile -rwxr-xr-x 1 fogproject fogproject 681 Nov 20 12:11 warnfogaccount.sh
There was no output from the last command.
My current kernel versions:
bzImage Version: 4.19.145 bzImage32 Version: 4.19.145
-
@hancocza Ok, we still have backups of the iPXE binaries. Please make sure you don’t re-run the FOG installer (for whatever reason) before moving the single backup copy we have out of the way:
mv /tftpboot.prev /home/fog_tftpboot_1.5.7
That’s just to make sure we won’t loose those when re-running the FOG installer at some point.
The other directory listing is not of much help as I made a mistake when typing this from the top of my head. Please run
ls -al /home | grep fog_web
and post results here.I suppose you will see a directory named
/home/fog_web_1.5.9.BACKUP
. In that case you might runfile /home/fog_web_1.5.9.BACKUP/service/ipxe/bzImage
to find out the kernel used earlier.Now let’s start with testing if it’s the kernel or iPXE causing the error.
cp /home/fog_web_1.5.9.BACKUP/service/ipxe/bzImage /var/www/html/fog/service/ipxe/bzImage_1.5.7 cp /home/fog_web_1.5.9.BACKUP/service/ipxe/bzImage32 /var/www/html/fog/service/ipxe/bzImage32_1.5.7 chown www-data:www-data /var/www/html/fog/service/ipxe/bzImage*1.5.7
Now chose one of your Dell Optiplex 7010 or Lenovo M83 devices you have seen the error on before and edit its host settings in the FOG web UI. Set Host Kernel to
bzImage_1.5.7
(suppose this is a 64 bit CPU orbzImage32_1.5.7
if it’s 32 bit) and then boot it up to do whatever you have done before that ended up with that error.Just found a few interesting posts on this on the web:
https://support.lenovo.com/us/en/solutions/ht510865-system-fails-to-boot-into-rhel81-with-error-message-exit_boot-failed-efi_main-failed-lenovo-thinksystem
https://bugzilla.redhat.com/show_bug.cgi?id=1824005 (anyone with access to the redhat bugzilla portal?)
https://github.com/clearlinux/distribution/issues/1885 (though this one is about kernel 5.6 and a kernel setting which we do not use - not sure if this is related or not?!?!) -
@sebastian-roth May be an issue with the tftpboot.prev folder… When I first updated to 1.5.9, I thought that the issues I was seeing were from some file not installing correctly… so I ran the installer again after I ran into the issue. So I might not have the actual previous tftpboot folder.
Heres the output from ls -al /home | grep fog_web:
drwxr-xr-x 10 root root 4096 Feb 26 2020 fog_web_1.5.8.BACKUP
drwxr-xr-x 10 root root 4096 Nov 20 12:12 fog_web_1.5.9.BACKUP
drwxr-xr-x 11 root root 4096 Jun 22 10:22 fog_web_1.5.9-RC2.9.BACKUPI’ll have to test the older bzimage kernel next Wednesday, I’m on vacation until then. I will get back to you with the results that morning.
-
@sebastian-roth I was able to test out the kernel this morning. I swapped in the ones from the 1.5.9.BACKUP folder as well as the 1.5.8.BACKUP folder, both of them had the same error. Thinking this might then be an iPXE error?
-
@hancocza said in EFI_exit failed error when leaving FOG Menu:
I swapped in the ones from the 1.5.9.BACKUP folder as well as the 1.5.8.BACKUP folder, both of them had the same error. Thinking this might then be an iPXE error?
Sure seems to be in the iPXE side. So we will need to step back to an earlier version. As a first reference we have the date of the backup directories. Looks like you upgraded from 1.5.8 to 1.5.9-RC2 on 26th of Feb 2020. We will try using iPXE from around that time.
When you have the FOG installer in /root/fogproject the following commands should work for you. Just change according to your situation:
cd /root/ipxe-efi/ipxe/ git clean -fd git reset --hard git checkout e3ca2110712f6472465c70f2e83b745ff8a25fcc cd src/ cp ../../fogproject/src/ipxe/src-efi/ipxescript . cp ../../fogproject/src/ipxe/src-efi/ipxescript10sec . cp ../../fogproject/src/ipxe/src-efi/config/general.h config/ cp ../../fogproject/src/ipxe/src-efi/config/settings.h config/ cp ../../fogproject/src/ipxe/src-efi/config/console.h config/ make EMBED=ipxescript bin-x86_64-efi/ipxe.efi CERT=/opt/fog/snapins/ssl/CA/.fogCA.pem TRUST=/opt/fog/snapins/ssl/CA/.fogCA.pem
You should end up with a 64bit UEFI binary in
/root/ipxe-efi/ipxe/src/bin-x86_64-efi/ipxe.efi
to use. Copy that over to/tftpboot
(rename the current one) and give it a try. -
@sebastian-roth That seems to have worked. Was able to boot to to a task on an Optiplex 7010. Will test on a few others tomorrow to be sure.
-
@hancocza That sounds like we hit an error that was introduced in iPXE some time within the last year. Would you be keen to find out which change exacly it was to hopefully report to the iPXE devs and get it fixed?
That would mean you’d compile a fair amount of binaries and test all of them. Would you be willing? Half a day of work should be enough I suppose.
It’s not too complicated and you will learn using an awesome feature of git called “bisect”…
-
@sebastian-roth Yeah I can do that. Give me till next week, gotta get a few imaged while it’s up and running.
-
@hancocza Just wondering if you are still keen to dig into this?
-
@sebastian-roth Can do. Feel free to post the instructions and I’ll get working.
-
@hancocza Essentially what we want to do is find the commit that introduced the issue. For that we can either compile binaries from each and every commit that was added since then (lots of unneeded work) or we can use the well known strategy called divide and conquer. So for example if we have 13 commits with number 1 being the latest and we know it has the issue and number 13 being an older one we tested to not show the issue. Now divide the list by half and test number 7. Let’s assume it shows the same issue as 1. So we know 2-6 have the bug as well without needing to test those. Next we compile and test number 10 and so forth. This method of divide and conquer is all pretty much handled by git (bisect subcommand).
Now to add a bit of complexity git does not have commit ids in a row but uses kind of random hash values. So it might be helpful for you to keep an eye on the list of commits of the iPXE code repo. On the other hand you should not need to keep track of the commit ids yourself but I just though I let you know about this list so you can take a look in case you find yourself lost somehow.
Now let’s get into the details of this:
cd /root/ipxe-efi/ipxe/ git clean -fd git reset --hard git checkout master git bisect start git bisect bad git bisect good e3ca2110712f6472465c70f2e83b745ff8a25fcc
Essentially you go to the ipxe code used earlier, prepare it (cleanup), checkout the latest commit (in master branch), start bisecting and mark this latest commit as “bad” and the one back from Feb 16th, 2020 as “good”. Now git will tell you:
Bisecting: 77 revisions left to test after this (roughly 6 steps) [02201417104c751545dda261eb33f0012703d1ff] [efi] Fix reporting of USB supported languages array
Now the repository is checked out with the code of the commit halfway between the known bad and the known good commit. We just need to compile iPXE with that code base and test it:
cd src/ cp ../../fogproject/src/ipxe/src-efi/ipxescript . cp ../../fogproject/src/ipxe/src-efi/config/general.h config/ cp ../../fogproject/src/ipxe/src-efi/config/settings.h config/ cp ../../fogproject/src/ipxe/src-efi/config/console.h config/ make EMBED=ipxescript bin-x86_64-efi/ipxe.efi CERT=/opt/fog/snapins/ssl/CA/.fogCA.pem TRUST=/opt/fog/snapins/ssl/CA/.fogCA.pem cp bin-x86_64-efi/ipxe.efi /tftpboot cd .. git clean -fd git reset --hard
Now you will either see the issue happen or not. If the issue is seen you just enter
git bisect bad
and otherwise (no issue) rungit bisect good
. Now you can do the next test loop using the exact same commands as seen above staring withcd src/
.Keep going until you get a message
...is the first bad commit
after telling bisect good or bad. Should take about 6-7 rounds of testing in this case. -
@sebastian-roth Finished this. Ended up with “c70b3e04e86cefca335e36f883829d89583a6921 is the first bad commit”.