rEFInd reboot loop when booting from the network
I’m using FOG 1.4.4 and I’ve got the same problem creeping up on several of my FOG installations. I’m in the process of deploying FOG at 15 public libraries. We’re using it to image brand new machines and to back up staff machines.
After installing FOG and getting it configured, the first thing I did was make a master image by installing Windows 10 Pro on one of the new machines, then captured an image with FOG. Deployed it to 17 other new computers and it worked flawlessly. Renamed the computers automatically and everything. At this point, I had all the new machines set with the network as the first boot device. They would boot to the FOG menu, then after the timeout it would boot straight into Windows. No FOG configuration necessary, it just worked. Oh, and these are all UEFI machines.
I think the problem started to show up when I began imaging the non-UEFI staff machines to back them up. Capturing the images seemed to work fine, but at some point after changing the DHCP boot file name (DHCP coming from Windows Server 2008 R2), I started to see this rEFInd thing after the FOG menu would time out. I had never heard of this and never seen it before with the other machines. Now when any machine boots from the network and there is no task scheduled for them, they all boot to rEFInd, where there is a 10 second countdown and it reboots and continues to go through this loop until I prevent it from booting from the network.
It feels like this is an EFI vs BIOS type of issue, but I don’t know much about EFI. I could provide a lot more information but this post is already getting lengthy. Any help is appreciated.
@benc Good to hear that you could make it work. I still suggest you should read up more about UEFI and think about moving all your systems that direction.
@sebastian-roth Success! I read through
/var/www/fog/service/ipxe/refind.confand changed the default
scanfor internal,hdbiosand now everything is booting whether it has a UEFI or BIOS based image. Thanks for pointing me in that direction. I have not tried converting a BIOS image to a UEFI image, so I don’t know if that would have worked. I would prefer to just change an option in a config file though. Thanks again for the help.
@benc As there was someone else asking about something similar in another thread I just found out that you could try converting a windows installation from legacy BIOS to UEFI: https://social.technet.microsoft.com/wiki/contents/articles/14286.converting-windows-bios-installation-to-uefi.aspx
Give it a go if you are keen and let us know how it went. Make a backup copy of your machine first!!
@sebastian-roth That makes sense. These new machines do also support BIOS, but they default to UEFI. Thank you for giving the path to that config file, I will look there next.
I did make a discovery just now. I thought exit to hard drive type (EFI) in the FOG configuration was set to rEFInd, but it was set to Exit. I changed it to rEFInd_EFI and my new public computers are working properly. They first boot to the network, the FOG menu, then after the timeout it boots straight to Windows. So the new computers using a fresh UEFI image are booting properly.
Now I am looking into the new staff computers, which are using an image that came from a BIOS install. I’m really trying to avoid having to reload Windows and copy data and reinstall programs. If it comes down to it, that’s what I’ll do though. They boot just fine if you bypass the network boot, but now they land on a rEFInd “about” screen where you can press enter and shutdown or restart the machine. Sounds like these computers do have some sort of a legacy BIOS fallback mechanism like you said which is allowing them to boot.
Like I said in my last post, I don’t know much about UEFI and I am assuming the new machines are UEFI based because I had to change the DHCP boot file to make them boot from the network.
From my point of view it’s only part of the truth that your new systems are UEFI. Essentially UEFI and legacy BIOS are two very different things and deploying a legacy BIOS based image to an UEFI only system will fail when trying to boot that. What you see is that your UEFI systems have what I call backward compatibility mode to also boot legacy BIOS systems even though it’s actually set to UEFI. This can be very confusing for people not knowing much about UEFI yet. and this I am fairly sure is causing the rEFInd boot loop. You can try playing with the configuration file (
/var/www/fog/service/ipxe.refind.confon your FOG server) and see if you can make it work that way. But I’d highly recommend to re-create your images in prober UEFI layout - essentially re-install windows from scratch.
Or you could try and set your new machines back to good old legacy BIOS (disable UEFI and compatibility mode) if that’s possible. Nevertheless, going forward is re-creating your images in UEFI.
No 2012 DHCP servers. There is a single 2008 DHCP server at the hub site and all DHCP from the 15 branches is forwarded to it. I learned the hard way after much googling that each firmware type requires a different boot file. I’ve been using ipxe.efi for EFI and ipxe.kpxe for BIOS, changing the DHCP option every time I need to switch back and forth. I was not aware that 2012 server could dynamically hand out different boot file names. I will definitely look into that for the future.
I had wondered if there could be issues mixing BIOS images on UEFI machines. Somehow the imaging part has worked fine for me. We are deploying new computers at these libraries, some for public use, and some are replacing old staff machines which are BIOS based. What I’ve been doing is capturing the image from the old BIOS machine, then deploying it to the new UEFI machine. Windows hasn’t complained and everything seems fine except for the rEFInd boot loop. Like I said in my last post, I don’t know much about UEFI and I am assuming the new machines are UEFI based because I had to change the DHCP boot file to make them boot from the network.
I’ve noticed boot options for individual hosts as well as global options in the web interface of FOG. I haven’t messed with any of them. Perhaps I should try changing those options. I’ve seen other posts about editing config files for rEFInd, but I don’t want to get into that level of troubleshooting yet unless I have to. What confuses me is that it worked fine on the first 17 computers (UEFI) I set up, then after capturing images from the old staff computers (BIOS) the network boot loop started happening.
Thank you for the prompt reply. I appreciate your time.
Well you have a few things here to address.
First, in your network do you have any windows 2012 dhcp servers since you have both bios and uefi clients. Each client firmware type needs its own ipxe boot menu. For uefi you might use ipxe.efi and for bios (legacy mode) you might use undionly.kpxe. If you have a windows 2012 dhcp server (I know you posted you have 2008) then 2012 can dynamically change between the two boot files for the two different firmware types. If you don’t have a 2012 dhcp server, how many other dhcp servers do you have in your environment?
The refind exit mode is only useful for uefi systems. For bios based systems you should have the best luck with sanboot. But random systems may need a different type of exit mode from the iPXE menu. There is a global exit mode (In the FOG Settings) that should be SANBOOT for bios and rEFInd for uefi. You can also set on a per host the type of exit mode to use if you come across a hardware that doesn’t like the global defaults.
You can only deploy a uefi captured image to a uefi based system. The same is true of a bios (legacy) system can only be deployed to another bios based system. You can not mix and match the image format with the hardware arch.