refind not working properly
-
@george1421 I had already copied the 0.11.0 refind files to the FOG server but I did it again just for sanity’s sake. When that again failed to make a difference I decided to try resetting BIOS again. Again, the first time it worked, but when I rebooted it again I noticed it booted straight into Windows so I went into BIOS and changed the NIC back to first priority because it had reversed the priorities by itself. Then it counted down in the FOG menu, began to load Windows and then the screen went black and the monitor went to sleep. It’s still sitting there like that.
How do I set the global efi exit mode? Is that under FOG Configuration -> iPXE General Configuration -> Boot Exit Settings -> Exit To Hard Drive Type (EFI)? If that is where it is, it is already set to REFIND_EFI. If that’s what it’s supposed to be then no, I am not seeing the REFIND menu.
-
@Huecuva Yes that is the right location for it. Is the refind.conf file the same one that was setup by FOG or did you copy over the one from the zip file. The right answer should be the one delivered by FOG.
first priority because it had reversed the priorities by itself.
The windows installer will do this for you, even if you don’t want it to.
-
@george1421 I did not copy the one from the zip file as that one says it’s a sample. I had, however, previously copied the refind.conf file from the FOG server at the primary location and brought it down to the secondary location so that’s the refind.conf file it is using. I still have a
.old
version of the secondary FOG server’s original refind.conf file, if you think I should put it back.EDIT: Ugh. Why does Windows have to suck so bad?
-
@Huecuva said in refind not working properly:
I should put it back
I would put it back only for the sake of us understanding what the configuration is. I don’t think its going to help in this case, but we know the one that is shipped with 1.5.9 works.
So you have these target hardware at the main office? Same bios version and such or is this hardware only at the remote site? I’m trying to understand why usb booting into refind 0.11.0 works and transferring it via iPXE is failing for us.
-
@Huecuva Ok the other variable here between FOG 1.5.6 and 1.5.9 is the version of ipxe that is being used. (again I’m grabbing at straws to explain why the main site acts one way and the remote site acts differently assuming the target hardware is exactly the same). If you have access copy over ipxe.efi from the 1.5.6 site to the remote site its in the /tftpboot directory. Make sure you save the 1.5.9 version if ipxe.efi just in case. With that file in place the two servers should be operationally equivalent at least in regards to pxe booting and exiting to disk.
-
@george1421 Okay, I just replaced the refind.conf file with the original.
The hardware at the main office is the same. MSI Z170A Gaming M7 motherboard with the same version of BIOS. The only differences between some of these rigs with the same motherboards (or even the ones with Biostar boards, for that matter) is that some are running Pentium G4400s and some are running Pentium G4650s. They’re all running a bunch of GTX 1070 or GTX 1070 Ti cards or a combination of the two.
-
@george1421 Unfortunately, I cannot access the primary FOG server from the secondary location and that particular step will have to wait until I’m back at the primary location next week. Then I will likely have to put it on a flash drive and manually drive it down here.
I just made a backup of the 1.5.9 ipxe.efi file though, so that’s already done.
If that’s all that can be done for now, I guess I might as well head home. There is nothing else I can do here.
-
@Huecuva Well I’m fresh out of ideas. The only other test would be to take a failing system from the remote site back to the main site to see if the problem moves with the hardware. I understand that may not be practical for your situation. But that would tell us if its FOG related to hardware related.
Safe travels back to your home site with the weather and everything.
-
@george1421 I can almost guarantee that the problem would not move with the hardware. I have moved rigs from the secondary location to the primary before, as we are slowly downsizing this secondary site and moving stuff to the primary as we sell off video cards. I’ve used motherboard from here to replace boards at the primary location without any problems. I do have an empty rack at the primary location where I can move a couple of the rigs from secondary though and make sure it will work. I will have to do that at some point next week. I can’t have a rig dismantled over the weekend.
-
@george1421 A new discovery today: Apparently FOG at the primary location is running in a Proxmox VM. I was not aware of that. I don’t think it should make much of a difference, but there it is. The FOG server at the secondary location is running on bare metal.
-
@Huecuva FOG server itself can not distinguish between bare metal and VM. So I’m almost 100% positive its not a factor here.
-
@george1421 Yeah, I didn’t think it would make a difference at all. I just figured it might be worth noting. I’ve managed to get the ipxe.efi file transferred to the secondary location without leaving the primary. I’m about to reboot one of the problematic rigs now to see if it will boot properly. Cross your fingers.
EDIT: Well, upon a simple reboot it managed to make it into Windows. However, it was one that I had not previously imaged with FOG. Once I had deployed the image to the rig, it will no longer boot into Windows. This is bizarre.
-
@george1421 Well, I don’t know. I’m also out of ideas. I’m pretty certain that any hardware moved from the secondary location to the primary location will not suffer this issue because, again, I’ve moved motherboards from the secondary to the primary before and they’ve worked. That’s not to say it’s 100% guaranteed to work but I’d say it’s 99.99% certain to work. The ipxe.efi file and all the refind files have not made any difference at all. As much as I would like to get this FOG server working properly at the secondary location, it’s not a priority compared to other stuff I need to do so I’m inclined to just call this a defeat. I generally shouldn’t need to image the secondary rigs that often anyway so, as inconvenient as it is, I think I will just have to physically go to the secondary site to do any imaging.
Thanks for all the help mate, but this site just keeps having strange issues that take too much time to solve. If you do come up with any other ideas, I’m happy to give them a try but I really don’t think moving any hardware is going to solve anything at the site. It’s just a lot of work that I don’t think will be worth doing, especially since the site is slowly being decommissioned anyway.
-
@JJ-Fullmer So I have confirmed that I am able to shut down the machines and have them wake on lan when I schedule a task in FOG, but I can’t make them boot from network if they have received a magic packet WOL request. They just boot into Windows and the task is not executed. I’m not sure how to change that. I know it would be in BIOS but I can’t seem to find anything related to that in the BIOS.
-
@Huecuva On HP computers I believe it’s called
Wake Up Boot Source
or at least it was once. I’m not next to one I can test at the moment. It might not exist on all boards. I’ve seen it on most business oriented machines -
@JJ-Fullmer Yeah there is nothing like that in the BIOS of these MSI gaming boards. I assume it might have something to do with the LAN Option ROM, but other than enabling or disabling that I can’t do anything with that either. I have no idea how to configure it and the mobo manual is no help at all.
I’ve asked the guys on the MSI_Gaming subreddit about it but to be honest I’m not expecting much.
I think this whole endeavor is pretty much a wash.
-
@Huecuva Well poo
I think a solution exists we just gotta find it.
So just want to do a quick review of where we’re at.
The problem is that after imaging it does boot into windows from the fog pxe menu and then it never works again?
Or is the problem that you image and then it doesn’t boot to windows at all? Sounds like you said it seems to work once, then the windows setup changes the boot order (which is something it does withbcdedit
and you can actually do this manually as well (trybcdedit /enum all
to see all the boot options windows can set from command line. You can make it put the network boot first, it’s just a bit of a complicated task)Anyway, since refind did work from usb that means it should work. I have seen it not work from pxe when it didn’t work from usb.
Some things worth trying are as @george1421 mentioned different ipxe.efi files. If the one that is working at the other site didn’t do the trick one of the other included ones (realtek.efi, intel.efi, etc) might make a difference.
Another thing would be to take the refind.conf from the working usb and put that on the fog server. I’d adjust it to not use the gui (can’t remember the exact setting in the file, but I remember it being pretty clear) and see if it works via pxe.
You could also consider using a bootloader such as grub2win and putting a local copy of the ipxe.efi file on each computer’s efi partition. Then when you need it to boot to fog you have a script that changes the grub.conf boot order. But that’s a super complicated approach. I actually do this because I used to have random issues with the windows bootloader and I also like having the menu at each startup to go to fog, uefi firmware settings, or windows. I can walk you through this but I make it part of my image, so you have to get it to boot to network pxe at least once.
-
@JJ-Fullmer The site is definitely not worth the work of setting up a local copy of ipxe.efi, but I guess I can try other efi files from the /tftpboot directory.
What’s happening is not always consistent. There are certain things it does most of the time, but occasionally it seems to decide it wants to switch things up a bit.
When I first registered (not imaged) the hosts on the FOG server, they booted into Windows just fine. And if that’s all I do, they continue to seem to boot into Windows just fine. However, as soon as I either image them or capture an image from one of them, they will no longer boot to Windows. What @george1421 was calling the ipxe menu (which I could have sworn was more closely FOG related) would time out and then the circling Windows loading indicator would appear briefly before the screen would go black and then shortly the whole thing would reboot and do that all over again. Sometimes, the screen would just go black and the monitor would either go to sleep or not (that just seemed to completely random) and it would hang that way until I rebooted it.
Resetting BIOS to defaults and then reconfiguring allowed it to boot past the ipxe menu and into Windows, but only once. After rebooting the machine, it would begin to boot loop again.
-
@Huecuva Sounds like a BIOS/UEFI firmware behaving really badly. PXE booting is probably not used much on gaming motherboards and so people hardly ever report these kind of things to the manufacturer.
On the other hand it’s very interesting you never seem to get this in your primary location. Some things are not adding up and I can understand that you are saying it’s not worth the time for the secondary location.
-
@Sebastian-Roth It does certainly seem like buggy BIOS/EUFI, doesn’t it? It seems that some random rigs at the primary location have also started having this problem now with the exception that where resetting and reconfiguring BIOS at the secondary location does not solve the problem, it does seem to help at the primary.
This is so totally effed up I can feel my hair turning gray. I’m going to see if there is a newer BIOS for these boards and if there is maybe that will fix this? I don’t know. At this point I’m almost ready to just burn both buildings down.