refind not working properly
-
@george1421 A new discovery today: Apparently FOG at the primary location is running in a Proxmox VM. I was not aware of that. I don’t think it should make much of a difference, but there it is. The FOG server at the secondary location is running on bare metal.
-
@Huecuva FOG server itself can not distinguish between bare metal and VM. So I’m almost 100% positive its not a factor here.
-
@george1421 Yeah, I didn’t think it would make a difference at all. I just figured it might be worth noting. I’ve managed to get the ipxe.efi file transferred to the secondary location without leaving the primary. I’m about to reboot one of the problematic rigs now to see if it will boot properly. Cross your fingers.
EDIT: Well, upon a simple reboot it managed to make it into Windows. However, it was one that I had not previously imaged with FOG. Once I had deployed the image to the rig, it will no longer boot into Windows. This is bizarre.
-
@george1421 Well, I don’t know. I’m also out of ideas. I’m pretty certain that any hardware moved from the secondary location to the primary location will not suffer this issue because, again, I’ve moved motherboards from the secondary to the primary before and they’ve worked. That’s not to say it’s 100% guaranteed to work but I’d say it’s 99.99% certain to work. The ipxe.efi file and all the refind files have not made any difference at all. As much as I would like to get this FOG server working properly at the secondary location, it’s not a priority compared to other stuff I need to do so I’m inclined to just call this a defeat. I generally shouldn’t need to image the secondary rigs that often anyway so, as inconvenient as it is, I think I will just have to physically go to the secondary site to do any imaging.
Thanks for all the help mate, but this site just keeps having strange issues that take too much time to solve. If you do come up with any other ideas, I’m happy to give them a try but I really don’t think moving any hardware is going to solve anything at the site. It’s just a lot of work that I don’t think will be worth doing, especially since the site is slowly being decommissioned anyway.
-
@JJ-Fullmer So I have confirmed that I am able to shut down the machines and have them wake on lan when I schedule a task in FOG, but I can’t make them boot from network if they have received a magic packet WOL request. They just boot into Windows and the task is not executed. I’m not sure how to change that. I know it would be in BIOS but I can’t seem to find anything related to that in the BIOS.
-
@Huecuva On HP computers I believe it’s called
Wake Up Boot Source
or at least it was once. I’m not next to one I can test at the moment. It might not exist on all boards. I’ve seen it on most business oriented machines -
@JJ-Fullmer Yeah there is nothing like that in the BIOS of these MSI gaming boards. I assume it might have something to do with the LAN Option ROM, but other than enabling or disabling that I can’t do anything with that either. I have no idea how to configure it and the mobo manual is no help at all.
I’ve asked the guys on the MSI_Gaming subreddit about it but to be honest I’m not expecting much.
I think this whole endeavor is pretty much a wash.
-
@Huecuva Well poo
I think a solution exists we just gotta find it.
So just want to do a quick review of where we’re at.
The problem is that after imaging it does boot into windows from the fog pxe menu and then it never works again?
Or is the problem that you image and then it doesn’t boot to windows at all? Sounds like you said it seems to work once, then the windows setup changes the boot order (which is something it does withbcdedit
and you can actually do this manually as well (trybcdedit /enum all
to see all the boot options windows can set from command line. You can make it put the network boot first, it’s just a bit of a complicated task)Anyway, since refind did work from usb that means it should work. I have seen it not work from pxe when it didn’t work from usb.
Some things worth trying are as @george1421 mentioned different ipxe.efi files. If the one that is working at the other site didn’t do the trick one of the other included ones (realtek.efi, intel.efi, etc) might make a difference.
Another thing would be to take the refind.conf from the working usb and put that on the fog server. I’d adjust it to not use the gui (can’t remember the exact setting in the file, but I remember it being pretty clear) and see if it works via pxe.
You could also consider using a bootloader such as grub2win and putting a local copy of the ipxe.efi file on each computer’s efi partition. Then when you need it to boot to fog you have a script that changes the grub.conf boot order. But that’s a super complicated approach. I actually do this because I used to have random issues with the windows bootloader and I also like having the menu at each startup to go to fog, uefi firmware settings, or windows. I can walk you through this but I make it part of my image, so you have to get it to boot to network pxe at least once.
-
@JJ-Fullmer The site is definitely not worth the work of setting up a local copy of ipxe.efi, but I guess I can try other efi files from the /tftpboot directory.
What’s happening is not always consistent. There are certain things it does most of the time, but occasionally it seems to decide it wants to switch things up a bit.
When I first registered (not imaged) the hosts on the FOG server, they booted into Windows just fine. And if that’s all I do, they continue to seem to boot into Windows just fine. However, as soon as I either image them or capture an image from one of them, they will no longer boot to Windows. What @george1421 was calling the ipxe menu (which I could have sworn was more closely FOG related) would time out and then the circling Windows loading indicator would appear briefly before the screen would go black and then shortly the whole thing would reboot and do that all over again. Sometimes, the screen would just go black and the monitor would either go to sleep or not (that just seemed to completely random) and it would hang that way until I rebooted it.
Resetting BIOS to defaults and then reconfiguring allowed it to boot past the ipxe menu and into Windows, but only once. After rebooting the machine, it would begin to boot loop again.
-
@Huecuva Sounds like a BIOS/UEFI firmware behaving really badly. PXE booting is probably not used much on gaming motherboards and so people hardly ever report these kind of things to the manufacturer.
On the other hand it’s very interesting you never seem to get this in your primary location. Some things are not adding up and I can understand that you are saying it’s not worth the time for the secondary location.
-
@Sebastian-Roth It does certainly seem like buggy BIOS/EUFI, doesn’t it? It seems that some random rigs at the primary location have also started having this problem now with the exception that where resetting and reconfiguring BIOS at the secondary location does not solve the problem, it does seem to help at the primary.
This is so totally effed up I can feel my hair turning gray. I’m going to see if there is a newer BIOS for these boards and if there is maybe that will fix this? I don’t know. At this point I’m almost ready to just burn both buildings down.
-
@Sebastian-Roth Updating BIOS did not solve the problem. I don’t know what’s going on but I do know that I, personally, will never buy an MSI product for this and several other reasons.
-
@Sebastian-Roth Here’s a strange new development that I happened upon today. I’m not 100% sure if this actually has anything to do with the issue, but at my primary location there is a particular rig with the troublesome MSI motherboard that was no having any issues booting past the iPXE menu…until I tried to update the nVidia graphics card driver. Once I updated the driver, then it started having the same boot-looping issue the rigs at the secondary location were having. The rigs at the secondary location all have newer drivers than some of the ones at the primary, since I made sure the image I took down there was up to date. I wouldn’t think the graphics card driver would have anything to do with it, but there it is. It’s really…weird.
EDIT: Confirmed. This particular rig, at least, will not boot past the iPXE menu when the newest driver I have downloaded (456.71) is installed. Despite a BSoD and hard crash during installation, the driver seemed to have installed correctly and the machine was mining and the manager was reporting the correct driver, it would not boot past the iPXE menu and into Windows. When I uninstalled that driver and reinstalled an older one (452.06) it has no problem booting past iPXE and into Windows. To confirm further, I reinstalled both drivers a couple of times just to make sure. The result was the same. I dunno.
-
@Huecuva Thanks for the interesting update! I have read it a couple of times, gave it some time to think about but still have no idea why the nVidia driver can possibly cause such an issue. There is a slight chance the driver changes how the card is being initialized and thus causing a problem but it’s really strange.
Great to hear you figured this out. Though it’s sad the issues comes with the newer driver. There is 457.09 now. Maybe give that a try.
-
@huecuva That is an interesting update for sure. I believe graphics cards have a bios of some sort and sometimes there’s some option rom boot option related to the gpu in the computer’s bios setting. It would make some sense for the graphics drivers to also update the bios or boot option roms on gpus.
What GPU’s do you have? GPU’s of the gtx 10xx series and newer have a ‘studio’ driver option that’s supposed to be the sort of ‘stable’ branch option. If your cards can use that, maybe using that as a standard could help if it’s not something you’re already doing?