refind not working properly
-
@Huecuva said in refind not working properly:
there was no
/var/www/fog/html/service/ipxe
I guess there is a typo in this. It should really be
/var/www/html/fog/service/ipxe
…Please run the following commands on your FOG server and post output here:
ls -al /var/www/ ls -al /var/www/fog/ ls -al /var/www/html/
-
@Sebastian-Roth Oh My goodness, how embarrassing. Yes it should have been
/var/www/html/fog/service/ipxe
@Huecuva The show platform thing should give you a better idea for sure on what the problem is.
And we do still want to know what the dhcp options are set to. That tells us how you’re getting your computers to boot to the fog server. One is pointing it to your fog server as a tftp server and the other tells us which pxe bootfile you are using. Sometimes a different pxe bootfile can make a difference in the boot behavior which is why a few options are provided. Most of the time the default ipxe.efi option does the trick for uefi options.Another thing you could try is create a bootable usb with refind. I suggest using rufus (https://rufus.ie/) to get the file on the usb but there are many ways. Here’s a link to how to get the different refind versions http://www.rodsbooks.com/refind/getting.html. I would go ahead and try the newest version. Usually if you can boot to a version of refind from usb, then it will work the same when booting from the network. I say usually as I have seen it work on a usb boot and then not via network, but if that happens it still helps to narrow down where the problem is. I would suggest trying the latest version (which I assume is what is included with fog 1.5.9) and see if it boots. If it doesn’t then go back to 0.11.0 and see if that helps. If none of them work, then perhaps contacting the refind developer with your hardware info would be wise to let him know it’s not working.
As another workaround option (hopefully we find a full solution though) you could see if your uefi firmwares support a wake on lan boot option. i.e. you set them to boot to network if they get a wake on lan packet, but the boot order for normal startups stays as the hard drive. Then when you image a computer you shut it down, set the wake on lan checkbox when deploying the image from fog, and let the wake on lan do the trick. Some computers this works and some give you a popup asking if you want ipv4 or ipv6 pxe, if you get that pop up then you’d need it to have an option to disable the ipv6 option so it just goes from WOL to ipv4 pxe boot. It’s for sure easier to just have network boot as the first option, but this is a workaround I employed before finding my refind solution.
-
@Huecuva Lets keep it simple for the moment. Lets make sure we fully understand how this second fog server is setup (since it is acting differently than the main site). Knowing they are 2 independent servers eliminates many of the potential issues because now we know the “problem” is localized to this new FOG server and its environment. Also what iPXE thinks about the target computer is important. I don’t want to chase something for several hours and have it be the CSM issue again. So knowing what exactly is configured for dhcp options 66 and 67 is important as well as what device is the dhcp server. I may ask you to capture some network packets so we can see exactly what the target computer is telling the dhcp server. If you know how to use wireshark we can get this answer in about 5 minutes. I don’t want to go this route until we fully understand the environment.
These are very contemporary mobos so they may be doing something we don’t expect in firmware simply because we don’t see them in a typical enterprise environment.
-
@Sebastian-Roth
$ ls -al /var/www/
total 20
drwxr-xr-x 4 root root 4096 Oct 14 19:58 .
drwxr-xr-x 14 root root 4096 Oct 14 19:53 …
drwxr-xr-x 11 www-data www-data 4096 Oct 22 19:49 fog
drwxr-xr-x 4 root root 4096 Oct 19 18:17 html
-rw-r–r-- 1 www-data www-data 52 Oct 14 19:58 index.php
$ ls -al /var/www/fog/
total 412
drwxr-xr-x 11 www-data www-data 4096 Oct 22 19:49 .
drwxr-xr-x 4 root root 4096 Oct 14 19:58 …
drwxr-xr-x 2 www-data www-data 4096 Oct 14 19:58 api
drwxr-xr-x 2 www-data www-data 4096 Oct 14 19:58 client
drwxr-xr-x 2 www-data www-data 4096 Oct 14 19:58 commons
-rw-r–r-- 1 www-data www-data 370070 Oct 14 19:58 favicon.ico
lrwxrwxrwx 1 www-data www-data 13 Oct 14 19:58 fog -> /var/www/fog/
drwxr-xr-x 2 www-data www-data 4096 Oct 14 19:58 fogdoc
drwxr-xr-x 3 root root 4096 Oct 22 19:50 html
-rw-r–r-- 1 www-data www-data 572 Oct 14 19:58 index.php
drwxr-xr-x 13 www-data www-data 4096 Oct 14 19:58 lib
drwxr-xr-x 10 www-data www-data 4096 Oct 14 19:58 management
drwxr-xr-x 3 www-data www-data 4096 Oct 14 19:58 service
drwxr-xr-x 2 www-data www-data 4096 Oct 14 19:58 status
$ ls -al /var/www/html/
total 28
drwxr-xr-x 4 root root 4096 Oct 19 18:17 .
drwxr-xr-x 4 root root 4096 Oct 14 19:58 …
drwxr-xr-x 7 root root 4096 Oct 19 18:17 admin
lrwxrwxrwx 1 root root 13 Oct 14 19:58 fog -> /var/www/fog/
-rw-r–r-- 1 root root 10918 Oct 14 19:53 index.html
drwxr-xr-x 2 root root 4096 Oct 19 18:17 piholeIt appears that I have both /var/www/fog/service/ipxe and /var/www/html/fog/service/ipxe directories and their contents appears to be identical. Is there a symlink or something. I didn’t even know there was a /var/www/html/fog/service/ipxe directory but the date of the files I changed matches that in /var/www/fog/service/ipxe.
-
@Huecuva
Yes,/var/www/fog
is a symlink to/var/www/html/fog
.This is created by the fog installer for backwards compatibility. The default path for httpd/apache sites in all linux distros used to be
/var/www
but it changed to/var/www/html
a few years ago.The symlink is maintained in case of any code (internal for fog or customized by users) doesn’t get broken if it’s still pointing to a path that starts with
/var/www/fog
-
@george1421 I thought the FOG server at the primary location was running at least FOG 1.5.7 but it appears to be running 1.5.6.
I don’t know how to find what DHCP options 66 and 67 are. Honestly, I don’t even know what that means.
I’m at the primary location right now. Unfortunately, the weather took a serious turn for the worse (as in we went from zero snow to like 4 inches overnight and it’s still snowing) so I’m not sure if I will make it down to the secondary location today. If that is the case I won’t be able to try your hack or tell you what the DHCP server is. Though if I do make it down there, I will try to post what I can. Otherwise that will have to wait until next week.
I’m not particularly familiar with wireshark, I’m afraid.
-
@Huecuva Ok no worries, do you have remote access to the other location? If you do there are still some things you can test. Do you have a tech at the remote location, or at least someone who knows how to pxe boot one of these computers? That is all we need to collect the rest of the data.
-
@george1421 I can remote to the secondary FOG server via SSH through the RDP into the mining manager there, but unfortunately there is no one on-site there to do any PXE booting of the rigs. I am the only one administering this mine at either location.
A strange new development, however: out of the blue, for no discernible reason whatsoever, a couple of the MSI rigs at the primary location randomly started having this issue. I guess they decided to reboot for some reason and when they wouldn’t come back online I plugged a cart into one of them and it was boot looping like the ones down at the secondary location. On a whim, I reset BIOS to defaults and reconfigured it and it worked. The same for the other one. I guess that’s another thing I can try at the secondary location. If that fixes the problem…
EDIT: I think I’m going to head down to the secondary location shortly here. There’s nothing else I can do from here.
-
@george1421 Alright. I am at the secondary location now. I’m going to try resetting and reconfiguring BIOS on one of these rigs first and see how that goes.
-
@Huecuva Ok if that doesn’t get it we can do a deep dive into the settings. From your end you will just need to probably install tcpdump on the fog server, run a command, then pxe boot the target computer. You can post the pcap to a file share site and post the link here. Lets first see the outcome of the bios reset.
-
@george1421 Unfortunately the BIOS reset did not behave. At first it appeared as if it was going to work. The machine booted into Windows after the FOG menu but when it was rebooted again, it once more started goofing off. Also, it seems these motherboards have an annoying habit of automatically changing their first boot priority back to the local Windows boot manager randomly.
I tried your hack just now. I tried adding that line to the beginning of the default.ipxe file and nothing changed, so I made a backup of that file and included only those two lines. The result was:
tftp://192.168.9.1/default.ipxe... ok builtin/platformstring = efi Chainloading failed, hit 's' for the iPXE shell; reboot in 10 seconds
192.168.9.1 is the IP of the FOG server. I will now replace the default.ipxe file with the backup.
EDIT: To answer another of your questions, it appears that my DHCP server is just a CISCO 1900 series router. CISCO1941/K9 I think.
-
@JJ-Fullmer Creating a bootable USB with rEFInd 0.12.0 (the latest listed here) and trying to boot from that resulted in immediate rejection and attempting to load Windows and another black screen.
Creating a bootable USB with rEFInd 0.11.0 and booting from that immediately brought up the rEFInd menu without any issues.
-
@Huecuva said in refind not working properly:
builtin/platformstring = efi
Great now we know a bunch of things that are right with your setup.
- PXE booting is working as it should
- Your computer is in uef mode,
- Your dhcp server is setup correctly
- Ipxe.efi is being sent to the client
- The exit mode we should be working with is EFI Exit
So now on the remote FOG server did you swap out the refind.efi and the other three with the downloaded version 0.11.0? Don’t forget about resetting the default.ipxe file back to the original version.
I see from the usb booting side you get the refind menu. If your fog server is configured with the global efi exit mode of REFIND then when the iPXE menu times out you should at least get the refind menu. Is this not happening?
-
@george1421 I had already copied the 0.11.0 refind files to the FOG server but I did it again just for sanity’s sake. When that again failed to make a difference I decided to try resetting BIOS again. Again, the first time it worked, but when I rebooted it again I noticed it booted straight into Windows so I went into BIOS and changed the NIC back to first priority because it had reversed the priorities by itself. Then it counted down in the FOG menu, began to load Windows and then the screen went black and the monitor went to sleep. It’s still sitting there like that.
How do I set the global efi exit mode? Is that under FOG Configuration -> iPXE General Configuration -> Boot Exit Settings -> Exit To Hard Drive Type (EFI)? If that is where it is, it is already set to REFIND_EFI. If that’s what it’s supposed to be then no, I am not seeing the REFIND menu.
-
@Huecuva Yes that is the right location for it. Is the refind.conf file the same one that was setup by FOG or did you copy over the one from the zip file. The right answer should be the one delivered by FOG.
first priority because it had reversed the priorities by itself.
The windows installer will do this for you, even if you don’t want it to.
-
@george1421 I did not copy the one from the zip file as that one says it’s a sample. I had, however, previously copied the refind.conf file from the FOG server at the primary location and brought it down to the secondary location so that’s the refind.conf file it is using. I still have a
.old
version of the secondary FOG server’s original refind.conf file, if you think I should put it back.EDIT: Ugh. Why does Windows have to suck so bad?
-
@Huecuva said in refind not working properly:
I should put it back
I would put it back only for the sake of us understanding what the configuration is. I don’t think its going to help in this case, but we know the one that is shipped with 1.5.9 works.
So you have these target hardware at the main office? Same bios version and such or is this hardware only at the remote site? I’m trying to understand why usb booting into refind 0.11.0 works and transferring it via iPXE is failing for us.
-
@Huecuva Ok the other variable here between FOG 1.5.6 and 1.5.9 is the version of ipxe that is being used. (again I’m grabbing at straws to explain why the main site acts one way and the remote site acts differently assuming the target hardware is exactly the same). If you have access copy over ipxe.efi from the 1.5.6 site to the remote site its in the /tftpboot directory. Make sure you save the 1.5.9 version if ipxe.efi just in case. With that file in place the two servers should be operationally equivalent at least in regards to pxe booting and exiting to disk.
-
@george1421 Okay, I just replaced the refind.conf file with the original.
The hardware at the main office is the same. MSI Z170A Gaming M7 motherboard with the same version of BIOS. The only differences between some of these rigs with the same motherboards (or even the ones with Biostar boards, for that matter) is that some are running Pentium G4400s and some are running Pentium G4650s. They’re all running a bunch of GTX 1070 or GTX 1070 Ti cards or a combination of the two.
-
@george1421 Unfortunately, I cannot access the primary FOG server from the secondary location and that particular step will have to wait until I’m back at the primary location next week. Then I will likely have to put it on a flash drive and manually drive it down here.
I just made a backup of the 1.5.9 ipxe.efi file though, so that’s already done.
If that’s all that can be done for now, I guess I might as well head home. There is nothing else I can do here.