System Hangs - Boot from SAN device
-
So I took a look at what service/ipxe/boot.php did, and had a question/concern looking at lib/fog/bootmenu.php. In the constructor, _exitTypes is defined as…
self::$_exitTypes = array( 'sanboot' => $sanboot, 'grub' => $grub['basic'], 'grub_first_hdd' => $grub['basic'], 'grub_first_cdrom' => $grub['1cd'], 'grub_first_found_windows' => $grub['1fw'], 'refind_efi' => $refind, );
Shouldn’t this also include the ‘exit’ type? I was trying to determine why it ignored my request for the ‘exit’ type in lieu of ‘sanboot’, and wasn’t sure if this was why just yet. I am still reading through the code to better understand the process though.
-Dustin
-
Yes it should.
-
Did it get removed at some point? Looking at the GIT trunk reflects this as well: https://github.com/FOGProject/fogproject/blob/2718a13d2bd11d4d9ccd4be7f2f005a67000da3e/packages/web/lib/fog/bootmenu.class.php. What should it be updated to reflect? Because exit is not present in the array, I believe it hits @ l.339 …
if (!$exit || !in_array($exit, array_keys(self::$_exitTypes))) { $exit = 'sanboot'; }
… and defaults to sanboot instead of the selected exit type.
-Dustin
-
I am going to perform a test, defaulting it to ‘exit’ in the event that the type is unknown. I will be right back.
edit: Well, that certainly eliminated the hangup, but the system fails during chain-loading now and enters a reboot loop.
-Dustin
-
@dholtz-docbox I corrected this for the current “working-RC-12” branch in git.
You should have this all fixed if you run:
wget -O /var/www/fog/lib/fog/bootmenu.class.php https://raw.githubusercontent.com/FOGProject/fogproject/0c8cf54f35f694504af9af5e9fcd525d2521ae60/packages/web/lib/fog/bootmenu.class.php
-
Oh, awesome! Let me give it a whirl!
-Dustin
-
First, we aren’t quite there yet. Second, things have progressed I believe.
boot.php now returns…
#!ipxe set fog-ip 10.1.10.42 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} exit
However, there is still a chain-loading issue. I am unsure if this is in the boot class or not - I assume it is, given its proximity. That said, I am not sure at what step it has a chain-loading issue yet.
edit: I am still using RC11, should I try pulling down the dev-RC12 branch in its entirety and applying it?
-Dustin
-
I see where it chains into boot.php in the /tftpboot directory, but am fumbling around determining what it is trying to chain into next still. I feel like this is the end of its chain, but it doesn’t realize it - when using exit.
edit: I stumbled across the following document and am trying it: https://wiki.fogproject.org/wiki/index.php?title=Boot_looping_and_Chainloading
-Dustin
-
@dholtz-docbox Based on what I can see, you’re running in “no-menu” mode?
-
@Tom-Elliott : Correct.
-Dustin
-
@dholtz-docbox So when is it getting the “chainload” error?
-
@Tom-Elliott : After executing boot.php. Let me go take a picture of it with my phone.
-Dustin
-
You might need to try:
http://fogserverIP/fog/service/ipxe/boot.php?mac=<macofhosttryingtoboot> (Of course replacing the <macofhosttryingtoboot> with the mac of the host trying to boot)
-
Yeah, that was what the output below was from, actually.
#!ipxe set fog-ip 10.1.10.42 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} exit
edit: I hit ‘s’ to enter PXE before booting, otherwise it stated that chainloading failed…
-Dustin
-
@dholtz-docbox Can you check the apache error logs then? If the error is still “chainloading…s to continue” or whatever it is, likely there’s some error being displayed that’s “breaking” things right now.
-
@Tom-Elliott : Okay. Let me purge this log and get a clean log for you.
-Dustin
-
@Tom-Elliott : I unfortunately have no errors in my /var/log/apache2/error.log file after cleaning it and running through the process again. Would they be generated somewhere else?
-Dustin
-
I tried playing around with other types, and none appear to work. The machine uses GRUB boot loader, so I thought I would try to boot into GRUB, but it leaves me in the GRUB interface where I appear to be able to do nothing. I am not quite sure how to configure this so that it boots into the drive directly when no tasks are present.
-Dustin
-
I have new errors after reformatting the machine in question and starting over. To iterate, the machine has two drives, is running Ubuntu Server 14.04, and is a LVM installation. The first drive is ignored, so I am using Single Disk - Resizable, setting its Primary Disk to /dev/sdb. Last, the host is set to exit its BIOS w/ type SANBOOT.
If there is a task to capture an image, there is no issue from what I can tell. The machine boots into Partimage and begins cloning /dev/sdb without issues. However, when I go to boot into the system w/o tasks, the system NEVER boots properly.
So… the new error. Upon reformatting, I switched the BIOS exit type back to SANDISK - after exhausting the initial selections prior. This time when I boot into the system, I am prompted with the following messages…
Booting from SAN device 0x80 Boot from SAN device 0x80 failed: Exec format error (http://ipxe.org/2e852001) Could not boot: Exec format error (http://ipxe.org/2e852001) Could not boot: Exec format error (http://ipxe.org/2e852001)
Looking into what this error means, so far I am reading that it regards the kernel. This doesn’t make sense to me. I am not sure why the system can’t just resume its boot process, I guess; or what the impediment is - when using BIOS exit type = ‘EXIT’ - if that is the standard course of action.
-Dustin
-
Okay… I may have solved this at the end of the day, finally.
I believe it was a sequence of a few items not matching up once I tested certain phases. In the end, I had to switch the boot priority so that the drive with the OS had priority over the data drive. I tried this earlier, but it was after already moving past SANBOOT and when looking into EXIT as an option instead. Now that I have corrected the drive and changed the boot priority, SANBOOT is functioning as intended. I will do a few more tests before I officially resolve this, so I will post back by tomorrow morning on my results - time is limited today.
-Dustin