Fog fails to boot from hard disk after image upload
-
Server
- FOG Version: 1.4.3
- OS: Ubuntu 16.04 Server LTS
Client
- Service Version:
- OS: Windows 10
Description
After completing an upload task on a host with windows 10, the machine becomes unbootable via fog. The fog system attempts to boot from hard disk but ends up stalling on a blank screen with a single blinking line.
I verified that the image / hard drive is fine by skipping the network boot and manually booting into the hard drive. So it seems like something is wrong on the fog end.
I was wondering if you guys had any ideas?
-
@thebrennan47 We have several different people reporting the same issue but we are still trying to work out where this comes from. I see you are using FOG 1.4.3, so this does not seem to be an issue with one of the RC releases I reckon ( @george1421 ).
Please tell us a bit more about your windows 10 installation:
- Which windows version/build exactly is this?
- Is this a clean install or upgraded from another version?
- Is it on MBR or GPT?
- Did you install in UEFI mode or legacy BIOS mode?
- Did you change from UEFI to BIOS or vice versa at some point?
- Which exit style settings (for UEFI and legacy BIOS) did you try so far? GRUB or SANBOOT might work for legacy BIOS installs and rEFInd should work for UEFI installed systems.
-
@Sebastian-Roth. I tried a deploy as well, and the machine I deployed to is experiencing the same issue it seems.
- Which windows version/build exactly is this?
Windows 10 Enterprise version 1607. Build: 1493.1066 - Is this a clean install or upgraded from another version?
Clean - Is it on MBR or GPT?
MBR - Did you install in UEFI mode or legacy BIOS mode?
Legacy - Did you change from UEFI to BIOS or vice versa at some point?
I havn’t changed that setting in the BIOS. I have left it as the default (Legacy) - Which exit style settings (for UEFI and legacy BIOS) did you try so far? GRUB or SANBOOT might work for legacy BIOS installs and rEFInd should work for UEFI installed systems.
Actually not sure on this one. It is a windows host so GRUB wouldn’t apply? OR are you referring to the storage node here?
EDIT: I found what you were refering to. It is the FOG_EFI_BOOT_EXIT_TYPE setting correct? It is currently set to REFIND_EFI. Also, the FOG_BOOT_EXIT_TYPE is currently set to SANBOOT
- Which windows version/build exactly is this?
-
@Sebastian-Roth Alrighty so switching it from SANBOOT exit to GRUB allowed the original host from the capture operation to boot in correctly.
However, now when I try to deploy and boot into one of the other hosts, the process gets stuck with only the following three lines visible
Launching GRUB…
begin PXE scan…
Starting cmain()…I double checked the bios on these hosts as well and everything seems to match up. Any ideas?
-
@sebastian-roth I setup a new dev server last night and installed FOG 1.5.0RC8 on it. I had a test image of Win10-1703 that I deployed to a test laptop (Dell e6430) that was running in bios (legacy) mode. I deployed the image from my production server running FOG 1.4.4 and the image deployed properly and pxe booted properly with the e6430. I then changed the pxe boot server to the FOG 1.5.0RC8 server. Again it pxe booted properly and exited using the SANBOOT option. All defaults for FOG 1.5.0RC8 (TBH I don’t think I configured anything on 1.5.0RC8 post install).
So I can’t prove there is a pxe booting issue as it were. So what does that mean, really nothing. My experience is only one data point in this problem.
Since I don’t have a win10 uefi image (at the moment) I setup a task sequence in MDT to build a Win10-1703-EFI image. That image should be done and ready to capture in about 1hr. I’ll setup the experiment again to deploy to that e6430 in efi mode this time and see what we can find.
-
@thebrennan47 said in Fog fails to boot from hard disk after image upload:
I double checked the bios on these hosts as well and everything seems to match up. Any ideas?
Please check the BIOS settings again. IDE/SATA/AHCI setting.
However, now when I try to deploy and boot into one of the other hosts, the process gets stuck with only the following three lines visible
So to prove me wrong, would you be so kind and do a fresh clean install of windows 10 on this very machine that hangs on
Starting cmain()…
. Don’t change anything in the BIOS (if possible don’t even enter BIOS to change boot order for CD install but simply choose CD as temporary boot device - which most BIOS can do) and see if PXE handover to boot from disk is working then. I kind of doubt it will.As well you might want to try all the different exit styles available to see if one is working for you. But possible there is one working for some devices and another one for others (because of different BIOS settings?!). So you can also adjust this setting per host.
@george1421 Thanks a lot for testing this. Looks as if it is working fine as expected. Let’s see how you go with an UEFI system. As I said, I already tried once and it worked gread with rEFInd!
-
@sebastian-roth There was a thread today where the OP had a new Dell 7720 laptop. We did have to switch that from raid mode to ahci mode because the workstation was in uefi mode. We may want to dig into why that switch is needed at some point too. I’m not saying the conditions are similar but if the target computer is in uefi mode and raid is enabled its possible that iPXE is having the same issue as FOS.
-
I am going to try a manual install like you suggested and see what happens — but while I do that I do have a few updates
All of the hosts are Dell Optiplex 7040’s, and I was able to verify that the BIOS settings are the same.
Some details of what I have configured in the BIOS:
- Secure Boot: Disabled
- Boot List Option: Legacy
- Boot Sequence Spot 1: Onboard NIC
- Boot Sequence Slot 2: HDD (I only have 1)
- Advanced Boot Options - Enable Legacy Option ROMS: Enabled
- Advanced Boot Options - Enable Attempt Legacy Boot After UEFI: Disabled
- SATA Operation: RAID On (I am not using RAID so I could theoretically set this to disabled if it makes a difference)
Also, I tried testing out the different exit options, and I got different behavior when selecting GRUB - ‘First Found Windows’
With this option, I get the following message: BOOTMGR image is corrupt. The system cannot boot.
The original host from the capture boots just fine, so it seems that maybe the image is getting corrupt somewhere in the capture and deploy process??
I will follow up again once I test a fresh install of windows 10 and see if the FOG PXE Boot works afterwards.
-
@thebrennan47 I have a few 7040s in our staging area. I will grab one in the AM and configure it the same as you have outlined above. One question do your 7040s have nvme disks or sata? Ours have nvme (I think).
-
@george1421 Awesome, thanks again for all of the help with this.
Our 7040’s are set up with SATA drives
-
@thebrennan47 said in Fog fails to boot from hard disk after image upload:
With this option, I get the following message: BOOTMGR image is corrupt. The system cannot boot.
To me this sounds as if the system on disk is corrupted anyway. Please try booting this machine straight from the disk. Does it load windows properly??
-
Well I can’t prove that its broken here. I deployed a Win10 1703 reference image to a 7040 in bios mode and it boots correctly through an pxe boot. I built a UEFI version of Win10 1703 and deployed that to the 7040 in uefi mode. First to the nvme disk and second to a sata ssd. Both booted through pxe to the OS. I tested these configurations with both our production server running FOG 1.4.4 and a dev box I spun up running FOG 1.5.0RC8.
In uefi mode right after iPXE exits and before Windows starts to boot there is some text and a blue banner displayed. I’m a quick reader but its only on the screen for a split second before Windows boots, so I missed even the context of what it said.
On our production server I’m sure I mucked with refind.conf file at some point in time. The 1.5.0RC8 was left as it was when I unboxed it. I did nothing to 1.5.0RC8 other than install it and login to register the test system.
The 7040 has firmware 1.4.5 installed I know there is a later bios release, but that is what was installed on the box. When I registered the system I went into the host configuration and set the exit mode for bios to SANBOOT and uefi to rEFInd.
Now this test may be flawed since I captured and deployed both images (win10 bios and win10 uefi) using our production server (1.4.4) and only used 1.5.0RC8 to pxe boot through. When I setup the test system I didn’t give it a big enough HD to take the 25GB Win10 image. I’ll have to rebuild the test server again to be able to capture the image from our reference VMs. I’ll do that after the holiday.
-
@thebrennan47 So is this still an issue?
-
Sorry for the late response on this guys.
The issue with this turned out to be with the OEM recovery partition that came with our dell machines. For whatever reason fog didn’t like it. Removing this partition fixed the issue and we now have the new fog server, and all the new hardware synchronizing and imaging correctly.
Thank you guys again for all your assistance in this thread (and the others). This forum has been extremely helpful