Windows 10x64 does not boot after restore (sporadically)
We have an Intel NUC that runs Windows 10x64 version 1809.
We use FOG version 1.5.8 to manage it.
We use UEFI boot.
The machine is restored several times per day.
Occasionally, booting the machine fails. We have investigated this issue and found out the following:
- The deploy job finishes normally, the machine then reboots
- The FOG boot screen shows up
- Afterwards, the “Intel NUC” screen appear, i.e. a black screen with “Intel NUC” in white letters. Normally, the Windows 10 “spinning wheel” is displayed below the text, and after a few seconds, the Windows login screen appears. However, in the cases of failure, nothing happens, and the Intel NUC screen stays there indefinitely.
(I cannot upload the screenshots I have because of a bug in the Forum page)
We are wondering whether there is some hardware defect, because we have no clue why this is happening.
Has anybody seen this issue before?
Replacing the hardware solved the issue.
So it was either a hardware defect, or a bug in the firmware.
@abulhol Any luck with the new Nuc?
@george1421 I also have a theory that it could be sysprep related, but that’s harder to troubleshoot so I figured why not try the other things first.
@JJ-Fullmer Just a short update on this issue: The NUC’s BIOS was outdated, and updating it failed for unknown reasons. We now replaced the NUC with a brand new one and hope to see the issue gone.
@abulhol While this is a bit off point, I just rebuilt an intel NUC NUC5i5 with MDT. It built fine with MDT rebooting many times. I just ran sysprep powered it off and then back on and it sits at the NUC logo. I bounced through the bios setup exited and then it booted into the spinning marbles of Windows 10 setup.
@JJ-Fullmer and @george1421 thank you so much for the immediate responses. I will follow your advice and see if I can fix it.
Regarding the use case, this is for doing malware analysis on bare metal machines (resetting the machine after each analysis).
I realize I gave a lot of possibilities there, but do try to only attempt one thing at a time.
- Start with bios updates/rollbacks
- Adjust bios fastboot and other settings, one at a time,
- Different boot option after fog imaging
a. legacy pxe boot
b. use wake on lan/manual network boot
c. use a uefi shell to manually boot to windows after imaging
d. create a local bootmanager
Once one of these things works we’ll see if we can help you come up with a full solution
When you say restored several times per day, do you mean you are re-imaging it several times a day? I’m curious about the use case on that.
I have seen this happen on some integrated systems like NUC’s computer sticks, and soc based tablets that we’ve tested at my work. I think sadly it may just be a hardware issue and they just freeze up sometimes, and a reboot fixes, but that’s more of an acceptable answer if it’s a $100-200 computer with a SoC cpu (I use acceptable loosely there).
I think that @george1421 is probably right that it’s going to be related to refind boot through fog, recently I’ve been finding it less reliable. I’m not sure if it’s something changed in the refind code, the iPxe code, or windows boot stuff. It would take quite a bit of digging to figure out which of those things have had changes that could have caused this. There are some ways to test if this is the problem.
- Try using legacy pxe boot if it’s supported (you can still image and boot to uefi as I recall)
- Try not having it boot to fog after the image is complete (i.e. set the boot options to harddrive/windows boot manager first, and use wake on lan to remote boot source or a manual boot option hotkey to get to the network boot)
- You can set up a local version of a bootmanager such as grub2forwin or refind and put a local copy of the ipxe.efi file on the machine so that you have a way to boot to fog without the download and boot to the ipxe.efi file to get to fog and then boot to windows.
- If supported, use the built in uefi shell (if you boot to a refind usb or locally installed refind it also has a uefi shell). Then use
lsincrementing the number till you find the microsoft EFI folder. Then you can launch the microsoft efi boot file from that shell at
.\EFI\Microsoft\Boot\bootmgfw.efi(slashes might be the other way, I’m doing this from memory)
Point is, try a different boot method and see if that makes any difference.
Also check for any bios updates, or if supported maybe a bios rollback if it maybe worked better before?
Also check bios options and try enabling/disabling things like fastboot (enabling it may disable some drivers that refind searches for, but that’s just my theory) and any other setting that might relate.
Hope one of these suggestions help.
I guess with the NUCs I would recommend, make sure the bios (firmware) is up to date.
So when these systems do this, is the only way to fix them is reimage them?
When these systems do this does the disk show up as a bootable device in the firmware? If the firmware doesn’t see the proper boot file on the first disk partition it won’t boot.
Lastly if you have the nuc boot through the fog IPXE menu, and you have the NUC registered with FOG with the exit mode of refind, does refind find the disk properly to boot? This test is to see if the disk structure is intact, just something is wrong with the efi boot partition.