Workstation does not reboot after imaging
-
All,
I have been happily using FOG for about a year now with the legacy boot options. We just got some new Dell notebooks in that are UEFI only (Latitude 5500). All I had to do was change the boot file section on my SonicWALL to read “ipxe.efi” and my Dell laptops now boot to UEFI PXE without issue. I am having 2 problems. The first problem may be due to my unattend.xml or a Windows update issue in the image as I am getting the never-ending spinning dots at the “Just a Moment” blue screen. The second problem is all 3 notebook models I have tested (Latitude 5500, Latitude E5450, and Latitude 7212 Rugged) halt after the Partclone imaging process with the following message displayed on the screen:
In Progress * Rebooting system as task is complete
reboot: Restarting systemAt first I though the “Just a Moment” hang problem I was having on the new 5500 was being caused by this refusal to reboot, like some of the imaging process was not quite done; however, when I imaged the 7212 I had no issues outside of it hanging with the same message after the Partclone imaging process. Maybe I’m just being impatient; I let the workstation sit at the Restarting system message for about 15 minutes before force reboot (FYI, after forcing reboot, the system does boot; however, the SetupComplete.cmd looks like it didn’t run i.e. device manager is a mess as if the drivers did not get installed, some applications I install with SetupComplete.cmd are not installed, etc.)
I am currently treating the “Just a Moment” issue as a problem with my unattend or Windows update problem as the 7212 system I was able to successfully image did not have this problem (it uses a more up to date golden image and unattend file). I have only mentioned it to be thorough. Here are some server version and some setting that may be pertinent. If you need anything else, please let me know.
Fog 1.5.7
Kernel 4.19.48 (also tried 4.19.64 and had the same issue)
ACPI=off
Boot File is ipxe.efi -
First the easy bit. If your sonicwall doesn’t support dynamic boot files, you might consider installing dnsmasq on your FOG server to supply the dhcp options 66 and 67 via the ProxyDHCP protocol. I have a tutorial for it here: https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server The tutorial is centos/rhel centric but just replace the yum command with apt-get and the rest should be easy to follow. It takes about 10 minutes start to finish to set it up. Its well worth the effort so you can dynamically boot both uefi and bios systems without having to update your dhcp server.
Since you are just starting out with UEFI, you need to be aware that you can not install (and expect it to boot) a bios captured image onto a uefi based computer. The same holds true for a uefi captured image, it won’t boot on a bios based computer. On my campus I have a bios and uefi base image (2) that are built exactly alike. One for uefi and one for bios.
For the partclone error, the developers would really like to see a clear snapshot of the error message taken with a mobile phone. The context of the error is almost as important as the error itself.
One other thing since you are first starting out with uefi. I see you have dell computers. There is a known and documented issue with linux, uefi, and Dell computers where the hard disk mode is Raid-On. The issue is that linux will not be able to see the disk behind the disk controller. If you switch the disk mode to ahci mode (assuming that you are not using disks in a raid configuration) it will image properly. You can leave the disk mode ahci or switch it back after imaging, but before windows OOBE first boots. That decision is yours.
The error on the screen (in the picture) appears to be during some kind of post install script. Is that correct?
-
@george1421 thank you so much for quick reply. My apologies for the blurry picture. I have added a new one below. The error in the picture happens just after the drivers are detected and pulled. This is not a post install script, it happens just after FOG’s built-in driver detection that runs just after Partclone finishes.
If I have a problem getting my other laptops that support legacy boot to work, I may setup dnsmasq as my SonicWALL does not support dynamic boot files, but I would like to switch everything over to UEFI since it saves me from having to go into every BIOS and changing from UEFI to Legacy and back after the deployment is finished.
All of my images are BIOS captured (from a VMware VM) and BIOS deployed. After deployment, I access BIOS, disable legacy, enable Secure Boot, and boot UEFI with no issues. We have 17 different models of laptops/desktops we do this with (the 5500 would make 18 if it worked - model list below image). This time, I BIOS captured the image then deployed with UEFI on and Secure Boot off to 3 different models and had mixed results. All laptops hung at “Restarting systems” (screenshot below). The 7212 deployed without issue, the E5450 deployed and booted to OS, but SetupComplete.cmd didn’t run, and the 5500 hung at Just a Moment with a hard reboot allowing it to boot to OS, but SetupComplete.cmd didn’t run.
Several of my laptops get imaged using Legacy deploy with RAID-On enabled and get switch to UEFI when imaging is complete. Any laptop that we get the no HDD error on we switch to AHCI to get around this, image using Legacy and switch to UEFI when imaging is complete.
With all this said, can you please clarify …" you can not install (and expect it to boot) a bios captured image onto a uefi based computer." Are you saying I can’t legacy capture and legacy deploy then switch to UEFI? If so, this has been working great for me for a year on all 17 models. OR, are you saying that I can’t legacy capture and UEFI deploy? If so, this is likely my problem as I am getting mixed results as described in my 3rd paragraph.
Latitude
3500
5480
5491
5500 (not working)
5580
5590
7212 Rugged
7490
E5450
E5470
E5540
E5550
E6440OptiPlex
3050
3060
5050
7060Precision
3510 -
@greichelt said in Workstation does not reboot after imaging:
The error in the picture happens just after the drivers are detected and pulled. This is not a post install script, it happens just after FOG’s built-in driver detection that runs just after Partclone finishes.
The text “Identifying hardware … XXXX identified” makes me think this is a post install script because FOG doesn’t give a rip about windows drivers. ref: https://forums.fogproject.org/topic/11126/using-fog-postinstall-scripts-for-windows-driver-injection-2017-ed Look in the fog.copy.copydrivers section there is a line
dots "Verifying we've found the OS disk"
that aligns with the picture. The point of this is the error is coming somewhere in the post install script. That post install script is very similar to my unpublished post install script. But I would look in /images/postinstall directory at your scripts somewhere after the preparing drivers text. After the post install script is executed control is returned back to FOG to close the task on the server then fog exits. It appears what ever is going wrong in the post install script is causing the reboot.I may setup dnsmasq as my SonicWALL does not support dynamic boot files, but I would like to switch everything over to UEFI since it saves me from having to go into every BIOS and changing from UEFI to Legacy and back after the deployment is finished.
Setting up dnsmasq on the fog server will make your life easier while you transition to uefi.
All of my images are BIOS captured (from a VMware VM) and BIOS deployed. After deployment, I access BIOS, disable legacy, enable Secure Boot, and boot UEFI with no issues.
I find this impossible to believe because the disk formats are different between bios and uefi, but if you say you have it working, wonderful. My experience is a bit different.
With all this said, can you please clarify …" you can not install (and expect it to boot)
My experience is that if you capture a bios image it will not operate on a uefi system. UEFI doesn’t use the mbr boot blocks to boot but requires a specific path to the uefi boot loader set. Its possible that your dells are seeing a bios image and switching modes dynamically. UEFI systems boot by looking for the boot loader on the first partition in /EFI/BOOT for a file called BOOTX64.EFI or BOOTX32.EFI (for a uefi system). As I said, if its working for you with your process, wonderful. For me we don’t have that joy. So we use MDT to create 2 identical images one build on a bios vm and one built on a uefi VM. Capture and deploy with FOG.
-
@george1421 Regarding dnsmasq, you’ve talked me into it. My only question is on step 1 where you have us “remove pxe boot information from your router”. Under the Advanced tab, my SonicWALL has 3 fields (screenshot below). Which field should I remove?
Regarding the post install script, you kickstarted my memory. I built this setup a year ago, so I forgot. I thought the driver install portion was built-in to FOG, but I remember following one of your posts to get driver injection working (this one - https://forums.fogproject.org/topic/11126/using-fog-postinstall-scripts-for-windows-driver-injection-2017-ed). This has been working flawlessly for all of our 17 laptop desktop models using legacy boot and undionly.kpxe. The issue with it hanging at “Restarting system” only started today when I updated the Boot File line on my SonicWALL DHCP serverr from undionly.kpxe to ipxe.efi. I bet if I switch it back I will not have a problem with it hanging. With that said, I will dig into the post install script and see what I can find.
Regarding your lack of belief , before you said anything, I had no idea it was a known issue switching from legacy to UEFI after my imaging process. This is the process we came up with a year ago when we setup FOG. It always worked, so I never questioned it. I’m absolutely dumbfounded that we have never had an issue with it until now. I’m certainly not dismissing your recommendation, in fact, after setting up dnsmasq, I am going to switch my VMware VM to UEFI, recapture, and test.
-
@greichelt Not sure if this has been tried yet but there are a couple of different options for the kernel parameter
reboot=
that you can try: https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.txt#L4194 -
@Sebastian-Roth This one is a bit over my head. Where do I go to find and edit this file?
-
@greichelt It’s not a file. You can add kernel parameters to every host object in the web UI individually. Edit the host’s settings and set
reboot=...
in Host Kernel Arguments. Try things likereboot=warm
orreboot=cold
orreboot=pci
or … -
@Sebastian-Roth Got it. What’s the formatting for adding more than one arg?
acpi=off,reboot=cold
acpi=off, reboot=cold
acpi=off;reboot=coldetc…
-
@greichelt said in Workstation does not reboot after imaging:
Regarding dnsmasq
First let me say its not technically necessary to remove the boot information because ProxyDHCP will override anything set in dhcp option 66 and 67, but for the sanity of someone coming after you it will drive them nuts if you have things set and can’t figure out why its working the way it is. To your question remove all three fields and save it.
I bet if I switch it back I will not have a problem with it hanging. With that said, I will dig into the post install script and see what I can find.
One thing to point out is that uefi and bios system reboot differently in FOG. At least the exit to disk function does. For BIOS the default for fog should be SANBOOT, the default for uefi systems should be rEFInd. Make sure the global exit modes in FOG Configuration->FOG Settings are correct. Just expand all and search for exit or SANBOOT. Since this is your first run at uefi it may not be set correctly. But it would be good to see if you can narrow this down to a uefi issue or something else.
Regarding your lack of belief , before you said anything,
I’ve thought about this for a bit to see if I can understand how both of us can be right (because I know I’m right ). I’m betting when you switched the device to uefi mode the firmware detected a bios boot disk and internally switched it self back to bios mode (I need to test this to prove it). This is similar to when you are in uefi mode and have the dell legacy roms enabled, when you hit F12 during power up you can actually dynamically switch between bios and uefi in the menu. So now why with this 5500 its not working… My bet is this is the first hardware you have that is UEFI only. I know in our case the Dell 7400s was the first uefi only system we had (outside of the boat anchor Surface Books). So for the 5500s there is no bios fall back mode so they won’t boot. Understand I only know what I experience, that doesn’t really make me right or wrong, it falls under “It worked for me” category.
-
What’s the formatting for adding more than one arg?
try the fourth option of a space between the variables.
-
@george1421 I’d put money on you being right. If your theory is correct and it “internally switched it self back to bios mode”, there is no evidence of this in the BIOS; BIOS shows UEFI with legacy boot options disabled. I have an E5450 and a 5590 in front of me and both show UEFI yet they were imaged with the method I described. Furthermore, what lends credence to me thinking these are booting UEFI, we don’t see the Windows logo on boot like you do with a BIOS boot, we get the Dell logo with the hula-hoop of circles then the logon screen.
-
@george1421 @Sebastian-Roth thank you both so much. I am heading home for the night and will pick this up again tomorrow and report back. I have started toggling through the reboot= options, though I put them in the universal Kernel Args settings under FOG Configuration > FOG Settings > General Settings > KERNEL ARGS since the hang is happening to all of our active models. reboot=efi made the screen vomit and ended in “Fixing recursive fault but reboot is needed!” I’ll try some more options tomorrow to see what I get. I have provided a screenshot to better flesh out what I mean by vomit
-
@george1421 Let me start off by offering my apologies for sending you down a rabbit hole regarding BIOS/UEFI image capture. I looked through the procedure and found the images are being created in VMWare as UEFI, then the sysprep, shutdown, boot to BIOS, change to Legacy boot then capture. From there, we deploy to the physical workstation with legacy enabled, then switch to UEFI on first boot. So you are absolutely correct; our images are being created as UEFI. I was going by memory from a year ago and should not have wasted your time.
@george1421 @Sebastian-Roth I have begun toggling through the reboot= options because I won’t sleep tonight unless I have tried each one. So far reboot=warm and reboot=hard leaves the system hung at restarting system and reboot=efi giving an actual error as shown in the previous post.
-
@george1421 @Sebastian-Roth Well, I used all of the reboot= options and they all had the same results; HOWEVER, it lead me to remove acpi=off from KERNEL ARGS and it fixed the problem with it hanging after the post install script! So that one is fixed.
I am still having a problem with the 5500 stopping at the blue “Just a moment…” screen with the hula-hoop of circles spinning around. So far none of my other machines are doing this. After about 15 minutes, the screen goes black, so I thought it rebooted, but it turns out the laptop display just went to sleep. Gesturing on the touch pad wakes it up to “Just a moment…”. Alt+Tab produces no results. If I hard power off from here, the workstation does boot into the OS, but SetupComplete.cmd never runs. I don’t have anything other than a network cable and a power adapter plugged in (non-USB-C) Any thoughts?
-
@greichelt said in Workstation does not reboot after imaging:
I am still having a problem with the 5500 stopping at the blue “Just a moment…”
What drivers are you installing in your base (golden) image? I’m wondering if there is a Windows driver that is almost compatible but not really. If you are using a post install script and the pnputil program to inject the drivers into the image, that would come at almost the end of the OOBE process. Where its hanging I believe is at the very beginning of OOBE. But I don’t have any basis to say. With sata drives, when it would hang like this, I would pop the sata drive out and add it in as a second hard drive in another computer. In that second computer I would look at the log files in the c:\windows\panther directory to see where it was hanging. Not sure how to do that with nvme drives.
-
@greichelt Great to hear you were able to figure out what was causing the reboot hang on the machine. Maybe you can put in the kernel parameter
acpi=off
only for the machines that really need it. Probably someone added it for a good reason. But doing this as a general option caused some/most to hang…Would you mind opening a new topic for the hang on first boot issue? We try to keep things a bit organized so other people find answers more quickly and might be able to help themselves. I will mark this one as solved now. If you open a new one, I can move your last to posts over to the new one.
-
To elaborate, I recall acpi=off being useful for rather old machines (who’d often have buggy implementations), but the exact opposite for newer machines who rely upon ACPI.
-
@Sebastian-Roth Doing it now. Thanks again.
-
@george1421 George, at @Sebastian-Roth 's request, I am opening a new topic for this. To answer your question, we install no drivers in our golden image, not even VMware tools. Whatever drivers are installed are included in the Windows 10 1803 base install.