Workstation does not reboot after imaging
I have been happily using FOG for about a year now with the legacy boot options. We just got some new Dell notebooks in that are UEFI only (Latitude 5500). All I had to do was change the boot file section on my SonicWALL to read “ipxe.efi” and my Dell laptops now boot to UEFI PXE without issue. I am having 2 problems. The first problem may be due to my unattend.xml or a Windows update issue in the image as I am getting the never-ending spinning dots at the “Just a Moment” blue screen. The second problem is all 3 notebook models I have tested (Latitude 5500, Latitude E5450, and Latitude 7212 Rugged) halt after the Partclone imaging process with the following message displayed on the screen:
In Progress * Rebooting system as task is complete
reboot: Restarting system
At first I though the “Just a Moment” hang problem I was having on the new 5500 was being caused by this refusal to reboot, like some of the imaging process was not quite done; however, when I imaged the 7212 I had no issues outside of it hanging with the same message after the Partclone imaging process. Maybe I’m just being impatient; I let the workstation sit at the Restarting system message for about 15 minutes before force reboot (FYI, after forcing reboot, the system does boot; however, the SetupComplete.cmd looks like it didn’t run i.e. device manager is a mess as if the drivers did not get installed, some applications I install with SetupComplete.cmd are not installed, etc.)
I am currently treating the “Just a Moment” issue as a problem with my unattend or Windows update problem as the 7212 system I was able to successfully image did not have this problem (it uses a more up to date golden image and unattend file). I have only mentioned it to be thorough. Here are some server version and some setting that may be pertinent. If you need anything else, please let me know.
Kernel 4.19.48 (also tried 4.19.64 and had the same issue)
Boot File is ipxe.efi
@greichelt On your fog server itself, there is a config file
/etc/resolv.confthat file should list your name servers (DNS servers) used for the fog server to do name lookups. A quick google-fu will show you the parameters needed for that file. Typically on ubuntu that file is managed by the network manager application. Just be aware of that because it may overwrite any settings you add. For the network manager application that is typically an application on the tool tray that deals with network configuration (sorry I’m a rhel guy, so I can’t give exact instructions for ubuntu).
@george1421 Thank you so much for the quick reply. I figured this part out. I ran sudo apt-get install dnsmasq and didn’t really pay attention to what was going on. It told me the latest version for LTS 16.04 was already installed, so I was still on 2.75. After a bunch of googlefoo, I check the version with dnsmasq -v and saw my error. I followed this - https://wiki.fogproject.org/wiki/index.php?title=ProxyDHCP_with_dnsmasq and installed 2.76. Now, it’s working! This unfortunately broke DNS lookups on my FOG server and I’m not sure how to fix it. I can ping 220.127.116.11 and get a response, but can’t nslookup.
@greichelt This one is an interesting one because it looks like all of the bits are in the right spot. But something is off in the config file. No worries we can get this sorted pretty easily. I know the config file if used exactly from the tutorial works perfectly. There are some boot roms that need an additional tweak but I haven’t come across those systems in a few years.
There is two things I need.
- Post the entire ltsp.conf file here
- If the lstp.conf file doesn’t give us the clues I have another tutorial that we can capture the pxe boot process: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue I can either tell you how to decode this file with wireshark or you can upload it to a file share site and post the link here. Once I review the file you can take it down from the file share site.
@george1421 I setup dnsmasq as you have suggested as I was not able to get my VMware VMs to EFI PXE; however, I am getting the as shown in the screenshot below when I legacy boot. EFI no longer functions on my working Dells. I am using Ubuntu 16.04. My FOG server is 172.31.0.2 (I updated all 5 entries in the script in the link you provided). I’m not sure where to go from here.
@george1421 George, at @Sebastian-Roth 's request, I am opening a new topic for this. To answer your question, we install no drivers in our golden image, not even VMware tools. Whatever drivers are installed are included in the Windows 10 1803 base install.
@Sebastian-Roth Doing it now. Thanks again.
To elaborate, I recall acpi=off being useful for rather old machines (who’d often have buggy implementations), but the exact opposite for newer machines who rely upon ACPI.
@greichelt Great to hear you were able to figure out what was causing the reboot hang on the machine. Maybe you can put in the kernel parameter
acpi=offonly for the machines that really need it. Probably someone added it for a good reason. But doing this as a general option caused some/most to hang…
Would you mind opening a new topic for the hang on first boot issue? We try to keep things a bit organized so other people find answers more quickly and might be able to help themselves. I will mark this one as solved now. If you open a new one, I can move your last to posts over to the new one.
I am still having a problem with the 5500 stopping at the blue “Just a moment…”
What drivers are you installing in your base (golden) image? I’m wondering if there is a Windows driver that is almost compatible but not really. If you are using a post install script and the pnputil program to inject the drivers into the image, that would come at almost the end of the OOBE process. Where its hanging I believe is at the very beginning of OOBE. But I don’t have any basis to say. With sata drives, when it would hang like this, I would pop the sata drive out and add it in as a second hard drive in another computer. In that second computer I would look at the log files in the c:\windows\panther directory to see where it was hanging. Not sure how to do that with nvme drives.
@george1421 @Sebastian-Roth Well, I used all of the reboot= options and they all had the same results; HOWEVER, it lead me to remove acpi=off from KERNEL ARGS and it fixed the problem with it hanging after the post install script! So that one is fixed.
I am still having a problem with the 5500 stopping at the blue “Just a moment…” screen with the hula-hoop of circles spinning around. So far none of my other machines are doing this. After about 15 minutes, the screen goes black, so I thought it rebooted, but it turns out the laptop display just went to sleep. Gesturing on the touch pad wakes it up to “Just a moment…”. Alt+Tab produces no results. If I hard power off from here, the workstation does boot into the OS, but SetupComplete.cmd never runs. I don’t have anything other than a network cable and a power adapter plugged in (non-USB-C) Any thoughts?
@george1421 Let me start off by offering my apologies for sending you down a rabbit hole regarding BIOS/UEFI image capture. I looked through the procedure and found the images are being created in VMWare as UEFI, then the sysprep, shutdown, boot to BIOS, change to Legacy boot then capture. From there, we deploy to the physical workstation with legacy enabled, then switch to UEFI on first boot. So you are absolutely correct; our images are being created as UEFI. I was going by memory from a year ago and should not have wasted your time.
@george1421 @Sebastian-Roth I have begun toggling through the reboot= options because I won’t sleep tonight unless I have tried each one. So far reboot=warm and reboot=hard leaves the system hung at restarting system and reboot=efi giving an actual error as shown in the previous post.
@george1421 @Sebastian-Roth thank you both so much. I am heading home for the night and will pick this up again tomorrow and report back. I have started toggling through the reboot= options, though I put them in the universal Kernel Args settings under FOG Configuration > FOG Settings > General Settings > KERNEL ARGS since the hang is happening to all of our active models. reboot=efi made the screen vomit and ended in “Fixing recursive fault but reboot is needed!” I’ll try some more options tomorrow to see what I get. I have provided a screenshot to better flesh out what I mean by vomit
@george1421 I’d put money on you being right. If your theory is correct and it “internally switched it self back to bios mode”, there is no evidence of this in the BIOS; BIOS shows UEFI with legacy boot options disabled. I have an E5450 and a 5590 in front of me and both show UEFI yet they were imaged with the method I described. Furthermore, what lends credence to me thinking these are booting UEFI, we don’t see the Windows logo on boot like you do with a BIOS boot, we get the Dell logo with the hula-hoop of circles then the logon screen.
What’s the formatting for adding more than one arg?
try the fourth option of a space between the variables.
First let me say its not technically necessary to remove the boot information because ProxyDHCP will override anything set in dhcp option 66 and 67, but for the sanity of someone coming after you it will drive them nuts if you have things set and can’t figure out why its working the way it is. To your question remove all three fields and save it.
I bet if I switch it back I will not have a problem with it hanging. With that said, I will dig into the post install script and see what I can find.
One thing to point out is that uefi and bios system reboot differently in FOG. At least the exit to disk function does. For BIOS the default for fog should be SANBOOT, the default for uefi systems should be rEFInd. Make sure the global exit modes in FOG Configuration->FOG Settings are correct. Just expand all and search for exit or SANBOOT. Since this is your first run at uefi it may not be set correctly. But it would be good to see if you can narrow this down to a uefi issue or something else.
Regarding your lack of belief , before you said anything,
I’ve thought about this for a bit to see if I can understand how both of us can be right (because I know I’m right ). I’m betting when you switched the device to uefi mode the firmware detected a bios boot disk and internally switched it self back to bios mode (I need to test this to prove it). This is similar to when you are in uefi mode and have the dell legacy roms enabled, when you hit F12 during power up you can actually dynamically switch between bios and uefi in the menu. So now why with this 5500 its not working… My bet is this is the first hardware you have that is UEFI only. I know in our case the Dell 7400s was the first uefi only system we had (outside of the boat anchor Surface Books). So for the 5500s there is no bios fall back mode so they won’t boot. Understand I only know what I experience, that doesn’t really make me right or wrong, it falls under “It worked for me” category.
@Sebastian-Roth Got it. What’s the formatting for adding more than one arg?
@greichelt It’s not a file. You can add kernel parameters to every host object in the web UI individually. Edit the host’s settings and set
reboot=...in Host Kernel Arguments. Try things like
@Sebastian-Roth This one is a bit over my head. Where do I go to find and edit this file?
@greichelt Not sure if this has been tried yet but there are a couple of different options for the kernel parameter
reboot=that you can try: https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.txt#L4194