Workstation does not reboot after imaging
-
@george1421 Regarding dnsmasq, you’ve talked me into it. My only question is on step 1 where you have us “remove pxe boot information from your router”. Under the Advanced tab, my SonicWALL has 3 fields (screenshot below). Which field should I remove?
Regarding the post install script, you kickstarted my memory. I built this setup a year ago, so I forgot. I thought the driver install portion was built-in to FOG, but I remember following one of your posts to get driver injection working (this one - https://forums.fogproject.org/topic/11126/using-fog-postinstall-scripts-for-windows-driver-injection-2017-ed). This has been working flawlessly for all of our 17 laptop desktop models using legacy boot and undionly.kpxe. The issue with it hanging at “Restarting system” only started today when I updated the Boot File line on my SonicWALL DHCP serverr from undionly.kpxe to ipxe.efi. I bet if I switch it back I will not have a problem with it hanging. With that said, I will dig into the post install script and see what I can find.
Regarding your lack of belief , before you said anything, I had no idea it was a known issue switching from legacy to UEFI after my imaging process. This is the process we came up with a year ago when we setup FOG. It always worked, so I never questioned it. I’m absolutely dumbfounded that we have never had an issue with it until now. I’m certainly not dismissing your recommendation, in fact, after setting up dnsmasq, I am going to switch my VMware VM to UEFI, recapture, and test.
-
@greichelt Not sure if this has been tried yet but there are a couple of different options for the kernel parameter
reboot=
that you can try: https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.txt#L4194 -
@Sebastian-Roth This one is a bit over my head. Where do I go to find and edit this file?
-
@greichelt It’s not a file. You can add kernel parameters to every host object in the web UI individually. Edit the host’s settings and set
reboot=...
in Host Kernel Arguments. Try things likereboot=warm
orreboot=cold
orreboot=pci
or … -
@Sebastian-Roth Got it. What’s the formatting for adding more than one arg?
acpi=off,reboot=cold
acpi=off, reboot=cold
acpi=off;reboot=coldetc…
-
@greichelt said in Workstation does not reboot after imaging:
Regarding dnsmasq
First let me say its not technically necessary to remove the boot information because ProxyDHCP will override anything set in dhcp option 66 and 67, but for the sanity of someone coming after you it will drive them nuts if you have things set and can’t figure out why its working the way it is. To your question remove all three fields and save it.
I bet if I switch it back I will not have a problem with it hanging. With that said, I will dig into the post install script and see what I can find.
One thing to point out is that uefi and bios system reboot differently in FOG. At least the exit to disk function does. For BIOS the default for fog should be SANBOOT, the default for uefi systems should be rEFInd. Make sure the global exit modes in FOG Configuration->FOG Settings are correct. Just expand all and search for exit or SANBOOT. Since this is your first run at uefi it may not be set correctly. But it would be good to see if you can narrow this down to a uefi issue or something else.
Regarding your lack of belief , before you said anything,
I’ve thought about this for a bit to see if I can understand how both of us can be right (because I know I’m right ). I’m betting when you switched the device to uefi mode the firmware detected a bios boot disk and internally switched it self back to bios mode (I need to test this to prove it). This is similar to when you are in uefi mode and have the dell legacy roms enabled, when you hit F12 during power up you can actually dynamically switch between bios and uefi in the menu. So now why with this 5500 its not working… My bet is this is the first hardware you have that is UEFI only. I know in our case the Dell 7400s was the first uefi only system we had (outside of the boat anchor Surface Books). So for the 5500s there is no bios fall back mode so they won’t boot. Understand I only know what I experience, that doesn’t really make me right or wrong, it falls under “It worked for me” category.
-
What’s the formatting for adding more than one arg?
try the fourth option of a space between the variables.
-
@george1421 I’d put money on you being right. If your theory is correct and it “internally switched it self back to bios mode”, there is no evidence of this in the BIOS; BIOS shows UEFI with legacy boot options disabled. I have an E5450 and a 5590 in front of me and both show UEFI yet they were imaged with the method I described. Furthermore, what lends credence to me thinking these are booting UEFI, we don’t see the Windows logo on boot like you do with a BIOS boot, we get the Dell logo with the hula-hoop of circles then the logon screen.
-
@george1421 @Sebastian-Roth thank you both so much. I am heading home for the night and will pick this up again tomorrow and report back. I have started toggling through the reboot= options, though I put them in the universal Kernel Args settings under FOG Configuration > FOG Settings > General Settings > KERNEL ARGS since the hang is happening to all of our active models. reboot=efi made the screen vomit and ended in “Fixing recursive fault but reboot is needed!” I’ll try some more options tomorrow to see what I get. I have provided a screenshot to better flesh out what I mean by vomit
-
@george1421 Let me start off by offering my apologies for sending you down a rabbit hole regarding BIOS/UEFI image capture. I looked through the procedure and found the images are being created in VMWare as UEFI, then the sysprep, shutdown, boot to BIOS, change to Legacy boot then capture. From there, we deploy to the physical workstation with legacy enabled, then switch to UEFI on first boot. So you are absolutely correct; our images are being created as UEFI. I was going by memory from a year ago and should not have wasted your time.
@george1421 @Sebastian-Roth I have begun toggling through the reboot= options because I won’t sleep tonight unless I have tried each one. So far reboot=warm and reboot=hard leaves the system hung at restarting system and reboot=efi giving an actual error as shown in the previous post.
-
@george1421 @Sebastian-Roth Well, I used all of the reboot= options and they all had the same results; HOWEVER, it lead me to remove acpi=off from KERNEL ARGS and it fixed the problem with it hanging after the post install script! So that one is fixed.
I am still having a problem with the 5500 stopping at the blue “Just a moment…” screen with the hula-hoop of circles spinning around. So far none of my other machines are doing this. After about 15 minutes, the screen goes black, so I thought it rebooted, but it turns out the laptop display just went to sleep. Gesturing on the touch pad wakes it up to “Just a moment…”. Alt+Tab produces no results. If I hard power off from here, the workstation does boot into the OS, but SetupComplete.cmd never runs. I don’t have anything other than a network cable and a power adapter plugged in (non-USB-C) Any thoughts?
-
@greichelt said in Workstation does not reboot after imaging:
I am still having a problem with the 5500 stopping at the blue “Just a moment…”
What drivers are you installing in your base (golden) image? I’m wondering if there is a Windows driver that is almost compatible but not really. If you are using a post install script and the pnputil program to inject the drivers into the image, that would come at almost the end of the OOBE process. Where its hanging I believe is at the very beginning of OOBE. But I don’t have any basis to say. With sata drives, when it would hang like this, I would pop the sata drive out and add it in as a second hard drive in another computer. In that second computer I would look at the log files in the c:\windows\panther directory to see where it was hanging. Not sure how to do that with nvme drives.
-
@greichelt Great to hear you were able to figure out what was causing the reboot hang on the machine. Maybe you can put in the kernel parameter
acpi=off
only for the machines that really need it. Probably someone added it for a good reason. But doing this as a general option caused some/most to hang…Would you mind opening a new topic for the hang on first boot issue? We try to keep things a bit organized so other people find answers more quickly and might be able to help themselves. I will mark this one as solved now. If you open a new one, I can move your last to posts over to the new one.
-
To elaborate, I recall acpi=off being useful for rather old machines (who’d often have buggy implementations), but the exact opposite for newer machines who rely upon ACPI.
-
@Sebastian-Roth Doing it now. Thanks again.
-
@george1421 George, at @Sebastian-Roth 's request, I am opening a new topic for this. To answer your question, we install no drivers in our golden image, not even VMware tools. Whatever drivers are installed are included in the Windows 10 1803 base install.
-
@george1421 I setup dnsmasq as you have suggested as I was not able to get my VMware VMs to EFI PXE; however, I am getting the as shown in the screenshot below when I legacy boot. EFI no longer functions on my working Dells. I am using Ubuntu 16.04. My FOG server is 172.31.0.2 (I updated all 5 entries in the script in the link you provided). I’m not sure where to go from here.
-
@greichelt This one is an interesting one because it looks like all of the bits are in the right spot. But something is off in the config file. No worries we can get this sorted pretty easily. I know the config file if used exactly from the tutorial works perfectly. There are some boot roms that need an additional tweak but I haven’t come across those systems in a few years.
There is two things I need.
- Post the entire ltsp.conf file here
- If the lstp.conf file doesn’t give us the clues I have another tutorial that we can capture the pxe boot process: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue I can either tell you how to decode this file with wireshark or you can upload it to a file share site and post the link here. Once I review the file you can take it down from the file share site.
-
@george1421 Thank you so much for the quick reply. I figured this part out. I ran sudo apt-get install dnsmasq and didn’t really pay attention to what was going on. It told me the latest version for LTS 16.04 was already installed, so I was still on 2.75. After a bunch of googlefoo, I check the version with dnsmasq -v and saw my error. I followed this - https://wiki.fogproject.org/wiki/index.php?title=ProxyDHCP_with_dnsmasq and installed 2.76. Now, it’s working! This unfortunately broke DNS lookups on my FOG server and I’m not sure how to fix it. I can ping 8.8.8.8 and get a response, but can’t nslookup.
-
@greichelt On your fog server itself, there is a config file
/etc/resolv.conf
that file should list your name servers (DNS servers) used for the fog server to do name lookups. A quick google-fu will show you the parameters needed for that file. Typically on ubuntu that file is managed by the network manager application. Just be aware of that because it may overwrite any settings you add. For the network manager application that is typically an application on the tool tray that deals with network configuration (sorry I’m a rhel guy, so I can’t give exact instructions for ubuntu).