WOL Firmware Linux Kernel Breaks - Power pulled or boot to windows fixes
-
@Gael said in Incomprehensible network adapter issue that goes out since the update to Fog 1.5.7:
(same image that is already running on the computer, also with faststartup disabled)
Can you please do me a favor?
- Start up your 1.5.2 server and save a copy of that image to an external drive
- Shut down the 1.5.2 server
- Startup the 1.5.7 server rename the image and move that from the external drive in place
- Deploy using that image
- Test WOL
-
@Sebastian-Roth I think I can say that I already did it because, at first, I thought that my updated master image was the problem. I recovered a copy of my Fog server before it was updated (and before my image was updated) from backup, then i deployed the image to a VM, then I captured it (without powering-on the image) with my Fog 1.5.7 server.
-
@Gael Is it possible to power on both fog servers at the same time? Maybe move the older fog server’s IP address? With both on at the same time, copy the image directly from the old fog server to the new fog server over nfs?
Like you I’m confused why the fog server would have any impact on windows unless the wol code on the new fog server is faulty for some reason. I still want to blame windows for some reason. When fog clones an image it doesn’t know anything (really) about the target OS. It only moves disk blocks from the fog server to the target computer. It doesn’t touch the target computer’s registry or drivers. After a power outage (really a reboot) there should be no trace that FOG was ever near the target computer. Also with both fog servers on you would have the chance to issue a wol from each server to compare the differences. -
@george1421 It’s not a WOL packet problem because the network card is shutdown (leds powered off)
Can the fog client have any impact? -
@Gael said in Incomprehensible network adapter issue that goes out since the update to Fog 1.5.7:
because the network card is shutdown
Two things come to mind.
- If the 3050 has vpro it should not power off the network card. vPro is for out of band management.
- Windows has the driver check box ticked to power off the network adapter to energy (or batter life, I can’t remember the exact term at the moment).
I still have to think there is a difference in the image with the one working vs the one not. Are you sure no windows updates have been applied on the one not working?
There is a (very) slight chance the fog client could have an impact if the version(s) are different. But I’m only saying its a very slight chance. I don’t think the fog client code has been updated since 1.5.2 but the difference is almost 1.5 years between releases.
-
@Gael said in Incomprehensible network adapter issue that goes out since the update to Fog 1.5.7:
then i deployed the image to a VM, then I captured it (without powering-on the image) with my Fog 1.5.7 server
Not exactly what I asked you to do!
-
I will scp image from old fog to new and report here the result.
-
This post is deleted! -
@Sebastian-Roth Done. Network card was shutdown too
-
@Gael This is very troubling since the issue (via process of elimination) appearing to be a FOG Server created problem. Which I’m still not convinced is possible. The only possibility I can see is if the updated FOS Linux kernel in 1.5.7 its leaving the hardware in a strange state, but a power removal should reset anything that FOS Linux would leave behind. I guess you could test by grabbing bzImage and init.xz from the 1.5.2 fog server and temporarily installing them on the 1.5.7 fog server. Those are the only thing from the fog site that actually touches the target hardware.
I have a 3050 around here somewhere, I can try deploying a copy of 1903 or 1809 to that system to see if we have similar results. But if vPro is on and fast windows startup is disabled it should leave the network adapter powered on. I’m pretty sure that the 3050s we have here do have vPro installed.
-
@george1421 Sorry typo, Optiplex 3060 not 3050
-
@Gael Hey one of the senior FOG developers just chatted with me on this issue. So we have a few more things to try here.
First of all on the 1.5.2 version of the fog server do the following.
cd /var/www/html/fog/service/ipxe` file bzImage
This will tell us the linux kernel version that was shipped with FOG 1.5.2
Now scp that file to the new fog server into the correct path. Power off (unplug) the target computer then pxe boot and image with the image you copied from the 1.5.2 version of the fog server.
See if the system responds differently?
The logic here is depending on the nic used, the linux kernel may load (patch) firmware onto the nic controller that would be persistent between system reboots, but would be erased after a power outage. With the different versions of the linux kernel its possible different firmware could be loaded causing the discrepancy.
Once all of the other things have been tested out (so we are not changing a bunch of things all at once), make sure the firmware (bios) is updated on the 3060 just in case that is causing the root of the issue.
-
Ok, i will do that.
But for information, another test i have juste done:
I only asked fog 1.5.7 to capture a working dell optiplex 3060 (without sysprep or anythink else), after capture was done i shutdown computer and: the network card was shutdown!
-
1.5.2 :
file bzImage
bzImage: Linux kernel x86 boot executable bzImage, version 4.19.1 (jenkins-agent@Tollana) #1 SMP Mon Nov 12 18:23:08 CST 2018, RO-rootFS, swap_dev 0x8, Normal VGA
file bzImage32
bzImage32: Linux kernel x86 boot executable bzImage, version 4.15.2 (builder@c38bc0acaeb4) #5 SMP Tue Feb 13 18:32:54 UTC 2018, RO-rootFS, swap_dev 0x7, Normal VGA1.5.7 :
file bzImage
bzImage: Linux kernel x86 boot executable bzImage, version 5.1.16 (sebastian@Tollana) #2 SMP Wed Aug 28 17:12:41 CDT 2019, RO-rootFS, swap_dev 0x8, Normal VGA
file bzImage32
bzImage32: Linux kernel x86 boot executable bzImage, version 4.19.64 (jenkins-agent@Tollana) #1 SMP Mon Aug 5 09:59:10 CDT 2019, RO-rootFS, swap_dev 0x7, Normal VGA -
@george1421 That’s work!
-
@Gael Ok, now we are sure it’s a FOS kernel issue causing this, probably some seldom issue with firmware on your NIC model. Can you tel us which NIC exactly you have (windows device manager hardware IDs)?
-
@Gael See we also learned something that we didn’t know before, not only did you install 1.5.7, but you updated the FOS Linux kernel to version 5.x.x, where 1.5.7 should be shipped with linux-4.19.64 or there about. The 5.x.x kernels really haven’t been tested well enough and imo should not be used until we’ve had more field experience with them.
So what happens if you roll the fos linux kernel back to 4.19.64? Does the behavior correct itself? Rolling back to the fos linux 4.19.1 seems to have done the trick.
From windows can you get us the vendor ID and hardware ID from the device manager for this NIC as Sebastian suggested?
-
@george1421 Contrôleur Realtek PCIe GBE Family
PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028&REV_15\4&1285CEFC&0&00E0
PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028&REV_15
PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028
PCI\VEN_10EC&DEV_8168&CC_020000
PCI\VEN_10EC&DEV_8168&CC_0200
PCI\VEN_10EC&DEV_8168&REV_15
PCI\VEN_10EC&DEV_8168
PCI\VEN_10EC&CC_020000
PCI\VEN_10EC&CC_0200
PCI\VEN_10EC
PCI\CC_020000
PCI\CC_0200 -
@george1421 Very sorry to forgot tell the kernel update part.
We updated to 5.1.16 because we will shortly receive new DELL optiplex 3070 that have NvMe SSD. -
@Gael So at this point you can make a decision.
You have kernels 5.x and 4.x. Pick one to keep as
bzImage
(assuming 5.x). Rename the other asbzImage4.19.64
. Then for all of the 3060s you have on your campus add inbzImage4.19.64
into the kernel field on the host definition for each system. That way when you deploy to the 3060s it will use the older kernel. If you have a lot of machines you can use the fog group setting to change all in the group all at once.For reference the device [10EC:8168] is a Realtek 8169/8168/8101/8125 nic.