WOL Firmware Linux Kernel Breaks - Power pulled or boot to windows fixes
-
@Sebastian-Roth Done. Network card was shutdown too
-
@Gael This is very troubling since the issue (via process of elimination) appearing to be a FOG Server created problem. Which I’m still not convinced is possible. The only possibility I can see is if the updated FOS Linux kernel in 1.5.7 its leaving the hardware in a strange state, but a power removal should reset anything that FOS Linux would leave behind. I guess you could test by grabbing bzImage and init.xz from the 1.5.2 fog server and temporarily installing them on the 1.5.7 fog server. Those are the only thing from the fog site that actually touches the target hardware.
I have a 3050 around here somewhere, I can try deploying a copy of 1903 or 1809 to that system to see if we have similar results. But if vPro is on and fast windows startup is disabled it should leave the network adapter powered on. I’m pretty sure that the 3050s we have here do have vPro installed.
-
@george1421 Sorry typo, Optiplex 3060 not 3050
-
@Gael Hey one of the senior FOG developers just chatted with me on this issue. So we have a few more things to try here.
First of all on the 1.5.2 version of the fog server do the following.
cd /var/www/html/fog/service/ipxe` file bzImage
This will tell us the linux kernel version that was shipped with FOG 1.5.2
Now scp that file to the new fog server into the correct path. Power off (unplug) the target computer then pxe boot and image with the image you copied from the 1.5.2 version of the fog server.
See if the system responds differently?
The logic here is depending on the nic used, the linux kernel may load (patch) firmware onto the nic controller that would be persistent between system reboots, but would be erased after a power outage. With the different versions of the linux kernel its possible different firmware could be loaded causing the discrepancy.
Once all of the other things have been tested out (so we are not changing a bunch of things all at once), make sure the firmware (bios) is updated on the 3060 just in case that is causing the root of the issue.
-
Ok, i will do that.
But for information, another test i have juste done:
I only asked fog 1.5.7 to capture a working dell optiplex 3060 (without sysprep or anythink else), after capture was done i shutdown computer and: the network card was shutdown!
-
1.5.2 :
file bzImage
bzImage: Linux kernel x86 boot executable bzImage, version 4.19.1 (jenkins-agent@Tollana) #1 SMP Mon Nov 12 18:23:08 CST 2018, RO-rootFS, swap_dev 0x8, Normal VGA
file bzImage32
bzImage32: Linux kernel x86 boot executable bzImage, version 4.15.2 (builder@c38bc0acaeb4) #5 SMP Tue Feb 13 18:32:54 UTC 2018, RO-rootFS, swap_dev 0x7, Normal VGA1.5.7 :
file bzImage
bzImage: Linux kernel x86 boot executable bzImage, version 5.1.16 (sebastian@Tollana) #2 SMP Wed Aug 28 17:12:41 CDT 2019, RO-rootFS, swap_dev 0x8, Normal VGA
file bzImage32
bzImage32: Linux kernel x86 boot executable bzImage, version 4.19.64 (jenkins-agent@Tollana) #1 SMP Mon Aug 5 09:59:10 CDT 2019, RO-rootFS, swap_dev 0x7, Normal VGA -
@george1421 That’s work!
-
@Gael Ok, now we are sure it’s a FOS kernel issue causing this, probably some seldom issue with firmware on your NIC model. Can you tel us which NIC exactly you have (windows device manager hardware IDs)?
-
@Gael See we also learned something that we didn’t know before, not only did you install 1.5.7, but you updated the FOS Linux kernel to version 5.x.x, where 1.5.7 should be shipped with linux-4.19.64 or there about. The 5.x.x kernels really haven’t been tested well enough and imo should not be used until we’ve had more field experience with them.
So what happens if you roll the fos linux kernel back to 4.19.64? Does the behavior correct itself? Rolling back to the fos linux 4.19.1 seems to have done the trick.
From windows can you get us the vendor ID and hardware ID from the device manager for this NIC as Sebastian suggested?
-
@george1421 Contrôleur Realtek PCIe GBE Family
PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028&REV_15\4&1285CEFC&0&00E0
PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028&REV_15
PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028
PCI\VEN_10EC&DEV_8168&CC_020000
PCI\VEN_10EC&DEV_8168&CC_0200
PCI\VEN_10EC&DEV_8168&REV_15
PCI\VEN_10EC&DEV_8168
PCI\VEN_10EC&CC_020000
PCI\VEN_10EC&CC_0200
PCI\VEN_10EC
PCI\CC_020000
PCI\CC_0200 -
@george1421 Very sorry to forgot tell the kernel update part.
We updated to 5.1.16 because we will shortly receive new DELL optiplex 3070 that have NvMe SSD. -
@Gael So at this point you can make a decision.
You have kernels 5.x and 4.x. Pick one to keep as
bzImage
(assuming 5.x). Rename the other asbzImage4.19.64
. Then for all of the 3060s you have on your campus add inbzImage4.19.64
into the kernel field on the host definition for each system. That way when you deploy to the 3060s it will use the older kernel. If you have a lot of machines you can use the fog group setting to change all in the group all at once.For reference the device [10EC:8168] is a Realtek 8169/8168/8101/8125 nic.
-
This post is deleted! -
-
@Gael et al.
I’m post a more descriptive post to summarize what the issue is. I don’t know how to fix (so I’ll start there.)
TL;DR;
Linux Kernel is putting a volatile Firmware on the NIC. This happens when FOS loads and the kernel begins associating the drivers. On restart, the firmware is still existing on the NIC from the Kernel. When Windows Boots, it re-flashes the volatile firmware so subsequent elements will work. Or a full power pull will do too (completely cold boot.)
Basis:
This particular issue, is due to Linux Kernel having a firmware defined for the NIC. This is volatile. This means when power is pulled, the firmware will no longer be present and normal actions will work properly.
While the machine is in FOS, the linux kernel hands it a temporary Firmware File and this is what’s causing the strangeness with the NIC.
Pulling the power cord causes the firmware to wipe. Similarly, if booting to Windows immediately after, and then powering off the machine, it should WOL. This is because Windows has a FW being applied when it loads, overwriting whatever the Linux Kernel pushed.