WOL Firmware Linux Kernel Breaks - Power pulled or boot to windows fixes



  • I have a problem that I do not explain.

    I have Dell Optiplex 3060 computers.
    I have a Windows 10 master. image

    Since I updated FOG from 1.5.2 to 1.5.7, on each PC I remastered, Windows turn off the network card when it shut down, preventing further WOL.
    This is Windows because if I remove the power and I put it back, the network card is turned on (so setting bios out of cause).
    You will tell me that it is my image, what I thought as well except that here are the facts: (checked several times to be certain):

    I restored from the backup my FOG while it was still in 1.5.2
    I captured back my master image updated last week.

    I therefore have:
    FOG 1.5.2 with:
    -the master image of January (currently deployed everywhere)
    -the master image updated last week

    From a FOG 1.5.7 with:
    -the master image of January (currently deployed everywhere)
    -the master image updated last week

    If I deploy any of my 2 masters images from the fog 1.5.2 the network card remains on after shutting down Windows.
    If I deploy any of my 2 masters images from the fog 1.5.7 the network card turns off after shutting down Windows

    The problem does not come from the image, nor the bios or AD GPO.
    I do not understand how the way used to deploy the same image can affect how Windows turns off the machine …

    If you have an idea?

    Thank you


  • Senior Developer

    @Gael et al.

    I’m post a more descriptive post to summarize what the issue is. I don’t know how to fix (so I’ll start there.)

    TL;DR;

    Linux Kernel is putting a volatile Firmware on the NIC. This happens when FOS loads and the kernel begins associating the drivers. On restart, the firmware is still existing on the NIC from the Kernel. When Windows Boots, it re-flashes the volatile firmware so subsequent elements will work. Or a full power pull will do too (completely cold boot.)

    Basis:

    This particular issue, is due to Linux Kernel having a firmware defined for the NIC. This is volatile. This means when power is pulled, the firmware will no longer be present and normal actions will work properly.

    While the machine is in FOS, the linux kernel hands it a temporary Firmware File and this is what’s causing the strangeness with the NIC.

    Pulling the power cord causes the firmware to wipe. Similarly, if booting to Windows immediately after, and then powering off the machine, it should WOL. This is because Windows has a FW being applied when it loads, overwriting whatever the Linux Kernel pushed.



  • @george1421 @Sebastian-Roth

    Ok,

    All is ok with kernel 4.19.64

    Very thank for you help :-)



  • This post is deleted!

  • Moderator

    @Gael So at this point you can make a decision.

    You have kernels 5.x and 4.x. Pick one to keep as bzImage (assuming 5.x). Rename the other as bzImage4.19.64. Then for all of the 3060s you have on your campus add in bzImage4.19.64 into the kernel field on the host definition for each system. That way when you deploy to the 3060s it will use the older kernel. If you have a lot of machines you can use the fog group setting to change all in the group all at once.

    For reference the device [10EC:8168] is a Realtek 8169/8168/8101/8125 nic.



  • @george1421 Very sorry to forgot tell the kernel update part.
    We updated to 5.1.16 because we will shortly receive new DELL optiplex 3070 that have NvMe SSD.



  • @george1421 Contrôleur Realtek PCIe GBE Family
    PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028&REV_15\4&1285CEFC&0&00E0
    PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028&REV_15
    PCI\VEN_10EC&DEV_8168&SUBSYS_085C1028
    PCI\VEN_10EC&DEV_8168&CC_020000
    PCI\VEN_10EC&DEV_8168&CC_0200
    PCI\VEN_10EC&DEV_8168&REV_15
    PCI\VEN_10EC&DEV_8168
    PCI\VEN_10EC&CC_020000
    PCI\VEN_10EC&CC_0200
    PCI\VEN_10EC
    PCI\CC_020000
    PCI\CC_0200


  • Moderator

    @Gael See we also learned something that we didn’t know before, not only did you install 1.5.7, but you updated the FOS Linux kernel to version 5.x.x, where 1.5.7 should be shipped with linux-4.19.64 or there about. The 5.x.x kernels really haven’t been tested well enough and imo should not be used until we’ve had more field experience with them.

    So what happens if you roll the fos linux kernel back to 4.19.64? Does the behavior correct itself? Rolling back to the fos linux 4.19.1 seems to have done the trick.

    From windows can you get us the vendor ID and hardware ID from the device manager for this NIC as Sebastian suggested?


  • Developer

    @Gael Ok, now we are sure it’s a FOS kernel issue causing this, probably some seldom issue with firmware on your NIC model. Can you tel us which NIC exactly you have (windows device manager hardware IDs)?



  • @george1421 That’s work! :-)



  • 1.5.2 :
    file bzImage
    bzImage: Linux kernel x86 boot executable bzImage, version 4.19.1 (jenkins-agent@Tollana) #1 SMP Mon Nov 12 18:23:08 CST 2018, RO-rootFS, swap_dev 0x8, Normal VGA
    file bzImage32
    bzImage32: Linux kernel x86 boot executable bzImage, version 4.15.2 (builder@c38bc0acaeb4) #5 SMP Tue Feb 13 18:32:54 UTC 2018, RO-rootFS, swap_dev 0x7, Normal VGA

    1.5.7 :
    file bzImage
    bzImage: Linux kernel x86 boot executable bzImage, version 5.1.16 (sebastian@Tollana) #2 SMP Wed Aug 28 17:12:41 CDT 2019, RO-rootFS, swap_dev 0x8, Normal VGA
    file bzImage32
    bzImage32: Linux kernel x86 boot executable bzImage, version 4.19.64 (jenkins-agent@Tollana) #1 SMP Mon Aug 5 09:59:10 CDT 2019, RO-rootFS, swap_dev 0x7, Normal VGA



  • Ok, i will do that.

    But for information, another test i have juste done:

    I only asked fog 1.5.7 to capture a working dell optiplex 3060 (without sysprep or anythink else), after capture was done i shutdown computer and: the network card was shutdown!


  • Moderator

    @Gael Hey one of the senior FOG developers just chatted with me on this issue. So we have a few more things to try here.

    First of all on the 1.5.2 version of the fog server do the following.

    cd /var/www/html/fog/service/ipxe`
    file bzImage
    

    This will tell us the linux kernel version that was shipped with FOG 1.5.2

    Now scp that file to the new fog server into the correct path. Power off (unplug) the target computer then pxe boot and image with the image you copied from the 1.5.2 version of the fog server.

    See if the system responds differently?

    The logic here is depending on the nic used, the linux kernel may load (patch) firmware onto the nic controller that would be persistent between system reboots, but would be erased after a power outage. With the different versions of the linux kernel its possible different firmware could be loaded causing the discrepancy.

    Once all of the other things have been tested out (so we are not changing a bunch of things all at once), make sure the firmware (bios) is updated on the 3060 just in case that is causing the root of the issue.



  • @george1421 Sorry typo, Optiplex 3060 not 3050


  • Moderator

    @Gael This is very troubling since the issue (via process of elimination) appearing to be a FOG Server created problem. Which I’m still not convinced is possible. The only possibility I can see is if the updated FOS Linux kernel in 1.5.7 its leaving the hardware in a strange state, but a power removal should reset anything that FOS Linux would leave behind. I guess you could test by grabbing bzImage and init.xz from the 1.5.2 fog server and temporarily installing them on the 1.5.7 fog server. Those are the only thing from the fog site that actually touches the target hardware.

    I have a 3050 around here somewhere, I can try deploying a copy of 1903 or 1809 to that system to see if we have similar results. But if vPro is on and fast windows startup is disabled it should leave the network adapter powered on. I’m pretty sure that the 3050s we have here do have vPro installed.



  • @Sebastian-Roth Done. Network card was shutdown too :-(



  • This post is deleted!


  • I will scp image from old fog to new and report here the result.


  • Developer

    @Gael said in Incomprehensible network adapter issue that goes out since the update to Fog 1.5.7:

    then i deployed the image to a VM, then I captured it (without powering-on the image) with my Fog 1.5.7 server

    Not exactly what I asked you to do!


  • Moderator

    @Gael said in Incomprehensible network adapter issue that goes out since the update to Fog 1.5.7:

    because the network card is shutdown

    Two things come to mind.

    1. If the 3050 has vpro it should not power off the network card. vPro is for out of band management.
    2. Windows has the driver check box ticked to power off the network adapter to energy (or batter life, I can’t remember the exact term at the moment).

    I still have to think there is a difference in the image with the one working vs the one not. Are you sure no windows updates have been applied on the one not working?

    There is a (very) slight chance the fog client could have an impact if the version(s) are different. But I’m only saying its a very slight chance. I don’t think the fog client code has been updated since 1.5.2 but the difference is almost 1.5 years between releases.



424
Online

6.3k
Users

13.7k
Topics

128.9k
Posts