• Moderator

    @sebastian-roth We can surely try it. But I would think the nic wouldn’t work if it needed the firmware. BUT that may be the nic firmware that is missing/needs to be patched. I guess what I’m saying is we should try it to see if it fixes the problem with the current kernel. We know now rolling back to 4.15 also fixes it, but that is not a good long term strategy.


  • @george1421 Hi,

    This Lenovo is brand new.
    M70a Desktop (ThinkCentre) - Type 11CK
    Machine Type Model: 11CKS03900

    For now, I will try testing my way up the different kernel versions and see what is the most recent kernel version that works with this model and hard code that one into the host definition as you suggest.

    Thank you again so much for your help.

    David.

  • Senior Developer

    @dmaret @george1421 Interesting findings! Just wondering if you think the mentioned firmware blob could make a difference or not? Shall I add it?

  • Moderator

    @dmaret How old is this lenovo?

    Now that you have a working, but slower kernel. In the host definition for this lenovo you can hard code this 4.15.2 special kernel right into that host definition. So every time that computer needs imaging it will use that 4.15.2 kernel. I don’t know if the fog project devs will be able to fix this since its a linux kernel change / hardware conflict causing the issue.

    You could try this kernel parameter instead, but I don’t have high hopes that it will work, but you can try acpi=off

    The error seems to relate to a component of the network interface going to sleep or loosing communication inside the nic. I have found some references to updating the NIC firmware, but if this nic is built into the mobo of the computer, the bios / firmware update should take care of that for you.


  • @george1421 Hi George,

    I have added pcie_aspm=off to the kernel args field but it did not change anything.

    However I could not find 4.15.3 but I found 4.15.2 and tried with this one, and it worked! It was going at a slower rate but it went all the way and the imaging completed successfully!

    David.

  • Moderator

    @dmaret In the fog configuration near where you found bzImage field, there is one called kernel args. You would place pcie_aspm=off in that field. Understand this is the global setting space so that will apply that kernel arg to every deployment. But for now we are just testing. When we are done testing you will need to reset these values.

    The more research I do the more its leaning towards a linux driver conflict with the hardware. One recommendation I found was to switch back to 4.15.x kernel. You can get older FOG kernels from here: https://fogproject.org/kernels/ I would download and rename the 64 bit 4.15.3 as bzImage4153 and then move it to the /var/www/html/fog/service/ipxe directory then update the global kernel parameter to bzImage4153 and test your deployment. The linux kernel developers change how the driver works for the realtek nic after the 4.15.x series of linux kernels as well as the 4.9.x series. But that is getting back to a pretty old version of the linux kernel. Just be aware that the newer (ish) hardware will not run on these old kernels. Right now we are trying to solve the problem with this lenovo system. We will need to reset everything to put FOG back to normal.


  • @george1421 Hello George,

    I am not sure where to find and update these switches. Could you please elaborate?

    Also I did update the BIOS to latest available version on this computer:
    M2SKT1FA 10 Mar 2021

    Is there anything else I should try to update?

    I will try to run again in debug mode as you suggest.

    Thank you again.

    David.

  • Moderator

    @dmaret I’m also finding posts about this nic just turning off/powering down during file transfer the nic fails to connect to its phy device.

    I have found some kernel switch that might work, but others say no.

    pcie_aspm=off

    iommu=soft

    You could run your deployment in debug mode and when partclone throws the error instead of rebooting you could check to see if the network is gone as well as search the /var/log/messages log file to see if there is any clue, but this error appears to be common for this realtek nic.

    Lastly have you updated the firmware on this computer? Is it the latest?

  • Senior Developer

    @dmaret Possibly we are missing just one particular firmware blob for the NIC you have in our kernel. Nur sure yet but a quick lock at our kernel config I find 12 times “rtl8168” while I find 13 times in the kernel firmware repo. We’ll compile a fresh 5.10.19 kernel soon to see if that helps.

    Edit: We are missing rtl8168fp-3.fw


  • Hi George and Sebastian,

    And thank you to both of you for your answers.

    Yes I do see the FOG menu. And I am able to choose “Deploy” and select my image and launch the deployment.

    To save time during my testing I have registered the host and created a deploy task, so now it automatically picks up the deploy task when booting to the FOG server.

    I have downloaded George’s file, placed it in the right folder and updated FOG settings accordingly, and tried again. Unfortunately it did not change anything.

    I have registered the whole boot process in a short video here, you can see that it is picking up the new bzImage593RT3 file:
    https://drive.google.com/file/d/1eDFfBaxTpwI5k-HbL3cbFR_fnhK0AnvG/view?usp=sharing

    As you can see, at the end of the video, the % suddenly stops increasing at 1.73%, and eventually after a few minutes, I get the same error message like the one I already sent. Here is a picture of the one I got this morning:
    https://drive.google.com/file/d/1Cc7Oq7r9ukrrg1NtJB7hiS3O_eZh3gDm/view?usp=sharing

    It does not always stop at the same stage, sometimes I get the error message even before getting to the partclone screen, sometimes it works long enough to let the imaging go up to 15 or 20% before stopping. But eventually I always get the same error message.

    Other models I have get imaged without issue (I have Lenovo and Dell computers). They are on the same network/vlan, I even tried the exact same network port just to be sure.

    I have spent hours searching the web I the last few days, but have not been able to make any progress.

    Thank you again very much.

    David.

  • Senior Developer

    @dmaret The messages in the picture posted are new to me. Sounds like a Linux kernel network driver issue. Have you tried searching the web yet?

    Trying George’s kernel with the updated Realtek driver is definitely worth a try.

  • Moderator

    @dmaret Well the question in my mind is where is it failing so we know where to focus.

    So the simple question is Are you seeing the FOG iPXE menu? If yes then you can rule out any ipxe related kernels.

    Now if its failing in FOS Linux (things that run after you pick something from the iPXE menu) then we need to look into fos linux.

    I have a one-off kernel that have the updated realtek network drivers in if the main stream kernels don’t work for your specific hardware. https://drive.google.com/file/d/1O-4tvx4DywWef75qfSxLRK9M6CoDS9pd/view?usp=sharing

    Download that file as bzImage593RT3 (case is important) and save it into /var/www/html/fog/service/ipxe on the FOG server. Then for the test, go into the web ui ->FOG Configuratio ->FOG Settings click on the expand all button and search the page for bzImage In the field with bzImage change the content to bzImage593RT3

    Save the setting then pxe boot the target computer. See if it images with that kernel.

299
Online

8.3k
Users

15.1k
Topics

142.0k
Posts