Lenovo ThinkCentre M70a
-
@dmaret Well the question in my mind is where is it failing so we know where to focus.
So the simple question is Are you seeing the FOG iPXE menu? If yes then you can rule out any ipxe related kernels.
Now if its failing in FOS Linux (things that run after you pick something from the iPXE menu) then we need to look into fos linux.
I have a one-off kernel that have the updated realtek network drivers in if the main stream kernels don’t work for your specific hardware. https://drive.google.com/file/d/1O-4tvx4DywWef75qfSxLRK9M6CoDS9pd/view?usp=sharing
Download that file as bzImage593RT3 (case is important) and save it into
/var/www/html/fog/service/ipxe
on the FOG server. Then for the test, go into the web ui ->FOG Configuratio ->FOG Settings click on the expand all button and search the page for bzImage In the field with bzImage change the content to bzImage593RT3Save the setting then pxe boot the target computer. See if it images with that kernel.
-
@dmaret The messages in the picture posted are new to me. Sounds like a Linux kernel network driver issue. Have you tried searching the web yet?
Trying George’s kernel with the updated Realtek driver is definitely worth a try.
-
Hi George and Sebastian,
And thank you to both of you for your answers.
Yes I do see the FOG menu. And I am able to choose “Deploy” and select my image and launch the deployment.
To save time during my testing I have registered the host and created a deploy task, so now it automatically picks up the deploy task when booting to the FOG server.
I have downloaded George’s file, placed it in the right folder and updated FOG settings accordingly, and tried again. Unfortunately it did not change anything.
I have registered the whole boot process in a short video here, you can see that it is picking up the new bzImage593RT3 file:
https://drive.google.com/file/d/1eDFfBaxTpwI5k-HbL3cbFR_fnhK0AnvG/view?usp=sharingAs you can see, at the end of the video, the % suddenly stops increasing at 1.73%, and eventually after a few minutes, I get the same error message like the one I already sent. Here is a picture of the one I got this morning:
https://drive.google.com/file/d/1Cc7Oq7r9ukrrg1NtJB7hiS3O_eZh3gDm/view?usp=sharingIt does not always stop at the same stage, sometimes I get the error message even before getting to the partclone screen, sometimes it works long enough to let the imaging go up to 15 or 20% before stopping. But eventually I always get the same error message.
Other models I have get imaged without issue (I have Lenovo and Dell computers). They are on the same network/vlan, I even tried the exact same network port just to be sure.
I have spent hours searching the web I the last few days, but have not been able to make any progress.
Thank you again very much.
David.
-
@dmaret Possibly we are missing just one particular firmware blob for the NIC you have in our kernel. Nur sure yet but a quick lock at our kernel config I find 12 times “rtl8168” while I find 13 times in the kernel firmware repo. We’ll compile a fresh 5.10.19 kernel soon to see if that helps.
Edit: We are missing
rtl8168fp-3.fw
-
@dmaret I’m also finding posts about this nic just turning off/powering down during file transfer the nic fails to connect to its phy device.
I have found some kernel switch that might work, but others say no.
pcie_aspm=off
iommu=soft
You could run your deployment in debug mode and when partclone throws the error instead of rebooting you could check to see if the network is gone as well as search the /var/log/messages log file to see if there is any clue, but this error appears to be common for this realtek nic.
Lastly have you updated the firmware on this computer? Is it the latest?
-
@george1421 Hello George,
I am not sure where to find and update these switches. Could you please elaborate?
Also I did update the BIOS to latest available version on this computer:
M2SKT1FA 10 Mar 2021Is there anything else I should try to update?
I will try to run again in debug mode as you suggest.
Thank you again.
David.
-
@dmaret In the fog configuration near where you found bzImage field, there is one called kernel args. You would place
pcie_aspm=off
in that field. Understand this is the global setting space so that will apply that kernel arg to every deployment. But for now we are just testing. When we are done testing you will need to reset these values.The more research I do the more its leaning towards a linux driver conflict with the hardware. One recommendation I found was to switch back to 4.15.x kernel. You can get older FOG kernels from here: https://fogproject.org/kernels/ I would download and rename the 64 bit 4.15.3 as bzImage4153 and then move it to the
/var/www/html/fog/service/ipxe
directory then update the global kernel parameter to bzImage4153 and test your deployment. The linux kernel developers change how the driver works for the realtek nic after the 4.15.x series of linux kernels as well as the 4.9.x series. But that is getting back to a pretty old version of the linux kernel. Just be aware that the newer (ish) hardware will not run on these old kernels. Right now we are trying to solve the problem with this lenovo system. We will need to reset everything to put FOG back to normal. -
@george1421 Hi George,
I have added pcie_aspm=off to the kernel args field but it did not change anything.
However I could not find 4.15.3 but I found 4.15.2 and tried with this one, and it worked! It was going at a slower rate but it went all the way and the imaging completed successfully!
David.
-
@dmaret How old is this lenovo?
Now that you have a working, but slower kernel. In the host definition for this lenovo you can hard code this 4.15.2 special kernel right into that host definition. So every time that computer needs imaging it will use that 4.15.2 kernel. I don’t know if the fog project devs will be able to fix this since its a linux kernel change / hardware conflict causing the issue.
You could try this kernel parameter instead, but I don’t have high hopes that it will work, but you can try
acpi=off
The error seems to relate to a component of the network interface going to sleep or loosing communication inside the nic. I have found some references to updating the NIC firmware, but if this nic is built into the mobo of the computer, the bios / firmware update should take care of that for you.
-
@dmaret @george1421 Interesting findings! Just wondering if you think the mentioned firmware blob could make a difference or not? Shall I add it?
-
@george1421 Hi,
This Lenovo is brand new.
M70a Desktop (ThinkCentre) - Type 11CK
Machine Type Model: 11CKS03900For now, I will try testing my way up the different kernel versions and see what is the most recent kernel version that works with this model and hard code that one into the host definition as you suggest.
Thank you again so much for your help.
David.
-
@sebastian-roth We can surely try it. But I would think the nic wouldn’t work if it needed the firmware. BUT that may be the nic firmware that is missing/needs to be patched. I guess what I’m saying is we should try it to see if it fixes the problem with the current kernel. We know now rolling back to 4.15 also fixes it, but that is not a good long term strategy.
-
@dmaret Please give this kernel a try: https://fogproject.org/kernels/bzImage-5.10.19-rtl8168fp-3 (not saying it does help but will be interesting to see if adding the firmware blob makes a difference)
-
@sebastian-roth Hi,
I have tried but it does not resolve the issue.
So since I successfully imaged with 4.15.2, I have also tested the following successfully: 4.16.6, 4.17.0, 4.18.3.
And I will continue to try to identify the most recent version which works with this Lenovo model.
Thanks again.
David.
-
@dmaret said in Lenovo ThinkCentre M70a:
So since I successfully imaged with 4.15.2, I have also tested the following successfully: 4.16.6, 4.17.0, 4.18.3.
Wow, didn’t expect that! Good to know. Are they all going full speed or kind of slow as you said with 4.15.2?
-
@sebastian-roth I think 4.16.6 was also slow, but the following were back to usual rate. 4.19.1 and 4.19.6 do not seem to work.
David.
-
@dmaret Do I get this right, 4.18.3 (and maybe 4.17.0) is the best candidate we have so far. Error not happening and normal speed?!
Did you test 4.18.11 yet? The closer we get the more chance we find what change in the kernel is causing this and we might be able to provide a patched up to date kernel.
-
@sebastian-roth Hi Sebastian,
Let me recap my findings:
- 4.15.2: working, slow
- 4.16.6: working, slow
- 4.17.0: working, normal speed
- 4.18.3: working, normal speed
- 4.18.11: working, normal speed
- 4.19.1: not working
- 4.19.6: not working
- 4.19.36: not working
NB: I am using realtek.pxe
Thanks.
David.
-
@dmaret said in Lenovo ThinkCentre M70a:
NB: I am using realtek.pxe
So this computer is in BIOS mode? Does undionly.kpxe work for the boot loader or will only the realtek.pxe get this system to the iPXE menu?
-
@george1421 Hi George,
This computer does not offer Legacy BIOS, only UEFI. It does not work with undionly.kpxe
There was a typo in my previous message, I actually use realtek.efi not realtek.pxe
Thank you.
David.