Partclone freezes or crawls after kernel update
-
I updated the fog kernel this morning to version 5.10.50 64 in order to cater for the new network drives for the HP 650 laptop. Since then it boots the up and runs all the way to launching partclone for deployment and then it gets stuck…!
Please can someone assist.
Many thanks
Duane -
@dashwell said in Partclone freezes or crawls after kernel update:
I updated the fog kernel this morning to version 5.10.50 64
Which kernel version did you use before? What happens when you switch back to the older kernel and try again?
Is this a unicast deploy or multicast?
-
@sebastian-roth Hi Sebastian,
Sorry the original bzImage is 4.19 and when I return to it, i get the original report that the drivers for the card aren’t loaded. I’ve just tried bzImage5.6, and it freezes in the same place. However if I run dmesg I get a ton of nfsd error where it is shutting down the socket.
Jul 28 14:00:06 fogdep in.tftpd[3689]: Client 192.168.101.86 finished snp.efi
Jul 28 14:00:26 fogdep in.tftpd[3714]: Client 192.168.101.86 finished default.ipxe
Jul 28 14:01:12 fogdep rpc.mountd[1432]: authenticated mount request from 192.168.101.86:795 for /images (/images)
Jul 28 14:01:14 fogdep rpc.mountd[1432]: authenticated mount request from 192.168.101.86:947 for /images (/images)
Jul 28 14:01:55 fogdep kernel: rpc-srv/tcp: nfsd: sent 5556 when sending 32896 bytes - shutting down socket -
@sebastian-roth Hi Sebastian,
I’ve created another VM, loaded it with the same details. Converted the kernel to 5.10 and I can see there is an issue with Partclone and NFSD.Jul 29 15:46:53 fog2 kernel: [17957.455297] rpc-srv/tcp: nfsd: sent only 31216 when sending 32900 bytes - shutting down socket
Jul 29 15:47:52 fog2 kernel: [18016.846003] rpc-srv/tcp: nfsd: sent only 31216 when sending 32900 bytes - shutting down socketfurther debug
root@fog2:/var/log# Jul 29 15:44:55 fog2 kernel: [17839.621117] nfsd: initializing export module (net: f000032d).
Jul 29 15:44:55 fog2 kernel: [17839.654019] nfsd: shutting down export module (net: f000032d).
Jul 29 15:44:55 fog2 kernel: [17839.654023] nfsd: export shutdown complete (net: f000032d).
Jul 29 15:44:55 fog2 kernel: [17839.655715] nfsd: initializing export module (net: f000032c).
Jul 29 15:44:55 fog2 kernel: [17839.697949] nfsd: shutting down export module (net: f000032c).
Jul 29 15:44:55 fog2 kernel: [17839.697954] nfsd: export shutdown complete (net: f000032c). -
@dashwell Well that’s interesting! The logs you posted come from the FOG server, right?! That would mean the FOG server NFS daemon is playing up. Could be caused by the FOS client kernel confusing the FOG server NFS or it could be just the NFS daemon having an issue by itself.
What Linux OS and version do you use as FOG server? Don’t think I’ve ever seen this before but we can try to replicate the issue if you let us know what exactly your setup looks like.
-
@sebastian-roth Hi Sebastian,
This occurred on both CENTOS 8 and UBUNTU LTS 18.04.
The laptops that we are attempting to clone are new HP 650’s. We upgraded the kernel to version 5.05 and 5.10 to try and get it to clone. We had been imaging HP 450’s before without an issue, and then when it reported a network driver not present on the clone to the 650 I did the upgrade.
What other details do you need, and I will gladly supply it.
-
@dashwell It’s very strange we don’t have anyone else in the forums report this obvious NFS error with newer kernels (5.10.x have been around for months now). Searching the web I can’t find a clear sign on what might be causing this either.
From what we have now I would think it’s a bug in the Linux kernel network driver. But first let’s make sure we are not racing down the wrong road! Please answer the following questions:
- What is you FOG server running on? Directly on hardware or as VM (which hypervisor and what version?) or even as a container (docker, Proxmox, …)?
- Can you deploy images to other machines (other than HP 650) without any issue?
- Are FOG server and the HP 650 laptop on the same subnet?
- What PCI IDs does the HP 650’s NIC have? Boot up one of the HP 650 laptops to Windows, open the device manager control and check the IDs in the NIC properties dialog (see this picture).
-
@sebastian-roth Hi Sebastian,
Sorry for only responding now. We’ve been dealing with Monthend items as well.- We are running the FOG server on VM. ESXI 6.5
- We’ve been deploying older images just fine except for one HP 450 laptop, but it is bottoming out on the grab of the BzImage. I’m tempted to try it on my laptop as well, but I will need to back up my laptop completely first.
- Both units are on a /24 subnet but separated into different networks. The same has been when we imaged 400 + units previously. The Fog server is running 192.168.250.142 and the client is on 192.168.101.87.
- I’m attaching an image of the PCI ID’s. These laptops have both wired and wireless. The wired is connected to the network and the wireless is at present not linked to any AP.
-
@dashwell said:
I’m attaching an image of the PCI ID’s.
Ok, NIC is
8086:15FC
- haha, searching the forums I just found that we already had someone with HP ProBook 640 G8 complaining about slow speed and there are two fixes available for this issue:- Plug in a random USB storage device (not kidding!)
- Enable “Storage Controller for VMD” under “System Settings” menu in the UEFI settings
Find all the details here: https://forums.fogproject.org/topic/15132/hp-probook-640-g8-imaging-extremely-slowly
-
Hi Sebastian,
I want to say thank you for all your assistance. You changes hit the nail right on the head. Thank you very much.
Kind regards
Duane -