Unsolved Kernel Panic - Unable to mount root fs on unknown-block
Recently we’ve started running in to issues when trying to deploy images to some machines. We get an error that a kernel panic was encountered and a message that there was an issue mounting the root fs on unknown-block. Details are below…
• 1 master and 10 storage nodes, all running Ubuntu 20.04.6 LTS
• Windows DHCP is in use, option 66 is set to our FOG master and bootfile name points to ipxe.efi (machines boot to the FOG options menu with no issue)
• Existing install was updated to 1.5.10 about a month ago (all nodes are on the same version and there were no errors during the upgrade process)
• Kernel version is 5.15.93 for both bzImage and bzImage32
• The drives in the physical machines have never had anything on them (brand new WD SSD’s installed by us) and the virtual machines are recreated using a new VHD each time
• The OS we’re attempting to deploy is a W10 LTSC image created using MDT
Machines we’ve seen the issue with so far:
• Intel NUC 10th gen (NUC10i3FNK)
• Citrix XenServer 8.2.1 virtual machines (used for testing of images prior to deployment)
Dell hardware seems to be unaffected by this. We have a variety of Dell machines in service ranging from 8-10 years old to brand new and we haven’t seen this on any of them.
Initially we were seeing an error pop up after selecting an option in the FOG menu, stating “Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)”. Setting the kernel log level to 7 we received the following output. We have tried kernel versions 6.1.22, 5.15.98, 5.15.93, and 5.15.68. We attempted to use 4.19.145 from FOG 1.5.9 but got an error that the kernel was too old.
Interestingly enough using the exact same hardware, changing nothing between attempts, if we just keep trying to deploy the image it will eventually work (can take anywhere from 2 to 5+ attempts but it eventually does go). We get this error using the full registration, quick registration, deploy image, and compatibility check options in the FOG menu.
If you have any suggestions it would be greatly appreciated. Just let me know if you need any more information. I’m in the process of getting a new FOG instance spun up just to rule out any possible issues with our existing setup.
Sorry, realized the image I attached initially didn’t line up with what was said about the drives being clean. Here’s another example where there are no existing partitions (the issue occurs in both cases regardless). In the case of the example below, after this was encountered without changing anything, simply rebooting the machine a couple times it eventually worked.
@Can-eh-dian11 Where I’ve seen this kernel panic is just after the kernel boots it tries to connect to the initrd file (virtual hard drive VFS: in the error message). This is the init.xz file that gets transferred after bzImage to the target computer. Almost all selections on the iPXE Boot menu calls bzImage and init.xz
Lets see if you manually download the intits from here: https://github.com/FOGProject/fos/releases
Its init.xz and it goes into /var/www/html/fog/service/ipxe directory. Rename the original one first then download this new one. Again its called init.xz
From the linux command prompt you could run these commands.
to see if the files are exactly the same.
@george1421 Thanks for the quick response. Just downloaded init.xz from the FOG 1.5.10 release (Kernel 5.15.93) using the link provided and the MD5 hash matches for both the newly downloaded and existing file. The SHA256 values for each also match the value published in the release.
@Can-eh-dian11 Are your clients PXE booting across subnets/routers or maybe even a VPN tunnel? If they do I may ask you to bring a machine to the local network to PXE boot and deploy there. See if you get the same error again. If it’s not across subnets I still suggest you try to move a client as close to the FOG server as possible (same switch) and try to reproduce the issue again.
@Sebastian-Roth Yes they are (multiple different subnets and IPsec tunnels are in use). As suggested I gradually brought the FOG server and the clients closer together until eventually they were on the same switch/subnet and I’m still encountering the error. We’ve also done some more deployments and have found that the issue isn’t as limited to a specific hardware vendor as we initially thought. There have been some recent Dell laptops where we are starting to see this pop up.
As suggested I gradually brought the FOG server and the clients closer together until eventually they were on the same switch/subnet and I’m still encountering the error.
Good work! Although we have not found the root cause of this we did rule out a very important factor.
We’ve also done some more deployments and have found that the issue isn’t as limited to a specific hardware vendor as we initially thought. There have been some recent Dell laptops where we are starting to see this pop up.
So from what we know so far I would guess this is a driver issue in iPXE. Do all the devices showing this issue have the same or a similar network chip (same vendor)?
Try using a different iPXE binary. Which one do you currently use? Are the devices set to UEFI or legacy BIOS mode?
This post is deleted!