Kernel temporarily loosing partition information when running partprobe
-
Trying to push an image to the clients I ran into a problem which took me several hours to dig into. Hope this can be of help to someone else too and maybe FOG could be updated to circumnavigate this issue as well. The client comes up with this error message:
[CODE]* Looking for Hard Disks…Done
- Using Hard Disk: /dev/sda
- Erasing current MBR/GPT Tables…Done
- Restoring MBR…Done
- Extended partitions…
Error: Error informing the kernel about modifications to partition
/dev/sda5 – Device or resource busy. This means Linux won’t know
about any changes you made to /dev/sda5 until you reboot – so you
shouldn’t mount it or use it in any way before rebooting.
Error: Failed to add partition 5 (Device or resource busy)[/CODE]
sda5 (or sda as a whole) CANNOT be busy as there is only FOG running and nothing else! The error is not a show stopper straight away. FOG starts to write the partition images to disk. sda1, sda2, sda3 all went fine [B]but sda5 and sda6 failed[/B]!!
Trying to find what’s causing this I stumbled upon this: [url]https://github.com/Excito/parted/blob/master/tests/t2310-dos-extended-2-sector-min-offset.sh[/url]
[CODE]…
Ensure that parted leaves at least 2 sectors between the beginning
of an extended partition and the first logical partition.
Before parted-2.3, it could be made to leave just one, and that
would cause trouble with the Linux kernel.
…[/CODE]
Looking at my partition table I knew what’s causing the trouble:
[CODE]# partition table of /dev/sda
unit: sectors/dev/sda1 : start= 2048, size= 206797, Id= 7, bootable
/dev/sda2 : start= 210944, size=257535916, Id= 7
/dev/sda3 : start=257748991, size=224474114, Id= f
/dev/sda4 : start= 0, size= 0, Id= 0
/dev/sda5 : start=257748992, size=150334138, Id= 7
/dev/sda6 : start=408084480, size= 74138625, Id= 7[/CODE]See the start of sda3 and sda5 are only one sector apart!
So what is really happening?? FOG zaps the old partition table on disk (sgdisk -Z), restores MBR (dd), creates partitions (sfdisk) and then forces the kernel to re-read the partition table (partprobe). Partprobe doesn’t like the partition scheme and leaves the kernel with a only halfway populated list of partitions. Looks like this:
[CODE]# ls -al /dev/sdb*
brw------- 1 root root 8, 0 Jan 21 03:33 /dev/sda
brw------- 1 root root 8, 1 Jan 21 03:33 /dev/sda1
brw------- 1 root root 8, 2 Jan 21 03:33 /dev/sda2
brw------- 1 root root 8, 3 Jan 21 03:33 /dev/sda3
brw------- 1 root root 8, 5 Jan 21 03:33 /dev/sda5
brw------- 1 root root 8, 6 Jan 21 03:33 /dev/sda6cat /proc/partitions
major minor #blocks name
8 0 241107738 sda
8 1 103398 sda1
8 2 128767958 sda2
8 3 1 sda3
8 5 75167069 sda5
8 6 37069312 sda6partprobe /dev/sda
Error: Error informing the kernel about modifications to partition
/dev/sda5 – Device or resource busy. This means Linux won’t know
about any changes you made to /dev/sda5 until you reboot – so you
shouldn’t mount it or use it in any way before rebooting.
Error: Failed to add partition 5 (Device or resource busy)ls -al /dev/sdb*
brw------- 1 root root 8, 0 Jan 21 03:33 /dev/sda
brw------- 1 root root 8, 1 Jan 21 03:33 /dev/sda1
brw------- 1 root root 8, 2 Jan 21 03:33 /dev/sda2
brw------- 1 root root 8, 3 Jan 21 03:33 /dev/sda3cat /proc/partitions
major minor #blocks name
8 0 241107738 sda
8 1 103398 sda1
8 2 128767958 sda2
8 3 1 sda3
#[/CODE]The partition table on disk is still alright. It’s just the kernel not knowing about it! There are several commands (sfdisk -R, hdparm -z, blockdev --rereadpt) which make the kernel re-read this information. For example if I run ‘hdparm -z /dev/sda’ I get sda5 and sda6 back and can run partclone (by hand) without any further trouble!
I know this is a very special (and stupid) problem and I really hope that no one else is running into this. [B]But could FOG prevent people from this by using a tool other than partprobe to make the kernel re-read the partition information???[/B]
-
If you could update to 2919. You should receive the updated init files that are now using blockdev --rereadpt <device> to have the kernel reread the partition information.
-
Thanks!! I get to try it on Friday and will report back then…
-
Deploying the image works now (no error messages) but somehow the second logical partition is screwed. I didn’t have the time to look into this so far. Just a couple of things I remember (noted down):
- Deploy task with partclone went through (including sda5 and sda6) without error
- Running debug download I had a quick look on the partition table (fdisk -l) wich seamed alright at that point
- Windows 7 boots up and is happy but is completely missing sda6 (2nd logical partition)!!
- So I quickly booted a debian live CD to check on the partition table (fdisk -l) and it looked a bit crooked:
[CODE]…
/dev/sda6 671145921 671147201 640+ 1 FAT12[/CODE]
All the other partitions are good! But sda6 should look like this:
[CODE]…
/dev/sda6 408084480 445153792 37069312 7 HPFS/NTFS/exFAT[/CODE]
As you can see those numbers are well beyond the disk capacity. Trying to dump the partition table with sfdisk I got the following error:
[CODE]ERROR: sector 408084417 does not have an msdos signature[/CODE]
Anyone got an idea? [B]ntfsfix[/B] is being called after restoring the partitions. Could that cause trouble? I am still not sure if all this is a very rare problem (caused by a stupid partition layout) or if others might also run into this. I’ll do some more tests next week and hopefully will be able to get it all sorted.
Any comments on this are more than welcome!
PS: We are happily using FOG for some time now and this problem is not a real issue as we fixed our partition layout already. I just want to make sure that no one else is also having trouble with this.
-
I do want to figure out what is and why it is happening. I just don’t know where to begin. I am extremely appreciative on the assist to bug track and narrow the problem down. Hopefully we can get it “just working” across the board for you and any others who may run into the same type of problems.
-
As I don’t have access to those computers on the weekend I simulate things with QEMU at home. It’s great for testing this kind of partitioning issues as running through an upload/download cycle only takes a couple of minutes with a small test container image!
But I wasn’t able to reproduce the problem at home as I don’t have a full Windows installation running in QEMU here. So my guess is that Windows is messing up things after the deploy. But I really wonder why it does when it is actually happy with things before I try to image it to another computer!?
I’ll do some more research next week and will let you know as soon as possible.
-
[QUOTE]So my guess is that Windows is messing up things after the deploy.[/QUOTE]
I was wrong!! The partition table is ruined even before FOG reboots the machine!Adding some more debugging steps into the download script I found out that deploying sda3 (extended partition) with partclone is messing up things!! It turns out that in some cases Linux has a different understanding of extended partitions than Windows has. In my case the number of the start sector of sda6 is slightly smaller when being captured with ‘sfdisk -d’ while uploading an image. This wouldn’t be a problem IF we wouldn’t write sda3 (and with that the old start sector of sda6) back to disk!
One solution would be to go back and only allow Linux to have extended partitions but I doubt this is what users would like to see. Another way would be to check the partition type and skip partclone.restore if it is an extended partition. Could that cause trouble somewhere else??
-
I don’t imagine it would.
The extended partition contains no data, only the logical partition does. However, that doesn’t make sense that restoring a 1 or 2 sector sized file (about 4k of data) would “break” anything.
Not copying it wouldn’t be a bad idea though either as it doesn’t contain anything of importance, but again I can’t see how the copying of this would cause problems.
Do you know what filetype it comes up as? I forget what command line of the top of my head reports this to us.
-
[QUOTE]Not copying it wouldn’t be a bad idea though either as it doesn’t contain anything of importance, but again I can’t see how the copying of this would cause problems.[/QUOTE]
Because the OLD start sector of the logical partitions is stored in that small junk of data and will overwrite the numbers that sfdisk has created when writing the whole NEW partition table.[QUOTE]Do you know what filetype it comes up as?[/QUOTE]
In my case partclone.imager (raw) is being used when storing this partition. As far as I remember ‘blkid’ is used to discover the filesystem on partitions. We probably could use this to skip taking an image on extended partitions as well.[CODE]blkid -po udev /dev/sda3
ID_PART_TABLE_TYPE=dos
ID_PART_ENTRY_SCHEME=dos
ID_PART_ENTRY_TYPE=0x5
…
[/CODE]
Type 5 (and also f) are used for extended partitions. -
Here is a patch. Only tested in QEMU so far!!! Use with care.
[url=“/_imported_xf_attachments/1/1638_partype.patch.txt?:”]partype.patch.txt[/url]
-
Here’s the same, albeit slightly modified, patch:
[code]Index: src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh
— src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh (revision 2932)
+++ src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh (working copy)
@@ -106,6 +106,12 @@
fi
}$1 is the partition
+getPartType()
+{-
parttype=`blkid -po udev $1 | grep PART_ENTRY_TYPE | awk -F'=' '{print $2}'`;
-
echo $parttype;
+}
+# $1 is the partitionReturns the size in bytes.
getPartSize()
{
@@ -774,9 +780,9 @@$1 is the drive
runPartprobe()
{-
hdparm -z $1 &>/dev/null;
-
if [ ! -f "${1}1" ]; then
-
partprobe $1 &>/dev/null;
-
partprobe $1 &>/dev/null || hdparm -z $1 &>/dev/null;
-
if [ "$?" != "0" ]; then
-
handleError "Failed to read back partitions"; fi
}
@@ -1023,6 +1029,7 @@
local imgPartitionType=“$6”;
local partNum=“”;
local fstype=“”;-
local parttype=""; local imgpart=""; partNum=${part:$diskLength};
@@ -1030,7 +1037,8 @@
mkfifo /tmp/pigz1;
echo " * Processing Partition: $part ($partNum)";
fstype=fsTypeSetting $part
;-
if [ "$fstype" != "swap" ]; then
-
parttype=`getPartType $part`;
-
if [ "$fstype" != "swap" ] || [ "$parttype" != "0x5" -a "$parttype" != "0xf" ]; then echo -n " * Using partclone."; echo $fstype; sleep 5;
@@ -1042,8 +1050,12 @@
clear;
echo " * Image uploaded";
else-
echo " * Not uploading swap partition";
-
saveSwapUUID "${imagePath}/d${intDisk}.original.swapuuids" "$part";
-
if [ "$parttype" == "0x5" -o "$parttype" == "0xf" ]; then
-
echo " * Not uploading content of extended partition";
-
else
-
echo " * Not uploading swap partition";
-
saveSwapUUID "${imagePath}/d${intDisk}.original.swapuuids" "$part";
-
fi fi rm /tmp/pigz1; else[/code]
-