Here is a patch. Only tested in QEMU so far!!! Use with care.
[url=“/_imported_xf_attachments/1/1638_partype.patch.txt?:”]partype.patch.txt[/url]
Here is a patch. Only tested in QEMU so far!!! Use with care.
[url=“/_imported_xf_attachments/1/1638_partype.patch.txt?:”]partype.patch.txt[/url]
[QUOTE]Not copying it wouldn’t be a bad idea though either as it doesn’t contain anything of importance, but again I can’t see how the copying of this would cause problems.[/QUOTE]
Because the OLD start sector of the logical partitions is stored in that small junk of data and will overwrite the numbers that sfdisk has created when writing the whole NEW partition table.
[QUOTE]Do you know what filetype it comes up as?[/QUOTE]
In my case partclone.imager (raw) is being used when storing this partition. As far as I remember ‘blkid’ is used to discover the filesystem on partitions. We probably could use this to skip taking an image on extended partitions as well.
[CODE]blkid -po udev /dev/sda3
ID_PART_TABLE_TYPE=dos
ID_PART_ENTRY_SCHEME=dos
ID_PART_ENTRY_TYPE=0x5
…
[/CODE]
Type 5 (and also f) are used for extended partitions.
[QUOTE]So my guess is that Windows is messing up things after the deploy.[/QUOTE]
I was wrong!! The partition table is ruined even before FOG reboots the machine!
Adding some more debugging steps into the download script I found out that deploying sda3 (extended partition) with partclone is messing up things!! It turns out that in some cases Linux has a different understanding of extended partitions than Windows has. In my case the number of the start sector of sda6 is slightly smaller when being captured with ‘sfdisk -d’ while uploading an image. This wouldn’t be a problem IF we wouldn’t write sda3 (and with that the old start sector of sda6) back to disk!
One solution would be to go back and only allow Linux to have extended partitions but I doubt this is what users would like to see. Another way would be to check the partition type and skip partclone.restore if it is an extended partition. Could that cause trouble somewhere else??
As I don’t have access to those computers on the weekend I simulate things with QEMU at home. It’s great for testing this kind of partitioning issues as running through an upload/download cycle only takes a couple of minutes with a small test container image!
But I wasn’t able to reproduce the problem at home as I don’t have a full Windows installation running in QEMU here. So my guess is that Windows is messing up things after the deploy. But I really wonder why it does when it is actually happy with things before I try to image it to another computer!?
I’ll do some more research next week and will let you know as soon as possible.
Did FOG wake up all 15 clients? Did the others get the normal FOG iPXE Boot menu (list with ‘Boot from harddrive’ as the first item)? What if you start the job again?
[CODE]# ps ax | grep “udp-sender”[/CODE]
You should see several processes. Keep your eyes open for the command option ‘–min-receivers’. As far as I know (FOG 1.2.0) multicast is not going to start if hosts are missing. Not sure if this is still true in current SVN. But check your process list anyway!
Deploying the image works now (no error messages) but somehow the second logical partition is screwed. I didn’t have the time to look into this so far. Just a couple of things I remember (noted down):
Anyone got an idea? [B]ntfsfix[/B] is being called after restoring the partitions. Could that cause trouble? I am still not sure if all this is a very rare problem (caused by a stupid partition layout) or if others might also run into this. I’ll do some more tests next week and hopefully will be able to get it all sorted.
Any comments on this are more than welcome!
PS: We are happily using FOG for some time now and this problem is not a real issue as we fixed our partition layout already. I just want to make sure that no one else is also having trouble with this.
Thanks!! I get to try it on Friday and will report back then…
Trying to push an image to the clients I ran into a problem which took me several hours to dig into. Hope this can be of help to someone else too and maybe FOG could be updated to circumnavigate this issue as well. The client comes up with this error message:
[CODE]* Looking for Hard Disks…Done
sda5 (or sda as a whole) CANNOT be busy as there is only FOG running and nothing else! The error is not a show stopper straight away. FOG starts to write the partition images to disk. sda1, sda2, sda3 all went fine [B]but sda5 and sda6 failed[/B]!!
Trying to find what’s causing this I stumbled upon this: [url]https://github.com/Excito/parted/blob/master/tests/t2310-dos-extended-2-sector-min-offset.sh[/url]
[CODE]…
…[/CODE]
Looking at my partition table I knew what’s causing the trouble:
[CODE]# partition table of /dev/sda
unit: sectors
/dev/sda1 : start= 2048, size= 206797, Id= 7, bootable
/dev/sda2 : start= 210944, size=257535916, Id= 7
/dev/sda3 : start=257748991, size=224474114, Id= f
/dev/sda4 : start= 0, size= 0, Id= 0
/dev/sda5 : start=257748992, size=150334138, Id= 7
/dev/sda6 : start=408084480, size= 74138625, Id= 7[/CODE]
See the start of sda3 and sda5 are only one sector apart!
So what is really happening?? FOG zaps the old partition table on disk (sgdisk -Z), restores MBR (dd), creates partitions (sfdisk) and then forces the kernel to re-read the partition table (partprobe). Partprobe doesn’t like the partition scheme and leaves the kernel with a only halfway populated list of partitions. Looks like this:
[CODE]# ls -al /dev/sdb*
brw------- 1 root root 8, 0 Jan 21 03:33 /dev/sda
brw------- 1 root root 8, 1 Jan 21 03:33 /dev/sda1
brw------- 1 root root 8, 2 Jan 21 03:33 /dev/sda2
brw------- 1 root root 8, 3 Jan 21 03:33 /dev/sda3
brw------- 1 root root 8, 5 Jan 21 03:33 /dev/sda5
brw------- 1 root root 8, 6 Jan 21 03:33 /dev/sda6
major minor #blocks name
8 0 241107738 sda
8 1 103398 sda1
8 2 128767958 sda2
8 3 1 sda3
8 5 75167069 sda5
8 6 37069312 sda6
Error: Error informing the kernel about modifications to partition
/dev/sda5 – Device or resource busy. This means Linux won’t know
about any changes you made to /dev/sda5 until you reboot – so you
shouldn’t mount it or use it in any way before rebooting.
Error: Failed to add partition 5 (Device or resource busy)
brw------- 1 root root 8, 0 Jan 21 03:33 /dev/sda
brw------- 1 root root 8, 1 Jan 21 03:33 /dev/sda1
brw------- 1 root root 8, 2 Jan 21 03:33 /dev/sda2
brw------- 1 root root 8, 3 Jan 21 03:33 /dev/sda3
major minor #blocks name
8 0 241107738 sda
8 1 103398 sda1
8 2 128767958 sda2
8 3 1 sda3
#[/CODE]
The partition table on disk is still alright. It’s just the kernel not knowing about it! There are several commands (sfdisk -R, hdparm -z, blockdev --rereadpt) which make the kernel re-read this information. For example if I run ‘hdparm -z /dev/sda’ I get sda5 and sda6 back and can run partclone (by hand) without any further trouble!
I know this is a very special (and stupid) problem and I really hope that no one else is running into this. [B]But could FOG prevent people from this by using a tool other than partprobe to make the kernel re-read the partition information???[/B]
[QUOTE]Booted up the client machine and it hangs on the FOG screen.[/QUOTE]
Hard to say without having more details. Could you send a screenshot or describe a little more? Do you get the Boot Menu if there is no task scheduled? Are you able to perform the registration from this menu? What exactly do you see right before it ‘hangs’? e.g. ‘Checking Operating System’ or ‘bzImage’ or ‘iPXE’ or …?
Same issue on my machine (Debian 7.6). As Tom said, it’s got to do with services depending on others but being started too early. I am pretty sure the FOG init scripts (which call php scripts) fail because mysql is not coming up fast enough!
My fix is to remove all FOG init scripts from normal startup and call them in rc.local:
[CODE]# update-rc.d FOGMulticastManager remove
sleep 5
service FOGMulticastManager start
service FOGScheduler start
service FOGImageReplicator start
[/CODE]
Not perfect but definitely working.
Maybe sort of a retry (x times) and wait loop could be implemented either in PHP or the init scripts…
I can confirm fractal13’s findings and I am pretty sure it’s still an issue in current SVN (r2908).
Our setup is kind of similar having Windows 7 AND logical partions.
sda1: primary NTFS Windows 7 (100 MB boot partition)
sda2: primary NTFS Windows 7 (C: partition)
sda3: extended partition
sda5: logical NTFS (D: data)
sda6: logical NTFS (E: ubuntu in an image file using WUBI)
Kind of a weird partition layout I know but it’s what we got right now… As we use WUBI we don’t have GRUB installed right on disk! So I don’t need to worry about that and therefore choose OSID 5 (we run Linux yes but from FOG’s point of view it looks like Windows only).
I fixed the problem for us with this very simple change in fog.upload:
[CODE]— src/buildroot/package/fog/scripts/bin/fog.upload (Revision 2908)
+++ src/buildroot/package/fog/scripts/bin/fog.upload (Arbeitskopie)
@@ -239,7 +239,7 @@
fi
elif [ “$imgType” == “mps” ]; then
hasgpt=hasGPT $hd
;
if [ "$hasgpt" == "0" -a "$osid" == "50" ]; then
if [ "$hasgpt" == "0" ]; then
have_extended_partition=`sfdisk -l $hd 2>/dev/null | egrep "^${hd}.* (Extended|W95 Ext'd \(LBA\))$" | wc -l`;
else
have_extended_partition="0";
[/CODE]
Although I have to admit that I CANNOT test if this is working for everyone else (other Windows or Mac OS X) I wonder why extended/logical partitions should only work with Linux (OSID 50)??
Would be great if others could test this “fix” and provide information if this works in their environment.
Thank you!