FOG 1.5.6: Auto resize is unpredictable
-
@Quazz said in FOG 1.5.6: Auto resize is unpredictable:
@Cheetah2003 So did your recovery partition have no label at all or what was it exactly? (should clarify that label is basically the user editable “name” of the partition such as “System Reserved”)
If it didn’t have a label, then I’m kind of confused about how it wouldn’t get resized since it most certainly should have.
It has a label. I put the label when I create the partition. It’s just called “RECOVERY” However, perhaps you’re confused. “System reserved” would be a partition type. That is too long for a label anyway. Doesn’t seem like FOG is even looking at the volume labels. They’re not in any of the files? Like the operating system partition’s label is “OS” but you don’t see that anywhere either. The ‘names’ you’re seeing in those two files is the partition type. If you fire up ‘fdisk’ in linux and change the type of a partition, you’ll note the UUID types line up precisely to the names fog has in those two files. So I dunno what it’s doing there, the ‘name’ field seems redundant, since it’s just a text representation of the UUID partition type.
I will also clarify why I said ‘incorrect’ in my previous post. I was refering to the prior version of FOG doing anything based on a volume label. It did not. As I explained, my original setup I put my Recovery partition LAST on the disk. And would get resized, and the operating system partition directly before it would not. So I moved my recovery partition to be BEFORE the operating system partition. This fixed my issue in the previous version of fog, it would only attempt to resize the last partition, regardless of it’s label. Now it’s resizing both of them.
@Quazz said in FOG 1.5.6: Auto resize is unpredictable:
There’s actually something very strange in this case, which is that your partitions and min.partitions file is the same. (maybe you accidentally pasted the wrong one?)
Yes, I thought that looked strange and I did recheck I was pasting the right file. Alas, that is 100% accurate. That’s what was in my fog server’s folder.
EDIT: I notice there is a difference between the files now. But I swear, they were the same before. This is several captures later, so I dunno. I’ve been trying to debug another issue and have recaptured that image several times.On another note: I’m not sure why I can’t get someone to answer this for me: What is the expected behavior when you have more than one resizeable partition?
-
@Quazz said in FOG 1.5.6: Auto resize is unpredictable:
There was some discussion about which direction to go forward in and ideally we’d do things in a more clever way, but for the time being we landed on using partition flags (such as boot, hidden, reserved) to determine whether or not they should be attempted to resized.
Me I’m rather unconcerned with auto-detection of partition types. It’s going to be dubious at best, under any circumstance. This is why I started this thread with the suggestion: We need to be able to specify how to handle partition resizing in the image specification, for better control over the behavior.
So putting aside the detection or lack there of… a manual scheme would be wonderful. This is what I’ve been advocating for all along. I’ve already figured out how to ‘trick’ FOG into leaving my recovery partition alone. I mentioned that in the very first post, as well.
And I just have to ask again: What should it be doing when it encounters more than one resizeable partition? How does it decide how to expand these on a target disk? No one seems interested in answering this?
If I capture 5 partitions, and 2 are auto-resized… how should I expect them to be put onto a target disk? This is where the unpredictablity comes in. In my experience, in that scenario, the target disk layout is pretty much random. Sometimes I get totally minimal partitions on both of them, sometimes it expands one to some odd size, and leaves the other one at minimum. Sometimes it expands the last partition to fill the entire disk. It’s completely unpredictable! So what should it actually be doing?
I’m asking all this stuff, cuz at the end of the day… I actually would like my recovery partition resized to minimum size and left that way on the target disk. It’s not supposed to be written to ever again anyway. But until I understand how the resize mechanisms actually work, how to control their behavior, I’m pretty much stuck with an unpredictable result.
-
@Cheetah2003 I am totally with you that the “auto-detection” and the resizing algorithms are not perfect at all! It’s quite difficult if not impossible to suite every situation/partition layout. So on the one hand side I do understand you are asking for more control over this via settings. On the other hand we have had this in place for many years now and were mostly able to help people to make it work for them nevertheless it having drawbacks!
I’d be very happy to have a couple of weeks to sit down and re-write the whole resize and partition layout handling code and make it all more robust and customizable. But right now the FOG dev team is very small and I myself have hardly enough time to answer all questions and fix things for people in the forums and on github. Can’t see me re-doing the partition handling or even just adding the asked settings in the image definition.
Don’t get me wrong. I am not saying we shouldn’t do it. I’d love to but we just won’t find the time for such a major change (I reckon it is). Would you be willing to dive in and work on this? I’ll surely assist as much as I can.
What should it be doing when it encounters more than one resizeable partition? How does it decide how to expand these on a target disk? No one seems interested in answering this?
Please go ahead, dive in the code and find out. I have played with this part of FOG a fair bit but it’s not code I wrote and I have never got to the point where I would say that I fully understand it. But from what I have figured out from playing with and reading the code I’d really wonder if this has random behavior. Sure it does not always do what I expect but I never got to the point where I have seen a single partition layout to be deployed randomly. But hey, what do I know.
Start out here: https://github.com/FOGProject/fos/blob/master/Buildroot/board/FOG/FOS/rootfs_overlay/usr/share/fog/lib/procsfdisk.awk
-
@Sebastian-Roth said in FOG 1.5.6: Auto resize is unpredictable:
Please go ahead, dive in the code and find out.
Yeah, I understand you’re just an open source project, and most likely short handed. If time permits, I would love to study under the hood and figure out how it all works.
But like everyone else, I have to survive first. So we’ll see. I have non-production fog server on my desktop at home, already (I set that up to get all those screen shots for you guys!), so maybe as time and interest permits, I will do just that!
And I apologize for the bold. Just been asking that question ALL WEEK and never got a response until I asked again in bold. So… hard to buy ‘shouting not needed’ when that’s what finally got a reply.
Anyway, thanks for all your help, I do appreciate it!
-
@Cheetah2003 said:
Just been asking that question ALL WEEK and never got a response until I asked again in bold.
I get your point here. But you need to understand as well that this is not about how often and how loud you ask but simply about how much time and willingness we/I have to answer all questions. I tend to delay questions that I can’t answer from the top of my head without letting people know. Maybe not the nicest way, so sorry for that.
I got a little bit more time today that I had on this very busy week. So I will try to give you some tools to look into this.
- Using a host/client in debug mode is usually the best way because there might be minor differences in tool versions and functionality compared to a day to day Linux system that turn out when you make extensive use if those as we do in the scripts. -> Schedule a debug task (deploy or capture whichever you are after) and boot the client till you get to the shell.
- Working on the client/host console really sucks. So using SSH connection to be able to copy&paste stuff is really helpful! -> When the client/host is up and on the shell run
passwd
command to set a root password andip a s
to get the IP address if you don’t already know. With that you should be able to connect to that client via SSH. - Preparing the environment like mounting the NFS share from your FOG server might still be inconvenience. -> Start out the task using the command
fog
, step to the point where NFS share is mounted and directory prepared (e.g. upload task creates directory to upload to) and then cancel the whole thing (ctrl+c) to get back to the shell and work on it. - The most important piece of the resize logic is understanding the partition layouts. There is no way around it other than starting to read and understand as much of
d1.partitions
(simplysfdisk -d /dev/...
output of the original layout),d1.minimum.partitions
(same as before but after it was shrunk down) andd1.fixed_size_partitions
(enumeration of partitions that won’t be resized or moved). - Now if you want to play with the partition resize logic I suggest you run that magic script manually. Follow the above steps and when you have things ready you can simply run the following command to let it calculate the partition table it would use to deploy to a target disk:
/usr/share/fog/lib/procsfdisk.awk -v SECTOR_SIZE=512 -v CHUNK_SIZE=512 -v MIN_START=2048 -v action=filldisk -v target=/dev/sda -v sizePos=187136208 -v diskSize=187136208 -v fixedList=1:2 /images/d1.minimum.partitions /images/d1.partitions
Hints:
SECTOR_SIZE
andCHUNK_SIZE
are 512 in very much all cases. New disks with real 4096 sector size are around but we have not seen much of those yet.MIN_START
is the start sector of the first partition, quite often 2048 but can be different!action=filldisk
is just what you usually want.diskSize
is the full sector count of the target disk you want to deploy to - you can find out the sector count by runningsfdisk -d /dev/...
on the target disk and look at thelast-lba
line.fixedList
is the list of partitions that should not be touched. And finally you tell it to read the rest of the rest of the information fromd1.minimum.partitions
andd1.partitions
. Running the command will print out a new partition layout that FOS would use to deploy to the target disk.I’d say, play with this for a bit and let us know what you find. I am fairly sure there are things in this that don’t add up. Possibly you’ll even find a bug in there that we have not come across over all the years.
-
@Sebastian-Roth Out of curiosity I tried that out and got some strange results. (resizable gets early “start positions”) I then ran
gawk /usr/share/fog/lib/procsfdisk.awk --lint -v SECTOR_SIZE=512 -v CHUNK_SIZE=512 -v MIN_START=2048 -v action=filldisk -v target="/dev/sda" -v diskSize=187136208 -v fixedList="1:2" d1.minimum.partitions d1.partitions
(gawk has a --lint option apparentally)
this threw a slew of warnings, haven’t had the oppertunity to go through them (I suspect most of them to be irrelevant), but figured I’d mention this option, could help in figuring this out.
EDIT: Aside two (minor) issues (such as line 545 (unquoted gpt label check)), couldn’t find anything through the lint option personally. (although I do wonder why non-fixed partitions seem to not be passed their original size. EDIT2: figured out this bit, corrected it, but still strange output, will paste below)
# Partition table is consistent. label: gpt label-id: 6D7D4E9F-F276-4554-945E-D42EF1DB667D device: /dev/sda unit: sectors first-lba: 34 last-lba: 1871362046 /dev/sda1 : start= 1870042624, size= 1083392, type=DE94BBA4-06D1-4D40-A16A-BFD50179D6AC, uuid=0E09A256-6313-43EA-9C45-1BDB234A17A3, name="Basic data partition", attrs="RequiredPartition GUID:63" /dev/sda2 : start= 1871126016, size= 202752, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=E004F3EB-3497-45AC-8BC2-40BF62ECF868, name="EFI system partition", attrs="GUID:63" /dev/sda3 : start= 1871328768, size= 32768, type=E3C9E316-0B5C-4DB8-817D-F92DF00215AE, uuid=B4553686-67E2-4177-BC7D-AC092860D2CF, name="Microsoft reserved partition", attrs="GUID:63" /dev/sda4 : start= 2048, size= 188101120, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=0005FBFB-A630-456B-9938-D501F6F70B00, name="Basic data partition" /dev/sda5 : start= 188103168, size= 1681939456, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=5F76FA4B-76D2-43B0-8ECB-F3EB8596E490, name="Basic data partition"
Problem appears to steam from the for loop at line 562
-
@Quazz Looks really strange the output. Three things I noticed:
- For whatever reason I have missed one parameter:
sizePos=...
- as far as I see it shouldn’t be relevant in the case where we useaction=filldisk
but in the scripts it’s set to the same value as disk size. So you might try and see if it makes a difference adding that (I updated my post). - Again something I might have messed up. I used quotes for the two parameters
target
andfixedList
although it’s not in the original scripts. Shouldn’t make a difference but we’ll see. - You are using
gawk
- is this for a good reason? Do you run the script on a FOS machine or some other Linux OS?
- For whatever reason I have missed one parameter:
-
@Sebastian-Roth Corrected the cli as per the info given, same results however.
I am using gawk because I’m running the tests on my Centos machine and if I don’t explicitily call gawk it will run awk (which for some reason gawk isn’t symlinking too on this system) which misses out on a variety of the requirements used in the script (and lint)
Forcing a skip in the for loop at line 563 delivers the output makes it look more normal, but then the partition table doesn’t make sense since the starts aren’t properly recalculated.
My gawk version seems to be an older version than the one Buildroot has been running for a while, so I’ll see if a newer version delivers better output.
Gawk version was the issue indeed. (was 4.0.2, now 4.2.1)
New output:
# Partition table is consistent. label: gpt label-id: 6D7D4E9F-F276-4554-945E-D42EF1DB667D device: /dev/sda unit: sectors first-lba: 34 last-lba: 1871362046 /dev/sda1 : start= 2048, size= 1083392, type=DE94BBA4-06D1-4D40-A16A-BFD50179D6AC, uuid=0E09A256-6313-43EA-9C45-1BDB234A17A3, name="Basic data partition", attrs="RequiredPartition GUID:63" /dev/sda2 : start= 1085440, size= 202752, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=E004F3EB-3497-45AC-8BC2-40BF62ECF868, name="EFI system partition", attrs="GUID:63" /dev/sda3 : start= 1288192, size= 32768, type=E3C9E316-0B5C-4DB8-817D-F92DF00215AE, uuid=B4553686-67E2-4177-BC7D-AC092860D2CF, name="Microsoft reserved partition", attrs="GUID:63" /dev/sda4 : start= 1320960, size= 188101120, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=0005FBFB-A630-456B-9938-D501F6F70B00, name="Basic data partition" /dev/sda5 : start= 189422080, size= 1681939456, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=5F76FA4B-76D2-43B0-8ECB-F3EB8596E490, name="Basic data partition"
By the way, @Cheetah2003 looking over the code, I think what it tries to do is assign the new size as a the same percentage as it took on the previous image if the partition is resizable. At least that’s what the intention is supposed to be.
The output here looks valid and more or less what we expect the script in its current iteration to do; but I could be missing something of course.
-
@Quazz said in FOG 1.5.6: Auto resize is unpredictable:
Gawk version was the issue indeed. (was 4.0.2, now 4.2.1)
Not that I really expected this but had a feeling somehow that using some other environment could give different results. Thanks for testing and verifying.
-
@Cheetah2003 Are you still keen to look into this?
-
@Sebastian-Roth said in FOG 1.5.6: Auto resize is unpredictable:
@Cheetah2003 Are you still keen to look into this?
I’d be happy to. What do you want me to do?
Also, for what it’s worth, I’m not sure multi-partition resizing is really necessary. I can’t really think of any use cases for this ‘feature.’
The percentage thing described earlier sounds pretty dubious, especially if you’re capturing 5 partitions from a 50GB disk… and the recovery partition is 20% of that space (10GB)… you don’t need that taking 20% of a target drive. That would be kinda crazy.
So really, IMHO, a percentage of the original drive captured from seems kinda not-useful. I still think this should be controllable entirely from the image specification. But I think that would require the image specification to actually pull info out of the captured image to offer the user options for how to handle the partitions contained within that image. Probably a pretty big rewrite of that entire part of the system. I’d love to see this, but yeah, it’s going to be a big task from my perspective.
So I’ll be happy to peek/test whatever you need help with, as time permits, but I’m a little unsure of the goal.
-
@Cheetah2003 A couple of posts down the road (four days earlier) I offered instructions on how to manually run the re-size calculation script. This is a good start to play with and get to see how this is all working. I am fairly sure this is not without flaw and it would be great if you are keen to look into it and suggest things you find.
-
Im just joining in here. But we are seeing the something like the same problem here. Fog 1.5.6. We have Dell 3430’s we are getting ready for deployment this fall.
A 3430 is a new model for us, and the first we have that doesn’t let you have a MBR boot disk, just GPT. Got everything working with a GPT clonemaster which for various ugly reasons has partitions like this:
[root@fog clonemaster10-lab-gpt]# cat d1.minimum.partitions
label: gpt
label-id: 701D9ABD-7D9A-11E9-B9AE-5254009E1079
device: /dev/sda
unit: sectors
first-lba: 34
last-lba: 257228766/dev/sda1 : start= 2048, size= 1124352, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=701D9AB9-7D9A-11E9-B9AE-5254009E1079
/dev/sda2 : start= 1126400, size= 234728416, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=701D9ABA-7D9A-11E9-B9AE-5254009E1079
/dev/sda3 : start= 255332352, size= 204800, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=701D9ABB-7D9A-11E9-B9AE-5254009E1079, name=“attrs=\x22GUID:63”sda2 is the real windows 10 partition…
cat d1.partitions
label: gpt
label-id: 701D9ABD-7D9A-11E9-B9AE-5254009E1079
device: /dev/sda
unit: sectors
first-lba: 34
last-lba: 257228766/dev/sda1 : start= 2048, size= 1124352, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=701D9AB9-7D9A-11E9-B9AE-5254009E1079
/dev/sda2 : start= 1126400, size= 254204148, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=701D9ABA-7D9A-11E9-B9AE-5254009E1079
/dev/sda3 : start= 255332352, size= 204800, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=701D9ABB-7D9A-11E9-B9AE-5254009E1079, attrs=“GUID:63”cat d1.fixed_size_partitions
:3:3All seemed great till we tried it on some new machines to deploy and after fog/oobe/namechange/domainjoin SOME of them wouldn’t let anyone log in. Turns out the middle partition didn’t get extended correctly in some cases so windows was out of disk.
I can multicast to 4 identical machines and on 3 of them /dev/sda2 gets resized correctly, but one it doesn’t. And the one it fails on is not always the same… Funky eh?
When I do a debug deploy with ismajordebug=9 it always works…
Was going to go digging into my memory to rebuild a init.xz that has ismajordebug=9 next. See if that makes 4 host multicast work. Or points to the problem.
Oh, and a manual run of /usr/share/fog/lib/procsfdisk.awk in debug mode seems to be producing the correct output.
vaguely wondering if $tmp_file2 is getting hosed some how before fillSfdiskWithPartitions calls applySfdiskPartitions… But like i said I can not get problem to replicate in majordebug mode yet.
Would be glad to instrument out fog.download in any way you suggest.
More tomorrow if I find anything useful.
E
-
@Cheetah2003 OS-built recovery partitions have the partition flags that keeps them fixed size.
The reason for multi partition resize is in case you have your normal partition layout (fixed size 1-3 + Windows partition) and an additional data partition.
eg
/dev/sda 1 200mb
/dev/sda2 800mb
/dev/sda3 200mb
/dev/sda4 30GB
/dev/sda5 200GBYou can’t automagically know which of the last 2 partitions to resize and which to ignore. Windows needs room to breathe, but if you deploy this to a 2TB drive then having a 1.8TB windows partition and 200GB data partition feels silly.
I agree that the current method isn’t good enough, of course, but it’s not without its logic.
Back to the topic of trying to figure this (this being why sometimes partitions don’t resize) out, as far as I can tell, these resize issues only occur on GPT based layouts.
I’ll be looking over partition-funcs.sh in that sort of a direction.
-
@Quazz Do we know enough about the problem to say… The problem started with Windows 10 version XXXX yet? I’m a bit suprised that if this is a GPT disk layout issue we haven’t had this problem before now? Or is it related to changes in FOS that caused this issue to come up (like building FOS from a newer release of buildroot causing packages to be updated)?
-
@george1421 There were some changes to GPT related stuff, not a lot, but some
I also think I remember a case where an existing image only started showing odd issues after updating FOG, so I’m currently leaning towards FOS, especially since I have experienced no problems on the latest Windows 10 versions at all.
So I’m guessing there’s something funky going on under certain conditions, but not sure what. Given the ambiguity it might not even have anything to do with GPT, but since those were the only relevant changes to the files currently being examined it seems the most likely path all the same.
-
I think I have found HOW the problem (or at least my problem) is happening, but still not clear on WHY…
/usr/share/fog/lib/partition-funcs.sh line 76 in restoreSfdiskPartitions
is where the resize occurs.sfdisk $disk < $file >/dev/null 2>&1
[[ ! $? -eq 0 ]] && majorDebugEcho “sfdisk failed in (${FUNCNAME[0]})”$file is a sfdisk input built in processSfdisk via /usr/share/fog/lib/procsfdisk.awk and stored in $tmp_file2 = /tmp/sfdisk2.$$
But if $tmp_file2 is empty $? from that sfdisk is still 0 (ie silent error) This I found via testing in a debug deploy.
Not sure why /tmp/sfdisk2.$$ is getting empty semi-randomly . Still tracking that down. /tmp is tmpfs filesystem, target machine has 16G ram. Doubt it is flling up…
-
@Eric-Johnson Just to collect a bit more data. In your FOG ui FOG Configuration->FOG Settings->TFTP Server->KERNEL RAMDISK SIZE What is the value there 127000? If so does it change the reliability if you change it to 255000? This ups the amount of virtual disk FOS Linux has available during imaging.
-
@Quazz said in FOG 1.5.6: Auto resize is unpredictable:
@Cheetah2003 OS-built recovery partitions have the partition flags that keeps them fixed size.
@Quazz Argh. As I said several times, this isn’t a OS built recovery partition. I built it myself. Are you even reading my posts???
@Eric-Johnson Welcome. And yeah, what you’re describing sounds very similar to the issue I had with the previous version of FOG that required I move my recovery partition to be before the OS partition, making the OS partition last on the disk for resize to work properly.
@Sebastian-Roth Sure sure. I’ll do some experiments and report back any findings. Might be a few days, so I hope you’re not in a hurry.
-
This other thread issue seems to be related to (maybe as a cousin) to this issue. In that thread the drive is not being expanded again after its being captured by FOG.
ref: https://forums.fogproject.org/topic/13479/install-windows-error-after-capturing-image