FOG 1.5.6: Auto resize is unpredictable


  • Developer

    @Cheetah2003 Are you still keen to look into this?


  • Developer

    @Quazz said in FOG 1.5.6: Auto resize is unpredictable:

    Gawk version was the issue indeed. (was 4.0.2, now 4.2.1)

    Not that I really expected this but had a feeling somehow that using some other environment could give different results. Thanks for testing and verifying.


  • Moderator

    @Sebastian-Roth Corrected the cli as per the info given, same results however.

    I am using gawk because I’m running the tests on my Centos machine and if I don’t explicitily call gawk it will run awk (which for some reason gawk isn’t symlinking too on this system) which misses out on a variety of the requirements used in the script (and lint)

    Forcing a skip in the for loop at line 563 delivers the output makes it look more normal, but then the partition table doesn’t make sense since the starts aren’t properly recalculated.

    My gawk version seems to be an older version than the one Buildroot has been running for a while, so I’ll see if a newer version delivers better output.

    Gawk version was the issue indeed. (was 4.0.2, now 4.2.1)

    New output:

    # Partition table is consistent.
    label: gpt
    label-id: 6D7D4E9F-F276-4554-945E-D42EF1DB667D
    device: /dev/sda
    unit: sectors
    first-lba: 34
    last-lba: 1871362046
    
    /dev/sda1 : start=        2048, size=     1083392, type=DE94BBA4-06D1-4D40-A16A-BFD50179D6AC, uuid=0E09A256-6313-43EA-9C45-1BDB234A17A3, name="Basic data partition", attrs="RequiredPartition GUID:63"
    /dev/sda2 : start=     1085440, size=      202752, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=E004F3EB-3497-45AC-8BC2-40BF62ECF868, name="EFI system partition", attrs="GUID:63"
    /dev/sda3 : start=     1288192, size=       32768, type=E3C9E316-0B5C-4DB8-817D-F92DF00215AE, uuid=B4553686-67E2-4177-BC7D-AC092860D2CF, name="Microsoft reserved partition", attrs="GUID:63"
    /dev/sda4 : start=     1320960, size=   188101120, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=0005FBFB-A630-456B-9938-D501F6F70B00, name="Basic data partition"
    /dev/sda5 : start=   189422080, size=  1681939456, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=5F76FA4B-76D2-43B0-8ECB-F3EB8596E490, name="Basic data partition"
    

    By the way, @Cheetah2003 looking over the code, I think what it tries to do is assign the new size as a the same percentage as it took on the previous image if the partition is resizable. At least that’s what the intention is supposed to be.

    The output here looks valid and more or less what we expect the script in its current iteration to do; but I could be missing something of course.


  • Developer

    @Quazz Looks really strange the output. Three things I noticed:

    • For whatever reason I have missed one parameter: sizePos=... - as far as I see it shouldn’t be relevant in the case where we use action=filldisk but in the scripts it’s set to the same value as disk size. So you might try and see if it makes a difference adding that (I updated my post).
    • Again something I might have messed up. I used quotes for the two parameters target and fixedList although it’s not in the original scripts. Shouldn’t make a difference but we’ll see.
    • You are using gawk - is this for a good reason? Do you run the script on a FOS machine or some other Linux OS?

  • Moderator

    @Sebastian-Roth Out of curiosity I tried that out and got some strange results. (resizable gets early “start positions”) I then ran

    gawk /usr/share/fog/lib/procsfdisk.awk --lint -v SECTOR_SIZE=512 -v CHUNK_SIZE=512 -v MIN_START=2048 -v action=filldisk -v target="/dev/sda" -v diskSize=187136208 -v fixedList="1:2" d1.minimum.partitions d1.partitions
    

    (gawk has a --lint option apparentally)

    this threw a slew of warnings, haven’t had the oppertunity to go through them (I suspect most of them to be irrelevant), but figured I’d mention this option, could help in figuring this out.

    EDIT: Aside two (minor) issues (such as line 545 (unquoted gpt label check)), couldn’t find anything through the lint option personally. (although I do wonder why non-fixed partitions seem to not be passed their original size. EDIT2: figured out this bit, corrected it, but still strange output, will paste below)

    # Partition table is consistent.
    label: gpt
    label-id: 6D7D4E9F-F276-4554-945E-D42EF1DB667D
    device: /dev/sda
    unit: sectors
    first-lba: 34
    last-lba: 1871362046
    
    /dev/sda1 : start=  1870042624, size=     1083392, type=DE94BBA4-06D1-4D40-A16A-BFD50179D6AC, uuid=0E09A256-6313-43EA-9C45-1BDB234A17A3, name="Basic data partition", attrs="RequiredPartition GUID:63"
    /dev/sda2 : start=  1871126016, size=      202752, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=E004F3EB-3497-45AC-8BC2-40BF62ECF868, name="EFI system partition", attrs="GUID:63"
    /dev/sda3 : start=  1871328768, size=       32768, type=E3C9E316-0B5C-4DB8-817D-F92DF00215AE, uuid=B4553686-67E2-4177-BC7D-AC092860D2CF, name="Microsoft reserved partition", attrs="GUID:63"
    /dev/sda4 : start=        2048, size=   188101120, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=0005FBFB-A630-456B-9938-D501F6F70B00, name="Basic data partition"
    /dev/sda5 : start=   188103168, size=  1681939456, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=5F76FA4B-76D2-43B0-8ECB-F3EB8596E490, name="Basic data partition"
    

    Problem appears to steam from the for loop at line 562


  • Developer

    @Cheetah2003 said:

    Just been asking that question ALL WEEK and never got a response until I asked again in bold.

    I get your point here. But you need to understand as well that this is not about how often and how loud you ask but simply about how much time and willingness we/I have to answer all questions. I tend to delay questions that I can’t answer from the top of my head without letting people know. Maybe not the nicest way, so sorry for that.

    I got a little bit more time today that I had on this very busy week. So I will try to give you some tools to look into this.

    1. Using a host/client in debug mode is usually the best way because there might be minor differences in tool versions and functionality compared to a day to day Linux system that turn out when you make extensive use if those as we do in the scripts. -> Schedule a debug task (deploy or capture whichever you are after) and boot the client till you get to the shell.
    2. Working on the client/host console really sucks. So using SSH connection to be able to copy&paste stuff is really helpful! -> When the client/host is up and on the shell run passwd command to set a root password and ip a s to get the IP address if you don’t already know. With that you should be able to connect to that client via SSH.
    3. Preparing the environment like mounting the NFS share from your FOG server might still be inconvenience. -> Start out the task using the command fog, step to the point where NFS share is mounted and directory prepared (e.g. upload task creates directory to upload to) and then cancel the whole thing (ctrl+c) to get back to the shell and work on it.
    4. The most important piece of the resize logic is understanding the partition layouts. There is no way around it other than starting to read and understand as much of d1.partitions (simply sfdisk -d /dev/... output of the original layout), d1.minimum.partitions (same as before but after it was shrunk down) and d1.fixed_size_partitions (enumeration of partitions that won’t be resized or moved).
    5. Now if you want to play with the partition resize logic I suggest you run that magic script manually. Follow the above steps and when you have things ready you can simply run the following command to let it calculate the partition table it would use to deploy to a target disk:
    /usr/share/fog/lib/procsfdisk.awk -v SECTOR_SIZE=512 -v CHUNK_SIZE=512 -v MIN_START=2048 -v action=filldisk -v target=/dev/sda -v sizePos=187136208 -v diskSize=187136208 -v fixedList=1:2 /images/d1.minimum.partitions /images/d1.partitions
    

    Hints: SECTOR_SIZE and CHUNK_SIZE are 512 in very much all cases. New disks with real 4096 sector size are around but we have not seen much of those yet. MIN_START is the start sector of the first partition, quite often 2048 but can be different! action=filldisk is just what you usually want. diskSize is the full sector count of the target disk you want to deploy to - you can find out the sector count by running sfdisk -d /dev/... on the target disk and look at the last-lba line. fixedList is the list of partitions that should not be touched. And finally you tell it to read the rest of the rest of the information from d1.minimum.partitions and d1.partitions. Running the command will print out a new partition layout that FOS would use to deploy to the target disk.

    I’d say, play with this for a bit and let us know what you find. I am fairly sure there are things in this that don’t add up. Possibly you’ll even find a bug in there that we have not come across over all the years.



  • @Sebastian-Roth said in FOG 1.5.6: Auto resize is unpredictable:

    Please go ahead, dive in the code and find out.

    Yeah, I understand you’re just an open source project, and most likely short handed. If time permits, I would love to study under the hood and figure out how it all works.

    But like everyone else, I have to survive first. So we’ll see. I have non-production fog server on my desktop at home, already (I set that up to get all those screen shots for you guys!), so maybe as time and interest permits, I will do just that!

    And I apologize for the bold. Just been asking that question ALL WEEK and never got a response until I asked again in bold. So… hard to buy ‘shouting not needed’ when that’s what finally got a reply.

    Anyway, thanks for all your help, I do appreciate it!


  • Developer

    @Cheetah2003 I am totally with you that the “auto-detection” and the resizing algorithms are not perfect at all! It’s quite difficult if not impossible to suite every situation/partition layout. So on the one hand side I do understand you are asking for more control over this via settings. On the other hand we have had this in place for many years now and were mostly able to help people to make it work for them nevertheless it having drawbacks!

    I’d be very happy to have a couple of weeks to sit down and re-write the whole resize and partition layout handling code and make it all more robust and customizable. But right now the FOG dev team is very small and I myself have hardly enough time to answer all questions and fix things for people in the forums and on github. Can’t see me re-doing the partition handling or even just adding the asked settings in the image definition.

    Don’t get me wrong. I am not saying we shouldn’t do it. I’d love to but we just won’t find the time for such a major change (I reckon it is). Would you be willing to dive in and work on this? I’ll surely assist as much as I can.

    What should it be doing when it encounters more than one resizeable partition? How does it decide how to expand these on a target disk? No one seems interested in answering this?

    Please go ahead, dive in the code and find out. I have played with this part of FOG a fair bit but it’s not code I wrote and I have never got to the point where I would say that I fully understand it. But from what I have figured out from playing with and reading the code I’d really wonder if this has random behavior. Sure it does not always do what I expect but I never got to the point where I have seen a single partition layout to be deployed randomly. But hey, what do I know.

    Start out here: https://github.com/FOGProject/fos/blob/master/Buildroot/board/FOG/FOS/rootfs_overlay/usr/share/fog/lib/procsfdisk.awk



  • @Quazz said in FOG 1.5.6: Auto resize is unpredictable:

    There was some discussion about which direction to go forward in and ideally we’d do things in a more clever way, but for the time being we landed on using partition flags (such as boot, hidden, reserved) to determine whether or not they should be attempted to resized.

    Me I’m rather unconcerned with auto-detection of partition types. It’s going to be dubious at best, under any circumstance. This is why I started this thread with the suggestion: We need to be able to specify how to handle partition resizing in the image specification, for better control over the behavior.

    So putting aside the detection or lack there of… a manual scheme would be wonderful. This is what I’ve been advocating for all along. I’ve already figured out how to ‘trick’ FOG into leaving my recovery partition alone. I mentioned that in the very first post, as well.

    And I just have to ask again: What should it be doing when it encounters more than one resizeable partition? How does it decide how to expand these on a target disk? No one seems interested in answering this?

    If I capture 5 partitions, and 2 are auto-resized… how should I expect them to be put onto a target disk? This is where the unpredictablity comes in. In my experience, in that scenario, the target disk layout is pretty much random. Sometimes I get totally minimal partitions on both of them, sometimes it expands one to some odd size, and leaves the other one at minimum. Sometimes it expands the last partition to fill the entire disk. It’s completely unpredictable! So what should it actually be doing?

    I’m asking all this stuff, cuz at the end of the day… I actually would like my recovery partition resized to minimum size and left that way on the target disk. It’s not supposed to be written to ever again anyway. But until I understand how the resize mechanisms actually work, how to control their behavior, I’m pretty much stuck with an unpredictable result.



  • @Quazz said in FOG 1.5.6: Auto resize is unpredictable:

    @Cheetah2003 So did your recovery partition have no label at all or what was it exactly? (should clarify that label is basically the user editable “name” of the partition such as “System Reserved”)

    If it didn’t have a label, then I’m kind of confused about how it wouldn’t get resized since it most certainly should have.

    It has a label. I put the label when I create the partition. It’s just called “RECOVERY” However, perhaps you’re confused. “System reserved” would be a partition type. That is too long for a label anyway. Doesn’t seem like FOG is even looking at the volume labels. They’re not in any of the files? Like the operating system partition’s label is “OS” but you don’t see that anywhere either. The ‘names’ you’re seeing in those two files is the partition type. If you fire up ‘fdisk’ in linux and change the type of a partition, you’ll note the UUID types line up precisely to the names fog has in those two files. So I dunno what it’s doing there, the ‘name’ field seems redundant, since it’s just a text representation of the UUID partition type.

    I will also clarify why I said ‘incorrect’ in my previous post. I was refering to the prior version of FOG doing anything based on a volume label. It did not. As I explained, my original setup I put my Recovery partition LAST on the disk. And would get resized, and the operating system partition directly before it would not. So I moved my recovery partition to be BEFORE the operating system partition. This fixed my issue in the previous version of fog, it would only attempt to resize the last partition, regardless of it’s label. Now it’s resizing both of them.

    @Quazz said in FOG 1.5.6: Auto resize is unpredictable:

    There’s actually something very strange in this case, which is that your partitions and min.partitions file is the same. (maybe you accidentally pasted the wrong one?)

    Yes, I thought that looked strange and I did recheck I was pasting the right file. Alas, that is 100% accurate. That’s what was in my fog server’s folder.

    alt text
    EDIT: I notice there is a difference between the files now. But I swear, they were the same before. This is several captures later, so I dunno. I’ve been trying to debug another issue and have recaptured that image several times.

    On another note: I’m not sure why I can’t get someone to answer this for me: What is the expected behavior when you have more than one resizeable partition?


  • Moderator

    @Cheetah2003 So did your recovery partition have no label at all or what was it exactly? (should clarify that label is basically the user editable “name” of the partition such as “System Reserved”)

    If it didn’t have a label, then I’m kind of confused about how it wouldn’t get resized since it most certainly should have.

    There’s actually something very strange in this case, which is that your partitions and min.partitions file is the same. (maybe you accidentally pasted the wrong one?)



  • @Quazz said in FOG 1.5.6: Auto resize is unpredictable:

    I’m guessing the partition label is something like “Recovery” which is why the old inits picked it up.

    Incorrect. At least from my perspective. In prior version of FOG (again, very sorry, don’t have the version), it would just resize the LAST partition on the disk.

    This is precisely why my procedure has me opening a gap between the operating system partition and any partitions before it, to insert a new partition with my recovery stuff on it. Because FOG always just resized only the last partition. I initially had my recovery partition as the last partition, but i ran into that problem. So I moved it, problem solved.

    This version of FOG resizes BOTH of them, unless I tweak the recovery partition’s type to look like a reserved partition.

    That said, I’m still curious what the expected behavior is supposed to be.

    What is it supposed to be doing when it encounters two or more resizable partitions?

    This is where my claim of ‘its unpredictable’ comes from. As I stated initially, the current behavior, regardless of how it decides which partitions should be resized or not, is an unknown. It’s not producing the same target disk layout on every deployment when deploying an image that has more than one resizable partition.


  • Moderator

    I’m guessing the partition label is something like “Recovery” which is why the old inits picked it up.

    The problem with the label system was that it was language dependant, which in turn cause loads of issues on windows images in languages other than a few that happened to match.

    There was some discussion about which direction to go forward in and ideally we’d do things in a more clever way, but for the time being we landed on using partition flags (such as boot, hidden, reserved) to determine whether or not they should be attempted to resized.

    I’m not well versed in the deploy time resize logic however. I’m guessing there were scenarios were exceeding the starting point of the following resizable partition was desirable. (eg a Windows partition + a data partition. You wouldn’t want your windows partition to be tiny after deploy per se)

    It being inconsistent (or at least seemingly so) seems of a greater concern. Is this occurring on different hardware? (particularily disk size)



  • @Sebastian-Roth Hope this helps.

    alt text

    alt text

    I wasn’t sure how to get out of that debug mode, so I just cancelled the task, rebooted the VM, and made a new task to capture normally.

    Here, I also took a snap of it resizing both /dev/sda4 and /dev/sda5, along with complaining about protective MBR?? I’ve always gotten that message, dunno what it means, it’s never been an issue AFAIK. ie: it usually works despite that message, at least it used to.

    alt text

    d1.partitions:

    label: gpt
    label-id: 6D7D4E9F-F276-4554-945E-D42EF1DB667D
    device: /dev/sda
    unit: sectors
    first-lba: 34
    last-lba: 209715166
    
    /dev/sda1 : start=        2048, size=     1083392, type=DE94BBA4-06D1-4D40-A16A-BFD50179D6AC, uuid=0E09A256-6313-43EA-9C45-1BDB234A17A3, name="Basic data partition", attrs="RequiredPartition GUID:63"
    /dev/sda2 : start=     1085440, size=      202752, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=E004F3EB-3497-45AC-8BC2-40BF62ECF868, name="EFI system partition", attrs="GUID:63"
    /dev/sda3 : start=     1288192, size=       32768, type=E3C9E316-0B5C-4DB8-817D-F92DF00215AE, uuid=B4553686-67E2-4177-BC7D-AC092860D2CF, name="Microsoft reserved partition", attrs="GUID:63"
    /dev/sda4 : start=     1320960, size=    20961280, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=0005FBFB-A630-456B-9938-D501F6F70B00, name="Basic data partition"
    /dev/sda5 : start=    22284288, size=   187428864, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=5F76FA4B-76D2-43B0-8ECB-F3EB8596E490, name="Basic data partition"
    

    d1.minimum.partitions:

    label: gpt
    label-id: 6D7D4E9F-F276-4554-945E-D42EF1DB667D
    device: /dev/sda
    unit: sectors
    first-lba: 34
    last-lba: 209715166
    
    /dev/sda1 : start=        2048, size=     1083392, type=DE94BBA4-06D1-4D40-A16A-BFD50179D6AC, uuid=0E09A256-6313-43EA-9C45-1BDB234A17A3, name="Basic data partition", attrs="RequiredPartition GUID:63"
    /dev/sda2 : start=     1085440, size=      202752, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=E004F3EB-3497-45AC-8BC2-40BF62ECF868, name="EFI system partition", attrs="GUID:63"
    /dev/sda3 : start=     1288192, size=       32768, type=E3C9E316-0B5C-4DB8-817D-F92DF00215AE, uuid=B4553686-67E2-4177-BC7D-AC092860D2CF, name="Microsoft reserved partition", attrs="GUID:63"
    /dev/sda4 : start=     1320960, size=    20961280, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=0005FBFB-A630-456B-9938-D501F6F70B00, name="Basic data partition"
    /dev/sda5 : start=    22284288, size=   187428864, type=EBD0A0A2-B9E5-4433-87C0-68B6B72699C7, uuid=5F76FA4B-76D2-43B0-8ECB-F3EB8596E490, name="Basic data partition"
    

    d1.fixed_size_partitions:

    :2:3:1:1:2:3
    

  • Developer

    @Cheetah2003 said:

    I can collect the data requested, but I’m not sure it’s really going to tell you anything I haven’t already told you.

    Please give us the details. I am fairly sure I can give way better advise and maybe even change/fix the resize algorithm when I see the partition layout and flags (parted output) as is. You definitely seem to have a bit of a special setup here. Not saying that we cannot handle this but it’s probably something what we just didn’t consider when we changed that stuff some weeks ago.

    Makes working on this stuff just way more easy if I have a partition layout that we know is causing trouble. Otherwise I’d need to sit down and try to convert what you describe into a test partition layout that I can use to play with. Not very efficient.



  • I’m not sure how much collecting that information is going to help. These partitions are NORMAL in every respect. I am creating them myself as part of my configuring the system before capturing it.

    The partition is basically created after I sysprep my image and seal it. I use a rescue media to slide my operating system partition 10GB to the right (creating a gap between the operating system partition and all the partitions before it.) Then I create my partition, a normal Windows NTFS partition, copy my CD image to it, do some other magic so Windows can boot it as a recovery partition and run Windows setup directly.

    So. It’s not really a defect in your partition detection scheme, in fact, I’d say that is working properly, since going back in and changing the partition types from normal NTFS to hidden NTFS tells FOG to leave it alone.

    What we need is a way to specify which partitions to resize in the image spec. I’m not exactly sure how auto-resize would even work if more than one partition is being resized. How does it decide what size to make the partitions if there’s more than one?

    What it should do, again, previous version (sorry, don’t have version # handy.), it would only attempt to resize the last partition. This is precisely why my weird process of sliding my operating system to the right to insert a partition BEFORE it, because that worked before.

    I can collect the data requested, but I’m not sure it’s really going to tell you anything I haven’t already told you.


  • Developer

    @Cheetah2003 Yes, we changed the way FOG/FOS detects the partitions to be marked as fixed vs. resizable in FOG 1.5.6. And it seems like the detection is not perfectly working yet. Please help us find and fix this.

    Let’s start with one image that you see this issue with. Don’t feel tempted to post as much information about different images in one topic. Every partition layout is a bit different and if we start to mix things up here we won’t get to the goal I fear.

    So please get the text content of the three files d1.partitions, d1.minimum.partitions and d1.fixed_size_partitions from one of your images (/images/<IMAGENAME>/... on your FOG server) and post here. As well schedule a debug capture task (through web UI as normal but just before you click create there is a checkbox for debug). Boot up the client and hit ENTER till you get to the shell. Run the command parted -l /dev/sda (might be /dev/nvme... or /dev/mmc... if you have another kind of drive in the machine), take a picture and post here. Make sure we have all the information you get on the screen in the picture as well. Most important the flags and such.


Log in to reply
 

376
Online

6.2k
Users

13.6k
Topics

128.1k
Posts