• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    FOG 1.5.6: Auto resize is unpredictable

    Scheduled Pinned Locked Moved Solved
    Bug Reports
    7
    57
    7.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Cheetah2003C
      Cheetah2003 @Quazz
      last edited by

      @Quazz said in FOG 1.5.6: Auto resize is unpredictable:

      @Cheetah2003 OS-built recovery partitions have the partition flags that keeps them fixed size.

      @Quazz Argh. As I said several times, this isn’t a OS built recovery partition. I built it myself. Are you even reading my posts???

      @Eric-Johnson Welcome. And yeah, what you’re describing sounds very similar to the issue I had with the previous version of FOG that required I move my recovery partition to be before the OS partition, making the OS partition last on the disk for resize to work properly.

      @Sebastian-Roth Sure sure. I’ll do some experiments and report back any findings. Might be a few days, so I hope you’re not in a hurry.

      1 Reply Last reply Reply Quote 0
      • george1421G
        george1421 Moderator
        last edited by

        This other thread issue seems to be related to (maybe as a cousin) to this issue. In that thread the drive is not being expanded again after its being captured by FOG.

        ref: https://forums.fogproject.org/topic/13479/install-windows-error-after-capturing-image

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        1 Reply Last reply Reply Quote 0
        • E
          Eric Johnson @george1421
          last edited by

          @george1421 it was indeed 127000… And of course since bigger is better I set to 511000. Will report back on the effect! Thanks!

          1 Reply Last reply Reply Quote 0
          • E
            Eric Johnson
            last edited by Eric Johnson

            TLDR: still not 100% convinced it is fixed…

            Todays fun start with metering init.xz like this:

            diff -u /mnt/init-orig/usr/share/fog/lib/funcs.sh /mnt/init/usr/share/fog/lib/funcs.sh
            --- /mnt/init-orig/usr/share/fog/lib/funcs.sh   2019-05-04 17:58:07.000000000 -0400
            +++ /mnt/init/usr/share/fog/lib/funcs.sh        2019-07-10 12:29:53.000000000 -0400
            @@ -1690,6 +1690,10 @@
                 runPartprobe "$disk"
             }
             # Waits for enter if system is debug type
            +Pause() {
            +    echo " * Press [Enter] key to continue"
            +    read  -p "$*"
            +}
             debugPause() {
                 case $isdebug in
                     [Yy][Ee][Ss]|[Yy])
            
            diff -u /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh /mnt/init/usr/share/fog/lib/partition-funcs.sh
            --- /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh 2019-05-04 17:58:07.000000000 -0400
            +++ /mnt/init/usr/share/fog/lib/partition-funcs.sh      2019-07-10 15:39:58.000000000 -0400
            @@ -401,8 +401,15 @@
                 #    majorDebugPause
                 #fi
                 #[[ $status -eq 0 ]] && applySfdiskPartitions "$disk" "$tmp_file1" "$tmp_file2"
            +    processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" 
            +        Pause
                 processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" > "$tmp_file2"
                 status=$?
            +       echo $tmp_file2
            +       ls -l $tmp_file2
            +        cat $tmp_file2
            +        Pause
            +
                 if [[ $ismajordebug -gt 0 ]]; then
                     echo "Debug"
                     majorDebugEcho "Trying to fill with the disk with these partititions:"
            

            Printing out the output of processSfdisk, pausing, then doing it for real and ls’ing $tmp_file2 and printing it out, then pausing again.

            In one sense this worked. Multicast to 4 machines and all came out right. Previous multicasts to the same four machines would have 1 or 2 displaying the failure… But with the metering… No failures.

            Ok, so I decide, lets just quit if $tmp_file2 is zero… Next version of init.xz had this diff

            diff -u init-orig/usr/share/fog/lib/partition-funcs.sh init/usr/share/fog/lib/partition-funcs.sh
            --- init-orig/usr/share/fog/lib/partition-funcs.sh      2019-05-04 17:58:07.000000000 -0400
            +++ init/usr/share/fog/lib/partition-funcs.sh   2019-07-10 14:29:47.000000000 -0400
            @@ -73,6 +73,7 @@
                 local file="$2"
                 [[ -z $disk ]] && handleError "No disk passed (${FUNCNAME[0]})\n   Args Passed: $*"
                 [[ -z $file ]] && handleError "No file to receive from passed (${FUNCNAME[0]})\n   Args Passed: $*"
            +    [[ ! -s $file ]] && handleError "in /usr/share/fog/lib/partition-funcs.sh fillSfdiskWithPartitions $tmp_file2 is zero length" #ESJ
                 sfdisk $disk < $file >/dev/null 2>&1
                 [[ ! $? -eq 0 ]] && majorDebugEcho "sfdisk failed in (${FUNCNAME[0]})"
             }
            @@ -403,6 +404,9 @@
                 #[[ $status -eq 0 ]] && applySfdiskPartitions "$disk" "$tmp_file1" "$tmp_file2"
                 processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" > "$tmp_file2"
                 status=$?
            +
            +    [[ ! -s $tmp_file2 ]] && handleError "in /usr/share/fog/lib/partition-funcs.sh fillSfdiskWithPartitions $tmp_file2 is zero size" #ESJ
            +
                 if [[ $ismajordebug -gt 0 ]]; then
                     echo "Debug"
                     majorDebugEcho "Trying to fill with the disk with these partititions:"
            

            Checked $tmp_file2 in two places, once when created and once right before being used. It would exit if $tmp_file2 was zero size right?

            Did a clone to the four machines… all worked well…
            Did another clone… Crapola… One of the four didn’t resize. So it wasn’t zero length… But the other metering was not there so I don’t know what was there…

            Have done a bunch of metered multicast since then. All with no errors. A Heisenberg bug. If you look to close it always works… Sigh…

            Am going trying a few with no metering and KERNEL RAMDISK SIZE set ot 511000 per my supersize of @george1421 's suggestion.

            But I am feeling like I am missing something…

            george1421G 1 Reply Last reply Reply Quote 0
            • george1421G
              george1421 Moderator @Eric Johnson
              last edited by george1421

              @Eric-Johnson While you are way over my head with this coding, I can make a suggestion that may make debugging faster.

              1. If you create a /images/dev/postinit script, you can patch (copy over from the FOG server to FOS) /usr/share/fog/lib/partition-funcs.sh on every boot of FOS. You don’t need to unpack and repack the inits. There is an example in the forum on how to patch (replace) fog.man.reg for a custom registration. ref: https://forums.fogproject.org/topic/9754/custom-full-host-registration-for-1-3-4/45

              2. At the FOS Linux command prompt, if you give root a password with passwd (just some simple password like hello) and then get the IP address of FOS with ip addr show you can connect to FOS via putty/ssh from a second computer. Of course you need to be in debug mode to do this, but with putty you can copy/paste/debug from your normal workstation.

              Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

              1 Reply Last reply Reply Quote 1
              • S
                Sebastian Roth Moderator
                last edited by

                @Eric-Johnson I think you are doing a really great job here trying to nail this down. I’d love to help you but as I can’t replicate the issue same as you can I can only assist with commenting here in the forum. The checks you added seem appropriate! Let’s try to get even more output and see what we find. Remove the output redirection in line 76 of /usr/share/fog/lib/partition-funcs.sh:

                applySfdiskPartitions() {
                    local disk="$1"
                    local file="$2"
                    [[ -z $disk ]] && handleError "No disk passed (${FUNCNAME[0]})\n   Args Passed: $*"
                    [[ -z $file ]] && handleError "No file to receive from passed (${FUNCNAME[0]})\n   Args Passed: $*"
                    sfdisk $disk < $file
                    [[ ! $? -eq 0 ]] && majorDebugEcho "sfdisk failed in (${FUNCNAME[0]})"
                }
                

                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                E 1 Reply Last reply Reply Quote 0
                • E
                  Eric Johnson
                  last edited by

                  So… current status.

                  Since it worked all the time with my diff with a Pause I just removed my Pause in non-debug mode… Seriously.

                  I have done 6 multicast deploys to 4 machines with this init and had no resize problems. Running with the original init showed 1-2 out of 4 resize failures.

                  This of course is NOT what I would call a fix. Kinda a workaround. And I don’t know if this is fixing the problem or just making it less likely to happen, as this should not be actually fixing anything. I through it out here for folks who might run into the same problem as a workaround till we figure out what the hell is going on… 🙂

                  diff -u /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh /mnt/init/usr/share/fog/lib/partition-funcs.sh
                  --- /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh 2019-05-04 17:58:07.000000000 -0400
                  +++ /mnt/init/usr/share/fog/lib/partition-funcs.sh      2019-07-11 09:22:49.000000000 -0400
                  @@ -401,8 +401,15 @@
                       #    majorDebugPause
                       #fi
                       #[[ $status -eq 0 ]] && applySfdiskPartitions "$disk" "$tmp_file1" "$tmp_file2"
                  +    processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" 
                  +#        Pause ESJ
                       processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" > "$tmp_file2"
                       status=$?
                  +       echo $tmp_file2
                  +       ls -l $tmp_file2
                  +        cat $tmp_file2
                  +#        Pause ESJ
                  +
                       if [[ $ismajordebug -gt 0 ]]; then
                           echo "Debug"
                           majorDebugEcho "Trying to fill with the disk with these partititions:"
                  
                  1 Reply Last reply Reply Quote 0
                  • E
                    Eric Johnson @Sebastian Roth
                    last edited by

                    @Sebastian-Roth Doh. Good idea, should have thought of that. But since even just doing some other output (see my recent post) seems to make the problem vanish…

                    Dis one is very odd.

                    Q 1 Reply Last reply Reply Quote 0
                    • Q
                      Quazz Moderator
                      last edited by

                      I created a PR to fix some stuff and to hopefully handle errors better/more reliably.

                      I don’t think it will fix this problem necessarily, but it should should hopefully throw errors when we expect it to.

                      https://github.com/FOGProject/fos/pull/26

                      You can download the inits of it here: https://dev.fogproject.org/blue/organizations/jenkins/fos/detail/PR-26/2/artifacts

                      if you’re interested in testing it out.

                      1 Reply Last reply Reply Quote 1
                      • Q
                        Quazz Moderator @Eric Johnson
                        last edited by Quazz

                        @Eric-Johnson Interesting. Could you add

                        cat /proc/sys/kernel/random/entropy_avail
                        

                        before and after your output commands?

                        Just a wild hunch

                        If my hunch is correct, then I suspect you could replace your output commands with any other output commands and it would work. Alternatively, you could smash the keyboard a couple of times.

                        E 2 Replies Last reply Reply Quote 0
                        • E
                          Eric Johnson @Quazz
                          last edited by

                          @Quazz That was a crazy brilliant idea! 🙂

                          I did the below, first cat before anything, then the processSfdisk to the screen then a second cat, then a pause so I can read it and it probably wasn’t important at that point (I think that would have shown the entropy increase, please let me know if I am miss-understanding what you wanted to see) but on two unicast tests so far… 2260 before / 2261 after (identical)

                          Going to do a multicast test to all 4 in a sec. See if there is any variation.

                          diff -u /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh /mnt/init/usr/share/fog/lib/partition-funcs.sh
                          --- /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh 2019-05-04 17:58:07.000000000 -0400
                          +++ /mnt/init/usr/share/fog/lib/partition-funcs.sh      2019-07-11 15:31:10.000000000 -0400
                          @@ -401,8 +401,19 @@
                               #    majorDebugPause
                               #fi
                               #[[ $status -eq 0 ]] && applySfdiskPartitions "$disk" "$tmp_file1" "$tmp_file2"
                          +cat /proc/sys/kernel/random/entropy_avail
                          +    processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" 
                          +echo "after"
                          +cat /proc/sys/kernel/random/entropy_avail
                          +
                          +        Pause #ESJ
                               processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" > "$tmp_file2"
                               status=$?
                          +       echo $tmp_file2
                          +       ls -l $tmp_file2
                          +        cat $tmp_file2
                          +#        Pause ESJ
                          +
                               if [[ $ismajordebug -gt 0 ]]; then
                                   echo "Debug"
                                   majorDebugEcho "Trying to fill with the disk with these partititions:"
                          
                          Q 1 Reply Last reply Reply Quote 0
                          • E
                            Eric Johnson @Quazz
                            last edited by Eric Johnson

                            @Quazz grrr. no luck. entropy started pretty high and was only slightly higher after the call to processSfdisk in 8 out of 8 cases… 2258-2261 before and after in all cases.

                            Was a great idea, worth trying though!

                            1 Reply Last reply Reply Quote 0
                            • E
                              Eric Johnson
                              last edited by

                              Current status:

                              Tried clever suggestion from @Quazz … it didn’t pan out.

                              But inspired me to think outside the box. Tried using /root/sfblahblah.$$ instead of /tmp (since tmpfs is different than ramdisk…) That was just as unsuccessful.

                              Just adding a “sync;sync;sync” (cause anything I tell you three times is true) seems successful. On two tests so far.

                              Maybe?

                              Current working diff…

                              [root@fog ipxe]# diff -u /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh /mnt/init/usr/share/fog/lib/partition-funcs.sh
                              --- /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh	2019-05-04 17:58:07.000000000 -0400
                              +++ /mnt/init/usr/share/fog/lib/partition-funcs.sh	2019-07-11 18:10:20.000000000 -0400
                              @@ -401,8 +401,17 @@
                                   #    majorDebugPause
                                   #fi
                                   #[[ $status -eq 0 ]] && applySfdiskPartitions "$disk" "$tmp_file1" "$tmp_file2"
                              +#    processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" 
                                   processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" > "$tmp_file2"
                                   status=$?
                              +sync;sync;sync
                              +#	echo $tmp_file2
                              +#	ls -l $tmp_file2
                              +#        cat $tmp_file2
                              +#echo "after"
                              +#cat /proc/sys/kernel/random/entropy_avail
                              +#Pause #ESJ
                              +	
                                   if [[ $ismajordebug -gt 0 ]]; then
                                       echo "Debug"
                                       majorDebugEcho "Trying to fill with the disk with these partititions:"
                              [root@fog ipxe]#
                              
                              Tom ElliottT 1 Reply Last reply Reply Quote 1
                              • Tom ElliottT
                                Tom Elliott @Eric Johnson
                                last edited by

                                @Eric-Johnson what about replacing sync with sleep 5? Sync is just ensuring cached items in wait get pushed to disk, maybe the scripts are working just a slight bit too fast?

                                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                1 Reply Last reply Reply Quote 0
                                • S
                                  Sebastian Roth Moderator
                                  last edited by

                                  @Tom-Elliott While you might be pointing to the right spot here I don’t thing adding a time based sleep is a great idea.

                                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                  1 Reply Last reply Reply Quote 0
                                  • Q
                                    Quazz Moderator @Eric Johnson
                                    last edited by Quazz

                                    @Eric-Johnson I mostly wanted to see it before and after your cat and ls commands and such, but I don’t think it matters, since 2000+ should be more than enough for just about anything anyway.

                                    I was thinking that perhaps certain hardware wouldn’t generate enough entropy in time and that that would cause issues.

                                    @Sebastian-Roth @Tom-Elliott Do you reckon it would be good idea to try and store the output in a variable as opposed to a file and use it as such?

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      Sebastian Roth Moderator
                                      last edited by Sebastian Roth

                                      @Quazz While I am not against your idea of trying to move this from file to variable I really wonder why we see this issue right here lately. Did we always have this issue and people just didn’t report? I doubt that! Or is it a change we pushed out in the last weeks? I have gone through the commit and nothing really makes sense from my point of view.

                                      I just try to take one step back before we head into trying to fix this. Is this hardware specific? @Eric-Johnson
                                      Should we try a 1.5.5 or even 1.5.4 FOG server and see if it’s the same?!

                                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                      Q 1 Reply Last reply Reply Quote 0
                                      • Q
                                        Quazz Moderator @Sebastian Roth
                                        last edited by Quazz

                                        @Sebastian-Roth From what I gather, it’s an issue that isn’t happening under every circumstance (personally haven’t ran into it, having deployed to hundreds of devices over the course of each new init release), but does happen reliably under certain conditions.

                                        It seems odd that a momentary delay in the script or otherwise a sync of the “disk” (as in the ramdisk) would prevent this issue from appearing again.

                                        From what I can tell, people did start reporting such issues after a certain version, perhaps a buildroot update (Just looked and Buildroot version 2019.02 has bugfixes up to 2019.02.4 now! changes list: https://git.busybox.net/buildroot/plain/CHANGES?id=2019.02.4) ? People tend to wait a while before updating, so we may have to look a bit further back.

                                        1 Reply Last reply Reply Quote 0
                                        • E
                                          Eric Johnson
                                          last edited by

                                          Bad news. On the 3rd deploy with sync;sync;sync one failed.

                                          Good news:

                                          I moved all the metering to AFTER the sfdisk (and didn’t send the sfdisk output to /dev/null as was suggested by @Sebastian-Roth ). Found something interesting. Twice on one of the 4 machines the sfdisk is failing with a disk in use error. The input is good, no issue with the tmp file. But sfdisk seems to think the disk is in use. I think we have the smoking gun.

                                          Adding some df and swapon -s now to see if this I can see the disk in use by some other tools. Next step if to dig up sfdisk source and see what makes it think it is use.

                                          Maybe --force or --no-reread as suggested by the error mesage as a BFMI workaround? 🙂

                                          --- /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh 2019-05-04 17:58:07.000000000 -0400
                                          +++ /mnt/init/usr/share/fog/lib/partition-funcs.sh      2019-07-12 08:37:10.000000000 -0400
                                          @@ -73,7 +73,14 @@
                                               local file="$2"
                                               [[ -z $disk ]] && handleError "No disk passed (${FUNCNAME[0]})\n   Args Passed: $*"
                                               [[ -z $file ]] && handleError "No file to receive from passed (${FUNCNAME[0]})\n   Args Passed: $*"
                                          -    sfdisk $disk < $file >/dev/null 2>&1
                                          +#    sfdisk $disk < $file >/dev/null 2>&1
                                          +    sfdisk $disk < $file 
                                          +echo " "
                                          +echo $file
                                          +ls -l $file
                                          +cat $file
                                          +Pause #ESJ
                                          +
                                               [[ ! $? -eq 0 ]] && majorDebugEcho "sfdisk failed in (${FUNCNAME[0]})"
                                           }
                                           # $1 is the name of the disk drive
                                          @@ -403,6 +410,8 @@
                                               #[[ $status -eq 0 ]] && applySfdiskPartitions "$disk" "$tmp_file1" "$tmp_file2"
                                               processSfdisk "$minf" filldisk "$disk" "$disk_size" "$fixed" "$orig" > "$tmp_file2"
                                               status=$?
                                          +#Pause #ESJ
                                          +
                                               if [[ $ismajordebug -gt 0 ]]; then
                                                   echo "Debug"
                                                   majorDebugEcho "Trying to fill with the disk with these partititions:"
                                          

                                          alt text

                                          Q 1 Reply Last reply Reply Quote 1
                                          • Q
                                            Quazz Moderator @Eric Johnson
                                            last edited by Quazz

                                            @Eric-Johnson Very interesting, here’s what sfdisk man page has to say about it.

                                            sfdisk uses BLKRRPART (reread partition table) ioctl to make sure
                                                   that the device is not used by system or another tools (see also
                                                   --no-reread).  It's possible that this feature or another sfdisk
                                                   activity races with udevd.  The recommended way how to avoid possible
                                                   collisions is to use exclusive flock for the whole-disk device to
                                                   serialize device access.  The exclusive lock will cause udevd to skip
                                                   the event handling on the device.  For example:
                                            
                                                          flock /dev/sdc sfdisk /dev/sdc
                                            
                                                   Note, this semantic is not currently supported by udevd for MD and DM
                                                   devices.
                                            

                                            So perhaps adding flock $disk before sfdisk $disk < $file will resolve it

                                            Making the line read as flock $disk sfdisk $disk < $file

                                            E 1 Reply Last reply Reply Quote 1
                                            • 1
                                            • 2
                                            • 3
                                            • 2 / 3
                                            • First post
                                              Last post

                                            260

                                            Online

                                            12.0k

                                            Users

                                            17.3k

                                            Topics

                                            155.2k

                                            Posts
                                            Copyright © 2012-2024 FOG Project