• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    FOG 1.5.6: Auto resize is unpredictable

    Scheduled Pinned Locked Moved Solved
    Bug Reports
    7
    57
    7.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tom ElliottT
      Tom Elliott
      last edited by

      This is really good information. I suppose it could be useful especially with the speed of NVMe usurping so many other disk access things, maybe udev is the issue the whole time in regards to “sporadic access” and usability?

      Thank you all for the debugging and communication.

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

      1 Reply Last reply Reply Quote 0
      • E
        Eric Johnson
        last edited by

        Two 4 stations multicasts. No resize problems. Below diff is only change. I think this one is done! 🙂

        diff -u /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh /mnt/init/usr/share/fog/lib/partition-funcs.sh 
        --- /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh 2019-05-04 17:58:07.000000000 -0400
        +++ /mnt/init/usr/share/fog/lib/partition-funcs.sh      2019-07-12 10:26:47.000000000 -0400
        @@ -73,7 +73,7 @@
             local file="$2"
             [[ -z $disk ]] && handleError "No disk passed (${FUNCNAME[0]})\n   Args Passed: $*"
             [[ -z $file ]] && handleError "No file to receive from passed (${FUNCNAME[0]})\n   Args Passed: $*"
        -    sfdisk $disk < $file >/dev/null 2>&1
        +    flock $disk sfdisk $disk < $file >/dev/null 2>&1
             [[ ! $? -eq 0 ]] && majorDebugEcho "sfdisk failed in (${FUNCNAME[0]})"
         }
         # $1 is the name of the disk drive
        
        1 Reply Last reply Reply Quote 1
        • E
          Eric Johnson
          last edited by

          Found a couple more sfdisk that might need protecting…

          [root@fog ipxe]# diff -ur /mnt/init-orig/usr/share/fog/lib /mnt/init/usr/share/fog/lib
          diff -ur /mnt/init-orig/usr/share/fog/lib/funcs.sh /mnt/init/usr/share/fog/lib/funcs.sh
          --- /mnt/init-orig/usr/share/fog/lib/funcs.sh   2019-05-04 17:58:07.000000000 -0400
          +++ /mnt/init/usr/share/fog/lib/funcs.sh        2019-07-12 12:57:23.000000000 -0400
          @@ -1977,7 +1977,7 @@
                   sfdiskLegacyOriginalPartitionFileName "$imagePath" "$disk_number"
                   if [[ -r $sfdiskoriginalpartitionfilename ]]; then
                       dots "Inserting Extended partitions (Original)"
          -            sfdisk $disk < $sfdiskoriginalpartitionfilename >/dev/null 2>&1
          +            flock $disk sfdisk $disk < $sfdiskoriginalpartitionfilename >/dev/null 2>&1
                       case $? in
                           0)
                               echo "Done"
          @@ -1988,7 +1988,7 @@
                       esac
                   elif [[ -e $sfdisklegacyoriginalpartitionfilename ]]; then
                       dots "Inserting Extended partitions (Legacy)"
          -            sfdisk $disk < $sfdisklegacyoriginalpartitionfilename >/dev/null 2>&1
          +            flock $disk sfdisk $disk < $sfdisklegacyoriginalpartitionfilename >/dev/null 2>&1
                       case $? in
                           0)
                               echo "Done"
          diff -ur /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh /mnt/init/usr/share/fog/lib/partition-funcs.sh
          --- /mnt/init-orig/usr/share/fog/lib/partition-funcs.sh 2019-05-04 17:58:07.000000000 -0400
          +++ /mnt/init/usr/share/fog/lib/partition-funcs.sh      2019-07-12 10:26:47.000000000 -0400
          @@ -73,7 +73,7 @@
               local file="$2"
               [[ -z $disk ]] && handleError "No disk passed (${FUNCNAME[0]})\n   Args Passed: $*"
               [[ -z $file ]] && handleError "No file to receive from passed (${FUNCNAME[0]})\n   Args Passed: $*"
          -    sfdisk $disk < $file >/dev/null 2>&1
          +    flock $disk sfdisk $disk < $file >/dev/null 2>&1
               [[ ! $? -eq 0 ]] && majorDebugEcho "sfdisk failed in (${FUNCNAME[0]})"
           }
           # $1 is the name of the disk drive
          
          1 Reply Last reply Reply Quote 1
          • S
            Sebastian Roth Moderator
            last edited by Sebastian Roth

            @Eric-Johnson @Quazz You guys rock!! Thank you very much for digging that deep and finding this!

            From the sound of what you describe it pretty much is some kind of timing issue and therefore perfectly matches the findings on the device being busy/locked at that stage. While I still don’t really understand why this is happening now but has not occurred (or not being reported) in the past much I am pretty confident that you have hit the nail on the head with this.

            Looking through the scripts I see even more places where sfdisk might bite us and I have a feeling that some other issue reports we currently have might be caused by this same busy/lock problem with sfdisk somewhere else in the scripts. Here is a full (?) list of the places:

            roth@x220:~/workspace/fog/fos/Buildroot/board/FOG/FOS/rootfs_overlay$ find . -type f -exec grep "sfdisk " {} /dev/null \; | grep -v -e ":[[:space:]]*#" -e majorDebugEcho
            ./usr/share/fog/lib/partition-funcs.sh:    sfdisk -d $disk 2>/dev/null > $file
            ./usr/share/fog/lib/partition-funcs.sh:    sfdisk $disk < $file >/dev/null 2>&1
            ./usr/share/fog/lib/partition-funcs.sh:    sfdisk -d $disk 2>/dev/null | egrep '(Id|type)=\ *[5f]' | wc -l
            ./usr/share/fog/lib/partition-funcs.sh:            parttype=$(sfdisk -d $disk 2>/dev/null | awk -F[,=] "/^$escape_part/{print \$6}")
            ./usr/share/fog/lib/partition-funcs.sh:            sfdisk -d $disk
            ./usr/share/fog/lib/funcs.sh:    local count=$(sfdisk -d $disk 2>/dev/null | awk /start=\ *[1-9]/'{print $4+0}' | sort -n | head -n1)
            ./usr/share/fog/lib/funcs.sh:    [[ $hasgpt -eq 0 ]] && have_extended_partition=$(sfdisk -l $disk 2>/dev/null | egrep "^${disk}.* (Extended|W95 Ext'd \(LBA\))$" | wc -l)
            ./usr/share/fog/lib/funcs.sh:            sfdisk -d $disk 2>/dev/null > $sfdiskfilename
            ./usr/share/fog/lib/funcs.sh:            sfdisk -d $disk 2>/dev/null > $sfdiskfilename
            ./usr/share/fog/lib/funcs.sh:            sfdisk $disk < $sfdiskoriginalpartitionfilename >/dev/null 2>&1
            ./usr/share/fog/lib/funcs.sh:            sfdisk $disk < $sfdisklegacyoriginalpartitionfilename >/dev/null 2>&1
            ./usr/share/fog/lib/funcs.sh:    fsid="$(sfdisk -d "$disk" |  grep "$part" | sed -n 's/.*Id=\([0-9]\+\).*\(,\|\).*/\1/p')"
            

            Question: Is there a good reason that we shouldn’t add flock ... to all those calls as well? @Eric-Johnson @Quazz @Tom-Elliott

            Though I don’t know if the same can even happen if sfdisk is just reading (-d for dumping the table) instead of it writing to disk but on the other hand I reckon adding flock ... wouldn’t hurt it in any case?!?! On the other hand it sounds a bit like “we don’t know what’s happening here so let’s just try to hit everything with a rock that looks like it might be dangerous” - not my kind of style.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            1 Reply Last reply Reply Quote 0
            • fry_pF
              fry_p Moderator
              last edited by

              Just want to chime in to say good job everyone! I would have thrown my hat in for testing fixes if I wasn’t sweating my buns off in hot school buildings reassembling classroom technology. I have experienced a pattern of right after uploading an image, as the upload is complete and the machine attempts to come out of sysprep, I get an error. Now I am not sure it is related to this, but I heard someone else having the same issue. After re-deploying to the very same machine with the image that was just captured, everything works just fine. Regardless it looks like you guys already have a good idea of what is going on, so I’ll shut up now 😄

              Like open source community computing? Why not do it for a good cause?
              Use your computer/server for humanitarian projects when it is idle!
              https://join.worldcommunitygrid.org?recruiterId=1026912

              1 Reply Last reply Reply Quote 0
              • S
                Sebastian Roth Moderator
                last edited by

                @fry_p On first read this doesn’t sound related to me but we’ll see. I know you are very busy right now but would you mind opening a new topic with this one as well and post more information there? Picture of the error would be great!

                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                fry_pF 1 Reply Last reply Reply Quote 0
                • fry_pF
                  fry_p Moderator @Sebastian Roth
                  last edited by

                  @Sebastian-Roth said in FOG 1.5.6: Auto resize is unpredictable:

                  @fry_p On first read this doesn’t sound related to me but we’ll see. I know you are very busy right now but would you mind opening a new topic with this one as well and post more information there? Picture of the error would be great!

                  I figured a new topic may be in order. I will open one up if I have time the beginning of next week when I would be able to replicate. If I am able to get some time I will gather as much info as possible including error messages and pictures.

                  Like open source community computing? Why not do it for a good cause?
                  Use your computer/server for humanitarian projects when it is idle!
                  https://join.worldcommunitygrid.org?recruiterId=1026912

                  1 Reply Last reply Reply Quote 1
                  • S
                    Sebastian Roth Moderator
                    last edited by Sebastian Roth

                    @Eric-Johnson I don’t want to take over this topic as it’s been mainly you and @Quazz who figured this one out! But I did add the changes and would be thankful if you’d give it a try in your setup. @Cheetah2003 and all the others are welcome too!

                    Download 64 Bit and 32 Bit and put those in /var/www/html/fog/service/ipxe/ dir on your FOG server. Probably good to rename the original ones instead of overwriting.

                    And I am still wondering if we need to use flock for fdisk (not sfdisk) calls and maybe others as well. I have a feeling that maybe some change in buildroot did cause this random failure in the first place. I am fairly sure this is not something we had in the inits for months and years. But I guess it’s not worth spending the time to track it down to the line of code that is causing it. Maybe a combination of udev and sfdisk - very likely one or both are involved.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    Q 1 Reply Last reply Reply Quote 0
                    • Q
                      Quazz Moderator @Sebastian Roth
                      last edited by Quazz

                      @Sebastian-Roth As far as I understand, any tool that uses BLKRRPART could be prevented access because of udev event handling.

                      It’s usually very fast and therefore unlikely to cause issues, but since it is executed parallel to everything else it can potentially get in the way from time to time.

                      As far as I know, other programs that interact with the storage device act in the same way and also use BLKRRPART (such as fdisk and hdparm)

                      For now, I’d say, leave it as is and monitor for reported problems. This problem is rare enough on its own already (personally haven’t come across it at all) that it doesn’t really warrant changing everything with the possibility of the affecting something else. (doesn’t look like it should, but never know)

                      edit: Important to note that Kernel version could affect how BLKRRPART functions and what not plus the udev events would be handled by the kernel as well afaik, so perhaps it’s as simple as the kernel versions?

                      1 Reply Last reply Reply Quote 0
                      • S
                        Sebastian Roth Moderator
                        last edited by

                        @Quazz Thanks heaps for digging into this even more. I guess it will be very hard to actually track down when and how this was introduced.

                        This problem is rare enough on its own already (personally haven’t come across it at all)

                        I would have said so myself a day or two ago. But then I even ran into the same issues when debugging a different partition layout problem for another user in my VM test setup (VirtualBox).

                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                        1 Reply Last reply Reply Quote 0
                        • 1
                        • 2
                        • 3
                        • 1 / 3
                        • First post
                          Last post

                        255

                        Online

                        12.0k

                        Users

                        17.3k

                        Topics

                        155.2k

                        Posts
                        Copyright © 2012-2024 FOG Project