• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Dell 7730 precision laptop deploy GPT error message

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    4
    94
    22.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Sebastian Roth Moderator
      last edited by

      @jmason I can’t give you a reference on this but it’s actually a likely cause (one that I have not though of before, grrrhhh) that disk enumeration can put your two disks in reverse order. This is known in Linux and usually circumnavigated through persistent block device naming.

      Try deploying a couple of times in a row always using the debug mode and run lsblk before starting the task. See if it’s exactly how we imagine it to be (changing disk order).

      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

      1 Reply Last reply Reply Quote 0
      • S
        Sebastian Roth Moderator
        last edited by

        @george1421 On the other hand I am wondering why we have not had other people reporting this in the past. What if you have a PC with two drives, one for OS and one for data. You only ever want to image the OS disk but could happen that you deploy to the data disk?! Just thinking out loud here.

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        jmasonJ 1 Reply Last reply Reply Quote 0
        • jmasonJ
          jmason
          last edited by jmason

          Yes I’ll do that, as when I just attempted to redeploy the original error returned. These are also pretty new systems so that could be a reason for not seeing it much before.

          1 Reply Last reply Reply Quote 0
          • S
            Sebastian Roth Moderator
            last edited by

            @jmason Yes, possibly (hopefully) this is something being more or less an issue of NVMe drives. Haha. Well, I’ll keep my head spinning on how we could possibly solve this as we have no influence on the order the Linux kernel enumerates your disks. We’d need to save disk identifier and store those with the image… I suppose.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            jmasonJ 1 Reply Last reply Reply Quote 0
            • jmasonJ
              jmason @Sebastian Roth
              last edited by jmason

              @Sebastian-Roth

              Looks like that is what it is doing, after the failed redeploy (didn’t run in debug that time of course 😞 ) I ran it in debug and the lsblk gives:

              nvme0n1     259:0    0   477G  0 disk    
              nvme1n1     259:1    0 953.9G  0 disk           
              |-nvme1n1p1 259:2    0   128M  0 part
              |-nvme1n1p2 259:3    0   200M  0 part
              |-nvme1n1p3 259:4    0     1G  0 part
              `-nvme1n1p4 259:5    0 475.6G  0 part
              

              However it did not hit the error this time and appears to be deploying again now, but can’t see that working for both partitions with the mismatch…wierd.

              1 Reply Last reply Reply Quote 0
              • jmasonJ
                jmason @Sebastian Roth
                last edited by

                @Sebastian-Roth

                Well if you need any testing of anything just let me know, I’ll be more than happy to run things on these systems

                1 Reply Last reply Reply Quote 0
                • S
                  Sebastian Roth Moderator
                  last edited by

                  @jmason Thanks for testing. I’ll see what I can do for you. Guess I will take a bit of time to figure something out.

                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                  1 Reply Last reply Reply Quote 0
                  • S
                    Sebastian Roth Moderator
                    last edited by Sebastian Roth

                    @jmason Hmmmm, the more I read the less I think we can do something about it. This is not something FOG or the Linux kernel is doing wrong. It’s more or less a combination of how the Dell UEFI firmware hands back the NVMe drive information to the Linux kernel. One boot it’s this way round and the next boot it might be the other way. When installing an OS on disk this is not much of an issue because you have partitions with UUIDs and labels on the disks and those can be used to identify which partition to mount for booting the OS. But in case of cloning we have a laptops with different physical disks (and identifiers) so there is no way we can use that information.

                    Possibly we could save the sector (or disk) size information in case of “Multiple Partition Image - All Disks (Not Resizable)” but then what happens if someone comes along with two identical size disks in their machines?

                    Hmmm, need more time to think about this. @george1421 @Wayne-Workman any ideas from your side?

                    Edit: By the way… I can imagine this being an issue when capturing the image as well. One time d1p* are from disk A and d2p* from disk B and next time it’s in reverse.

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    george1421G 1 Reply Last reply Reply Quote 0
                    • george1421G
                      george1421 Moderator @Sebastian Roth
                      last edited by george1421

                      @Sebastian-Roth (this is more of a brain dump than an answer)
                      Do we have empirical evidence that these disks are being swapped as being reported by the uefi bios? It would be a bit more telling if that second disk (for debugging purposes) could be exchanged for a different size disk, then run the test again. I might see the order being swapped between models of computers, but not the same computer depending on the boot. I might think this is an oddity in the uefi firmware. The Precision 7730 generation is pretty new, so the first thing I would check/watch for is firmware update availability.

                      Do we know if this issue is model or machine specific? It could also be a linux kernel issue where one of the drives may init faster/slower than the other so its detected by the linux OS at different times. It would be interesting to compare the FOS boot logs between the two states to see if there are any telling events. But again trying to get it to break and know when its broken is the hardest part.

                      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                      jmasonJ 1 Reply Last reply Reply Quote 0
                      • S
                        Sebastian Roth Moderator
                        last edited by

                        @george1421 said in Dell 7730 precision laptop deploy GPT error message:

                        Do we have empirical evidence that these disks are being swapped as being reported by the uefi bios?

                        See @jmason’s lsblk listings. From my point of view this is evidence enough. The disks are different size and do swap. As far as I got the postings it seems like the output was always taken on the same deploy system. One time 477 GB drive is last and the other time it’s first.

                        The Precision 7730 generation is pretty new, so the first thing I would check/watch for is firmware update availability.

                        Definitely a good point!!

                        It would be interesting to compare the FOS boot logs between the two states to see if there are any telling events.

                        Good one as well! @jmason Can you please schedule a debug deploy job. Boot that machine and run dmesg | grep -i nvm. Take a picture and reboot the machine. When you are bacl to the shell, again dmesg | grep -i nvm and take a picture. Do this maybe ten times to see if we see a difference there.

                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                        jmasonJ 2 Replies Last reply Reply Quote 1
                        • george1421G
                          george1421 Moderator
                          last edited by

                          It would also be interesting to see if the 4.15.2 kernels gave us the same random results (Actually I’d like to push it earlier than 4.13.x but the inits would get in the way, because we had issue with kernels after that and the Dell Precision swappable nvme drives that have been since fixed). To see if this randomness is linux kernel related or not. I’m not really sure what this will tell us other than if the problem was introduced in later kernels.

                          Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                          1 Reply Last reply Reply Quote 1
                          • jmasonJ
                            jmason @Sebastian Roth
                            last edited by

                            @Sebastian-Roth

                            I ran it 10 times and noticed a slight difference on the 6th and 9th time.

                            Test10times.png

                            george1421G 1 Reply Last reply Reply Quote 0
                            • jmasonJ
                              jmason @Sebastian Roth
                              last edited by

                              @Sebastian-Roth said in Dell 7730 precision laptop deploy GPT error message:

                              @george1421 said in Dell 7730 precision laptop deploy GPT error message:

                              Do we have empirical evidence that these disks are being swapped as being reported by the uefi bios?

                              See @jmason’s lsblk listings. From my point of view this is evidence enough. The disks are different size and do swap. As far as I got the postings it seems like the output was always taken on the same deploy system. One time 477 GB drive is last and the other time it’s first.

                              I have indeed been working with only 1 laptop with the deploy attempts and only 1 laptop from which the capture image was created, and both are identical machines. I have 20 of them in total.

                              1 Reply Last reply Reply Quote 0
                              • jmasonJ
                                jmason @george1421
                                last edited by

                                @george1421 said in Dell 7730 precision laptop deploy GPT error message:

                                @Sebastian-Roth (this is more of a brain dump than an answer)
                                The Precision 7730 generation is pretty new, so the first thing I would check/watch for is firmware update availability.

                                There is a BIOS update with the following fixes, but don’t see anything related to our issue. I can go ahead and update the system with all the latest fixes available if requested.

                                Dell Support for Precision 7730

                                • Fixes the issue where the mouse lags when the Dell TB16 dock is unplugged or plugged in.
                                • Fixes the issue where the system cannot set hard drive password with Dell Client Configuration Toolkit.
                                • Fixes the issue where the system always boots to Rufus formatted USB drives instead of internal hard drive.
                                • Improves system performance under heavy load when connected to Dell TB18DC Dock.
                                george1421G 1 Reply Last reply Reply Quote 0
                                • george1421G
                                  george1421 Moderator @jmason
                                  last edited by

                                  @jmason said in Dell 7730 precision laptop deploy GPT error message:

                                  There is a BIOS update with the following fixes, but don’t see anything related to our issue. I can go ahead and update the system with all the latest fixes available if requested.

                                  I would do this no matter what even though the change log shows the fix primarily dealing with the usb-c dock.

                                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                  jmasonJ 1 Reply Last reply Reply Quote 0
                                  • george1421G
                                    george1421 Moderator @jmason
                                    last edited by

                                    @jmason First let me say, thank you for being so detailed and helping debug this issue. I know it takes quite a bit of time to do these iterative testing, so Thank You.

                                    So I see from the pictures you have a 20% rate where it looks like the nvme disk 1 inits before disk 0. Can we correlate the 6th and 9th order with the swap when shown by lsblk.

                                    I also see from the picture that the PCI address for nvme disks are not changing, at least the location vs name. My intuition is telling me that this problem is probably rooted in the linux kernel and or hardware/linux kernel race condition.

                                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                                    jmasonJ 1 Reply Last reply Reply Quote 0
                                    • jmasonJ
                                      jmason @george1421
                                      last edited by jmason

                                      @george1421

                                      Here are two images showing the correlation. I only had to attempt deploy twice this time to get the difference to appear.

                                      nvme1n1-nvme0n1.PNG

                                      nvme0n1-nvme1n1.PNG

                                      well I just ran it a third time and got another difference, but it appears it might be just like the first image displayed with only the output in a different order.
                                      .
                                      nvme0n1-nvme1n1-1TB-500GB.PNG

                                      1 Reply Last reply Reply Quote 1
                                      • S
                                        Sebastian Roth Moderator
                                        last edited by

                                        @jmason Great stuff. Thanks for that as well. From those pictures it looks like the order of initialization (dmesg output nvme0n1 before nvme1n1 or vice versa) does not co-relate to the disks being in different order.

                                        1. picture: init nvme1n1 before nvme0n1 - nvme0n1 954 GB disk / nvme1n1 477 GB disk
                                        2. picture: init nvme0n1 before nvme1n1 - nvme0n1 477 GB disk / nvme1n1 954 GB
                                        3. picture: init nvme0n1 before nvme1n1 - nvme0n1 954 GB disk / nvme1n1 477 GB

                                        I suppose if you’d have three disks it could be any combination… 😞

                                        Let’s see what we can do about this. Can you please get a couple of different Linux Live ISOs and do exactly the same testing on those.

                                        • Debian: https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-9.7.0-amd64-xfce.iso
                                        • Ubuntu: http://releases.ubuntu.com/18.04.1/ubuntu-18.04.1-desktop-amd64.iso
                                        • Arch: https://mirror.orbit-os.com/archlinux/iso/2019.01.01/archlinux-2019.01.01-x86_64.iso
                                        • SystemRescueCD: https://osdn.net/projects/systemrescuecd/storage/releases/6.0.1/systemrescuecd-6.0.1.iso

                                        Please see if all of those behave exactly the same (random change on every reboot) or if the disk order seems stable. At lease boot each OS ten times.

                                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                                        jmasonJ 4 Replies Last reply Reply Quote 0
                                        • jmasonJ
                                          jmason @Sebastian Roth
                                          last edited by

                                          @Sebastian-Roth Grabbing the ISOs now, debian and ubuntu just updated their release.

                                          https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-9.8.0-amd64-xfce.iso
                                          http://releases.ubuntu.com/18.04.2/ubuntu-18.04.2-desktop-amd64.iso

                                          1 Reply Last reply Reply Quote 1
                                          • jmasonJ
                                            jmason @Sebastian Roth
                                            last edited by jmason

                                            @Sebastian-Roth

                                            I am unable to get Debian live to start up after the initial run without install menu.

                                            Ubuntu showed the behavior on the 3rd with lsblk and 5th reboot with dmesg, while reboot 7 was different than all previous, I’ll move on to the other 2 ISOs next.

                                            -1-ubuntulive-1.PNG

                                            -2-ubuntulive-2.PNG

                                            -3-ubuntulive-3.PNG

                                            -4-ubuntulive-4.PNG

                                            -5-ubuntulive-5.PNG

                                            -6- was like -3-

                                            -7-ubuntulive-7.PNG

                                            Tom ElliottT george1421G 2 Replies Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 3
                                            • 4
                                            • 5
                                            • 2 / 5
                                            • First post
                                              Last post

                                            164

                                            Online

                                            12.0k

                                            Users

                                            17.3k

                                            Topics

                                            155.2k

                                            Posts
                                            Copyright © 2012-2024 FOG Project