• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Wrong target device

    Scheduled Pinned Locked Moved Unsolved FOG Problems
    7 Posts 3 Posters 101 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      Floppyrub
      last edited by

      I’ve currently run into a problem in FOG that I can’t resolve. I’m using version 1.5.10.1698. I have different systems with varying numbers of hard drives. All of the systems have an NVMe drive, which is where the installation is supposed to go. Sometimes the systems also have one or two additional data HDDs.

      As soon as even one HDD is connected, FOG always selects it as the target device for deployment. This has already led to a painful data loss. In debug mode, the NVMe drive is displayed correctly, and if I disconnect the HDD, the installation works fine on the NVMe.

      Could this issue be related to how the Windows image was created? I currently can’t rule out that when the Sysprep was performed, the HDD was also connected alongside the NVMe (the drive on which Windows is installed).

      I seem to remember that in earlier FOG versions, it always selected the NVMe drives automatically. I’m now trying to find out where exactly the error occurs. Could you tell me which file in FOG determines the target disk for deployment?

      Tom ElliottT 1 Reply Last reply Reply Quote 0
      • Tom ElliottT
        Tom Elliott @Floppyrub
        last edited by Tom Elliott

        @Floppyrub /usr/share/fog/lib/funcs.sh getHardDisk function is where the code is located.

        You’ll want to look at github.com/fogproject/fos for this under the path https://github.com/FOGProject/fos/tree/master/Buildroot/board/FOG/FOS/rootfs_overlay/usr/share/fog/lib/funcs.sh specifically.

        Recent updates have been made that were attempting to account for “Host Primary Disk” field allowing serial/wwn/disk size lookups to help pinpoint what drive to use for imaging when this is set.

        In a point of consistency it now de-duplicates and sorts the drives, so it’s possible:

        /dev/hdX is chosen as the primary drive before /dev/nvmeX because of the sorting feature.

        There’s no real way to consistently ensure nvme is loaded before HDD’s though so there was always the potential, just that nvme runs on the PCI bus directly rather than the ATA busses (which are generally much slower to power on)

        Now /dev/sdX (in the new layout) would most likely be safe because lexicographically speaking it would fall in after the nvme’s in naming sorting I’d imagine?

        Currently, I’m aware that the released version of inits likely is also presorting by disk size first (assuming the largest drives are the primary disk you’d want to send the image to when you’re not using the Host Primary Disk feature.)

        From my viewpoint (limited as it may be) you may need to start using UUID/WWN/Serial formatting more for these multi-disk connections where you don’t want to accidentally overwrite a disk.

        Easier said than done, but my point is the getHardDisk feature is a best guess algorithm at its core. It “seemed” better in older systems, but as new technologies and methodologies of reading data come about, there’s no real “this is definitely the drive this user wants the OS to sit on” method available to anyone.

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        1 Reply Last reply Reply Quote 0
        • F
          Floppyrub
          last edited by

          @Tom-Elliott Thanks for your detailed response. I can understand the problem well. On desktops, you can simply disconnect the HDDs if necessary. However, the day before yesterday, I had a laptop with two NVMe drives, where that would be very inconvenient. There, the installation also went to the second NVMe instead of the first.

          The hint about funcs.sh is extremely valuable. I had already come across the line /usr/share/fog/lib/funcs.sh in the file fog.custominstall. When I wanted to look at the file, I noticed that this directory does not exist in my installation. I had assumed it was old custom code I had written. Now I know that’s not the case. I will download the file from GitHub and place it on the system, and I’m quite confident that this will solve the problem, at least initially.

          The system was freshly installed at the end of August with version 1.5.10.1600, so the error must have occurred there.

          Tom ElliottT F 2 Replies Last reply Reply Quote 0
          • Tom ElliottT
            Tom Elliott @Floppyrub
            last edited by

            @Floppyrub The code exists in the FOS system (when you boot a machine for a task, not on your server)

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            1 Reply Last reply Reply Quote 0
            • F
              Floppyrub
              last edited by

              I wanted to just accept the situation as it is. However, I’ve already suffered a second painful data loss because I overlooked a SATA HDD and didn’t disconnect it. I don’t think it’s working the way it should. All my systems are Dell machines that were previously deployed using FOG, and the correct NVME drive was always selected as the target before. Is there anything I can do to solve the problem?

              Tom ElliottT 1 Reply Last reply Reply Quote 0
              • Tom ElliottT
                Tom Elliott @Floppyrub
                last edited by Tom Elliott

                @Floppyrub Have you updated to dev-branch? You are free to run any task as a debug, initially to validate things are working as expected before things get too far and do any actual “data loss activities”.

                The latest FOS code is the default pull in:

                if it helps you to see how it functions please review the getHardDisk funciton starts at line 1501 of the Code link I provided you:

                If it helps to see the funciton as a whole:

                getHardDisk() {
                    hd=""
                    disks=""
                
                    # Get valid devices (filter out 0B disks) once, sort lexicographically for stable name order
                    local devs
                    devs=$(lsblk -dpno KNAME,SIZE -I 3,8,9,179,202,253,259 | awk '$2 != "0B" { print $1 }' | sort -u)
                
                    if [[ -n $fdrive ]]; then
                        local found_match=0
                        for spec in ${fdrive//,/ }; do
                            local spec_resolved spec_norm spec_normalized matched
                            spec_resolved=$(resolve_path "$spec")
                            spec_norm=$(normalize "$spec_resolved")
                            spec_normalized=$(normalize "$spec")
                            matched=0
                
                            for dev in $devs; do
                                local size uuid serial wwn
                                size=$(blockdev --getsize64 "$dev" | normalize)
                                uuid=$(blkid -s UUID -o value "$dev" 2>/dev/null | normalize)
                                read -r serial wwn <<< "$(lsblk -pdno SERIAL,WWN "$dev" 2>/dev/null | normalize)"
                
                                [[ -n $isdebug ]] && {
                                    echo "Comparing spec='$spec' (resolved: '$spec_resolved') with dev=$dev"
                                    echo "  size=$size serial=$serial wwn=$wwn uuid=$uuid"
                                }
                                if [[ "x$spec_resolved" == "x$dev" || \
                                      "x$spec_normalized" == "x$size" || \
                                      "x$spec_normalized" == "x$wwn" || \
                                      "x$spec_normalized" == "x$serial" || \
                                      "x$spec_normalized" == "x$uuid" ]]; then
                                    [[ -n $isdebug ]] && echo "Matched spec '$spec' to device '$dev' (size=$size, serial=$serial, wwn=$wwn, uuid=$uuid)"
                                    matched=1
                                    found_match=1
                                    disks="$disks $dev"
                                    # remove matched dev from the pool
                                    devs=${devs// $dev/}
                                    break
                                fi
                            done
                
                            [[ $matched -eq 0 ]] && echo "WARNING: Drive spec '$spec' does not match any available device." >&2
                        done
                
                        [[ $found_match -eq 0 ]] && handleError "Fatal: No valid drives found for 'Host Primary Disk'='$fdrive'."
                
                        disks=$(echo "$disks $devs" | xargs)   # add unmatched devices for completeness
                
                    elif [[ -r ${imagePath}/d1.size && -r ${imagePath}/d2.size ]]; then
                        # Multi-disk image: keep stable name order
                        disks="$devs"
                    else
                        if [[ -n $largesize ]]; then
                            # Auto-select largest available drive
                            hd=$(
                                for d in $devs; do
                                    echo "$(blockdev --getsize64 "$d") $d"
                                done | sort -k1,1nr -k2,2 | head -1 | cut -d' ' -f2
                            )
                        else
                            for d in $devs; do
                                hd="$d"
                                break
                            done
                        fi
                        [[ -z $hd ]] && handleError "Could not determine a suitable disk automatically."
                        disks="$hd"
                    fi
                
                    # Set primary hard disk
                    hd=$(awk '{print $1}' <<< "$disks")
                }
                

                Ultimately, the part I’m worried about is the sort -u as that will lexographically sort the drives regardless of how lsblk returns (which is the part I was stating earlier, there’s no true OS load method as PCI tends to load faster than serial -> parallel:

                I have adjusted the code slightly and am rebuilding with that adjustment in the beginning of the function where we get all available drives:

                devs=$(lsblk -dpno KNAME,SIZE -I 3,8,9,179,202,253,259 | awk '$2 != "0B" { print $1 }' | sort -u)
                

                Instead of sort -u I’m going to try:

                devs=$(lsblk -dpno KNAME,SIZE -I 3,8,9,179,202,253,259 | awk '$2 != "0B" && !seen[$1]++ { print $1 }')
                

                Basically that will get only unique drive entries but keep in in the order of which lsblk sees the drives.

                I doubt this will “fix” the issue you’re seeing, but it’s worth noting.

                I still need to clarify, however, that this isn’t the coding fault. There’s 0 guaranteed method to ensure we always get the right drive, because in newer systems what is labelled the drive this cycle, can easily be labelled something else the next cycle.

                hdd will always load in hda, hdb, hdc, hdd - this is about the only “guarantee” we can give.

                Serial (USB, SATA, etc…) SATA would load (generally) in the channel order appropriately, but USB might or might not load before: so Something in the USB might take /dev/sda on this boot, and on the next, the channel 0 controller might take /dev/sda.

                NVME, what’s nvme0n1 on this cycle, might become nvme1n1 on the next.

                This is why the function you see is “best guess” at best.

                I am wanting to make this seemingly more stable on your side of things, for sure, but just want to be clear on what you’re seeing, there’s never any potential we can guarantee we got the “right” thing.

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                1 Reply Last reply Reply Quote 0
                • F
                  Fog_Newb @Floppyrub
                  last edited by Fog_Newb

                  @Floppyrub said in Wrong target device:

                  @Tom-Elliott Thanks for your detailed response. I can understand the problem well. On desktops, you can simply disconnect the HDDs if necessary. However, the day before yesterday, I had a laptop with two NVMe drives,

                  I’ve run into this problem on a PC that had 2 NVME’s, It’s my understanding the reason is, sometimes one NVME initializes first, so /nvme0n is sometimes /nvme1n. If the drives are different sizes, you can specify the size of the target drive / Host Primary Disk (or at least you could). Currently I have 2 NVME’s the same size so I use the serial number of the drive as Host Primary Disk. It works.

                  1 Reply Last reply Reply Quote 0
                  • 1 / 1
                  • First post
                    Last post

                  176

                  Online

                  12.3k

                  Users

                  17.4k

                  Topics

                  155.8k

                  Posts
                  Copyright © 2012-2025 FOG Project