Failure in Capturing an image

  • @Tom-Elliott @george1421

    Started working on the Linux side of the Fog-Client and quickly got that working. Ran a capture task on the Linux side to then create a new golden image of updates + Fog-Client functioning.

    It would reboot, boot into Fog and then quit without capturing the image but would clear the task in the queue. I am doing this remotely so I started a virtualized VM of the exact original captured image I got before making direct changes to the Fog-Project server with customization of scripts.

    Deploying an image is fine on a Mac & on Linux (VM currently, out of the office when running these tests but I am assuming the same issue is happening.

    When sending a capture task to a Linux based machine I am getting the following:

    I am not sure if a Mac is doing the same issue. Although my disk for Linux is resizable and is ‘Single Disk’.

    This worked before changes were made to the post-init script. Any idea?

    See the below:

    Got into the office this morning, when capturing on MacOSX:

    alt text

    Looking like it pushes wrong flags to the partclone.exe

    Not sure what is going on but deploying an image works perfectly fine.

  • Moderator

    @Tom-Elliott said in Failure in Capturing an image:

    @ismith-hpu But the machine, indeed, does have an OS on the disk? I can only assume the -a0 is a part of the problem. I don’t know what the argument is doing for the 0.3.12 version of partclone.

    It’s to ensure compatibility with 0.2.89 images, allowing them to be deployed with 0.3.12.

    0.2.89 had a broken checksum system, so we force it disabled on 0.3.12 in order to allow those images a normal deployment. It’s also faster and slightly smaller images, certainly not complaining about that part.

    It is curious that 0.3.12 doesn’t seem to work in this instance, though I wonder if a non-resizable (not raw) capture has been tried?

  • @Tom-Elliott said in Failure in Capturing an image:

    wget -O /var/www/fog/service/ipxe/init_32.xz

    It now works completely.

    Thank you very much.

  • @Tom-Elliott
    Trying this now, will attempt to see if it deploys properly.

  • Senior Developer

    The fixes we put in place where 2 fold.

    First we put in the fix to address the issue of RAW imaging not passing a partition for partprobe. This was handled by passing a variable to test if the image is raw or not. If it’s raw, the flag gets set and the function returns the disk as it was sent to the function.

    Second we are using a more implicit means to return the disk when the partition information is passed.

    The only real difference that I’m seeing here, is that the 0.3.12 partclone doesn’t like doing the the translation.

    THis is okay, leave your post init script in place. Edit the file and remove the -a0 (or -a1 if this is still in place) from the file.

    Download the proper inits again using:

    wget -O /var/www/fog/service/ipxe/init.xz
    wget -O /var/www/fog/service/ipxe/init_32.xz

    Then you should be all set. The init’s will have the proper for 0.2.89 partclone as well as the changes that were in the init’s you downloaded from our dev server.

  • @Tom-Elliott

    But it still needs to be able to write the partitions and data in the correct order.

    If that wasn’t the case then why did we make a post init to properly identify the nvme0n1p1 when we did the:

    lsblk -dpno KNAME -I 3,8,9,179,202,253,259 | uniq | sort -
    readlink /sys/class/block/nvme0n1p1
    disk=$(readlink /sys/class/block/nvme0n1p1)
    echo $disk
    lsblk -no pkname /dev/nvme0n1p1

    When my capture/hardware hasn’t change.

    Look back at the conversation in PM from 11 days ago and see if that maybe adds more context? I am a little confused myself x.x

  • Senior Developer

    @Sebastian-Roth the Macs are being captured as raw, which captures the entire disk, not the partitions.

  • @Sebastian-Roth

    That is what I told @Tom-Elliott in the priv-chat.

    This is similar to what happened before and he fixed in a post-init but it’s not working again.

    I am out of the office but will do that tomorrow.

    Additionally this WAS working before I updated utilizing his post-init, as I can deploy the image fine that I capture before and I use to be capture fine, now it’s broken.

  • Developer

    @ismith-hpu In the partclone pictures we see /dev/nvme0n1 which is the whole disk. Shouldn’t it try to capture the partitions (e.g. /dev/nvme0n1p1 …)? Not sure what exactly is going wrong here. Just the first thing that jumped at me.

    Years ago at my old working place we did capture and deploy Mac OS X perfectly fine, so I know this has worked at some point. But so many things have changed and I don’t have a Mac at hand to test.

    Can you please schedule a debug capture task. Boot up the machine and hit ENTER twice to get to the shell. Now run fdisk -l, take a picture of the output and post here.

  • @Tom-Elliott
    -aX --checksum-mode=X Checksum formula to use to add error detection\n"
    " where X:\n"
    " 0: No checksum (no slowdown, smallest image)\n"
    " 1: CRC32 (Fast to compute, basic detection)\n"

  • @Tom-Elliott

    Yes, it’s a Mac.

    The Mac will deploy an image fine (dd, raw, everything).

    But capturing an image (dd, raw, everything) does not work.

  • Senior Developer

    @ismith-hpu But the machine, indeed, does have an OS on the disk? I can only assume the -a0 is a part of the problem. I don’t know what the argument is doing for the 0.3.12 version of partclone.

  • @Tom-Elliott That is a capture task from the webUI, not a deployment.

    Just did it via the ‘task selection’ from the host page, and same interaction happened.

  • Senior Developer

    @ismith-hpu this looks like a deployment, but what does a capture look like?

  • @Tom-Elliott

    Mac’s can deploy but cannot capture.

    It instantly finishes writing the content but is not make any actual image, the size is saying 0.00KB on the file server:
    Here is the boot process in order.
    alt text
    alt text
    alt text

  • @Tom-Elliott

    Removed the postinit
    Replaced the inits with artifacts.

    Linux is working with capturing, still need to test deployment with Linux.

    Still needing to test deployment & capture with Mac after the above changes.

    -Edit 1-
    Linux deployment works.
    Mac deployment works, still needing to test capture. Want to write the master image before rewriting the golden image on Mac.

  • Senior Developer

    @Sebastian-Roth the post init is just placing the downloaded file in the init before the imaging task begins.

    It’s literally:
    cp ${postinitpath} /usr/share/fog/lib/

    @ismith-hpu please note if you download the artifacts init, you no longer should need to run the postinit script.


  • Developer

    @ismith-hpu said in Failure in Capturing an image:

    This worked before changes were made to the post-init script. Any idea?

    Can you please post the full post init script here so we see what modifications you make!

    The error we see in the video is definitely caused by a partclone parameter as mentioned by Tom. The issue is you use init file from 14.07.2019 (see in the video) which came with partclone 0.2.89 and modified the scripts to match parameters that only work with partclone 0.3.12 (which we updated to in September 2019).

    So what you want to do is download the latest proper build of the inits and use those:

    Not exactly sure about the error you posted on MacOS X. Please post the pull post-init script here so we know your mods.

    Edit: Ahhh wait! The scripts only count NTFS adn EXT2/3/4 as possible resizable filesystems. So on your Mac it doesn’t find any and bails out. Which version of Mac OS X do you have? Resizable is a huge problem in Mac OS - see here:

  • Senior Developer

    @Sebastian-Roth the issue is the init in use has parclone 0.2.89, which cannot accept the -a0 argument. The file is being pulled from github fos master, which is coded for partclone 0.3.12 I believe.

  • In

                echo "Done"
                dots "Getting Windows/Linux Partition Count"
                countPartTypes "$hd" "ntfs" "ntfscnt"
                countPartTypes "$hd" "extfs" "extfscnt"
                countPartTypes "$hd" "*" "partscnt"
                if [[ $ntfscnt -eq 0 && $extfscnt -eq 0 ]]; then
                    echo "Failed"
                    handleError "No resizable partitions found ($0)\n   Args Passed: $*"
                echo "Done"

    Is where it is failing in the code, not sure what is causing this issue?

Log in to reply