SOLVED Unable to capture image after performing iPXE boot loader update


  • I’ve gone through 5 of our FOG servers and updated the iPXE boot loaders on all of them, then re-ran the FOG installer and restarted. On our primary FOG server I am now unable to capture images. (NOTE: I was able to capture an image immediately prior to performing the iPXE boot loader update)

    During the capture sequence I am getting the following error immediately after mounting the partition:

    blkid: error: /dev/nvme0n1p4: No such file or directory

    The process continues running the partition resize test, then begin the resize process where it hangs indefinately. With limited Linux knowledge I’m at a loss on what to look for. Googling the error I found someone that said that the libblkid-dev package may be missing so I installed that package but the problem persists.

    Screenshot:

    Capture.JPG


  • @george1421

    I just tried another capture after changing the FOG client on the image to point to our FOG server in Germany. This time there was a normal 1 minute pause during the resizing process then Partclone G-zip started and imaged without problems. All seems normal again…gremlins have been eradicated. You can mark this as solved.

  • Moderator

    @jyost well we can always put back the ipxe if you really think that is your problem.


  • @jyost

    I did notice something odd about the latest image capture though…
    Normally using Partclone G-Zip we end up with 10 files in the image…this time we ended up with 11 files:

    11Files.JPG

    And here is the 21H2 image I captured immediately prior to the iPXE boot loader update - 10 files:

    9Files.JPG


  • @george1421
    The VM partition is 60GB - as far as fragmentation goes the partition resides on an SSD which shouldn’t be defragmented. I’m not of the opinion that fragmentation had anything to do with the slow resizing as the image I captured immediately before the iPXE boot loader update resized within a minute or so. The same image was captured after the boot loader update (the only difference being the FOG client installation using a different IP.)

    Even with the long resizing the second time around the image uploaded and downloaded fine without any errors.

    Sidenote: You stated that FOG 1.5.9 had issues with 20H2 and possibly even 20H1. Ironically we have been using FOG 1.5.9 with 20H1, 20H2, and 21H1 without any issues whatsoever. 21H2 is where we ran into problems.

    So, moving forward…for the rest of our fog servers I take it that the order of operations to fully update our servers should be as follows:

    1.) Update FOG to v.1.5.9.111 by updating the FOS init (is server restart or re-run install fog.sh required here?)
    2.) Update Kernel to latest version, then restart FOG server
    3.) Update iPXE to latest version, then re-run installfog.sh

    Again, thank you George, Sebastian, et al. for the excellent tech support. You guys are geniuses.

  • Moderator

    @jyost Unless the disk was badly fragmented it shouldn’t have taken that much time to squeeze down a disk. What the FOG Program does is packs the file system to make the disk files sequential taking up as little non-contiguous space on the disk as possible. Its kind of akin to disk fragmentation, but it moves all of the blocks towards the start of the disk. Then it shrinks the partitions to the size of data. That is how it can make a partition resizable. It squeezes all of the air out of the partition then clones that squeezed partition.

    While I don’t recommend this if you were to abort the capture right in the middle then look at your disk structure you would see what you see in partclone a 35.5 GB partition with 800MB of free space. Its possible that this partition was 100GB (guess) in size to start with. Having about 34.6GB of data on that 100GB disk.


  • @george1421
    No, I never do in place upgrades. This is a fresh copy of 21H2 straight from Micro$oft.
    The strange this is I created a 21H2 image with no problems whatsoever immediately before doing the iPXE boot loader update. After the boot loader update I went back into the VM, Opened the pre-sysprep snapshot, uninstalled the FOG client and reinstalled it with a different IP address (for a different site in India) as I’ve done countless times before. Re-ran sysprep and attempted the capture. That’s when the problems started…

    EDIT: AS I AM TYPING THIS Partclone finally started the capture (~25+ minutes later). Strange.

    PartClone.JPG

    I will need to test this image to see if it’s working and report back…

  • Moderator

    @jyost It almost appears there is something going on with this disk. Did the source computer have windows 10 (any version) and someone did an in place upgrade to a different version of windows 10?


  • @george1421

    Closer…after updating the FOS init and restarting the server, the Init Version is now 20210807. The blkid error is also gone. “New fixed partition for (/dev/nvme0n1p4 added.” message is now displayed (which I assume is good 🙂 ). The only issue now is that the PXE boot process is now hung up on Resizing filesystem…
    Normally this passes within a minute or so and PartClone does it’s thing. It’s been 15 minutes now and we’re still stuck.

    Capture.JPG

  • Moderator

    @jyost If you use the git method to install FOG initially then its pretty easy.

    Change into that directory depending on who’s instructions the fogproject folder will be in /root or /opt Then key in

    git pull
    git checkout dev-branch
    git pull
    

    Then simply rerun the installer with the latest options. Understand the installer will reset all of the iPXE files you have as well as bzImage which you just updated. So you will need to recompile ipxe and copy ipxe.efi and snponly.efi back over to the /tftpboot directory as well as redownload 5.10.x kernels.

    Edit: yeah it would probably help to see if someone already answered the question before replying.

  • Moderator

    @JYost Updating to 1.5.9.111 is one way but you can also just update the FOS init.

    fos-version.jpg

    In the picture we see you’re still using the FOS init that was released with FOG 1.5.9. This init definitely has issues with newer 20H2 (and also earlier version I think).

    To update the init just follow these commands (run as root on your FOG server):

    cd /var/www/html/fog/service/ipxe/
    mv init.xz init.xz.orig
    mv init_32.xz init_32.xz.orig
    wget https://github.com/FOGProject/fos/releases/latest/download/init.xz
    wget https://github.com/FOGProject/fos/releases/latest/download/init_32.xz
    

    Try a new capture and pay attention to the “Init version” that is shown on bootup. Should be newer than what we see in the picture now.


  • @george1421
    How does one get to the dev-branch to download 1.5.9.111??? The Kernel has already been updated to the latest.

  • Moderator

    @jyost Ok two things here.

    If that is a Windows 10 20H2 or later target computer you are going to need to do a few things. M$ changed the disk structure on 20H1 or 20H2 (sorry I can’t remember at the moment) but that 4th partition is now marked as unmovable. The developers have addressed this issue on the dev branch 1.5.9.111 The other thing is you probably need to update the FOS Linux kernel (FOG Web UI -> Fog Configuration -> Kernel update. Update to 5.10.55 or later for both x64 and x32 bit versions.


  • Screenshot below:

    Screen.JPG

  • Moderator

    @jyost OK first let me say there is no connection between iPXE and your current issue.

    iPXE takes the process from the PXE rom to the iPXE menu. Once you make a menu selection on the iPXE menu then iPXE will download bzImage (a.k.a The Kernel) and init.xz (virtual hard drive) then boots bzImage. Once bzImage starts it takes over control of the computer from the boot loader. So with bzImage running its in full control of the computer.

    Based on the error I might think there is an issue with your storage device.

    What I want you to do is cancel this capture task if its still running. Then schedule a new capture task, but before you hit the schedule task button, tick the debug checkbox. Now schedule the task. PXE boot the target computer. After a few screens of text that you need to clear with the enter key you will be dropped to the FOS Linux command prompt.

    At the FOS Linux command prompt please key in the following:

    uname -a
    lsblk
    fdisk -l /dev/nvme0n1
    

    Lets see what fos linux is seeing here. Stay in this mode because we might have a few more commands for you.

    On a side note, you probably need to be on fog version 1.5.9.111 if you are trying to capture a windows 10 20h2 or later release.

324
Online

9.9k
Users

16.2k
Topics

148.9k
Posts