Debian 9.0 capture fails AND destroys client image



  • Server
    • FOG Version: Latest 1.4.4
    • OS: CentOS 7.3
    Client
    • Service Version: N/A
    • OS: Debian 9.01
    Description

    Installed Debian 9.01 as VM thin client image. Using Fog 1.4.4 to capture the image results in the error shown below. Debian 8.8 installed the exact same way captures just fine as well as many other Linux variants. The real shocker here is that after the capture fails, the original hard disk image is unbootable, corrupt and mangled. I believe the cardinal rule of image cloning is to never mess with the actual image on disk. Any ideas???
    0_1498691905435_fog-debian-9-error.png

    BTW, thank you for a handy piece of software.



  • Thanks @Tom-Elliott, I have successfully captured and deployed Debian 9 clients both in virtual space and to real hardware. So, it looks like your fix works. I did notice some slower booting as you mentioned but that seemed to be minimized after I went through a full capture-deploy-capture sequence. Perhaps there was residual file system ambiguities that was affecting the boot times. In any event, the boot times smoothed out over time. The only slightly odd thing I did notice is sometimes the client would boot twice before the capture would actually start. I cannot reproduce this all the time. Just thought I would mention it if it makes you think of anything.

    Thanks again; will this fix make it into 1.4.5?


  • Senior Developer

    And success.

    If you’d like to test for yourself, please download the newly built inits (these will be in 1.4.5 of course).

    It does appear to take longer to boot than one would expect under normal conditions, but it DOES boot. I’m going to try recapturing now that the init’s have been updated.

    To test for yourself, please try:

    wget -O /var/www/fog/service/ipxe/init.xz https://fogproject.org/inits/init.xz
    wget -O /var/www/fog/service/ipxe/init_32.xz https://fogproject.org/inits/init_32.xz
    

  • Senior Developer

    Good news Everyone (Futurama reference anybody?)…

    The “fix” i’m testing seems to have worked on the capture side. Will test deploy shortly, but should work as well.


  • Senior Developer

    I was able to, more or less, replicate the problem albeit in a different sector failing. I don’t know why this is failing unless the ext4 utility used to build the filesystem has a difference from what is normally used on the likes of Redhat, Debian 8, etc…

    I don’t like the way I’ve worked around this either. It leaves too much potential for unknown issues, but I think it might work for our needs either way.

    I’m approaching the problem by:

    Check the filesystem as normal. Copy the error so we can present it to the user later if needed.
    If the filesystem check fails as it seems to do, attempt to forcibly fix the problem. If the forcible fix doesn’t seem to help, then we know there must be a real problem. I’m still waiting for the upload to complete to find out if it works. Then I’ll test if deploying the image works as well.


  • Senior Developer

    @foguser438 I have a debian 9.0 VM, I just haven’t tested uploading it yet. I prefer the VM method as I can more simply “break” and “fix” than having to reinstall every time. I’ll work on it this weekend.



  • @Wayne-Workman @Tom-Elliott : In the hopes that it may help you, here is what I have done to eliminate the possibility that this problem is related to a corrupt ext4 filesystem:

    1. Rebuilt from scratch three times the same image, tried the capture, same result.
    2. Booted client in rescue mode, ran e2fsck on /dev/sda1 the image, it is clean. Ran a capture, same result.
    3. Used Clonezilla to capture/deploy the image several times. No problems. In fact, I use the CZ image to restore.
    4. Successfully captured the image using non-resizable disk. Deployed this image to real thin client. Rebooted thin client and verified filesystem integrity. Tried to capture newly deployed thin client using resizable option, same result.

    Therefore, I don’t think the ext4 filesystem is corrupt before the capture process. But it sure has problems after the capture process fails. I hope this helps. Please let me know what your results are when you capture a Debian 9 client.

    Thanks


  • Moderator

    @foguser438 I’ve been following this thread since the first post very closely. I want to ask if you’ve tried to rebuild the Debian 9 image from scratch a 2nd time and tried capturing that? The original error does relate to inconsistency and recommends an file system check - instead of doing that, I default to trying a new fresh-built image.



  • @Tom-Elliott I always manually partition my images for thin clients as ext4. Never use any other fstype, especially not LVM. Been burnt too many times with Clonezilla with that one… So the fstype is ext4 and has always been. I also only use one partition / and no swap.


  • Senior Developer

    @foguser438 It is, I’m just describing what I know of things at the moment.

    When you reprep your Debian 9 image system, please just make sure you set the format of the partitions to ext rather than LVM, XFS, HFS, or any other thing it may default too. /dev/sda1 is typically the “boot” partition and is really normal to be set as ext, but the rest of the file systems would be set as lvm which isn’t really allowable in a resizable mode.



  • Thanks @Tom-Elliott, yes non-resizable works but is obviously not desirable for the long-term. I appreciate the newness of Debian 9 but I assumed this is how this process works. New client distros are released/installed, problems are found, feedback is provided. I look forward to your input after you have had time to spin up a Debian 9 image.

    TIA


  • Senior Developer

    @foguser438 Debian 9 has been out all of a week, so capturing an image has not yet been tested (from my side). I was under the impression you meant Debian 9 was the SERVER OS, not the OS of the system being captured (sorry about that).

    Then might I recommend setting the “image type” from Resizable to Non-resizable? I’m getting the impression, that while FOS is reading one filesystem as ext4 (and allowing the partition to resize) the disk is actually partitioned using some other format which the resizing is writing over and failing. This is all just a guess of course, I really don’t know the exact issue or a good means to validate. (Particularly seeing as the error is happening immediately on the first partition).



  • Sorry I was not clear about the version information. I have FOG 1.4.4 running on a CentOS 7.3 Linux server. The client I am trying to capture the image of has Debian 9.01 installed in it. I am using the web interface to FOG 1.4.4 to setup a capture task. I copied the binaries package as you suggested and rebooted. I then used Clonezilla to get the Debian 9.01 image back on the client. Rebooted the client to make sure it was working. Re-ran the capture task and received the same error message as before. I only have the 1.4.3 and 1.4.4 binary packages in my fog installation tree. Should I leave the 4.3 installed or should I put 4.4 back on. I really want this FOG server installation to be standard because it is a production system. Have you tried to capture a Debian 9.0 image before? With success?

    Thanks!


  • Senior Developer

    For my own understanding/tracking, what FOG Version was installed on the Debian 8.0 Box? I ask because the FOS Linux system is not depending on the os that’s installed on the FOG Server, so the information as you’ve put forth is very confusing to me. There was a change added to the init’s to try preventing start positioning moving in the case of Resizable images between 1.4.3 and 1.4.4.

    It’s the resizable image imaging that broke the original “client image” only because it has to shrink the partitions down to begin with.

    I suspect the change is a part of what’s causing the problem for you.

    Would you mind, on the debian 9 server, running:

    wget https://fogproject.org/binaries1.4.3.zip
    unzip binaries1.4.3
    cp packages/inits/init*.xz /var/www/fog/service/ipxe/
    

    This should revert the changing (it was relatively minor) and re-attempt capturing?


Log in to reply
 

358
Online

39.3k
Users

11.0k
Topics

104.4k
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.