Deployment stuck in a loop, never finishes imaging?

salted_cashews

So I’ve set deployment tasks to run and they’ve been running for approximately 7hrs. The usual time is around 1hr.

I’ve observed that it keeps trying to deploy the same partition over and over again. The only thing that changed were the following settings under the Web GUI’s “Fog Settings”:
FOG_IMAGE_COMPRESSION_FORMAT_DEFAULT = PartClone Zstd
FOG_PIGZ_COMP = 19

The only thing that doesn’t make sense to me is all of our images are using that compression format and with a level of 19, so I don’t see why this would be a problem. Strangely, I was able to deploy 2 other images perfectly fine. Any ideas? Fog v 1.4.4.

alt text

I noticed it says look at /var/log/partclone.log but there doesn’t seem to be one.

Junkhacker

@salted_cashews it looks like the image file for partition 3 is missing

salted_cashews

@Junkhacker alt text

Am I reading this wrong? This is weirding me out. Is there a reason I don’t have a partclone log?

Junkhacker

@salted_cashews the log that’s gets generated during the imaging process is on the client booted to FOS. it’s gone as soon as the computer reboots.

salted_cashews

@Junkhacker Oh I see, is FOS the preboot environment?

Junkhacker

@salted_cashews yes. it’s the minimal Linux OS that loads over the network to do the imaging tasks

salted_cashews

@Junkhacker Thanks for the info!

george1421

@salted_cashews FOS is the Fog Operating System that runs on the target computer. It is linux based and is built from bzImage (kernel) and init.xz (virtual HD) .

If you run a debug capture/deployment you can access this log file. It only exists on the virtual ram drive that FOS uses.

salted_cashews

@george1421 This is really interesting, is this why I’m able to almost-ssh into the box during an image/network boot?

george1421

@salted_cashews Yes. It IS an operating linux OS. If you boot into debug mode and then give root a password you can ssh into the box as root and run the debug deployment/capture remotely. I use this method when debugging/developing post install scripts.

Sebastian Roth

@salted_cashews Are you sure the image was captured with Zstd as well? If you change that option in the image setting you need to re-capture it!

Running a debug deploy task and ssh into it (you need to set a root password within the booted FOS environment on your client machine using passwd command) to look at the partclone.log is definitely a good idea.

salted_cashews

@Sebastian-Roth I’m 100% positive it was captured using Zstd, the only thing I can think of is something we did on the image before capture or a network issue during.

Sebastian Roth

@salted_cashews See if you can grad the partclone.log file and hope we get some more information from that.

Sebastian Roth

@salted_cashews I find it really strange that the filesystem does not seem to be clean. Are you sure the filesystem was clean when you initially captured the image? Sure the machine was not in some kind of hibernation when it was PXE booted to be captured?

salted_cashews

@Sebastian-Roth Indeed, the image was a CentOS 7 image that had been rebooted (hibernation on the OS is disabled via the GUI). This had happened with one other image as well, and I remember us running some basic “clean up” tasks beforehand. It’s possible these mucked up the file system or something. Let me see if I can trace back exactly what we did.

Sebastian Roth

@salted_cashews said in Deployment stuck in a loop, never finishes imaging?:

Let me see if I can trace back exactly what we did.

Maybe .bash_history…?

salted_cashews

@Sebastian-Roth To my dismay the host was just “nuked” this morning. On the bright side I’m testing another deploy debug and I’m SSHd into the guy. Is it possible to have the root password set via passwd by default on a deploy/capture? I’d love to just be able to jump in like this at-will.

Sebastian Roth

@salted_cashews Take a look at Tom’s post here: https://forums.fogproject.org/post/88286

Though I have not tested this myself lately it should still work I reckon.

salted_cashews

@Sebastian-Roth Thank you sir, as far as the logs are concerned this is what they report:

Partclone v0.2.89 http://partclone.org
Starting to restore image (-) to device (/dev/sda3)
note: Storage Location 10.10.100.252:/images/, Image name PPS_v9.0R2-dev_CentOS
we need memory: 208468 bytes
image head 4160, bitmap 200208, crc 4100 bytes
Calculating bitmap... Please wait... get device size 53687091200 by ioctl BLKGETSIZE64,
done!
File system:  EXTFS
Device size:    6.6 GB = 1601624 Blocks
Space in use:   4.5 GB = 1097323 Blocks
Free Space:     2.1 GB = 504301 Blocks
Block size:   4096 Byte
read ERROR:No such file or directory

Following this I get a bunch of errors about “inode” something or other, and then the eventual reboot.

Sebastian Roth

@salted_cashews Please run the following command on your FOG server: file /images/PPS_v9.0R2-dev_CentOS/d1p3.img

Post output here.

Deployment stuck in a loop, never finishes imaging?

189

12.0k

17.3k

155.1k