Deployment stuck in a loop, never finishes imaging?
-
@salted_cashews Yes. It IS an operating linux OS. If you boot into debug mode and then give root a password you can ssh into the box as root and run the debug deployment/capture remotely. I use this method when debugging/developing post install scripts.
-
@salted_cashews Are you sure the image was captured with Zstd as well? If you change that option in the image setting you need to re-capture it!
Running a debug deploy task and ssh into it (you need to set a root password within the booted FOS environment on your client machine using
passwd
command) to look at the partclone.log is definitely a good idea. -
@Sebastian-Roth I’m 100% positive it was captured using Zstd, the only thing I can think of is something we did on the image before capture or a network issue during.
-
@salted_cashews See if you can grad the partclone.log file and hope we get some more information from that.
-
@salted_cashews I find it really strange that the filesystem does not seem to be clean. Are you sure the filesystem was clean when you initially captured the image? Sure the machine was not in some kind of hibernation when it was PXE booted to be captured?
-
@Sebastian-Roth Indeed, the image was a CentOS 7 image that had been rebooted (hibernation on the OS is disabled via the GUI). This had happened with one other image as well, and I remember us running some basic “clean up” tasks beforehand. It’s possible these mucked up the file system or something. Let me see if I can trace back exactly what we did.
-
@salted_cashews said in Deployment stuck in a loop, never finishes imaging?:
Let me see if I can trace back exactly what we did.
Maybe
.bash_history
…? -
@Sebastian-Roth To my dismay the host was just “nuked” this morning. On the bright side I’m testing another deploy debug and I’m SSHd into the guy. Is it possible to have the root password set via
passwd
by default on a deploy/capture? I’d love to just be able to jump in like this at-will. -
@salted_cashews Take a look at Tom’s post here: https://forums.fogproject.org/post/88286
Though I have not tested this myself lately it should still work I reckon.
-
@Sebastian-Roth Thank you sir, as far as the logs are concerned this is what they report:
Partclone v0.2.89 http://partclone.org Starting to restore image (-) to device (/dev/sda3) note: Storage Location 10.10.100.252:/images/, Image name PPS_v9.0R2-dev_CentOS we need memory: 208468 bytes image head 4160, bitmap 200208, crc 4100 bytes Calculating bitmap... Please wait... get device size 53687091200 by ioctl BLKGETSIZE64, done! File system: EXTFS Device size: 6.6 GB = 1601624 Blocks Space in use: 4.5 GB = 1097323 Blocks Free Space: 2.1 GB = 504301 Blocks Block size: 4096 Byte read ERROR:No such file or directory
Following this I get a bunch of errors about “inode” something or other, and then the eventual reboot.
-
@salted_cashews Please run the following command on your FOG server:
file /images/PPS_v9.0R2-dev_CentOS/d1p3.img
Post output here.
-
/images/PPS_v9.0R2-dev_CentOS/d1p3.img: data
-
@salted_cashews Please run the same for all image files in that directory:
file /images/PPS_v9.0R2-dev_CentOS/d1p*
-
@Sebastian-Roth said in Deployment stuck in a loop, never finishes imaging?:
file /images/PPS_v9.0R2-dev_CentOS/d1p*
/images/PPS_v9.0R2-dev_CentOS/d1p1.img: data /images/PPS_v9.0R2-dev_CentOS/d1p2.img: data /images/PPS_v9.0R2-dev_CentOS/d1p3.img: data /images/PPS_v9.0R2-dev_CentOS/d1p4.ebr: DOS/MBR boot sector; partition 1 : ID=0x82, start-CHS (0x1bf,247,57), end-CHS (0x2d9,99,10), startsector 8192, 20971520 sectors, extended partition table (last) /images/PPS_v9.0R2-dev_CentOS/d1p5.ebr: data
-
@salted_cashews Possibly this version of
file
does not detect Zstd compressed files. Please try to manually extract the image to see if that works properly:zstdmt -d /images/PPS_v9.0R2-dev_CentOS/d1p3.img -o /images/PPS_v9.0R2-dev_CentOS/d1p3_extracted.dat
See if that triggers an error or not.
Hint: You might need to install package
zsdt
on your FOG server. -
-dev_CentOS/d1p3.img : 4201 MB... -dev_CentOS/d1p3.img : Read error (39) : premature end
-
@salted_cashews Make sure you have enough space on your disk:
df -h
Now as a test, please do the same with another image file:
zstdmt -d /images/PPS_v9.0R2-dev_CentOS/d1p1.img -o /images/PPS_v9.0R2-dev_CentOS/d1p1_extracted.dat
From my point of view the manual extraction test should give you an error if the image file is fine.
-
zstdmt -d /images/Ciara_CentOS-BASEmk3/d1p1.img -o /images/Ciara_CentOS-BASEmk3/d1p1_extracted.dat /images/Ciara_CentOS-BASEmk3/d1p1.img: 241217197 bytes
No error this time, the error I received running it manually is the same error that displays during the task as well (on the partclone progress screen).
df -h
Filesystem Size Used Avail Use% Mounted on udev 2.9G 0 2.9G 0% /dev tmpfs 597M 61M 537M 11% /run /dev/mapper/FOG--DHCP--vg-root 24G 7.7G 15G 36% / tmpfs 3.0G 0 3.0G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 3.0G 0 3.0G 0% /sys/fs/cgroup /dev/sdb1 1.8T 1.4T 334G 81% /images /dev/sda1 472M 108M 341M 24% /boot tmpfs 597M 0 597M 0% /run/user/1000
-
@salted_cashews and @Sebastian-Roth
I believe the problem is coming from the second
-d
Particularly in the naming of the image, it appears to be doing:
zstdmt -d /images/PPS_v9.0R2
Then get’s a second-d
from the-dev_CentOS/d1p1.image
So it’s literally, I think, doing:
zstdmt -d ev_CentOS/d1p1.img
Does this make sense?
I think the
-
in the image name is causing issues parsing into the zstdmt command. The reason it doesn’t impact the Ciara_CentOS-BASEmk3 is because, likely, there is no argument for-B
is it just uses it like a normal string.Maybe we need to add some quoting to the scripting?
-
@Tom-Elliott Interesting catch, would just renaming and re-associating via the Web GUI work to fix this?