Deployment stuck in a loop, never finishes imaging?

salted_cashews

/images/PPS_v9.0R2-dev_CentOS/d1p3.img: data

Sebastian Roth

@salted_cashews Please run the same for all image files in that directory: file /images/PPS_v9.0R2-dev_CentOS/d1p*

salted_cashews

@Sebastian-Roth said in Deployment stuck in a loop, never finishes imaging?:

file /images/PPS_v9.0R2-dev_CentOS/d1p*

/images/PPS_v9.0R2-dev_CentOS/d1p1.img: data
/images/PPS_v9.0R2-dev_CentOS/d1p2.img: data
/images/PPS_v9.0R2-dev_CentOS/d1p3.img: data
/images/PPS_v9.0R2-dev_CentOS/d1p4.ebr: DOS/MBR boot sector; partition 1 : ID=0x82, start-CHS (0x1bf,247,57), end-CHS (0x2d9,99,10), startsector 8192, 20971520 sectors, extended partition table (last)
/images/PPS_v9.0R2-dev_CentOS/d1p5.ebr: data

Sebastian Roth

@salted_cashews Possibly this version of file does not detect Zstd compressed files. Please try to manually extract the image to see if that works properly:

zstdmt -d /images/PPS_v9.0R2-dev_CentOS/d1p3.img -o /images/PPS_v9.0R2-dev_CentOS/d1p3_extracted.dat

See if that triggers an error or not.

Hint: You might need to install package zsdt on your FOG server.

salted_cashews

@Sebastian-Roth

-dev_CentOS/d1p3.img : 4201 MB...     -dev_CentOS/d1p3.img : Read error (39) : premature end

Sebastian Roth

@salted_cashews Make sure you have enough space on your disk: df -h

Now as a test, please do the same with another image file:

zstdmt -d /images/PPS_v9.0R2-dev_CentOS/d1p1.img -o /images/PPS_v9.0R2-dev_CentOS/d1p1_extracted.dat

From my point of view the manual extraction test should give you an error if the image file is fine.

salted_cashews

@Sebastian-Roth

zstdmt -d /images/Ciara_CentOS-BASEmk3/d1p1.img -o /images/Ciara_CentOS-BASEmk3/d1p1_extracted.dat
/images/Ciara_CentOS-BASEmk3/d1p1.img: 241217197 bytes

No error this time, the error I received running it manually is the same error that displays during the task as well (on the partclone progress screen).

df -h

Filesystem                      Size  Used Avail Use% Mounted on
udev                            2.9G     0  2.9G   0% /dev
tmpfs                           597M   61M  537M  11% /run
/dev/mapper/FOG--DHCP--vg-root   24G  7.7G   15G  36% /
tmpfs                           3.0G     0  3.0G   0% /dev/shm
tmpfs                           5.0M     0  5.0M   0% /run/lock
tmpfs                           3.0G     0  3.0G   0% /sys/fs/cgroup
/dev/sdb1                       1.8T  1.4T  334G  81% /images
/dev/sda1                       472M  108M  341M  24% /boot
tmpfs                           597M     0  597M   0% /run/user/1000

Tom Elliott

@salted_cashews and @Sebastian-Roth

I believe the problem is coming from the second -d

Particularly in the naming of the image, it appears to be doing:

zstdmt -d /images/PPS_v9.0R2 Then get’s a second -d from the -dev_CentOS/d1p1.image

So it’s literally, I think, doing:

zstdmt -d ev_CentOS/d1p1.img

Does this make sense?

I think the - in the image name is causing issues parsing into the zstdmt command. The reason it doesn’t impact the Ciara_CentOS-BASEmk3 is because, likely, there is no argument for -B is it just uses it like a normal string.

Maybe we need to add some quoting to the scripting?

salted_cashews

@Tom-Elliott Interesting catch, would just renaming and re-associating via the Web GUI work to fix this?

Tom Elliott

@salted_cashews I think so, yes. Rename the image to something without a - maybe?

salted_cashews

@Tom-Elliott Renamed, giving it another shot.

salted_cashews

@Tom-Elliott
Here’s the result after renaming using a “_”. We have other images using “-” and “.”, would this be best practice to avoid even if they still work fine (haven’t had this issue until now)?

Image failed to restore and exited with exit code 1 (writeImage)
   Info: Partclone v0.2.89 http://partclone.org
Starting to restore image (-) to device (/dev/sda3)
note: Storage Location 10.10.100.252:/images/, Image name PPS_v9.0R2_dev_CentOS
we need memory: 208468 bytes
image head 4160, bitmap 200208, crc 4100 bytes
Calculating bitmap... Please wait... get device size 53687091200 by ioctl BLKGETSIZE64,
done!
File system:  EXTFS
Device size:    6.6 GB = 1601624 Blocks
Space in use:   4.5 GB = 1097323 Blocks
Free Space:     2.1 GB = 504301 Blocks
Block size:   4096 Byte
read ERROR:No such file or directory
   Args Passed: /images/PPS_v9.0R2_dev_CentOS/d1p3.img* /dev/sda3

#                          Will continue in 1 minute                         #


 * Press [Enter] key to continue

 * Resizing extfs volume (/dev/sda3).................Failed
 * Press [Enter] key to continue

#                         An error has been detected!                        #
Could not check before resize (expandPartition)
   Info: /dev/sda3: Inode 393217 has an invalid extent
        (logical block 0, invalid physical block 1609504, len 26)


/dev/sda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
        (i.e., without -a or -p options)
   Args Passed: /dev/sda3 :4:5

Kernel variables and settings:
loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=10.10.100.252/fog/ consoleblank=0 rootfstype=ext4 mac=60:45:cb:a2:86:eb ftp=10.10.100.252 storage=10.10.100.252:/images/ storageip=10.10.100.252 osid=50 irqpoll hostname=Kaibab-20 chkdsk=0 img=PPS_v9.0R2_dev_CentOS imgType=n imgPartitionType=all imgid=333 imgFormat=5 PIGZ_COMP=-19 hostearly=1 isdebug=yes type=down```

Tom Elliott

@salted_cashews The - ultimately shouldn’t matter in the context of the imaging process.

We pipe the image into a FIFO (First In First Out) file called /tmp/pigz, then we perform the decompression and write to disk from that filename. So no, it shouldn’t matter how the image is named. In particular, howeve,r is the fact that the image keeps saying no such file or directory called d1p3.img*. The * is in the case of “split” file formatting. (d1p3.img.001, d1p3.img.002, etc…) but should work regardless of the file as long as the file exists.

Looking through the thread, it appears each time asking for information that the file name’s are varying. I’m sure you have the files though, exact comparisons would be great.

In the case of the image named PPS_v9.0R2_dev_CentOS, You have d1p1.img and d1p2.img. So why is it failing on d1p3.img?

The scripting uses the same formatting for each image file. So when it comes to partition 1 it’s running the arg passed as /image/PPS_v9.0R2_dev_CentOS/d1p1.img* as well as d1p2.img* for partition 2. So it’s really confusing to me that it keeps failing on that same file.

Can you get us files inside of /images/PPS_v9.0R2_dev_CentOS/?

ls -lhart /images/PPS_v9.0R2_dev_CentOS/

I’m sure this is redundant and all, but just trying to get the full scope.

salted_cashews

@Tom-Elliott
No problem, here’s the output:

administrator@FOG-DHCP:/images$ ls -lhart /images/PPS_v9.0R2_dev_CentOS/
total 30G
-rwxrwxrwx  1 root          root             5 May  2 07:27 d1.fixed_size_partitions
-rwxrwxrwx  1 root          root           368 May  2 07:27 d1.partitions
-rwxrwxrwx  1 root          root            48 May  2 07:29 d1.original.fstypes
-rwxrwxrwx  1 root          root             0 May  2 07:29 d1.has_grub
-rwxrwxrwx  1 root          root          1.0M May  2 07:29 d1.mbr
-rwxrwxrwx  1 root          root           368 May  2 07:29 d1.minimum.partitions
-rwxrwxrwx  1 root          root           512 May  2 07:29 d1p5.ebr
-rwxrwxrwx  1 root          root          173M May  2 07:29 d1p1.img
-rwxrwxrwx  1 root          root           29G May  2 08:12 d1p2.img
-rwxrwxrwx  1 root          root           512 May  2 08:16 d1p4.ebr
-rwxrwxrwx  1 root          root            47 May  2 08:16 d1.original.swapuuids
-rwxrwxrwx  1 root          root          1.1G May  2 08:16 d1p3.img
drwxrwxrwx  2 root          root          4.0K May 13 12:29 .
-rw-rw-r--  1 administrator administrator  90M May 13 12:29 d1p1_extracted.dat
drwxrwxrwx 40 fog           root          4.0K May 13 13:32 ..

This certainly is a weird occurrence, I’ve captured and deployed a few images since then so I know nothing has been permanently botched. I know getting that fsck error usually means a corrupt file system correct?

Tom Elliott

@salted_cashews IN the d1.fixed_size_partitions, can you add :3 to the file? And see if that helps at all?

I’m grasping at straws and just guessing at what may be the issue. I don’t know what is the issue right now, of course, so I don’t expect any miracles here. But just testing a theory.

salted_cashews

@Tom-Elliott

:3:4:5

Like so?

Tom Elliott

@salted_cashews that works yes

Sebastian Roth

@salted_cashews We have not looked at the partition layout yet at all. Probably just because the error did not seem to point that way. Thanks Tom for mentioning that in a chat session!

Can you please post the contents of the text files: d1.fixed_size_partitions (before and after changing it), d1.partitions, d1.minimum.partitions and d1.original.fstypes.

salted_cashews

@Sebastian-Roth

Before:

administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.fixed_size_partitions
:4:5

After:

administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.fixed_size_partitions
:3:4:5

administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.partitions
label: dos
label-id: 0x000b34c0
device: /dev/sda
unit: sectors

/dev/sda1 : start=        2048, size=     2097152, type=83, bootable
/dev/sda2 : start=     2099200, size=   262144000, type=83
/dev/sda3 : start=   264243200, size=   104857600, type=83
/dev/sda4 : start=   369100800, size=   568602112, type=5
/dev/sda5 : start=   369108992, size=    20971520, type=82

administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.minimum.partitions
label: dos
label-id: 0x000b34c0
device: /dev/sda
unit: sectors

/dev/sda1 : start=        2048, size=      531963, type=83, bootable
/dev/sda2 : start=     2099200, size=    95734010, type=83
/dev/sda3 : start=   264243200, size=    13453641, type=83
/dev/sda4 : start=   369100800, size=   568602112, type=5
/dev/sda5 : start=   369108992, size=    20971520, type=82

administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.original.fstypes
/dev/sda1 extfs
/dev/sda2 extfs
/dev/sda3 extfs

Sebastian Roth

@salted_cashews As far as I can see the partition layout looks fine. Don’t think there is anything wrong with it - at least not something obvious.

So we are back to the point where you tried to manually extract d1p3.img and that failed with “Read error (39) : premature end”. Asking the search engine of choice we get some interesting bug report here pointing to a fix that was pushed only a month ago. Not exactly sure but to me it sounds like this could be it. @Tom-Elliott @Junkhacker what do you think?

Deployment stuck in a loop, never finishes imaging?

130

12.6k

17.5k

156.3k