Deployment stuck in a loop, never finishes imaging?
-
@salted_cashews and @Sebastian-Roth
I believe the problem is coming from the second
-d
Particularly in the naming of the image, it appears to be doing:
zstdmt -d /images/PPS_v9.0R2
Then get’s a second-d
from the-dev_CentOS/d1p1.image
So it’s literally, I think, doing:
zstdmt -d ev_CentOS/d1p1.img
Does this make sense?
I think the
-
in the image name is causing issues parsing into the zstdmt command. The reason it doesn’t impact the Ciara_CentOS-BASEmk3 is because, likely, there is no argument for-B
is it just uses it like a normal string.Maybe we need to add some quoting to the scripting?
-
@Tom-Elliott Interesting catch, would just renaming and re-associating via the Web GUI work to fix this?
-
@salted_cashews I think so, yes. Rename the image to something without a - maybe?
-
@Tom-Elliott Renamed, giving it another shot.
-
@Tom-Elliott
Here’s the result after renaming using a “_”. We have other images using “-” and “.”, would this be best practice to avoid even if they still work fine (haven’t had this issue until now)?Image failed to restore and exited with exit code 1 (writeImage) Info: Partclone v0.2.89 http://partclone.org Starting to restore image (-) to device (/dev/sda3) note: Storage Location 10.10.100.252:/images/, Image name PPS_v9.0R2_dev_CentOS we need memory: 208468 bytes image head 4160, bitmap 200208, crc 4100 bytes Calculating bitmap... Please wait... get device size 53687091200 by ioctl BLKGETSIZE64, done! File system: EXTFS Device size: 6.6 GB = 1601624 Blocks Space in use: 4.5 GB = 1097323 Blocks Free Space: 2.1 GB = 504301 Blocks Block size: 4096 Byte read ERROR:No such file or directory Args Passed: /images/PPS_v9.0R2_dev_CentOS/d1p3.img* /dev/sda3 # Will continue in 1 minute # * Press [Enter] key to continue * Resizing extfs volume (/dev/sda3).................Failed * Press [Enter] key to continue # An error has been detected! # Could not check before resize (expandPartition) Info: /dev/sda3: Inode 393217 has an invalid extent (logical block 0, invalid physical block 1609504, len 26) /dev/sda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options) Args Passed: /dev/sda3 :4:5 Kernel variables and settings: loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=10.10.100.252/fog/ consoleblank=0 rootfstype=ext4 mac=60:45:cb:a2:86:eb ftp=10.10.100.252 storage=10.10.100.252:/images/ storageip=10.10.100.252 osid=50 irqpoll hostname=Kaibab-20 chkdsk=0 img=PPS_v9.0R2_dev_CentOS imgType=n imgPartitionType=all imgid=333 imgFormat=5 PIGZ_COMP=-19 hostearly=1 isdebug=yes type=down```
-
@salted_cashews The - ultimately shouldn’t matter in the context of the imaging process.
We pipe the image into a FIFO (First In First Out) file called /tmp/pigz, then we perform the decompression and write to disk from that filename. So no, it shouldn’t matter how the image is named. In particular, howeve,r is the fact that the image keeps saying no such file or directory called d1p3.img*. The * is in the case of “split” file formatting. (d1p3.img.001, d1p3.img.002, etc…) but should work regardless of the file as long as the file exists.
Looking through the thread, it appears each time asking for information that the file name’s are varying. I’m sure you have the files though, exact comparisons would be great.
In the case of the image named PPS_v9.0R2_dev_CentOS, You have d1p1.img and d1p2.img. So why is it failing on d1p3.img?
The scripting uses the same formatting for each image file. So when it comes to partition 1 it’s running the arg passed as /image/PPS_v9.0R2_dev_CentOS/d1p1.img* as well as d1p2.img* for partition 2. So it’s really confusing to me that it keeps failing on that same file.
Can you get us files inside of /images/PPS_v9.0R2_dev_CentOS/?
ls -lhart /images/PPS_v9.0R2_dev_CentOS/
I’m sure this is redundant and all, but just trying to get the full scope.
-
@Tom-Elliott
No problem, here’s the output:administrator@FOG-DHCP:/images$ ls -lhart /images/PPS_v9.0R2_dev_CentOS/ total 30G -rwxrwxrwx 1 root root 5 May 2 07:27 d1.fixed_size_partitions -rwxrwxrwx 1 root root 368 May 2 07:27 d1.partitions -rwxrwxrwx 1 root root 48 May 2 07:29 d1.original.fstypes -rwxrwxrwx 1 root root 0 May 2 07:29 d1.has_grub -rwxrwxrwx 1 root root 1.0M May 2 07:29 d1.mbr -rwxrwxrwx 1 root root 368 May 2 07:29 d1.minimum.partitions -rwxrwxrwx 1 root root 512 May 2 07:29 d1p5.ebr -rwxrwxrwx 1 root root 173M May 2 07:29 d1p1.img -rwxrwxrwx 1 root root 29G May 2 08:12 d1p2.img -rwxrwxrwx 1 root root 512 May 2 08:16 d1p4.ebr -rwxrwxrwx 1 root root 47 May 2 08:16 d1.original.swapuuids -rwxrwxrwx 1 root root 1.1G May 2 08:16 d1p3.img drwxrwxrwx 2 root root 4.0K May 13 12:29 . -rw-rw-r-- 1 administrator administrator 90M May 13 12:29 d1p1_extracted.dat drwxrwxrwx 40 fog root 4.0K May 13 13:32 ..
This certainly is a weird occurrence, I’ve captured and deployed a few images since then so I know nothing has been permanently botched. I know getting that fsck error usually means a corrupt file system correct?
-
@salted_cashews IN the d1.fixed_size_partitions, can you add
:3
to the file? And see if that helps at all?I’m grasping at straws and just guessing at what may be the issue. I don’t know what is the issue right now, of course, so I don’t expect any miracles here. But just testing a theory.
-
-
@salted_cashews that works yes
-
@salted_cashews We have not looked at the partition layout yet at all. Probably just because the error did not seem to point that way. Thanks Tom for mentioning that in a chat session!
Can you please post the contents of the text files:
d1.fixed_size_partitions
(before and after changing it),d1.partitions
,d1.minimum.partitions
andd1.original.fstypes
. -
Before:
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.fixed_size_partitions :4:5
After:
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.fixed_size_partitions :3:4:5
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.partitions label: dos label-id: 0x000b34c0 device: /dev/sda unit: sectors /dev/sda1 : start= 2048, size= 2097152, type=83, bootable /dev/sda2 : start= 2099200, size= 262144000, type=83 /dev/sda3 : start= 264243200, size= 104857600, type=83 /dev/sda4 : start= 369100800, size= 568602112, type=5 /dev/sda5 : start= 369108992, size= 20971520, type=82
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.minimum.partitions label: dos label-id: 0x000b34c0 device: /dev/sda unit: sectors /dev/sda1 : start= 2048, size= 531963, type=83, bootable /dev/sda2 : start= 2099200, size= 95734010, type=83 /dev/sda3 : start= 264243200, size= 13453641, type=83 /dev/sda4 : start= 369100800, size= 568602112, type=5 /dev/sda5 : start= 369108992, size= 20971520, type=82
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.original.fstypes /dev/sda1 extfs /dev/sda2 extfs /dev/sda3 extfs
-
@salted_cashews As far as I can see the partition layout looks fine. Don’t think there is anything wrong with it - at least not something obvious.
So we are back to the point where you tried to manually extract
d1p3.img
and that failed with “Read error (39) : premature end”. Asking the search engine of choice we get some interesting bug report here pointing to a fix that was pushed only a month ago. Not exactly sure but to me it sounds like this could be it. @Tom-Elliott @Junkhacker what do you think? -
@Sebastian-Roth Interesting, so a simple update to the latest build seems to be a solution? I remember you saying (about a month ago no less) that you guys were releasing a new version. Interesting, I really need to keep hounding my superiors lol.
-
@salted_cashews No, we have not included this patch to FOG yet. This is new to me as well. Just stumbled upon this when looking for a solution.
A new version (1.5.6) was released a couple of days ago but as you see we are always on the edge of trying to solve all the issues that arise. So new releases will come… You need to know that we don’t have a strict release schedule. We are a very small team of developers and sometimes there is very little time while we have a bit more at other times. So sometimes there might be a new release in just two weeks and sometimes it takes months.
-
@salted_cashews Would you be keen to test new init files (manual download of inits and possibly kernel binaries) if I provide those for you? Should find enough time by the end of this week.
-
@Sebastian-Roth Certainly so! I won’t have to worry about these new kernels breaking any working images would I? Would making a backup beforehand be a wise choice?
Also, I appreciate the work you guys do here very much. I understand you’re all doing this in your own time and that changes might come sooner or later. I’ve learned quite a lot from using FOG as a whole and speaking with you and everyone on the forums. Anything I can do to help I’m willing to jump in.
-
@salted_cashews Ok, just finished building new inits with Zstd 1.4.0 that you can try out. I’d suggest you put the kernel and init binary alongside with the original ones instead of swapping those. This way you can test without causing any harm.
On you FOG server run (suppose you have 64 bit machines here):
sudo su - cd /var/www/html/fog/service/ipxe wget https://fogproject.org/kernels/Kernel.TomElliott.4.19.36.64 wget https://fogproject.org/inits/init_zstd-1.4.0.xz chmod 666 Kernel.TomElliott.4.19.36.64 init_zstd-1.4.0.xz
Better if you do a
chown ...
on the files but as I don’t know your OS webserver username I thought I’d do it this way.Now go to the FOG web UI, edit the hosts settings of the machine you capture the image from and set Host Kernel to
Kernel.TomElliott.4.19.36.64
and Host Init toinit_zstd-1.4.0.xz
. Now schedule a capture task, let it grab the whole image and then try to deploy that new image again.Not sure if you need to set Host Kernel and Host Init on the deploy host as well. From my point of view it’s the capture that breaks the image and deployment using the old-fashioned kernel/init might still work.
-
@Sebastian-Roth Unfortunately the machine in question has been imaged over, so the original image that is now broken no longer exists aside from the copy on the FOG server. Should I still try using those Kernels on deployment? I know you said the capture might’ve been the problem though.
-
@salted_cashews I am fairly sure the image was corrupted when capturing it already. So I don’t think that deploying with the new init/kernel will help. But you might still give it a try. You never know.