Deployment stuck in a loop, never finishes imaging?
So I’ve set deployment tasks to run and they’ve been running for approximately 7hrs. The usual time is around 1hr.
I’ve observed that it keeps trying to deploy the same partition over and over again. The only thing that changed were the following settings under the Web GUI’s “Fog Settings”:
FOG_IMAGE_COMPRESSION_FORMAT_DEFAULT = PartClone Zstd
FOG_PIGZ_COMP = 19
The only thing that doesn’t make sense to me is all of our images are using that compression format and with a level of 19, so I don’t see why this would be a problem. Strangely, I was able to deploy 2 other images perfectly fine. Any ideas? Fog v 1.4.4.
I noticed it says look at /var/log/partclone.log but there doesn’t seem to be one.
@Sebastian-Roth Yeah, it was honestly my fault. Normally I’d capture and deploy immediately after to test, but these past few weeks have been ridiculously insane for me and my machine to image allocations haven’t been as up to date as they normally would’ve been.
I noticed this happened once before with a Windows image, but it was related to the dirty bit and another issue I found here. All in all I wanted to at least report and give this a go, I apologize for taking your time but as always appreciate the help. If this does happen again, I’ll be sure to give those kernels a go and report back here the results.
@salted_cashews So the master installation is lost? Too bad. Think there is nothing we can do as the last image capture is incomplete and we can’t get that extracted. Hmmmmmm
@Junkhacker Ah, that makes sense now that I think about it.
@salted_cashews i think the issue you’re describing is because the FOS hits the problem and bails before the step where it checks in with the web interface to say it’s done.
@Sebastian-Roth Haha, no worries. Unfortunately it didn’t work but this certainly was an interesting learning experience. This has made me curious: Does the task via the GUI and task on the client both have to be restarted (i.e. the computer rebooted and the old task killed, new task created) in order for an image to be deployed? I only ask because I’ve been using debug mode quite a bit but noticed I couldn’t start a new task without rebooting the client first. I would try to schedule a new task via the GUI but it would only take the old one when I ran “fog” at the CLI. From my perspective this makes sense as the FOS is deployed after PXE boot and that FOS contains only the information it was given from that correct?
@salted_cashews Strange, no idea why that is?!
@Sebastian-Roth True, I’ve elected to at least give it a go. I did notice the image progress screen was a bit off, is this the init/kernel?
@salted_cashews I am fairly sure the image was corrupted when capturing it already. So I don’t think that deploying with the new init/kernel will help. But you might still give it a try. You never know.
@Sebastian-Roth Unfortunately the machine in question has been imaged over, so the original image that is now broken no longer exists aside from the copy on the FOG server. Should I still try using those Kernels on deployment? I know you said the capture might’ve been the problem though.
@salted_cashews Ok, just finished building new inits with Zstd 1.4.0 that you can try out. I’d suggest you put the kernel and init binary alongside with the original ones instead of swapping those. This way you can test without causing any harm.
On you FOG server run (suppose you have 64 bit machines here):
sudo su - cd /var/www/html/fog/service/ipxe wget https://fogproject.org/kernels/Kernel.TomElliott.126.96.36.199 wget https://fogproject.org/inits/init_zstd-1.4.0.xz chmod 666 Kernel.TomElliott.188.8.131.52 init_zstd-1.4.0.xz
Better if you do a
chown ...on the files but as I don’t know your OS webserver username I thought I’d do it this way.
Now go to the FOG web UI, edit the hosts settings of the machine you capture the image from and set Host Kernel to
Kernel.TomElliott.184.108.40.206and Host Init to
init_zstd-1.4.0.xz. Now schedule a capture task, let it grab the whole image and then try to deploy that new image again.
Not sure if you need to set Host Kernel and Host Init on the deploy host as well. From my point of view it’s the capture that breaks the image and deployment using the old-fashioned kernel/init might still work.
@Sebastian-Roth Certainly so! I won’t have to worry about these new kernels breaking any working images would I? Would making a backup beforehand be a wise choice?
Also, I appreciate the work you guys do here very much. I understand you’re all doing this in your own time and that changes might come sooner or later. I’ve learned quite a lot from using FOG as a whole and speaking with you and everyone on the forums. Anything I can do to help I’m willing to jump in.
@salted_cashews Would you be keen to test new init files (manual download of inits and possibly kernel binaries) if I provide those for you? Should find enough time by the end of this week.
@salted_cashews No, we have not included this patch to FOG yet. This is new to me as well. Just stumbled upon this when looking for a solution.
A new version (1.5.6) was released a couple of days ago but as you see we are always on the edge of trying to solve all the issues that arise. So new releases will come… You need to know that we don’t have a strict release schedule. We are a very small team of developers and sometimes there is very little time while we have a bit more at other times. So sometimes there might be a new release in just two weeks and sometimes it takes months.
@Sebastian-Roth Interesting, so a simple update to the latest build seems to be a solution? I remember you saying (about a month ago no less) that you guys were releasing a new version. Interesting, I really need to keep hounding my superiors lol.
@salted_cashews As far as I can see the partition layout looks fine. Don’t think there is anything wrong with it - at least not something obvious.
So we are back to the point where you tried to manually extract
d1p3.imgand that failed with “Read error (39) : premature end”. Asking the search engine of choice we get some interesting bug report here pointing to a fix that was pushed only a month ago. Not exactly sure but to me it sounds like this could be it. @Tom-Elliott @Junkhacker what do you think?
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.fixed_size_partitions :4:5
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.fixed_size_partitions :3:4:5
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.partitions label: dos label-id: 0x000b34c0 device: /dev/sda unit: sectors /dev/sda1 : start= 2048, size= 2097152, type=83, bootable /dev/sda2 : start= 2099200, size= 262144000, type=83 /dev/sda3 : start= 264243200, size= 104857600, type=83 /dev/sda4 : start= 369100800, size= 568602112, type=5 /dev/sda5 : start= 369108992, size= 20971520, type=82
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.minimum.partitions label: dos label-id: 0x000b34c0 device: /dev/sda unit: sectors /dev/sda1 : start= 2048, size= 531963, type=83, bootable /dev/sda2 : start= 2099200, size= 95734010, type=83 /dev/sda3 : start= 264243200, size= 13453641, type=83 /dev/sda4 : start= 369100800, size= 568602112, type=5 /dev/sda5 : start= 369108992, size= 20971520, type=82
administrator@FOG-DHCP:/images/PPS_v9.0R2_dev_CentOS$ cat d1.original.fstypes /dev/sda1 extfs /dev/sda2 extfs /dev/sda3 extfs
@salted_cashews We have not looked at the partition layout yet at all. Probably just because the error did not seem to point that way. Thanks Tom for mentioning that in a chat session!
Can you please post the contents of the text files:
d1.fixed_size_partitions(before and after changing it),
@salted_cashews that works yes