Development FOG not capturing image - PartClone update
-
@ty900000 Sorry we’ve lost track of this. Tom just pushed a change to the repo some days ago that might address your issue. Please download the latest
init.xz
/init_32.xz
files from our build server and see if that works. Otherwise we need to do a debug capture task to get more of the error message. -
No worries! The holidays took up so much time, I haven’t had much time to work on this to expand it to non-RedHat distros.
I updated both init.xz and init_32.xz and get a similar error. I had been seeing this before, too. I definitely have a large enough drive for the image. 140-ish GB /images with the image that needs to be taken is only a 50GB disk. I noticed this when I updated to 1.5.7.86 and then subsequent updates. I noticed PartClone got updated and that’s when things started to break. If I do a new install of 1.5.7 with the old version of PartClone, everything works fine. How do I enable debug capture for iPXE? Thanks!!
-
@ty900000 I just pushed another update, though it may be a little while before the artifacts are ready for testing.
I’m fairly sure the issue here has to be the FIFO. I’ve also gotten rid of the “Maybe check the fog server to ensure disk space is good to go” by providing the available disk space. It also adds the exact command that partclone is trying to use so we can see what’s going on.
2060 is just the case statement, so I don’t think it’s failing because of the case. I think it’s failing because the FIFO was still open. To combat this, I’ve added a 5 second wait to let the disk settle and release the information for the FIFO so we can remove it to recreate it later on.
-
I pulled the latest init and got a different error this time
-
@ty900000 Okay, do you mind running the capture using Debug? Cancel the task, and go to create it like you normally would, but before submitting it, there’s a checkbox that says Schedule as Debug.
It does mean a little extra work for you in that you will need to press enter twice to get to the shell.
At the shell type:
fog
Then you will need to press enter until the image completes. This method should at least allow you to capture the image. This is why I was adding the sleeps between. I see, now, that it’s not anything to do with that. I can’t imagine it’s the -a0 though. (I suppose maybe but I’m not quite sure right now).
-
@Tom-Elliott I think it’s more likely to be caused by
partclone.imager
being broken in current 0.3.12Note how the detected size of the partition is 0 by partclone.
-
@Quazz Yeah, but it’s broke to the -a0 and quite possibly the -c option I think.
It’s strange as the -c seems almost redundant here.
Though, when I ran into the issue (which prompted me to try running in debug so I could more directly narrow down the issue), from debug everything worked without an issue.
-
@Tom-Elliott I am fairly confident the -a0 is a bug, since it is listed in its options, but isn’t picked up for use.
-c was removed for dd (it’s implied I guess??)
Interesting you should mention it not occuring in debug. I have seen this problem before, but that was on… unreliable devices so didn’t think much of it when I couldn’t replicate it on other devices.
-
@ty900000 said in FOG/Apache PKI/Certificate Authentication:
I pulled the latest init and got a different error this time
Wait a second. Where did you pull it from? Did you use these ones? https://dev.fogproject.org/blue/organizations/jenkins/fos/detail/master/113/artifacts
-
@Sebastian-Roth He did, I can see the changes I created in the output.
-
I stepped through everything until it halted. Pressing [Enter] here doesn’t do anything.
-
@Tom-Elliott I really wonder why we don’t see other people report this error. Were you actually able to replicate this? Maybe this is just some RAM issue that causes binaries to fail on this particular machine!?
By the way, @ty900000 would you mind opening a new topic for this? Better to keep things sorted. I can move all the related messages over…
-
Would you mind trying the latest inits from: https://dev.fogproject.org/job/fos/job/master/lastSuccessfulBuild/
The init.xz and init_32.xz should be good.
Essentially I’m having a check on the partclone to be used and removing a couple of arguments as they are not built during the configuration and build of partclone.
-
Yes! It worked perfectly. I’ve tested it a bunch of times and it works great. I do get this output after one of the partitions. It doesn’t affect anything it seems, but I’ve just noticed it.
-
@Tom-Elliott Are you able to replicate the issue as seen in the pictures?
@ty900000 Does this happen on several machines? All the same model or different ones? -
To start, I am using Hyper-V for everything. Yes, I do get that above image when I try to capture other images - either Windows or Linux. When I attempt to deploy the Windows image (the original image I’ve been trying to take), I get this error. But it does seem to complete. It does something similar for the the Linux image.
-
@ty900000 @Sebastian-Roth
I haven’t replicated, but to be fair I also haven’t watched that closely. We did image one machine yesterday and all seemed fine.Looking at my images folder, however, I do notice that I’m missing the “imager” partition from my image. Luckily I had another image of the machine that did have the missing partition.
I pushed another fix and believe the issue was as @Quazz noted is the -c argument was missing. Strange as that is, as the -c argument doesn’t appear to be a part of the spec list (unless somebody already added that to the patch for partclone and I didn’t know it?)
This will take a while to build of course as I only just pushed it.
-
@Tom-Elliott said in Development FOG not capturing image - PartClone update:
Strange as that is, as the -c argument doesn’t appear to be a part of the spec list
I think the -c is important to make partclone.imager actually use the parclone image format.
I haven’t replicated.
My guess is that this is something specific to Hyper-V or maybe even just @ty900000’s setup. Not saying we shouldn’t try to figure this out and eventually fix if it’s in the inits. My feeling is that this is not about partclone command line parameters or anything.
@ty900000 Please do me a favor and play with the the image’s setting Image Manager. Try Partclone Zstd if you have used Gzip since and even more so try out Partclone Uncompressed! Capture the image with these changed settings once more and let us know if it makes any difference.
-
@Tom-Elliott said in Development FOG not capturing image - PartClone update:
I pushed another fix and believe the issue was as @Quazz noted is the -c argument was missing. Strange as that is, as the -c argument doesn’t appear to be a part of the spec list (unless somebody already added that to the patch for partclone and I didn’t know it?)
My understanding of it is this:
- images captured with
partclone.imager -c
need to be restored withpartclone.restore
, raw images captured withpartclone.dd
or withpartclone.imager
without-c
need to be restored withpartclone.dd
- the options for
partclone.imager
are basically the same as for anypartclone.$ftype
(including-c
and-a
), it’spartclone.dd
that is a special case. The symbolDD
only pertains topartclone.dd
, the symbolIMG
is specifically defined inMakefile
forpartclone.imager
, but never gets used in upstream Partclone code.
- images captured with
-
I did test with several of the other compression types. As far as I can figure the Partclone Uncompressed is the only one that throws the above error with the 1 minute time out. And I also noticed it is not capturing sda3 - which is the raw partition on my particular Windows image. I think when it gets done with sda2 and attempts to move to sda3, it throws the Usage: unset_name [OPTIONS] error from farther above.
The only reason I am using Uncompressed is because the Gzip and Zstd image captures with 6 compression were slower than I remember them being in older (much older) versions of FOG - about 3 GB/min capture and deploy whereas the uncompressed was 6+GB/min, which I know is about line speed. We had a server at my last job and we were getting 6+GB/min when we captured and deployed a machine with Zstd level 6 compression. I need to do some testing tomorrow with a physical machine capture and deploy to see what’s going on with my network…