PC unbootable after capture fails
-
It looks like your using fog trunk, because 1.2.0 would reboot on failure in 2 minutes, not 1.
To debug, on the page where you would confirm creating the capture (or deploy) there is a checkbox for debug. Tick that and confirm.
Then, the host will boot FOS and you’ll be at a shell. Type
fog
and hit enter. Thus begins the task. You’ll be asked to press enter at every step. When the error occurs, you can cancel the reboot should it say that, and then go digging for the log.To cancel a shutdown in linux, it’s generally
shutdown -c
Debug will also allow you to take a picture of the actual error.
-
Thanks. Will do!
I’m running Fog 8515 on Ubuntu 14.04 -
What if you redo the check disk but add the
/R
switch? -
I scanned for bad sectors the day before yesterday, and there were none, so it should be fine.
Resizing the file system in debug mode now. There is no indication of progress, so I’m guessing that is uses ntfsresize, but the output is piped elsewhere? Is the output of ntfsresize logged somewhere?
-
There’s the problem!
The partclone log shows the same thing which I snapped here:
Is the “ntfs flag” the same thing as the “dirty flag”? I could clear it using
ntfsfix --clear-dirty /dev/sda2
, but I would like to see why it failed in the first place…But I tried it in any case. First a dry run:
Then the real deal:
With no luck.
-
@dolf It’s not a dirty flag that I’ve seen before - here’s an article on the Windows Dirty bit:
https://wiki.fogproject.org/wiki/index.php?title=Windows_Dirty_Bit -
@dolf During the capture debug task, can you run these commands?
fdisk -l
lsblk
Please give us the output of both.
-
-
@dolf is that from the reference machine or after deployment?
-
@Wayne-Workman What do you mean? It’s right after the previous images, same computer, same everything. The output of
lsblk
andfdisk -l
was taken right after theVolume is corrupt. You should run chkdsk.
error. -
Just to verify that this is not a hardware issue, I restored the image to another PC using CloneZilla, and tried to capture using FOG. Same results! The resize step totally corrupts the MFT, leaving the PC unbootable.
-
@dolf Just throwing this out there, is it possible to successfully capture a non-resizable multiple partition image and deploy it? This would just be a test to see if the resizing is the issue or not.
-
Is the partition table manipulated in any way when capturing a non-resizable multiple partition image? If not, it probably works just like CloneZilla, which I’m using now. And that works.
I’ll test your idea as soon as I have time. For now, I’m trying to work around the issue to save time. Working through the night to get the image ready. 200 PCs to deploy soon…
Where can I find the exact
ntfsresize
command used by FOS? I looked at thefog.upload
script while in FOS, and there was a call toshrinkPartition
or something like that, but I couldn’t find where the call tontfsresize
happens. I would like to type that exact command on a terminal and see what happens. -
@dolf said in PC unbootable after capture fails:
Where can I find the exact ntfsresize command used by FOS?
it’s in the init, but you can view the source code of the init in your trunk source. It’s here, line
196
to be exact.<Trunk Directory>/src/buildroot/package/fog/scripts/usr/share/fog/lib/funcs.sh
Also some stuff around
460
and490
and508
-
Back at it! I tried resizing with GParted, which is known to very carefully check everything before touching the drive. I simply booted GParted Live, and resized the big partition,
sda2
to a minimum. Here is the log: gparted_details.htmMaybe FOG could learn from (or even directly use) GParted in this regard
-
@dolf I am not sure why resizing isn’t working for you. I’ve created hundreds of images with fog - most re-sizable - for Windows 7, 8, 8.1, 10, ubuntu, CentOS, Fedora - I’ve not had the problems that you’ve had. All my co-workers use resizable. We have probably 30 different hardware models from various manufacturers at work, they all work fine with fog. Many community members here use resizable images, seldom do issues with resizing come up.
We need to troubleshoot what’s going on with your particular setup - and see what can be done.
I particularly think something is wrong with the MBR. After deploying a resizable image (captured by fog), you can boot to a linux live disk and likely be able to mount the HDD and read all the files just fine, copy to and fro, and run other diagnostics. I really doubt that the resizing is breaking it, I really think it’s something with the MBR.
As a sort of test, after capturing a resizable image with fog, you can trade out the mbr fog captured with the mbr that CloneZilla captured, set permissions, and try to deploy. See what happens.
-
@Wayne-Workman Good to hear that it works for you. The fact that it usually works, but didn’t work for me is the definition of an edge case. And things should not break when edge cases happen.
I just realized that I unknowingly tested exactly what you suggested, and that’s probably why it worked. When I try to resize the problematic image, however, I get this: gparted_details_bad.htm
Still, GParted wins, because it safely terminates before destroying the disk. FOG should, too.
This discussion shows that most people aren’t really sure why this happens. We could use the following algorithm to work around the problem (expanding on what GParted does):
increment := "1GB or a certain percentage of the disk size" partition = /dev/sda2 calibrate partition target_size := check file system on partition for errors and fix them and get estimate of smallest supported shrunken size if there are errors stop do simulate resizing to target_size target_size += increment while simulation fails and target_size < disk_size if target_size < disk_size // this means the simulation must have succeeded for the current value of target_size actually resize the file system actually resize the partition // note that file systems and partitions are not the same thing, and are not necessarily the same size... TODO: this is yet another edge case to consider // if all simulations failed, we just don't resize the disk, and the capture process can still continue uninterrupted
-
Sorry, actually no, the image where the resize succeeded has the same mbr, but fewer files in sda2 (about 10GB less than the one that fails to resize).
The suggestion for making the capture process safer still holds, though
I even tested it: If I resize to 70GB instead of the minimum (about 66GB), it works just fine. I suspect that it isn’t possible to know exactly what the minimum size of an NTFS partition will be without simulating. That’s probably why the authors of ntfsresize include messages like this (emphasis mine):
- Estimating smallest shrunken size supported …
- You might resize at 71189536768 bytes or 71190 MB (freeing 178764 MB).
- Please make a test run using both the -n and -s options before real resizing!
Luckily, simulation takes about 10 seconds for a 250GB drive, so it won’t be a large performance hit.
-
@dolf I agree with all of that. How good are you with shell script?
-
@dolf While I understand what you’re saying, I don’t think it should continue going. I agree it should not, in the least, actually resize the partition unless we know absolutely all will continue fine down the road (which is not very practical, as I don’t know of a way to “dry_run” the fog system before actually performing tasks to test for all these edge cases. The reason there are different image types (resize, non-resize, raw) is to allow people to use what will suit them best. If resize is going to cause issues, I think it wise to fail to upload, but not attempt altering the disk.
Can you post the contents of your image’s (broken please) d1.fixed_size_partitions file? I suspect what’s occurring is an unexpected partition is resizing, thus moving the start sector of the next partition. That I can fix, though I don’t know where to begin.