Images (multiple) failing deployment at 32%, 15%, or others (always the same point for each image)

Rama

Hi, I have multiple images that used to be fine, but are now failing to deploy. I recently upgraded the bzImage kernel to get around some graphics issues with newer laptops, but apart from that nothing else has changed (yes I have tried deploying with old kernels, either specifying them in ‘host kernel’ slot in host management page, or renaming bzImage with older file version, neither work).

The exact error (was only able to capture with video shot on phone, as was too quick for the eye to read when it failed!):

performing cleanup stage 1… done
changing hostname… ntfs_mst_post_read_fixup: magic 0x71d40e27 size: 1024 usa_ofs: 16406 usa_count: 41949: Invalid argument
Record 0 has no FILE magic (0x71d40e27 )
Failed to load $MFT: Input/output error
Failed to mount /dev/sda2: Input/output error
NTFS is either inconsistent, or there is a hardware fault, or its a SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on windows then reboot.
…
…
unmount: can’t unmount /ntfs: Invalid argument
Done.

EDIT: sorry I should mention I’m running fog 0.32 on a linux(CentOS) VM in Virtualbox.

P.S. It seems like possible file corruption? which may have happened when we added some space on the images drive (a resizeable LVM partition).
Is there any way to mount the sys.img.000 partition and run an fdisk on it or similar? or is there any other way to check for file system errors?
thanks.

Tom Elliott

Is it always the same system giving the same problem? Or is it any system failing at the same point?

Rama

any system, same image. One very strange workaround I found for a laptop was to re-run host registration, then the image decided to deploy?!?! but didn’t work on any of the desktops

Rama

so far there are 3 images that I know of that are corrupted/not deploying. all of them are recent ones… before the disk resize happened.

Tom Elliott

Unfortunately, it sounds like the data is corrupted. However, this is dependent on the way the images folder is mounted on the system.

I’ve tried playing with glusterfs on my systems and, while the data itself was perfectly fine, it would act on the actual system like the data didn’t exist. I ended up closing my gluster systems off and just copying the data to the local drive.

No, there is no way to mount the files and perform an fsck/chkdsk of the system. What I’d recommend is trying to copy the image to the local filesystem (not a share) and try deploying using that method. It may take some configuration to get working again. If you can upload it that way, things should work.

I’m only telling you this because of my experiences. It sounds likely that it’s because you’re “sharing” the images filesystem with your fogserver. I could very well be wrong. In either case, one way you could verify is try the above or upload the image again.

Rama

hmm… the ‘images’ folder on the fog server is a local folder, its not a symlink or mapped to another network location if thats what you meant? the VM is running a .VDI with all the images on it, so as far as virtualbox/fog is concerned it is on a local disk?

Rama

what is the usual cause of corruption in the images? over the last 6-8 months we have had a few go bad, I would just recreate them if it was a one off, but this seemed to be a bunch so I wanted to make sure it was corruption before reinstalling.
We are running identical hardware (both with the same redundancy/mirroring) at another school and have had zero issues.

Images (multiple) failing deployment at 32%, 15%, or others (always the same point for each image)

224

12.2k

17.3k

155.5k