Unable to capture image

Ared

Hello,

I am consistently getting an error when capturing an image (single disk, multi-part) that says to “maybe check available disk space.”

My setup is currently composed of three machines:

dhcp server (Debian 9)
fog server (Centos 7)
img host (Ubuntu 18.04 & Windows 10)

The fog server root partition (images stored in /images) has several hundred GB available, and the img host disk is only 80GB. I found a previous post (https://forums.fogproject.org/topic/10967/error-capture-image/2) that prompted me to replace the hdd in the img host & fog server.

What I have tried:

SMART & on-board diagnostic test for hdd’s (fog server and img host) -> both passed.
Memtest86+ on fog server and img host -> both passed without error.
Replacing the hdd in the img host and also the fog server.
Replacing the switch (a 5 port to an old Cisco switch).
Replacing the NIC on the img host (It was previously undetected by fog, although the transfer process started).
Using dd instead instead of partclone.
Replacing the entire machine for the img host.
Re-installing fog (after replacing the hdd).
Running the DHCP server on the on the fog machine, rather than the Debian one.
Ensuring that the DHCP server has a very long lease (2 days)

I still get the same error each time. There was one exception, where the capture was successful with dd, as was deploying it on the same machine. After creating a new image and trying again as a test, it failed with the same error I was getting previously.

I have not yet tried:

A single partition image.

Other strange behavior:
Systemd reports that tftp.service goes down after a time, sometimes 10 min, sometimes several hours. Whether it goes down during the capture process, before, or after, it still fails with the same error.

Sebastian Roth

@Ared Don’t take the error message for granted. Sometimes things happen and we can’t properly detect what was causing it. Instead of just print an unknown error we suggest to check disk space as this is known to cause weird errors.

Please give it another try and take a picture of the client’s screen when you get to the error.

As well you might tell us which FOG version you are on.

Ared

The FOG version is 1.5.4. The problem with tftpd seems to have been caused by systemd and xinetd fighting over it, and has been resolved. I still get the same error during the capture though.

Also, /var/log/partclone.log does not exist.

george1421

@Ared The partclone log file is on the target computer. You will have to do a debug capture to get the log.

This is not a tftp issue either. If its anything its probably related to NFS.

Can you run these commands from the FOG server’s command prompt and post the outputs?
sudo showmount -e 127.0.0.1

then
sudo ls -la /images
and
sudo ls -la /images/dev

Ared

@george1421 Here is the output:

[root@fogtest tmp]# showmount -e 127.0.0.1
Export list for 127.0.0.1:
/images/dev *
/images     *

[root@fogtest tmp]# ls -la /images/
total 16
drwxrwxrwx.  4 fog  root 4096 Nov  1 14:19 .
dr-xr-xr-x. 22 root root 4096 Nov  6 13:29 ..
drwxrwxrwx.  4 fog  root 4096 Nov  1 14:28 dev
-rwxrwxrwx.  1 fog  root    0 Nov  1 14:19 .mntcheck
drwxrwxrwx.  2 fog  root 4096 Nov  1 14:19 postdownloadscripts

[root@fogtest tmp]# ls -la /images/dev
total 16
drwxrwxrwx. 4 fog root 4096 Nov  1 14:28 .
drwxrwxrwx. 4 fog root 4096 Nov  1 14:19 ..
drwxrwxrwx. 2 fog root 4096 Nov  6 14:05 782bcbb0e075
-rwxrwxrwx. 1 fog root    0 Nov  1 14:19 .mntcheck
drwxrwxrwx. 2 fog root 4096 Nov  1 14:19 postinitscripts

moderator note: fixed formatting

george1421

@Ared Ok the directories are shared via NFS and you have the correct permissions on the directory files. I see you have one failed upload 782bcbb0e075. Under normal circumstances there should be no directories in /images/dev with a mac address style directory name.

Ared

@george1421 Should I remove it and try again?

george1421

@Ared I’m trying to think of the best way to debug. I have seen this error before just trying to remember where/why.

Probably a debug capture and we can single step through the capture.

I didn’t see the directory for the system in the screen shot so that means that the image base directory isn’t being created. So its erroring out fairly early in the capture process.

george1421

@Ared Looking at your screen shot again. You are capturing a linux system a single partition on /dev/sda4?

edit: Ah mps == multiple partitions single disk (non-resizable)

george1421

Ok the more I think about it the more I think that error message isn’t accurate. Partclone is failing and not returning a valid error code, so the FOG code is “guessing” its a disk space issue.

Schedule another capture session, but before you hit the schedule task button, check the debug check box. Then pxe boot the target computer. After a few screens of text and enter key presses you will be dropped at a linux command prompt on the target computer. Key in fog and single step through the deployment until you get the error then inspect the partclone log on the target computer, also if you watch the partclone screen you might see the error briefly displayed.

Ared

@george1421 After doing so, the error was the same except that partclone failed on sda2:

Args Passed: /dev/sda2 1 /images/782bcbb0e075 all

partclone.log reports that “NTFS Volume ‘/dev/sda2’ is scheduled for a check or it was shutdown uncleanly. Please boot Windows or fix it by fsck.”

I will do this and attempt to recapture.

george1421

@Ared Bingo that’s the issue.

You will need to startup that reference image and check the disks for errors. Then when you are ready to capture shutdown the computer with this command (don’t use shutdown from the start menu)
shutdown -s -f -t 0 this command will ensure that windows is shutdown properly for image capture.

Ared

@george1421 The capture was successful.

I do not remember shutting down Windows unsafely (on either install), but it seems likely that I did. I had since shutdown from the start menu.

I did see this error:

0_1541609320335_678a406a-fdfc-41ef-b4cf-ea48b4d19bbb-image.png

I am not sure if it will cause any problems.

I plan on running more tests to ensure that everything is working properly, but it looks like the partclone capture problem has been solved. I am curious, however, as to why dd also failed. Shouldn’t the state of the filesystem(s) be irrelevant for dd?

Thank you for your help.

Unable to capture image

164

12.2k

17.3k

155.5k