UNSOLVED Capture failure since 1.3.5
- FOG Version: 1.4.0 RC-9.3
- OS: Ubuntu Mate 16.04 x64
- Service Version: N/A
- OS: N/A
During image capture, partclone suddenly exits with error code 141 despite the server having more than enough free space left on the images/dev partition. It always seems to exit in the same place as well.
Attempting to do a debug capture does the same thing, but doesn’t return control to the console, so I can’t grab the debugging log.
Server summary screen after capture failure:
Task management screen after capture failure:
Could this be an IO error beyond 2 GB ?
That used to happen on older 32-bits systems, and is still something to care about with minGW.
I would not expect that from Ubuntu x64, but that conclusion may depend on the compiler used.
@code7 No, nothing changed in regards to requirements of RAM.
OK, some info:
Apparently the “uncompressed” options don’t work. Even if the compression level is set to zero. I get the 141 error. Also the “uncompressed” options cause it to fail at ~2GB whereas the normal options cause a failure at ~4GB. Shouldn’t “uncompressed” mean write directly to the storage node, not a memory buffer? (That seems odd not to flush it though. Not every system has a RAM size that will match the size of the partition.)
Setting the image to use a “split” type allows it to complete the imaging process. Which means there is an issue with the RAM usage. Although, prior to 1.3.5 this worked without issue on these machines, did something change to require more RAM?
If I had to guess, the partition which hosts the images is too small.
Given that it exits at the same spot (aka same size)
You say the disk is large enough, but one quick look at your storage summary and the scheduled task would indicate that that is not necessarily the case.
Of course if this is going to a different storage node (doesn’t seem to be the case judging by task overview) then that’s a different matter. Can we get an overview of that then?
EDIT: Derp, confused free with used
Anyway, error 141 is a SIGPIPE error, meaning that the program that was piping info to the next one encountered an issue. Which in this case is likely partclone to gzip ?
I’d have to agree with Tom then, given they fail at different spots, but consistently so, it seems that the RAM on the clients get full.
Pic of error:
@Tom Elliott The image is configured to Compression level 6, Multipartition image Single Disk (Not Resizeable), Everything, Using partclone gzip.
@george1421 The desktop machine ran a successful smart extended check without error. I’ll try and take a picture of the logging output from the machine. (In a debug capture it won’t reboot, or return the shell so capturing it should be easy.)
How is the image configured? Is it failing at ~2gb because you have only 2 GB ram on that machine? Same for the ~4gb machine?
While zstd supports 22 compression it requires a lot of memory so if you have compression above 19 you may run into issues.
@code7 If you watch partclone it will throw the error and print text on the screen telling what exactly is wrong. It should be more than just the error code. If you are a quick read you can catch what it is saying. If not you can setup your mobile phone to capture the deployment and then once the error occurs stop the video and review the “tape” to see the exact error. I searched for error 141 and not much came back.
Unlikely, considering I tried it with two different machines. (A desktop and a laptop.) Both failed around the same spot. (~2GB for the laptop, and ~4GB on the desktop.) But I’ll run a smartctl check and report back.
@Tom-Elliott So far, every time we’ve seen this the physical media on the target computer has been at fault. I’m not saying that this time its not, just the odds are in our favor that this isn’t a FOG issue its a hardware issue on the target computer.
If it’s failing in the same spot Everytime is it possible there is a problem with the disk?