image file integrity?



  • Is there a way to check the integrity of an image file? I have been getting CRC32 errors when imaging any computer and with any image I try. I checked the SMART data of the drive holding my images and it says everything passed. The computer I use as my FOG sever is older and needs to be replaced (my department was supposed to have ordered me another one by now but I swear a snail is after at doing things…) and I’m wondering if maybe the decompression of the image file on the fly is failing on the FOG server. The error that happens on the computer being imaged says

    pigz: skipping: <stdin>: corrupt -- invalid deflate data (invalid code lengths set)  pigz: abort: internal threads error
    

    Since it happens on every computer and with any image, I don’t think theres a problem with the image files themselves and the 1TB drive that holds my images isn’t very old either.

    I’m on FOG SVN 5680 running Fedora 22 Server. I just updated it to see if it was the software The last two SVNs I was on, which had a bit of versions separating them, had the error too. Every once in a while I’ll get a successful image process to complete. I’ve used the images many times with success each time.


  • Moderator

    Update on this project, @Tom-Elliott has taken the PHP backend and integrated it partially into FOG Trunk. It’s available as a plugin, but not fully functioning just yet. We still need to create scheduling for it. Tom has already written a way to display everything in the checksum table, and a way to export those if people wanted to use a 3rd party app to analyze the results if they wish.

    I’ll be writing some intelligent code that will analyze the table’s contents to display concerning entries. The analysis will follow some basic principals.

    • Makes decisions based on data in the DB.

    • A storage group’s files should always match across all nodes in the group, both images and snapins.

    • Images shared between storage groups should always match between those groups masters.

    • If no image upload occurred between the last and current check, images are expected to match across that time period.

    • If an image upload does occur, the files are expected to change.

    Results of the intelligent analysis should display concerns, following the rules above, and the user should be able to “dismiss” individual file concerns so they don’t show anymore.

    The integrity table will have a column that will operate similar to the pending hosts column in the hosts table. blank or zero should be unchecked or false (bad or unprocessed), 1 should be good or dismissed.

    This column in the table will be administered by the intelligent checking, and by the user’s “dismiss” clicks. Once an entry for a file is marked as “good” by either no problems being detected or being dismissed by the user, that entry is forever good. If it’s blank, when analyzed it will be marked 0 or 1 respectively.

    I’ll be working on this as I have time.


  • Moderator

    Got the PHP version functioning. Just need to polish it up and work out how scheduling is going to work.


  • Moderator

    Bumping this, converting the integrity checking stuff to PHP and going to try to work out a plugin for it.


  • Moderator

    I’ll be working to convert this project from shell-script to pure PHP, I’ll be developing in PHP 5.5, but I’ll also ensure compatibility with PHP 7.0. I’ll also make a much nicer front-end. I get a little better at web GUIs each day I think.


  • Moderator

    I’ve made this into a GPLv3 project on github. You can follow along here:
    https://github.com/wayneworkman/FOGFileChecksum


  • Moderator

    Looking at the results, I’ve concluded that replication did not happen for the /images/Win7/d1.mbr file.

    And, I have either I have a corrupt HDD or replication transfered incorrectly for the /images/Win7/d1p1.img file.

    The /images/Win7/d1.minimum.partitions is also of concern.

    I’m going to delete everything in the /images directory of the slave node and let everything re-replicate and see what happens…


  • Moderator

    I’ve updated one image at home since I started tracking the checksums, I have interesting results…

    0_1456544357301_Wild results..png

    For those that are interested, here is the complete table output:
    0_1456542756352_fileChecksums.txt


  • Moderator

    Just posting what I’ve worked on this weekend. This is not finished, the backend-script is not CRON ready yet and there is no installer yet either.

    0_1456118459005_Integrity Check.png


    FOGFileChecksum.php

    <?php
    
    
    $servername="localhost";
    $username="wayne";
    $password="";
    $database="fog";
    
    // Create connection
    $link = new mysqli($servername, $username, $password, $database);
    // Check connection
    if ($link->connect_error) {
            // Couldn't establish a connection with the database.
            die($SiteErrorMessage);
    }
    
    $fileLocation = array("");
    $fileSum = array("");
    
    $sql = "select DISTINCT(fileSum),fileLocation from fileChecksums order by fileLocation";
    $result1 = $link->query($sql);
    while($row1 = $result1->fetch_assoc()) {
    	
    	$aSum = $row1['fileSum'];
    	$aLocation = $row1['fileLocation'];
    	array_push($fileSum, $aSum);
    	array_push($fileLocation, $aLocation);
    
    }
    
    
    $arrlength = count($fileLocation);
    
    echo "The below table only shows detected changes in a file.<br><p>";
    echo "New files and files that have not been changed since creation are not listed. There is no consideration for storage groups or when images were uploaded.<br>";
    echo "If a change in a file is detected, a set of relevant records from all storage nodes concerning the file are displayed.<br><p>";
    echo "You should see changes in files when an image is updated, when snapin files are updated, or when replication does not occur for a updated image or snapin, or when the storage node's hardware (hdd mostly) is failing.<br><p>";
    echo "<table border=\"1\" style=\"width:100%\">";
    echo "<tr>";
    echo "<td>Hash Checksum</td>";
    echo "<td>When this record was recorded</td>";
    echo "<td>Host</td>";
    echo "<td>File</td>";
    echo "</tr>";
    for($x = 1; $x < $arrlength; $x++) {
    	if ($fileLocation[$x-1] == $fileLocation[$x]) {
    		$sql = "select distinct(fileSum),fileLocation,fileHost from fileChecksums where fileLocation = '$fileLocation[$x]'";
    		$result1 = $link->query($sql);
                    while($row1 = $result1->fetch_assoc()) {
    			$aSum = $row1['fileSum'];
    			$aLocation = $row1['fileLocation'];
    			$aHost = $row1['fileHost'];
    			$sql = "select * from fileChecksums where fileLocation = '$aLocation' and fileHost = '$aHost' and fileSum = '$aSum' order by fileTime ASC LIMIT 1";
    			$result2 = $link->query($sql);
    			while($row2 = $result2->fetch_assoc()) {
    				$aSum = $row2['fileSum'];
    				$aLocation = $row2['fileLocation'];
    				$aHost = $row2['fileHost'];
    				$aTime = $row2['fileTime'];
    				$aTime = gmdate("l jS \of F Y h:i:s A", $aTime);
    				echo "<tr>";
    				echo "<td>$aSum</td>";
    				echo "<td>$aTime</td>";
    				echo "<td>$aHost</td>";
    				echo "<td>$aLocation</td>";
    				echo "</tr>";
    			}
    		}
    	}
    }
    echo "</table>";
    $link->close();
    ?>
    

    FOGFileChecksum.sh

    #-----Variables-----#
    
    files=/root/files.txt
    fogsettings=/opt/fog/.fogsettings
    ipaddress="$(grep 'ipaddress=' $fogsettings | cut -d \' -f2 )"
    snmysqluser="$(grep 'snmysqluser=' $fogsettings | cut -d \' -f2 )"
    snmysqlpass="$(grep 'snmysqlpass=' $fogsettings | cut -d \' -f2 )"
    snmysqlhost="$(grep 'snmysqlhost=' $fogsettings | cut -d \' -f2 )"
    
    
    #-----Connect to mysql and querry all nodes that have the IP-----#
    
    if [[ $snmysqlhost != "" ]]; then
    
    imagePaths=$(mysql -s -h$snmysqlhost -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmRootPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID")
    
    snapinPaths=$(mysql -s -h$snmysqlhost -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmSnapinPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID")
    
    elif [[ $snmysqlpass != "" ]]; then
    
    imagePaths=$(mysql -s -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmRootPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID")
    
    snapinPaths=$(mysql -s -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmSnapinPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID")
    
    else
    
    imagePaths=$(mysql -s -D fog -e "SELECT ngmRootPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID")
    
    snapinPaths=$(mysql -s -D fog -e "SELECT ngmSnapinPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID")
    
    fi
    
    
    
    #-----Find all files on all local storage nodes-----#
    
    if [[ -e $files ]]; then
    	rm -f $files
    fi
    
    for i in ${imagePaths[@]}; do
    	find ${i} -type f >> $files
    done
    
    for i in ${snapinPaths[@]}; do
            find ${i} -type f >> $files
    done
    
    IFS=$'\n' read -d '' -r -a allFiles < $files
    
    
    
    
    #-----Checksum all files, insert into database-----#
    
    for i in ${allFiles[@]}; do
    	
    	md5sum_space_file_space_time="$(sha1sum ${i}) $(date +%s)"
    	
    	N=1
    	fileSum=$(echo $md5sum_space_file_space_time | awk -v N=$N '{print $N}')
    	N=2
    	fileLocation=$(echo $md5sum_space_file_space_time | awk -v N=$N '{print $N}')
    	N=3
    	fileTime=$(echo $md5sum_space_file_space_time | awk -v N=$N '{print $N}')
    
    	if [[ $snmysqlhost != "" ]]; then
    
    		mysql -s -h$snmysqlhost -u$snmysqluser -p$snmysqlpass -D fog -e "INSERT INTO fileChecksums (fileHost,fileTime,fileSum,fileLocation) VALUES ('$ipaddress','$fileTime','$fileSum','$fileLocation')"
    	else
    	mysql -s -D fog -e "INSERT INTO fileChecksums (fileHost,fileTime,fileSum,fileLocation) VALUES ('$ipaddress','$fileTime','$fileSum','$fileLocation')"
    	fi
    	
    done
    

    FOGFileChecksum.sql

    USE fog
    
    CREATE TABLE fileChecksums(
    fileChecksumsID int NOT NULL AUTO_INCREMENT,
    fileHost VARCHAR(255) NOT NULL,
    fileTime int NOT NULL,
    fileSum VARCHAR(40) NOT NULL,
    fileLocation VARCHAR(255) NOT NULL,
    PRIMARY KEY (fileChecksumsID)
    );
    

  • Moderator

    Bumping this thread, I feel it has real utility for comparing files across storage group members. When I write the wiki article on it, I will gear it towards that.



  • @Tom-Elliott I’m currently copying the images off the 1TB to a brand new 3TB drive as a backup. I’m going to look at the MD5 checksum info Wayne posted below after the copy goes through.


  • Developer

    Another thing to test would be the memory on the FOG server.
    From experience, undetected memory errors can lead to the symptoms you describe, i.e.random fails of CPU/memory intensive operations.
    If this were the case I would also not be surprised if you were getting strange errors reported by the OS, e.g. sigsegv


  • Senior Developer

    @johnomaz if an old image on a different site worked fine or an image copied to another site wouldn’t that raise more suspicion in the disk storing the image has some problem with it? Failing randomly would also lean more towards an escalating problem (one that isn’t fully prevelent right now) and indicate more of an issue on the device they’re currently stored? If it failed every time in the same spot, I would say the image is corrupt, but because multiple systems pulling images from this particular device and the fact the failing happens at different points still looks to me like there is a problem with the disk the images are currently stored on.


  • Moderator

    @Developers I couldn’t resist. I’ve been working A LOT with crontab and bash scripting lately! It’s super fun!

    It took my virtualized dual core VM running on SATA1 drives on an old computer a grand total of…

    8 minutes

    to produce a MD5 checksum for all files in my /images directory, which is a total of…

    18GB.

    More powerful systems can expect much greater performance.

    Here’s a command that will generate a file with the date on it for every file in /images. It puts the file in whatever directory you’re pwd is.

    now=$(date +\%m\%d\%Y);filename=checklist_$now.chk;find /images -type f -exec md5sum "{}" + > $filename

    Results should look something like this.
    cat checklist_12_08_2015.chk

    e8ca919a5cf891c1444bef848ba0826a  /images/postdownloadscripts/fog.postdownload
    d41d8cd98f00b204e9800998ecf8427e  /images/dev/.mntcheck
    b026324c6904b2a9cb4b88d6d61c81d1  /images/CentOS7Optiplex745UpdateBASE/d1.fixed_size_partitions
    989163439cdf3d881c5c47ec26a3549b  /images/CentOS7Optiplex745UpdateBASE/d1.partitions
    e908d53a4e858480aade0428022c2b79  /images/CentOS7Optiplex745UpdateBASE/d1.original.fstypes
    d41d8cd98f00b204e9800998ecf8427e  /images/CentOS7Optiplex745UpdateBASE/d1.original.swapuuids
    d41d8cd98f00b204e9800998ecf8427e  /images/CentOS7Optiplex745UpdateBASE/d1.has_grub
    5b3cc4be3250658e7832435bf51cfd19  /images/CentOS7Optiplex745UpdateBASE/d1.mbr
    0f52ff40ada8c1a3f585c045c046f90e  /images/CentOS7Optiplex745UpdateBASE/d1.minimum.partitions
    9b7ab9244c3fbb3ed591c2b139e08289  /images/CentOS7Optiplex745UpdateBASE/d1p1.img
    653b56dddbbe995df1a02fa4ce3101a6  /images/CentOS7Optiplex745UpdateBASE/d1p2.img
    26ab0db90d72e28ad0ba1e22ee510510  /images/Fedora22LivingRoom/d1.fixed_size_partitions
    e547d023f0de8960668c29a92c97ea64  /images/Fedora22LivingRoom/d1.partitions
    b45db809f4e6ab60ae53acb4bf4814db  /images/Fedora22LivingRoom/d1.original.fstypes
    ece7afe263274485bdcd95a563a8339f  /images/Fedora22LivingRoom/d1.original.swapuuids
    d41d8cd98f00b204e9800998ecf8427e  /images/Fedora22LivingRoom/d1.has_grub
    df83ab31a77bc51f3dceb345375c81ed  /images/Fedora22LivingRoom/d1.mbr
    db56d2974bf892b86f351fc4f92b0537  /images/Fedora22LivingRoom/d1.minimum.partitions
    a296046f409bc5d4dfabd051dafa507d  /images/Fedora22LivingRoom/d1p1.img
    b026324c6904b2a9cb4b88d6d61c81d1  /images/Win7/d1.fixed_size_partitions
    2cfd36a0e381b432a55b929f796ed3d4  /images/Win7/d1.original.fstypes
    d41d8cd98f00b204e9800998ecf8427e  /images/Win7/d1.original.swapuuids
    2523e3aca9e856933d1fb1e2b0d2cd6e  /images/Win7/d1.original.partitions
    6240be7065ce983bdcb63f8e9c1a5096  /images/Win7/d1.minimum.partitions
    3a673e2c1b7d279c1381e4a6b4303175  /images/Win7/d1.mbr
    5dddb3b8c46be183b0a209e32ee912ef  /images/Win7/d1p1.img
    71c96624ae151ebeaa7fef342b1436a6  /images/Win7/d1p2.img
    d41d8cd98f00b204e9800998ecf8427e  /images/.mntcheck
    606b8f0aed33cd36c9fb077a904c5865  /images/checklist.chk
    

    You can easily just append the file name to point to /var/www/html/md5 in order to start the command and then later view results in a web browser.

    The command can easily be made into a crontab event for root by simply finding out where md5sum is located on the machine and pathing to it directly. For Fedora 23 Server Minimal it’s here: /usr/bin/md5sum

    The command in a crontab event should therefore look like…

    if [ ! -d /var/www/html/checksum ]; then
         #Make a web directory called checksum if it's not already there...
         mkdir /var/www/html/checksum
         chown apache:apache /var/www/html/checksum
         chmod 744 /var/www/html/checksum
    fi
    
    now=$(date +\%m\%d\%Y)
    filename=/var/www/html/checksum/checklist_$now.chk
    find /images -type f -exec /usr/bin/md5sum "{}" + > $filename
    chown apache:apache $filename
    chmod 744 $filename
    

    That’s a script I stuck in my root user’s home folder here: /root/checksum.sh you have to make it executable obviously (after it’s created) with chmod +x /root/checksum.sh

    The crontab entry for root to do this every day at 10pm (overkill) would be… 0 20 * * * /root/checksum.sh and for the 1st of every month (recommended) at 10pm, it would be 0 20 1 * * /root/checksum.sh

    (in Fedora / CentOS / RHEL, to create a crontab event for root, first switch to root withsu root and then execute crontab -e and add the entry)

    Sample output in a browser after the task ran for the first time:

    0_1449639435691_Screenshot from 2015-12-08 23-36-28.png

    0_1449639518778_Screenshot from 2015-12-08 23-38-24.png

    Tagging this for the #wiki
    Thread solved. :-)


  • Moderator

    @johnomaz If you’re interested, it’s most likely very easy to do a basic MD5 checksum type thing in the evenings and pile the results up in a file in the web root for viewing in a web browser… it would be a crontab event.


  • Moderator

    @johnomaz There’s a thing called data-rot that happens to old data, or data on old drives… https://en.wikipedia.org/wiki/Data_degradation

    Just an idea…

    It could also be your network dropping packets… (far more likely). It could be a loose patch cable, a kinked patch cable, interference on a copper line that was ran a little too closely to a high-voltage inverter for florescent lighting. If your copper lines run close to HVAC or larg-ish electric motors you’ll have a lot of interference too… Don’t have a microwave by the server do you?



  • @Tom-Elliott The images have all been used a lot. I’ve used both old and new images. I took one of the machines I’m trying to image to a different site and imaged it there with no issue. The imaging process also fails randomly, its not always at the same spot with the same image. Tomorrow I plan on taking one of the images to a different site and seeing if that image works fine elsewhere.

    I also tried to image a known working computer and it failed there too.


  • Senior Developer

    Well the fact that it happens on all machines would more so indicate a potential problem on the drive or the image was corrupted during upload originally. Of course i don’t know all the information.


Log in to reply
 

499
Online

39193
Users

10845
Topics

103221
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.