image file integrity?
-
Another thing to test would be the memory on the FOG server.
From experience, undetected memory errors can lead to the symptoms you describe, i.e.random fails of CPU/memory intensive operations.
If this were the case I would also not be surprised if you were getting strange errors reported by the OS, e.g. sigsegv -
@Tom-Elliott I’m currently copying the images off the 1TB to a brand new 3TB drive as a backup. I’m going to look at the MD5 checksum info Wayne posted below after the copy goes through.
-
Bumping this thread, I feel it has real utility for comparing files across storage group members. When I write the wiki article on it, I will gear it towards that.
-
Just posting what I’ve worked on this weekend. This is not finished, the backend-script is not CRON ready yet and there is no installer yet either.
FOGFileChecksum.php
<?php $servername="localhost"; $username="wayne"; $password=""; $database="fog"; // Create connection $link = new mysqli($servername, $username, $password, $database); // Check connection if ($link->connect_error) { // Couldn't establish a connection with the database. die($SiteErrorMessage); } $fileLocation = array(""); $fileSum = array(""); $sql = "select DISTINCT(fileSum),fileLocation from fileChecksums order by fileLocation"; $result1 = $link->query($sql); while($row1 = $result1->fetch_assoc()) { $aSum = $row1['fileSum']; $aLocation = $row1['fileLocation']; array_push($fileSum, $aSum); array_push($fileLocation, $aLocation); } $arrlength = count($fileLocation); echo "The below table only shows detected changes in a file.<br><p>"; echo "New files and files that have not been changed since creation are not listed. There is no consideration for storage groups or when images were uploaded.<br>"; echo "If a change in a file is detected, a set of relevant records from all storage nodes concerning the file are displayed.<br><p>"; echo "You should see changes in files when an image is updated, when snapin files are updated, or when replication does not occur for a updated image or snapin, or when the storage node's hardware (hdd mostly) is failing.<br><p>"; echo "<table border=\"1\" style=\"width:100%\">"; echo "<tr>"; echo "<td>Hash Checksum</td>"; echo "<td>When this record was recorded</td>"; echo "<td>Host</td>"; echo "<td>File</td>"; echo "</tr>"; for($x = 1; $x < $arrlength; $x++) { if ($fileLocation[$x-1] == $fileLocation[$x]) { $sql = "select distinct(fileSum),fileLocation,fileHost from fileChecksums where fileLocation = '$fileLocation[$x]'"; $result1 = $link->query($sql); while($row1 = $result1->fetch_assoc()) { $aSum = $row1['fileSum']; $aLocation = $row1['fileLocation']; $aHost = $row1['fileHost']; $sql = "select * from fileChecksums where fileLocation = '$aLocation' and fileHost = '$aHost' and fileSum = '$aSum' order by fileTime ASC LIMIT 1"; $result2 = $link->query($sql); while($row2 = $result2->fetch_assoc()) { $aSum = $row2['fileSum']; $aLocation = $row2['fileLocation']; $aHost = $row2['fileHost']; $aTime = $row2['fileTime']; $aTime = gmdate("l jS \of F Y h:i:s A", $aTime); echo "<tr>"; echo "<td>$aSum</td>"; echo "<td>$aTime</td>"; echo "<td>$aHost</td>"; echo "<td>$aLocation</td>"; echo "</tr>"; } } } } echo "</table>"; $link->close(); ?>
FOGFileChecksum.sh
#-----Variables-----# files=/root/files.txt fogsettings=/opt/fog/.fogsettings ipaddress="$(grep 'ipaddress=' $fogsettings | cut -d \' -f2 )" snmysqluser="$(grep 'snmysqluser=' $fogsettings | cut -d \' -f2 )" snmysqlpass="$(grep 'snmysqlpass=' $fogsettings | cut -d \' -f2 )" snmysqlhost="$(grep 'snmysqlhost=' $fogsettings | cut -d \' -f2 )" #-----Connect to mysql and querry all nodes that have the IP-----# if [[ $snmysqlhost != "" ]]; then imagePaths=$(mysql -s -h$snmysqlhost -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmRootPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID") snapinPaths=$(mysql -s -h$snmysqlhost -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmSnapinPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID") elif [[ $snmysqlpass != "" ]]; then imagePaths=$(mysql -s -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmRootPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID") snapinPaths=$(mysql -s -u$snmysqluser -p$snmysqlpass -D fog -e "SELECT ngmSnapinPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID") else imagePaths=$(mysql -s -D fog -e "SELECT ngmRootPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID") snapinPaths=$(mysql -s -D fog -e "SELECT ngmSnapinPath FROM nfsGroupMembers WHERE ngmHostname = '$ipaddress' ORDER BY ngmID") fi #-----Find all files on all local storage nodes-----# if [[ -e $files ]]; then rm -f $files fi for i in ${imagePaths[@]}; do find ${i} -type f >> $files done for i in ${snapinPaths[@]}; do find ${i} -type f >> $files done IFS=$'\n' read -d '' -r -a allFiles < $files #-----Checksum all files, insert into database-----# for i in ${allFiles[@]}; do md5sum_space_file_space_time="$(sha1sum ${i}) $(date +%s)" N=1 fileSum=$(echo $md5sum_space_file_space_time | awk -v N=$N '{print $N}') N=2 fileLocation=$(echo $md5sum_space_file_space_time | awk -v N=$N '{print $N}') N=3 fileTime=$(echo $md5sum_space_file_space_time | awk -v N=$N '{print $N}') if [[ $snmysqlhost != "" ]]; then mysql -s -h$snmysqlhost -u$snmysqluser -p$snmysqlpass -D fog -e "INSERT INTO fileChecksums (fileHost,fileTime,fileSum,fileLocation) VALUES ('$ipaddress','$fileTime','$fileSum','$fileLocation')" else mysql -s -D fog -e "INSERT INTO fileChecksums (fileHost,fileTime,fileSum,fileLocation) VALUES ('$ipaddress','$fileTime','$fileSum','$fileLocation')" fi done
FOGFileChecksum.sql
USE fog CREATE TABLE fileChecksums( fileChecksumsID int NOT NULL AUTO_INCREMENT, fileHost VARCHAR(255) NOT NULL, fileTime int NOT NULL, fileSum VARCHAR(40) NOT NULL, fileLocation VARCHAR(255) NOT NULL, PRIMARY KEY (fileChecksumsID) );
-
I’ve updated one image at home since I started tracking the checksums, I have interesting results…
For those that are interested, here is the complete table output:
0_1456542756352_fileChecksums.txt -
Looking at the results, I’ve concluded that replication did not happen for the
/images/Win7/d1.mbr
file.And, I have either I have a corrupt HDD or replication transfered incorrectly for the
/images/Win7/d1p1.img
file.The
/images/Win7/d1.minimum.partitions
is also of concern.I’m going to delete everything in the /images directory of the slave node and let everything re-replicate and see what happens…
-
I’ve made this into a GPLv3 project on github. You can follow along here:
https://github.com/wayneworkman/FOGFileChecksum -
I’ll be working to convert this project from shell-script to pure PHP, I’ll be developing in PHP 5.5, but I’ll also ensure compatibility with PHP 7.0. I’ll also make a much nicer front-end. I get a little better at web GUIs each day I think.
-
Bumping this, converting the integrity checking stuff to PHP and going to try to work out a plugin for it.
-
Got the PHP version functioning. Just need to polish it up and work out how scheduling is going to work.
-
Update on this project, @Tom-Elliott has taken the PHP backend and integrated it partially into FOG Trunk. It’s available as a plugin, but not fully functioning just yet. We still need to create scheduling for it. Tom has already written a way to display everything in the checksum table, and a way to export those if people wanted to use a 3rd party app to analyze the results if they wish.
I’ll be writing some intelligent code that will analyze the table’s contents to display concerning entries. The analysis will follow some basic principals.
-
Makes decisions based on data in the DB.
-
A storage group’s files should always match across all nodes in the group, both images and snapins.
-
Images shared between storage groups should always match between those groups masters.
-
If no image upload occurred between the last and current check, images are expected to match across that time period.
-
If an image upload does occur, the files are expected to change.
Results of the intelligent analysis should display concerns, following the rules above, and the user should be able to “dismiss” individual file concerns so they don’t show anymore.
The integrity table will have a column that will operate similar to the pending hosts column in the hosts table. blank or zero should be unchecked or false (bad or unprocessed), 1 should be good or dismissed.
This column in the table will be administered by the intelligent checking, and by the user’s “dismiss” clicks. Once an entry for a file is marked as “good” by either no problems being detected or being dismissed by the user, that entry is forever good. If it’s blank, when analyzed it will be marked 0 or 1 respectively.
I’ll be working on this as I have time.
-