UNSOLVED Image replication MD5sum not matches
We have got multiple FOG servers over multiple locations (all version 1.5.5).
Yesterday we’ve captured an image to the master of the storage group on location A.
This image is replicated to location B (witch contains 2 storage nodes). The image contains the files d1p1.img, d1p2.img, d1p3.img, d1p4.img and d1p5.img. All files matches the exact file-size on all storage nodes, but location B the files d1p4.img and d1p5.img not matches the checksum on location B - storage node 2.
Therefore the images running from location B - StorageNode 2 are failing.
So long story short:
Image on Location A:
MasterNode -> original image
StorageNode -> all files md5sum matches original
Same image on Location B:
MasterNode -> all files md5sum matches original
Storagenode -> files d1p4.img and d1p5.img md5sum not matches original image files.
The logfiles are not showing any errors, so maybe it would be a nice feature for future releases to verify the replicated files with the master-.
In the past I’ve faced this problem before, and ended up copying the corrupted files manually from the image-MasterNode. No problem at all, but maybe anybody got any clue what could be the cause of this problem?
@Franklyn I am not sure I get the point of verifying the images right after the transfer as image replication is a constant loop that does a check and re-transfer (if size and or checksum don’t match) of all the image files.
Which versions of FOG do you run on the master and the storage nodes? Is it all surely 1.5.5???
Thanks for your response. I’ve only found the following in the transferlogs on the master StorageNode location B:
2019-04-09 12:10:59 /images/LY-HP-LAP/d1p1.img -> ftp://fog@fogSN2-locB/%2Fimages/LY-HP-LAP/d1p1.img 0-165286 807.5 KiB/s
2019-04-09 12:10:59 /images/LY-HP-LAP/d1p3.img -> ftp://fog@fogSN2-locB/%2Fimages/LY-HP-LAP/d1p3.img 0-254129 1.17 MiB/s
2019-04-09 12:10:59 /images/LY-HP-LAP/d1.mbr -> ftp://fog@fogSN2-locB/%2Fimages/LY-HP-LAP/d1.mbr 0-1048576 3.66 MiB/s
2019-04-09 12:11:03 /images/LY-HP-LAP/d1p2.img -> ftp://fog@fogSN2-locB/%2Fimages/LY-HP-LAP/d1p2.img 0-13429217 3.20 MiB/s
2019-04-09 12:11:51 /images/LY-HP-LAP/d1p5.img -> ftp://fog@fogSN2-locB/%2Fimages/LY-HP-LAP/d1p5.img 0-634614839 11.49 MiB/s
2019-04-09 12:26:52 /images/LY-HP-LAP/d1p4.img -> ftp://fog@fogSN2-locB/%2Fimages/LY-HP-LAP/d1p4.img 0-44472049358 44.47 MiB/s
So no errors or what so ever. Any clues or so? The StorageNodes on location B are connected on LAN 1Gbps, on the same subnet.
I understand FOG just uses the lftp command, so it would be rather an FTP protocol problem than a FOG problem but to avoid strange problems like this, it would be a nice feature to verify the replicated images.
@Franklyn It’s hard to answer as we don’t know anything about your network structure and the link used between the nodes. FOG uses FTP (command
lftp) to sync the actual image files from master to the storage nodes. You should find file transfer logs in the log directory (/opt/fog/log). See if you can find any valuable information in those.