Replication: Only working on 1/5 nodes
-
@jflippen said in Replication: Only working on 1/5 nodes:
This has worked for me until I can figure out why the replication service won’t update existing images on nodes. I am also toying around with rsync commands to manually replicate my images out to each node, since I do that anyways for my drivers folder used to driver injection based off this post:
https://forums.fogproject.org/topic/11126/using-fog-postinstall-scripts-for-windows-driver-injection-2017-edYou can get the replicator to sync your drivers directory if you create a faux image definition in FOG called Drivers and then make it reference the directory /images/drivers there is no need to run a separate rsync process.
In regards to the FOG Image replicator, it has been changed over the last few releases to only support FOG storage nodes. If you have a NAS as a storage node, image replication will keep cycling over and over again. It will happen but it will be continuous. If you have this type of environment then setting up a cron job to run the rsync process is a pretty solid one. Many nas’ have an rsync process that you can enable to make the replication process a bit quicker and more efficient. I would eventually like to see FOG switch to an rsync process and move away from the custom code to offer a bit more flexibility.
-
@george1421 Hey George, I thought the faux drivers folder would only work if there were no subdirectories in the drivers folder? Has this been modified to allow your previous setup of drivers/ModelName/x64/.ini files?
Also, I would like to note that I am not using NAS devices but three Dell rack mounted servers that we bought a few months ago. I do have a hardware-level RAID 1, but today I have repeated the test and set up some old OptPlex 380 machines as test fog servers (as their own master and 2 nodes, not as part of my actual server and ther nodes). With these test fog servers, it’s even more screwy. It says that files don’t match even though the md5sum is the same. I didn’t even get to updating the image. The imagereplicationservice is now hanging up after the first transfer (on the 380’s at least).
I was trying to find the code to get a better understanding of what’s happening at the actual service level. After some digging I found the imagereplicator.class.php file, which calls another function replicateItems for each image. I cannot find where this replicateItems function is defined. Basically, I am trying to find the class that is actually doing the lftp command and / or deleting the files, how it’s verifying if they are the same, etc.
You wouldn’t happen to know where these files are, would you?
-
@jflippen The last I knew the entire content of the /image/drivers directory is replicated. I do have to say we don’t change drivers much and something may have change since all of the drivers are on all storage nodes at the moment. I’ll have to confirm it still works as I thought.
both the image as well as the snapin replicator should use the lftp command defined in this file
fog/lib/service/fogservice.class.php
Double checking I don’t see any other instance of lftp.
-
@george1421 Thanks, George. I’ll have to give that a go once I get this replication issue resolved. Right now I’m trying to keep myself from going cross-eyed going over @Tom-Elliott 's PHP. It’s driving me nuts that I can’t figure out:
-
Why it says the file is different when sometimes the md5sum matches for the two files
-
Why it’s not deleting the file when they don’t match
-
When replacing a file with a different md5sum, if I run the lftp command manually and insert the correct credentials, I don’t get any errors, but it doesn’t change the existing file (I’m guessing since the deleting is handled by this line:
self::$FOGFTP->delete($remotefilescheck[$index]);
)
To be fair, I don’t know PHP, just some other languages. I am just trying to get a better understanding of what’s happening under the hood to see if it will help me resolve my issue.
I feel like there has to be something wrong with my environment, otherwise we should see way more people complaining about this issue, no? However the fact I can replicate the issue on the OplixPlex 380 test fog group with the issues I am having with my actual fog servers means it’s likely not the servers themselves…or I’m doing it wrong after following the directions here:
https://wiki.fogproject.org/wiki/index.php?title=CentOS_7Idk… I better start sweeping up all the hair I’ve pulled out
-
-
@george1421 @Tom-Elliott Is this a logic error in this method?
private static function _filesAreEqual($size_a, $size_b, $file_a, $file_b, $avail) { if ($size_a != $size_b) { return false; } if (false === $avail) { if ($size_a < 1047685760) { $remhash = md5_file($file_b); $lochash = md5_file($file_a); return ($remhash == $lochash); } return file_exists($file_b) && file_exists($file_a); } $hashLoc = self::getHash($file_a); $hashRem = $file_b; $hashCom = ($hashLoc == $hashRem); return $hashCom;
It looks like it skips md5sum for files larger than 1GB and does the encode_64 from getHash instead. The method is calling getHash for $file_a but not for $file_b. I do not see a part in the php file that calls for a hash for $file_b at all. Wouldn’t this mean the hashComp would be comparing a hashed file vs a non-hashed file?
-
@jflippen file b is handled separately and done on the remote side. This method receives that hash and uses it
-
@tom-elliott Okay, thanks. Back to the drawing board I go. Any ideas on why this line,
self::$FOGFTP->delete($remotefilescheck[$index]);
, is not deleting the file on the node before the lftp call? -
@jflippen Each fog server has a local user account called
fog
with its own password. Have you tried to log into the remote fog server using the appropriate linux userfog
password as defined in the storage node configuration for that storage node and done by ftp?I know that was a mouth full, but the basic premise is to ftp to the remote storage node using the appropriate
fog
account. Change to the /images directory and ensure that you canput
anddelete
files over ftp. -
@george1421 Sorry for the long delay, I’ve been gone on vacation for the last week. I just tried using the fog credentials stored for one of the nodes and was able to FTP in from my Windows workstation using clonezilla. I was able to delete the file manually. So now the question is:
Why isn’t the PHP command to do the FTP deletion for the replicator service working? -
@dburk Sorry for digging out this old topic. While trying to figure out some replication hickup I found and fixed a couple of issues within the code. All that will be in the next release. Hopefully coming soon. Let me know if you are keen to test those changes beforehand.
Edit: Issues fixed in 1.5.5.