Replication to storage nodes not working - Trunk version 4487
-
I only recently started displaying the command for the replicator services just as a way for me to ensure the command was starting properly and give people the command line that would be used if they wanted to try to debug an issue with replication. That said and due to the nature of security I will probably remove that element shortly. The password is always going to be plain text though. While ftp has some security the username and password are normally handled “in the clear” as the protocol was developed during a time when security was not thought of. That all said the fog password is not the same as the fog storagenode mysql pass from fog configuration fog settings. That is the password other storage nodes are using. The MySQL pass you referenced can be found in one file. The Config.class.php file will have the MySQL pass in it unless you opted to use a different username/password to connect to the MySQL server. The password pair used for ftp/lftp is based on the node receiving/getting the file or files and is stored with the storage node. There is not a Config file in use for this.
-
I’m on SVN Revision 4502 cloud 5662 running CentOS 7
Firewall and SELinux are off for both Master and non-master.I’m seeing the same thing at my site.
The Master Node and non-master node are in the same storage group. Passwords are set correctly, and I’ve reset them manually too.
I can FTP into the remote node fine using the password that shows in the logs. Permissions on /images are fine.
When I manually execute the commands in the logs, nothing happens. No errors, no spike in bandwidth, nothing.
here’s the logs:
[12-07-15 11:12:55 am] ___ ___ ___ /\ \ /\ \ /\ \ /::\ \ /::\ \ /::\ \ /:/\:\ \ /:/\:\ \ /:/\:\ \ /::\-\:\ \ /:/ \:\ \ /:/ \:\ \ /:/\:\ \:\__\ /:/__/ \:\__\ /:/__/_\:\__\ \/__\:\ \/__/ \:\ \ /:/ / \:\ /\ \/__/ \:\__\ \:\ /:/ / \:\ \:\__\ \/__/ \:\/:/ / \:\/:/ / \::/ / \::/ / \/__/ \/__/ ########################################### # Free Computer Imaging Solution # # Credits: # # http://fogproject.org/credits # # GNU GPL Version 3 # ########################################### [12-07-15 11:12:55 am] Interface Ready with IP Address: 10.51.1.53 [12-07-15 11:12:55 am] Interface Ready with IP Address: acfog.OMITTED.k12.mo.us [12-07-15 11:12:55 am] * Starting ImageReplicator Service [12-07-15 11:12:55 am] * Checking for new items every 600 seconds [12-07-15 11:12:55 am] * Starting service loop [12-07-15 11:12:55 am] * Starting Image Replication. [12-07-15 11:12:55 am] * We are group ID: #1 [12-07-15 11:12:55 am] | We are group name: AC-Storage-Group [12-07-15 11:12:55 am] * We have node ID: #1 [12-07-15 11:12:55 am] | We are node name: AC-Master [12-07-15 11:12:55 am] * Not syncing Image between group(s) [12-07-15 11:12:55 am] | Image Name: 6073admin [12-07-15 11:12:55 am] | I am the only member [12-07-15 11:12:55 am] * Not syncing Image between group(s) [12-07-15 11:12:55 am] | Image Name: 7010admin [12-07-15 11:12:55 am] | I am the only member [12-07-15 11:12:55 am] * Not syncing Image between group(s) [12-07-15 11:12:55 am] | Image Name: 7303admin [12-07-15 11:12:55 am] | I am the only member [12-07-15 11:12:55 am] * Not syncing Image between group(s) [12-07-15 11:12:55 am] | Image Name: 8808admin [12-07-15 11:12:55 am] | I am the only member [12-07-15 11:12:55 am] * Not syncing Image between group(s) [12-07-15 11:12:55 am] | Image Name: 9020admin [12-07-15 11:12:55 am] | I am the only member [12-07-15 11:12:55 am] * Not syncing Image between group(s) [12-07-15 11:12:55 am] | Image Name: dell9020instuctional [12-07-15 11:12:55 am] | I am the only member [12-07-15 11:12:55 am] * Not syncing Image between group(s) [12-07-15 11:12:55 am] | Image Name: tecraa10s3501 [12-07-15 11:12:55 am] | I am the only member [12-07-15 11:12:55 am] * Found Image to transfer to 2 node(s) [12-07-15 11:12:55 am] | Image name: 6073admin [12-07-15 11:12:55 am] * Starting Sync Actions [12-07-15 11:12:55 am] | CMD: lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c -R --ignore-time -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first /images/6073admin /images/6073admin; exit' -u fog,OMITTED 10.65.2.20 [12-07-15 11:12:55 am] * Started sync for Image 6073admin [12-07-15 11:12:55 am] * Found Image to transfer to 2 node(s) [12-07-15 11:12:55 am] | Image name: 7010admin [12-07-15 11:12:55 am] * Starting Sync Actions [12-07-15 11:12:55 am] | CMD: lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c -R --ignore-time -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first /images/dell7010admin /images/dell7010admin; exit' -u fog,OMITTED 10.65.2.20 [12-07-15 11:12:55 am] * Started sync for Image 7010admin [12-07-15 11:12:55 am] * Found Image to transfer to 2 node(s) [12-07-15 11:12:55 am] | Image name: 7303admin [12-07-15 11:12:55 am] * Starting Sync Actions [12-07-15 11:12:55 am] | CMD: lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c -R --ignore-time -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first /images/7303admin /images/7303admin; exit' -u fog,OMITTED 10.65.2.20 [12-07-15 11:12:55 am] * Started sync for Image 7303admin [12-07-15 11:12:55 am] * Found Image to transfer to 2 node(s) [12-07-15 11:12:55 am] | Image name: 8808admin [12-07-15 11:12:55 am] * Starting Sync Actions [12-07-15 11:12:55 am] | CMD: lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c -R --ignore-time -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first /images/8808admin /images/8808admin; exit' -u fog,OMITTED 10.65.2.20 [12-07-15 11:12:55 am] * Started sync for Image 8808admin [12-07-15 11:12:55 am] * Found Image to transfer to 2 node(s) [12-07-15 11:12:55 am] | Image name: 9020admin [12-07-15 11:12:55 am] * Starting Sync Actions [12-07-15 11:12:55 am] | CMD: lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c -R --ignore-time -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first /images/9020admin /images/9020admin; exit' -u fog,OMITTED 10.65.2.20 [12-07-15 11:12:55 am] * Started sync for Image 9020admin [12-07-15 11:12:55 am] * Found Image to transfer to 2 node(s) [12-07-15 11:12:55 am] | Image name: dell9020instuctional [12-07-15 11:12:55 am] * Starting Sync Actions [12-07-15 11:12:55 am] | CMD: lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c -R --ignore-time -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first /images/Dell9020BaseImageOct2015 /images/Dell9020BaseImageOct2015; exit' -u fog,OMITTED 10.65.2.20 [12-07-15 11:12:55 am] * Started sync for Image dell9020instuctional [12-07-15 11:12:55 am] * Found Image to transfer to 2 node(s) [12-07-15 11:12:55 am] | Image name: tecraa10s3501 [12-07-15 11:12:55 am] * Starting Sync Actions [12-07-15 11:12:55 am] | CMD: lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c -R --ignore-time -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first /images/tecraa10s3501 /images/tecraa10s3501; exit' -u fog,OMITTED 10.65.2.20 [12-07-15 11:12:55 am] * Started sync for Image tecraa10s3501
-
-
@Wayne-Workman Are the AC-Master and Annex nodes on the same Server? If they are, are they pointing at the same image location?
-
@Tom-Elliott No, they are two geographically separated nodes. They previously had the Location plugin setup for them but when we started having issues with replication, I uninstalled the location plugin to just eliminate variables.
Both are FULL server installations, but the Annex node has it’s
/opt/fog/.fogsettings
set to:snmysqluser="fog" snmysqlpass='OMITTED'; snmysqlhost="10.51.1.53";
Replication worked fine on our previous version - we updated to get some bug fixes and now we have this issue.
-
@Wayne-Workman And the mysql user being fog was actually setup for your database environment? Can you try using the fogstorage user as defined in the master node?
-
@Wayne-Workman Also, is it possible the Annex node already has the files in question?
-
@Tom-Elliott I setup the fog user manually a while back. I’ll switch it to the fogstorage user just so it’s more standard.
I should start with backstory… seems like I always bring it up later… anyways.
This morning, our image builder in the building that the AC-Master node is in - she uploaded a new image this morning. We had previously had issues with restoring the image in the building where the Annex node is. I wasn’t physically there to see any of it.
So, I found out she uploaded a new image. Via CLI on my buildings fog server, I grabbed a copy of the image by just mounting the remote /images directory to a temp directory using NFS and just doing a recursive copy and then unmounting. I created the image definition for the image as it was on their FOG DB.
I was able to successfully restore the image to the right hardware model with no problems. An Optiplex 9020.
The people at the Annex could not. I compared file sizes for the 9020admin image on both the Master and the non-master nodes. They were identical… which is strange but maybe that will help you figure out what’s going on…
I then walked them through manually (via CLI) just deleting the 9020admin directory on the Annex node and then manually copying it via NFS like I had done.
We were able to deploy the image from the Annex then - but the location plugin is uninstalled at that point so it might have been pulling from AC-Master… don’t know.
-
I just re-did the location stuff, this time I enabled the TFTP checkbox on both locations.
-
If the filesize a are the same this would explain why they where in defunct status. The commands run but have no work to perform. I believe the defuncts you’re seeing are simply because of this. If you’re daring you could delete one of the images from the annex node and restart the replicator on the master. Then check your bandwidth and see if things are happening.