Images suddently not replicating to storage nodes from Master Node
-
FOS: 1.5.8
OS: Debian 10I captured three new images on Friday evening and none of them replicated to any of my 10 storage nodes. I have had issues in the past where I’ve had to compare checksums and delete the
d1p1.img
files to re-propogate them, but in this case, the entire image directory did not transfer to any of the nodesThe images were all set to “replicate” and there is only one storage group. I have rebooted all the machines and Master can ping all of the nodes and vice versa.
Is there any other info I can post to this topic to help everyone understand this problem?
Has anyone run into this issue before?Thanks very much, y’all are great!
-
@danieln Definitely take a look at the log file in
/opt/fog/log/fogreplicator.log
on your master node and post the full log here if you need further help with it. Also post the names of the images not being replicated.You have made sure there is enough free space on the storage nodes?!
-
@sebastian-roth Thank you for your reply!
Please excuse my ignorance on this one. Would it be
tail -f opt/fog/log/fogreplicator.log
to get the full log? It seems like I can only get the head or the tail. Is there a command that would help me do this?and yes, each node has about 70%-80% free space!
-
@danieln Using
tail -f ...
you only get the last 10 lines and new information as soon as it is logged. You can useless ...
to open the full log file for reading. If you want to share it here in the forums you better use WinSCP to load the whole log file to your working PC and then upload here. -
The image is Q1 2021 - Connect Academy TX HP and Q1 2021 Connect Academy TX - Len
[03-15-21 1:10:18 pm] * All files synced for this item. [03-15-21 1:10:19 pm] * Found Image to transfer to 12 nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] | Replication already running with PID: 22051 [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - HP [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Found Image to transfer to 12 nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - Len [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - Len [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - Len [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] | Replication already running with PID: 28711 [03-15-21 1:10:19 pm] | Replication already running with PID: 22078 [03-15-21 1:10:19 pm] | Replication already running with PID: 28714 [03-15-21 1:10:19 pm] | Replication already running with PID: 28718 [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - Len [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] | Replication already running with PID: 28748 [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - Len [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] * Not syncing Image between nodes [03-15-21 1:10:19 pm] | Image Name: Q1 2021 - Connect Academy TX - Len [03-15-21 1:10:19 pm] | File or path cannot be reached. [03-15-21 1:10:19 pm] | Replication already running with PID: 28758 [03-15-21 1:10:19 pm] | Replication already running with PID: 28769 [03-15-21 1:10:19 pm] * Found Image to transfer to 12 nodes
It looks like it seems to be saying the file or path cannot be reached. I can ping the nodes and the Master. I don’t undersatand what could possibly be going on. It was just with these images and it just happened this morning for the first time.
-
@danieln Please run the following commands and post output here:
ls -al "/images/Q1 2021 - Connect Academy TX - HP" ls -al "/images/Q1 2021 - Connect Academy TX - Len" ls -al /images | grep "2021"
-
@sebastian-roth Thank you very much for the response. Sorry for the delay, I was on a time crunch for delivering this image and I did not have enough time to continue troubleshooting. I ended up just deleting the image and recapturing and it replicated afterwards. Hopefully it was just a fluke. But i know how to check the replication log file now!
Thanks again for taking the time to respond.