Images suddently not replicating to storage nodes from Master Node


  • FOS: 1.5.8
    OS: Debian 10

    I captured three new images on Friday evening and none of them replicated to any of my 10 storage nodes. I have had issues in the past where I’ve had to compare checksums and delete the d1p1.img files to re-propogate them, but in this case, the entire image directory did not transfer to any of the nodes

    The images were all set to “replicate” and there is only one storage group. I have rebooted all the machines and Master can ping all of the nodes and vice versa.

    Is there any other info I can post to this topic to help everyone understand this problem?
    Has anyone run into this issue before?

    Thanks very much, y’all are great!


  • @sebastian-roth Thank you very much for the response. Sorry for the delay, I was on a time crunch for delivering this image and I did not have enough time to continue troubleshooting. I ended up just deleting the image and recapturing and it replicated afterwards. Hopefully it was just a fluke. But i know how to check the replication log file now!

    Thanks again for taking the time to respond.

  • Senior Developer

    @danieln Please run the following commands and post output here:

    ls -al "/images/Q1 2021 - Connect Academy TX - HP"
    ls -al "/images/Q1 2021 - Connect Academy TX - Len"
    ls -al /images | grep "2021"
    

  • @sebastian-roth

    The image is Q1 2021 - Connect Academy TX HP and Q1 2021 Connect Academy TX - Len

    [03-15-21 1:10:18 pm]  * All files synced for this item.
    [03-15-21 1:10:19 pm]  * Found Image to transfer to 12 nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm] | Replication already running with PID: 22051
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - HP
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Found Image to transfer to 12 nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - Len
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - Len
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - Len
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm] | Replication already running with PID: 28711
    [03-15-21 1:10:19 pm] | Replication already running with PID: 22078
    [03-15-21 1:10:19 pm] | Replication already running with PID: 28714
    [03-15-21 1:10:19 pm] | Replication already running with PID: 28718
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - Len
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm] | Replication already running with PID: 28748
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - Len
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm]  * Not syncing Image between nodes
    [03-15-21 1:10:19 pm]  | Image Name: Q1 2021 - Connect Academy TX - Len
    [03-15-21 1:10:19 pm]  | File or path cannot be reached.
    [03-15-21 1:10:19 pm] | Replication already running with PID: 28758
    [03-15-21 1:10:19 pm] | Replication already running with PID: 28769
    [03-15-21 1:10:19 pm]  * Found Image to transfer to 12 nodes
    

    It looks like it seems to be saying the file or path cannot be reached. I can ping the nodes and the Master. I don’t undersatand what could possibly be going on. It was just with these images and it just happened this morning for the first time.

  • Senior Developer

    @danieln Using tail -f ... you only get the last 10 lines and new information as soon as it is logged. You can use less ... to open the full log file for reading. If you want to share it here in the forums you better use WinSCP to load the whole log file to your working PC and then upload here.


  • @sebastian-roth Thank you for your reply!

    Please excuse my ignorance on this one. Would it be tail -f opt/fog/log/fogreplicator.log to get the full log? It seems like I can only get the head or the tail. Is there a command that would help me do this?

    and yes, each node has about 70%-80% free space!

  • Senior Developer

    @danieln Definitely take a look at the log file in /opt/fog/log/fogreplicator.log on your master node and post the full log here if you need further help with it. Also post the names of the images not being replicated.

    You have made sure there is enough free space on the storage nodes?!

299
Online

8.5k
Users

15.3k
Topics

143.3k
Posts