images are not syncing from master to node


  • Testers

    Hi,
    I have 4 total images in the master server. However, it’s not syncing other two image to all other node.
    The image was captured via another node and transferred manually using scp to the master server since it wasn’t syncing it all.


  • Developer

    @msi Just to let you know, we have worked on the replication code and fixed a couple of issues. The new release is on the door step and should work fairly well.

    We’ll put that into the release notes as well but just to let you know beforehand:
    Nodes being on different versions (1.5.4 vs. 1.5.5) will replicate images over and over again as some of the hashing code needed to be changed. Therefore we advise you to update all nodes in one go! Please make sure you stop replication on the master first (systemctl stop FOGImageReplicator), then update the storage node(s) and then update master node as a last step.


  • Developer

    @msi While trying to figure out the replication hickup I found and fixed a couple of issues within the code. All that will be in the next release. Hopefully coming soon. Let me know if you are keen to test those changes beforehand.


  • Developer

    @msi George was able to verify that he also got those defunct processes on a CentOS 7 installation. I just did a quick test on a Ubuntu setup and did not see any of those (only tested with small replication test files so far). We need to do more testing @george1421


  • Developer

    @msi Do you still see the same Replication already running with PID messages in the log? Please post current logs again.

    Any chance we can do a live debugging session together? Send me a PM if you are available right now.


  • Testers

    @Sebastian-Roth I killed all the defunct process and restarted the server. Still not syncing…


  • Developer

    @msi Not exactly sure what those <defunct> processes mean. Maybe try rebooting the server once to get ein of those. Or simply try killing those.


  • Testers

    @Sebastian-Roth We have verity of speed based on branch size. Our baseline is to have 10 MB upload and download speed which most brunch carries out. Below is the output for lftp:

    [root@fogserver images]# ps aux |grep lftp
    root     12955  0.0  0.0      0     0 ?        Z    08:13   0:00 [lftp] <defunct>
    root     22481  0.0  0.0      0     0 ?        Z    08:21   0:00 [lftp] <defunct>
    root     41111  0.0  0.0      0     0 ?        Z    08:38   0:00 [lftp] <defunct>
    root     42773  0.0  0.0 112708   972 pts/0    S+   08:40   0:00 grep --color=auto lftp
    

  • Developer

    @msi In the log I repeatedly see Replication already running with PID. Do you have slow links to some of those nodes? Maybe those replications stalled at some point when the connection was lost and just sit there since then. See if you have lftp commands running… ps aux | grep lftp

    The 9th column of the output should show the start time of those processes. Let’s see what we get…


  • Testers

    @Sebastian-Roth replication is check marked for all images
    0_1538657547512_19c2131b-b096-419c-9be7-ddab1fe1b13f-image.png
    0_1538657572096_fogreplicator.log
    I have attached the fogreplocator log


  • Developer

    @msi Is replication for this specific image enabled (see image definition in the web UI)? If so, we need more information! Can you please post contents of the replication log?


 

545
Online

5.4k
Users

12.6k
Topics

118.9k
Posts