images are not syncing from master to node
I have 4 total images in the master server. However, it’s not syncing other two image to all other node.
The image was captured via another node and transferred manually using scp to the master server since it wasn’t syncing it all.
@msi Just to let you know, we have worked on the replication code and fixed a couple of issues. The new release is on the door step and should work fairly well.
We’ll put that into the release notes as well but just to let you know beforehand:
Nodes being on different versions (1.5.4 vs. 1.5.5) will replicate images over and over again as some of the hashing code needed to be changed. Therefore we advise you to update all nodes in one go! Please make sure you stop replication on the master first (
systemctl stop FOGImageReplicator), then update the storage node(s) and then update master node as a last step.
@msi While trying to figure out the replication hickup I found and fixed a couple of issues within the code. All that will be in the next release. Hopefully coming soon. Let me know if you are keen to test those changes beforehand.
@msi George was able to verify that he also got those
defunctprocesses on a CentOS 7 installation. I just did a quick test on a Ubuntu setup and did not see any of those (only tested with small replication test files so far). We need to do more testing @george1421
@msi Do you still see the same
Replication already running with PIDmessages in the log? Please post current logs again.
Any chance we can do a live debugging session together? Send me a PM if you are available right now.
@Sebastian-Roth I killed all the defunct process and restarted the server. Still not syncing…
@msi Not exactly sure what those
<defunct>processes mean. Maybe try rebooting the server once to get ein of those. Or simply try killing those.
@Sebastian-Roth We have verity of speed based on branch size. Our baseline is to have 10 MB upload and download speed which most brunch carries out. Below is the output for lftp:
[root@fogserver images]# ps aux |grep lftp root 12955 0.0 0.0 0 0 ? Z 08:13 0:00 [lftp] <defunct> root 22481 0.0 0.0 0 0 ? Z 08:21 0:00 [lftp] <defunct> root 41111 0.0 0.0 0 0 ? Z 08:38 0:00 [lftp] <defunct> root 42773 0.0 0.0 112708 972 pts/0 S+ 08:40 0:00 grep --color=auto lftp
@msi In the log I repeatedly see
Replication already running with PID. Do you have slow links to some of those nodes? Maybe those replications stalled at some point when the connection was lost and just sit there since then. See if you have lftp commands running…
ps aux | grep lftp
The 9th column of the output should show the start time of those processes. Let’s see what we get…
@msi Is replication for this specific image enabled (see image definition in the web UI)? If so, we need more information! Can you please post contents of the replication log?