master node sending data to storage node.

mwolfe60

we have 100Mb/s dedicated link between the 2 sites. I’ll send you the links tomorrow when I get back in the office.

mwolfe60

@Sebastian-Roth I was looking at that log and there are some images that do not need to be replicated to the other node. They are not needed at that site as the systems are not located there. can we not replicate those images and delete them from the storage node or is all or nothing ?

Sebastian Roth

@mwolfe60 said in master node sending data to storage node.:

can we not replicate those images and delete them from the storage node or is all or nothing ?

Open the image settings in the web UI and un-check the option Replicate…

I had a look a the new logs you send me and I still can’t see there being an issue. Possibly you just restart the fogreplicaiton service too early before it finishes transferring the huge file? From my point of view the replication algorithm seems to work. Here is an example:

...
[04-30-19 11:48:01 am]  * Found Image to transfer to 1 node
[04-30-19 11:48:01 am]  | Image Name: Automation_TB_CF54
...
[04-30-19 11:48:10 am]   # Automation_TB_CF54: File size mismatch - d1p2.img: 48334490621 != 1202149362
[04-30-19 11:48:10 am]   # Automation_TB_CF54: Deleting remote file d1p2.img
[04-30-19 11:48:10 am]  * Starting Sync Actions
...
[04-30-19 11:48:10 am]  | Started sync for Image Automation_TB_CF54 - Resource id #1105
...
[04-30-19 12:06:13 pm]  | Image Name: Automation_TB_CF54
[04-30-19 12:06:13 pm] | Replication already running with PID: 2651
...

So it finds that d1p2.img is not the same size on the storage node, deletes it and starts replicating it again. This is at 11:48. Then it checks all the other images and on the next loop (at 12:06) when it comes back to that image it tells us there the sync of that file is still going on and goes ahead.

Doing some quick maths here. Transferring a 45 GB file over a 100 Mbit/s link will take at least 1 hour 07 minutes and 35 seconds. And that calculation is most certainly still a fair way off. I’d expect it to take one and a half to three hours depending on the other traffic on that link. So you just need to be patient!

If the link collapses somewhere on between it needs to start over again. I know this sounds stupid but implementing a continuing sync that can pick up on a broken file is way more advanced. Not something we can implement on short notice.

mwolfe60

@Sebastian-Roth Thanks for taking a look. I lowered the max bandwidth that the replication can use so it would not crush our link and allowed it to run all night. The images have replicated and traffic has dropped off. again thanks for the help. I think everything is working now.

master node sending data to storage node.

120

12.3k

17.4k

155.8k