Replication Bandwidth Limiter - not totally working
-
-
@george1421 I’m all ears.
-
@Jbob Very nice, but still this does not explain why the bandwidht reported 800Mbps when the limit was set to 500000Kbps.
-
If you really want to go there, using the current svn trunk (if things go sideways, just rerun the fog installer to correct the files. No harm done then)
Edit this file:
/var/www/html/fog/lib/service/FOGService.class.php
Search for the first occurrence of the word fragment: limit.
You should see something like this:
$limitsend = $this->byteconvert($StorageNodeToSend->get('bandwidth')); if ($limitmain > 0) $limitset = "set net:limit-total-rate 0:$limitmain;"; if ($limitsend > 0) $limitset .= "set net:limit-rate 0:$limitsend;"; $limit[] = $limitset; } } unset($StorageNodeToSend); $this->outall(_(' * Starting Sync Actions')); foreach ((array)$nodename AS $i => &$name) { $process[$name] = popen("lftp -e 'set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; ".$limit[$i]." mirror -c -R --ignore-time ".$includeFile[$i]." -vvv --exclude 'dev/' --exclude 'ssl/' --exclude 'CA/' --delete-first ".$myItem[$i].' '.$remItem[$i]."; exit' -u ".$username[$i].','.$password[$i].' '.$ip[$i]." 2>&1","r");
Insert the following line after: $this->outall(_(’ * Starting Sync Actions’));
$this->outall(_(' * Speed limiter settings: $limitset'));
That should make it like this:
unset($StorageNodeToSend); $this->outall(_(' * Starting Sync Actions')); $this->outall(_(' * Speed limiter settings: $limitset')); foreach ((array)$nodename AS $i => &$name) {
Stop and restart the FOGImageReplicator. You may need to delete a file on the storage node for it to see a change. When the replication runs it should output the speed limits to /opt/fog/log/FOGImageReplicator
I’d do this in my test environment but its only partially rebuilt.
-
I’m more interested in knowing if this is really not limiting, or if it’s because of multiple instances of the replication being started at the same time.
-
@Tom-Elliott said:
I’m more interested in knowing if this is really not limiting, or if it’s because of multiple instances of the replication being started at the same time.
We are using two full server installations, with the MySQL stuff pointed at the master. Could it be related to multiple instances of the FOGImageReplicator running on multiple servers at once?
-
What I mean by the info, is I need to see the commands as they’re being sent.
The replicator, as far as I knew, actually did replicate with the proper limiters in place, however, if you’re attempting to replicate multiple images, it does not replicate them sequentially.
There are reasons behind this.
First, if you do it sequentially, the replicator only runs based on it’s time period after the last image completes replication. This, by itself, is not entirely bad, but just imagine a scenario where you have 15 images that need to replicate to 12 separate nodes. (All in the same group).
I’m not going to do the math, but our replicator defaults to a 10 minute cycle. But all of the replication must happen before the cycle can wait for it’s time period. Now limiting is fine and understandable, but it could take hours, days, or even weeks depending on the sizes of the images to get one replication cycle complete.
I replicate them, now, asynchronously and read the completion state synchronously so we can get all of the nodes syncing.
Because of this, the bandwidth limiter is kind of a misnomer, I guess, because the limiting is on a per instance basis, not an overall basis. I have not figured out HOW to get it to limit it in whole and if I could, I most definitely would.
-
FWIW: I have seen if you stop the fog image replicator while a transfer is underway the lftp process will continue. If you stop and restart the fog image replicator multiple times, you might end up with 3 or more lftp processes running at the same time each with their own bandwidth limitation and ignorant of the other running processes.
-
@Wayne-Workman @george1421 You’re correct it starts its own instances of the items, and it does NOT kill the original started instances.
I’ve now corrected this and commonized the command starter functions so MulticastManager, Snapin and Image replication will use these methods as well.
This means, under the latest svn, the ImageReplicator, SnapinReplicator, and MulticastManager will now all close their opened commands when the services are stopped/restarted.
Hopefully that should limit the clutter of multiple lftp commands using more and more bandwidth. It does not fix the issue of starting multiple instances, for all purposes, asynchronously and I think this is fine.
Unless you really want the items to replicate one at a time, which would do the limiting more properly, but it will also require that much more time for the data to actually transfer to the nodes/groups receiving it.
-
@Tom-Elliott In probably 99% of all scenarios, only one image ever needs replicated at any given time. Because who will upload two images simultaneously? That would be a rare occurrence, I think.
I’m glad to hear that the code has been cleaned up/improved in the replicator/multicast areas and is more manageable now.
-
I’ve decided to resolve the thread as the limiter IS in fact working. And most people, you’re right, will only be replicating one image or another.