FOG Replication strange behavior

processor · Jun 21, 2024, 4:19 AM

FIrst let set the scene :

I have 2 FOGs server : 1 on main site 1 other on branch site.
Both are ubutnu 22.04
Each fog server storage node are masters on their site.
On main site, branch storage node has been added to default group but not as master. This way we should have only 1 way replication.

I can see on main site FOG server dashboard both storages.

This strange behaviour appear :
On main site FOG server (also detaining master storage node) I detected massive bandwidth utilisation. (input and output)
Regarding the input this do not surprise me as master should replicate images not already sync images on branch site.
Then regarding the input I don’t really understand as the master should no get anything from the branch other than images list.

So then I tail the replicator log and this is what I have :

[06-21-24 9:00:34 am]  * Starting Image Replication.
[06-21-24 9:00:34 am]  * We are group ID: 1. We are group name: default
[06-21-24 9:00:34 am]  * We are node ID: 1. We are node name: DefaultMember
[06-21-24 9:00:34 am]  * Attempting to perform Group -> Group image replication.
[06-21-24 9:00:34 am]  | Replicating postdownloadscripts
[06-21-24 9:00:34 am]  * Found Image to transfer to 1 node
[06-21-24 9:00:34 am]  | File Name: postdownloadscripts
[06-21-24 9:00:35 am]   # postdownloadscripts: No need to sync fog.postdownload (FOG-01)
[06-21-24 9:00:36 am]  * All files synced for this item.
[06-21-24 9:00:36 am]  | Replicating postinitscripts
[06-21-24 9:00:36 am]  * Found Image to transfer to 1 node
[06-21-24 9:00:36 am]  | File Name: dev/postinitscripts
[06-21-24 9:00:37 am]   # dev/postinitscripts: No need to sync fog.postinit (FOG-01)
[06-21-24 9:00:37 am]  * All files synced for this item.
[06-21-24 9:00:37 am]  * Not syncing Image between groups
[06-21-24 9:00:37 am]  | Image Name: Android Studio
[06-21-24 9:00:37 am]  | There are no other members to sync to.
[06-21-24 9:00:37 am]  * Not syncing Image between groups
....
and it does this for all images we host on master node
....
[06-21-24 9:00:40 am]  | Image Name: Android Studio
[06-21-24 9:00:42 am]   # Android Studio: No need to sync d1.fixed_size_partitions (FOG-01)
[06-21-24 9:00:43 am]   # Android Studio: No need to sync d1.mbr (FOG-01)
[06-21-24 9:00:45 am]   # Android Studio: No need to sync d1.minimum.partitions (FOG-01)
[06-21-24 9:00:45 am]   # Android Studio: No need to sync d1.original.fstypes (FOG-01)
[06-21-24 9:00:45 am]   # Android Studio: No need to sync d1.original.swapuuids (FOG-01)
[06-21-24 9:00:46 am]   # Android Studio: No need to sync d1.partitions (FOG-01)
[06-21-24 9:00:48 am]   # Android Studio: No need to sync d1p1.img (FOG-01)
[06-21-24 9:00:50 am]   # Android Studio: No need to sync d1p2.img (FOG-01)
[06-21-24 9:00:50 am]  * All files synced for this item.
...
and it does this for all images we host on master node and loop to the first part endlessly.

In about 1 hour I get near a 10000 lines log file for about only 90 images where I expected to have less than 1000.

Is it normal it loops over and over ?

This is how I configured the branch storage node on main server :

Many thanks by advance for any help.

Tom Elliott · Jun 21, 2024, 11:15 AM

@processor Yes this is normal.

We are checking each file of an image independently to determine if it needs to be synced or not.

In the past, there was a simpler mechanism. if the data varied at all int eh folder the entire contents of the secondary nodw would be deleted and resynced for that image. As you can imagine, this worked, and logging was much less of course. That said,

imagine, a sfdisk (partition file that’s simply a few bytes of text) had somebody go in and delete a line. The whole 40GB of that image would’ve been deleted and resynced over the network.

By checking each file of an image independently, we can limit the about of network usage to check/sync data and provide (in my opinion) more information on exactly what’s happening during it’s regular process.

processor · Jun 21, 2024, 12:08 PM

HI @Tom-Elliott,

Ok thanks for you anwers.

Is it possible to check for replication may be once a day @ night and not 24/24-7/7 as it going right now ?

Proc.

Tom Elliott · Jun 27, 2024, 10:25 AM

@processor You set the replicator sleep time (and sleep time for all services.)

FOG Configuration -> FOG Settings -> FOG_IMAGEREPSLEEPTIME

It’s defaulted to 600 (10 minute cycles.)
If you set it to 86400, that should do the checks once per day.

FOG Replication strange behavior

147

12.1k

17.3k

155.4k