Images not replicating to Storage node after upgrade
I seem to have a Storage Node replication problem from the main FOG server after upgrading it from 3594 to 5566.
The main FOG server (in Sydney with IP 192.168.23.10), the Storage Node (in Melbourne with IP 192.168.20.8). Are both running on CentOS 6.7.
When I am running trunk 3594 on the main server, image replication works.
[12-01-15 11:21:17 am] * Found image to transfer to 1 group(s) [12-01-15 11:21:17 am] | Image name: Dell XPS 13 [Win 10] [12-01-15 11:21:17 am] * Melbourne Storage Node - SubProcess -> Transferring file `d1.fixed_size_partitions' [12-01-15 11:21:18 am] * Melbourne Storage Node - SubProcess -> Transferring file `d1.mbr' [12-01-15 11:21:19 am] * Melbourne Storage Node - SubProcess -> Transferring file `d1.minimum.partitions' [12-01-15 11:21:19 am] * Melbourne Storage Node - SubProcess -> Transferring file `d1.original.fstypes' [12-01-15 11:21:19 am] * Melbourne Storage Node - SubProcess -> Transferring file `d1.original.partitions'
and I can see that the files are being created on the Storage node’s /image folder.
If I then upgrade the main server to 5566, then all that appears in the image replicator logs are…
[12-02-15 8:41:28 am] * Starting Image Replication. [12-02-15 8:41:28 am] * We are group ID: #1 [12-02-15 8:41:28 am] | We are group name: Sydney Storage Group [12-02-15 8:41:28 am] * We have node ID: #1 [12-02-15 8:41:28 am] | We are node name: Sydney Storage Node [12-02-15 8:41:28 am] * Not syncing Image between group(s) [12-02-15 8:41:28 am] | Image Name: Dell Latitude E7x40 - Base [12-02-15 8:41:28 am] | I am the only member [12-02-15 8:41:28 am] * Not syncing Image between group(s) [12-02-15 8:41:28 am] | Image Name: Dell Optiplex 9010 - Base [12-02-15 8:41:28 am] | I am the only member [12-02-15 8:41:28 am] * Not syncing Image between group(s)
I’m able to FTP from main server to storage node with the FOG user/pass that’s in .fogsettings on the storage node.
No bind options in /etc/my.cnf on both hosts.
From the Storage node, I am able to perform the command and make a successful connection to the database…
mysql -u fogstorage fog -pXXXXXXXXX -h192.168.23.10
If I downgrade to 3594, Image Replication works again.
All the credentials seem to be correct in all the areas that I should be looking at.
Just not sure why it’s broken when I upgrade to the latest trunk.
At the time of writing, I have left the Storage Node running at 5566 with the main server now downgraded to 3594 where Image Replication is currently working.
Hopefully you guys can provide me with some clues as to what else I should be checking.
@Wayne-Workman Thanks Wayne… understood.
I do use the Location plugin for this very reason. I guess because it had worked in a previous trunk, I just assumed that the way I had it set up originally was how it worked. In any case, I can easily change this and put all the storage nodes in to the one storage group, making Sydney the Master node.
You can set this one as resolved.
@Toby777 Hoping in this thread to try to answer some of the questions you posed.
We need to flow-chart how replication works. I started flow-charting it a while back but never finished… anyways…
As far as this statement:
So correct me if I’m wrong, in my case, I would just simply tick Is Master Node checkbox in the Melbourne Storage Node since it is in its own Melbourne Storage Group. Then as long as the Primary Group is set to Sydney Storage Group in each of the Images, then replication should kick in and copy the image to the Melbourne Storage Node that’s inside the Melbourne Storage Group?
I think you’re over-complicating the design.
If you only have two locations and want to share images between the two - it’s best to have one storage group, with all your nodes in that group. Set one as master.
All uploads go to the master node for the storage group that you’ve assigned the image to. If you don’t want to image across the WAN link, use the location plugin and define locations according to geographic layout. This way Melbourne imaging tasks use the Melbourne storage node.
What you’re doing now - with two storage groups - It works and in some cases it’s the way to go, but for your situation I feel like it’s just overly complex.
About the image wiping thing - why would you want to have multiple images with the same name? Again I’ll say that all uploads first go to the master node of the storage group the image is assigned to. After that, the image is replicated to other storage nodes in the group.
A few months ago @Tom-Elliott added some logic for sharing images across groups. I think he added the idea of “Primary”. When an image is shared across groups - the image replicator replicates that image from the master node it’s on currently to the master node of the other storage group. After that, as usual, the masters replicate to other nodes in their groups.
I’ll end this post with a plea to simplicity. Replication doesn’t need to be complex. The location plugin is amazing and should always be used when nodes in the same group span a WAN link. Before just “doing it”, you should sit down and plan out how you want your Storage setup. Draw it out, label everything.
@Tom-Elliott Thanks for that explanation. I think I’ve grasped the concept.
Yes your assumption is correct. Our Melbourne office is a small office actually with no I.T at all. So there would never be a need for them to update/upload an image.
Yes I do have several images set to be in both the Sydney & Melbourne Storage Groups, however I thought that since only the Sydney Storage Node is set to Master, only images from the Sydney node would be pushed through to the other nodes. Then there’s the issue you mentioned if one was to upload a fresh image to Melbourne Storage Node, then it would be wiped by replication from Sydney since it is set as the Master (which I was aware of from the warning next to the checkbox ).
So correct me if I’m wrong, in my case, I would just simply tick
Is Master Nodecheckbox in the Melbourne Storage Node since it is in its own Melbourne Storage Group. Then as long as the Primary Group is set to Sydney Storage Group in each of the Images, then replication should kick in and copy the image to the Melbourne Storage Node that’s inside the Melbourne Storage Group?
If say a Host location was set to Melbourne and was in Melbourne, and the Image’s Primary Group set to Sydney, Would the same thing happen then if for eg an image was uploaded to FOG in Melbourne, because the Primary Group for the Image is still set to Sydney, would that still also wipe the local Melbourne image with the one from the Sydney node?
Edit: Or is FOG smart enough to know that the Primary Storage Group for the image being uploaded is in Sydney, and so will then replace the Image in Sydney, then once done, replicate it back to the Melbourne Storage Node?
Apologies for the long write up. Hopefully this will be the last.
@Toby777 I’m guessing that the images you want across groups (Melborne and Sydney as of right now?) are setup to be a part of the groups, for example an image named win7 is setup to be in both groups? The only change that I can think of is I made what I am calling the “primary group”. The idea of this is to help ensure the images are replicated and done so properly. Without a primary group both side could potentially wreak havoc on trying to replicate the data. Think of the issue like this:
Melborne win7 image is freshly updated and you want the image replicate to the Sydney group. Sydney’s image replicator is in such a way that the replicator service runs before the Melborne replication service. Because Melbornes image is different, the Sydney image starts replicating it’s version of the image file which would delete the current one you uploaded. Now your fresh image is no longer proper. This is easily fixed by simply making one group the “main” group. Yes it does mean if you have Sydney as the primary but want/need Melborne as the primary due to the update that you would need to change what the primary group is.
Sorry about the long winded post but I felt giving all the info I can think about for it, it’s make a bit more sense.
ah right… i think i understand you.
So at the moment, I have a Sydney Storage Group and a Melbourne Storage Group, in which each Storage Node is in their respective Storage Group.
What I should be doing is scrapping that and having just 1 Storage Group called, Australia (for eg). And then putting the Sydney & Melbourne Storage Nodes in to that Australia Storage Group with Sydney being set as the Master?
Was this changed or has it always been like that?
Just strange that it worked in its current form at trunk 3594 but is different with the latest trunk.
Unless I’m missing something. Your storage node in Melbourne is in th storage group Melbourne. This is understood. But in the Melbourne what is the master node? IMO the Sydney server with ip 192.168.23.10 should be the master node in the Melbourne storage group. The Sydney server can also be a member of the Sydney storage group as a master node. What I see right now (reading the tea leaves a bit) is that in each storage group you only have one device. One group has a master node and storage node with the other group has just a storage node. There is nothing to copy in this setup because each server is in its own group. You need that linking connection.
The Sydney Storage Node points to itself. The main fog server 192.168.20.10
The Melbourne Storage Node points to 192.168.20.8
So in Location Management. I have Sydney and Melbourne.
In each of these, Sydney has Sydney Storage Group and Sydney Storage Node. Melbourne has Melbourne Storage Group and Melbourne Storage Node.
In Storage Management, there are the 2 Storage Nodes. One for Sydney and one for Melbourne, each pointing to their respective IP’s and each with their credentials, and each point to their respective Storage Group. Only the Sydney Storage Node is set as the master.
While I’m just looking at the obvious here.
In the top picture it says Melbourne storage node in the bottom it says Sydney storage node. What groups is host 192.168.20.8 really in? Is it in the Sydney or Melbourne group. What ever group its in, is both 192.168.20.8 and 192.168.23.10 in the same storage group, 192.168.23.10 configured as the master server?
I also noted the date on the second group. Is it really 12/2 there. I don’t think the date comes into play. I just want to ensure the date is consistent between the two servers.