Storage Group node replication
-
Hi,
I have a question regarding replication for additional storage groups.My fog setup consists of 1 main server and 2 storage nodes (let’s call them nodes A and B.) The main server is in a storage group by itself and I don’t care about replication there because it’s a VM that backs up with snapshots. Node A is in a second storage group with the “Is Master Node” enabled. Node B is also in the second storage group.
My problem is that images are not being replicated from Node A to Node B. Am I thinking correctly that that’s the way this should work? I see this in the replication log every 10 minutes:
[COLOR=#555555][10-09-14 1:04:42 pm] * Starting Image Replication.
[10-09-14 1:04:42 pm] * We are group ID: #1
[10-09-14 1:04:42 pm] * We have node ID: #1
[10-09-14 1:04:42 pm] * I am the only member, no need to copy anything!.[/COLOR]
[COLOR=#555555] [/COLOR]
It’s almost like the replication service doesn’t know about the second storage group. Thoughts?Running fog 1.1.2 (recently downgraded from 1.2.0 because of problems with imaging XP)
Thanks!
-
Image replication happens across nodes within the same group, not from group to group.
-
What this means, you need to look on the MASTER node of the second group’s /opt/fog/log/fogreplicator.log file to see if things are happening properly.
-
Ahh, good to know. I was under the impression that the main server handled replication for all groups.
So… knowing that, my next issue is that the FOGImageReplicator will not start on Node A. There are no logs in /opt/fog/log. When I run /etc/init.d/FOGImageReplicator Start, I get this:
- Starting FOG Computer Imaging Solution: FOGImageReplicator [fail]
I even tried restarting and get this: - Restarting FOG Computer Imaging Solution: FOGImageReplicator
start-stop-daemon: warning: failed to kill 12522: No such process
What might be causing the service to fail?
- Starting FOG Computer Imaging Solution: FOGImageReplicator [fail]
-
you probably need to run sudo
[code]sudo /etc/init.d/FOGImageReplicator stop
sudo /etc/init.d/FOGImageReplicator start[/code] -
oh duh. Okay it appears that service is running now
-
Stopping FOG Computer Imaging Solution: FOGImageReplicator
start-stop-daemon: warning: failed to kill 12557: No such process [ OK ] -
Starting FOG Computer Imaging Solution: FOGImageReplicator [ OK ]
However, still no logs and still no sign of replication happening.
P.S. Thanks for the quick responses!
-
-
Can you screen shot this page?
[url]http://<ipaddress[/url] of fog server>/fog/management/index.php?node=about&sub=log
[Select file Replicator]Log should look something like this…
[QUOTE][SIZE=13px][COLOR=#555555][10-09-14 3:04:20 pm] * Starting Image Replication.[/COLOR][/SIZE]
[SIZE=13px][COLOR=#555555][10-09-14 3:04:20 pm] * We are group ID: #1[/COLOR][/SIZE]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * We have node ID: #1[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * Found: 2 other member(s).[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] [/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * My root: /images[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * Starting Sync.[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * Syncing: MC Server[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `DELLD630’[/SIZE][/COLOR][COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `DellD610’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `GX620’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `HPDC5700’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `HPDC7800’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `LenovoR500’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `LenovoR61’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `Toshiba’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `postdownloadscripts’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> [/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Complete[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * Syncing: Tech Windows Svr[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `DELLD630’[/SIZE][/COLOR][COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `DellD610’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `GX620’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `HPDC5700’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `HPDC7800’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `LenovoR500’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `LenovoR61’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `Toshiba’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Mirroring directory `postdownloadscripts’[/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> [/SIZE][/COLOR]
[COLOR=#555555][SIZE=13px][10-09-14 3:04:20 pm] * SubProcess -> Complete[/SIZE][/COLOR][/QUOTE]
-
Yep - here you go:
[url=“/_imported_xf_attachments/1/1418_log-screenshot.png?:”]log-screenshot.png[/url]
-
OK so it defiantly looks like your Group ID#2 does not even try to replicate. Unfortunately, I only have one group, so verifying this is a bug will be a little more difficult. Can you attach and upload the replication log file?
-
Sure, here it is. Some more information too since the log will show some other details: All 3 servers were in the same group at one point and the main server successfully replicated to the other 2 nodes. On October 8th, the 2 nodes ran out of space because they have smaller hard drives than the master. So I decided to take split them out into their own group and have Node1 be the master and have it replicated to Node2 for redundancy.
You’ll see in the log on 10-08-14 11:43:28am is when the drives ran out of space. I removed the nodes from the group at 10-08-14 2:57:35pm, which is when the main server started saying there’s only one member and it doesn’t need to copy anything. While the 2 nodes weren’t part of any group, I manually deleted the images from the /images directory. Then I created a new group and added the nodes to it. I uploaded an image without issue and I see it in the /images directory on node1 but it has never been copied to node2. Side note: When I try to deploy this image, I get an error on the client saying “Unable to locate image store”… Any chance these 2 problems are related?
Tom pointed out above that servers only handle replication within their group, so it makes sense that the log is not showing attempts for group number 2. However, there is no fogreplicator.log file on Node1 to view. When I restart FogImageReplicator, nothing happens.
[url=“/_imported_xf_attachments/1/1421_fogreplicator.txt?:”]fogreplicator.txt[/url]
-
Update: This is resolved.
There were 2 issues:
-
I had the wrong mysql username/password in Config.class.php on the nodes. During the node install, I had given it a generic root account but found out earlier today that I specifically needed the username/password for storage nodes, which I found under the storage node section in settings on the main server.
-
Second: I realized the nodes couldn’t even connect to the main database, and I was able to narrow that down to the my.cnf file on the main server. It had the bind address set to 127.0.0.1. I commented that line out, restarted mysql, and then they were able to connect.
After I verified everything was talking correctly, I restarted the FOGImageReplicator service and everything instantly started copying. The fogreplicator.log file on Node1 shows Group #2 as I expected it might.
This also fixed an issue that I never really cared about enough to fix… On the main fog dashboard page under disk information, It always told me “Failed to connect to” when I selected Node1 or Node2. that’s fixed now and I can see the disk usage for them.
Thanks for the help!
-