Host/Group replication between FOG Servers
-
@adukes40 The script from the main server looks good - that’s how it should look from the node.
Run them on the node. Yes, just copy paste.
-
@Wayne-Workman this is what i get:
Sorry, I am new with Linux, so it is all a learning curve for me.
-
@adukes40 Try it with a simple password. It goes between the empty quotes.
-
All I get is invalid syntax. No clue what I am doing wrong.
-
@adukes40 I sent you a message. Top right talk bubble.
-
I remoted in to help.
bind address 127.0.0.1 was set in my.cnf, I commented that and then we were able to connect to mysql from the remote node using the
fogstorage
username and password.However, when visiting the storage node’s boot.php file with a registered MAC address appended like this:
http://10.106.2.149/fog/service/ipxe/boot.php?mac=b8:ac:6f:3d:6e:a4It spits just this out:
set fog-ip set fog-webroot set boot-url http://${fog-ip}/${fog-webroot}
also, an apache2 error pops up in the node’s logs:
[Tue May 24 16:05:32.244071 2016] [:error] [pid 20780] [client 10.106.10.5:12079] PHP Fatal error: Call to a member function lastInsertId() on null in /var/www/html/fog/lib/db/pdodb.class.php on line 124
FOG version 7829
Both main and node is Ubuntu 14.04 LTS. -
I just spoke briefly with the @Senior-Developers and they said that the storage node installation actually communicates with the main server’s DB to do certain things.
What this means is because of the bind address previously, those things the installer needed to do didn’t get done.
So, now that we’ve sorted out the connectivity issues, all you should need to do is re-run the installer on the storage node (no rebuild is required at all!), and then things should be working for you.
-
After re-running the installer on the storage node and rebooting the main server, this is what’s in the replication logs, and it just hangs and never replicates. @Tom-Elliott
[05-24-16 9:37:59 pm] | Image name: testing [05-24-16 9:37:59 pm] * Found Image to transfer to 2 node(s) [05-24-16 9:37:59 pm] | I am the only member [05-24-16 9:37:59 pm] | Image Name: testing [05-24-16 9:37:59 pm] * Not syncing Image between group(s) [05-24-16 9:37:59 pm] | We are node name: CA - MasterNode [05-24-16 9:37:59 pm] * We have node ID: #1 [05-24-16 9:37:59 pm] | We are group name: default [05-24-16 9:37:59 pm] * We are group ID: #1 [05-24-16 9:37:59 pm] * Starting Image Replication. [05-24-16 9:37:59 pm] * Starting service loop [05-24-16 9:37:59 pm] * Checking for new items every 600 seconds [05-24-16 9:37:59 pm] * Starting ImageReplicator Service [05-24-16 9:37:59 pm] Interface Ready with IP Address: 167.21.42.13 [05-24-16 9:37:59 pm] Interface Ready with IP Address: 10.103.72.49
-
I’ve verified both the main and node’s FTP credentials.
I’ve toggled the master node for both, just to reset it.
I’ve made test images and test snapins but both hang at exactly where lftp should start.
I’ve also uninstalled and reinstalled lftp. -
Found this interesting message in /var/log/syslog:
May 24 18:20:30 MSDCATS09 kernel: [ 2560.753652] init: vsftpd main process (4005) killed by TERM signal
-
snapins attempt to replicate, it seems… here’s the lftp log. But images are not.
root@MSDCATS09:/var/log# cat /var/log/xferlog Tue May 24 17:44:36 2016 1 10.103.72.49 12 /opt/fog/snapins/test.bat b _ i r fog ftp 0 * c Tue May 24 18:14:20 2016 1 10.103.72.49 12 /opt/fog/snapins/test.bat b _ i r fog ftp 0 * c
-
These log entries are also interesting:
cat /var/log/auth.log | grep vsftpd May 24 17:36:01 MSDCATS09 vsftpd[21829]: pam_unix(vsftpd:auth): check pass; user unknown May 24 17:36:01 MSDCATS09 vsftpd[21829]: pam_unix(vsftpd:auth): authentication failure; logname= uid=0 euid=0 tty=ftp ruser=apc rhost=10.103.67.62
-
@Tom-Elliott helped out, it was a bug in replication. It should be fixed in the current.