Replication has stopped after ugprade
-
@moses if you manual stop and start the services do things work?
-
@Tom-Elliott Nope, though the behavior there is kind of odd:
If I restart, it says “failed”.
If I stop, then start, it says “OK” for both, but nothing in the logs, and an image that is supposed to be replicating is not.
-
@moses So you’re running a variant of Ubuntu, Just guessing?
-
@Tom-Elliott ubuntu 14.04
-
@moses Can you run:
sudo service vsftpd stop sudo service FOGImageReplicator stop sudo service FOGSnapinReplicator stop sudo service FOGPingHosts stop sudo service FOGMulticastManager stop sudo service FOGScheduler stop sleep 5 sudo service vsftpd start sudo service FOGImageReplicator start sudo service FOGSnapinReplicator start sudo service FOGPingHosts start sudo service FOGMulticastManager start sudo service FOGScheduler start
-
@Tom-Elliott okay, did that:
administrator@SVR-HQ-IMAGING:~$ sudo service vsftpd stop vsftpd stop/waiting administrator@SVR-HQ-IMAGING:~$ sudo service FOGImageReplicator stop * Stopping FOG Computer Imaging Solution: FOGImageReplicator [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGSnapinReplicator stop * Stopping FOG Computer Imaging Solution: FOGSnapinReplicator [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGPingHosts stop * Stopping FOG Computer Imaging Solution: FOGPingHosts [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGMulticastManager stop * Stopping FOG Computer Imaging Solution: FOGMulticastManager [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGScheduler stop * Stopping FOG Computer Imaging Solution: FOGScheduler [ OK ] administrator@SVR-HQ-IMAGING:~$ sleep 5 administrator@SVR-HQ-IMAGING:~$ sudo service vsftpd start vsftpd start/running, process 9607 administrator@SVR-HQ-IMAGING:~$ sudo service FOGImageReplicator start * Starting FOG Computer Imaging Solution: FOGImageReplicator [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGSnapinReplicator start * Starting FOG Computer Imaging Solution: FOGSnapinReplicator [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGPingHosts start * Starting FOG Computer Imaging Solution: FOGPingHosts [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGMulticastManager start * Starting FOG Computer Imaging Solution: FOGMulticastManager [ OK ] administrator@SVR-HQ-IMAGING:~$ sudo service FOGScheduler start
Still no change, however (no replication or log changes)
-
I just updated to the latest, then deleted all images on my slave node, and quickly rebooted both master and slave machines… we’ll see how it goes.
-
@Wayne-Workman See anything on that slave? Trying to determine if this is just an issue somewhere on the master I’m running.
-
@moses It replicated every image back perfectly. There’s something with your setup. Keep in mind it could still be FOG-related though, such as a credentials issue, it could be firewall related, or SELinux related. You might even be out of space on the storage node?? There are many possibilities.
I know you have said you’ve verified the FTP credentials, but let’s make doubly sure? There are instructions here for testing it: https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_FTP
Basically you should ssh into your main and try to open an FTP connection to the other nodes using the user/pass they have set in their respective listing under storage management.
Then the opposite, from the nodes, try to open an FTP connection to the main using the main’s credentials that are set in it’s respective Storage Management listing.
-
@moses Did you find out what’s going wrong with the services? Any errors in the apache log when you try starting the service?
We’ve fixed the bandwidth.php issue. Can you please update to the latest version to see if those errors stop and the bandwidth monitor on the dashboard works for you.
-
Unfortunately, even with some hands-on help, I wasn’t able to determine what the cause was. At this point my best bet was that it’s related to Linux. Even upgrading to a newer distro didn’t help. I’m currently in the process of moving my configuration over to CentOS, once I back up my images.
-
@moses Are the services actually running after you started them by hand?
ps ax | grep FOG
-
if you run top, how much of your cpu do these services utilize?
-
After moving my installation over to CentOS, the Image Replication service now starts. Hooray!
But alas, I run into another issue. See the last line of the fogreplicator.log:
########################################### # Free Computer Imaging Solution # # Credits: # # http://fogproject.org/credits # # GNU GPL Version 3 # ########################################### [03-08-16 2:00:59 pm] Interface Ready with IP Address: 8.67.5.309 [03-08-16 2:00:59 pm] Interface Ready with IP Address: 192.168.1.66 [03-08-16 2:00:59 pm] Interface Ready with IP Address: SVR-HQ-IMAGING [03-08-16 2:00:59 pm] * Starting ImageReplicator Service [03-08-16 2:00:59 pm] * Checking for new items every 600 seconds [03-08-16 2:00:59 pm] * Starting service loop [03-08-16 2:00:59 pm] * Starting Image Replication. [03-08-16 2:00:59 pm] * We are group ID: #1 [03-08-16 2:00:59 pm] | We are group name: HQ [03-08-16 2:00:59 pm] * We have node ID: #1 [03-08-16 2:00:59 pm] | We are node name: SVR-HQ-IMAGING [03-08-16 2:00:59 pm] * Found Image to transfer to 3 group(s) [03-08-16 2:00:59 pm] | Image name: Universal-W7P-x64-ENGINEERING [03-08-16 2:01:00 pm] * Type: 8, File: /var/www/html/fog/lib/fog/fogbase.class.php, Line: 55, Message: Undefined index: REQUEST_METHOD
This error only appears if I add an image to one specific storage group. Should I maybe delete the storage group or the node there and re-create them?
-
@moses Thanks for reporting! I am sure Tom will fix this in a second. Anyhow, I don’t think this is a show stopper for you. Usually those “Undefined index” messages are just notices, not real PHP errors. Log should be going on… I hope!
By the way. Do you still have the fopen warnings in your apache error log? A fix has been pushed this morning.
-
@Sebastian-Roth In this case, it’s preventing the replication service from proceeding any further. Replication never starts, to any nodes. If I remove that particular storage group from that image, replication works find to all the others.
Checking on that now…
-
@Sebastian-Roth fopen errors are gone, now I got a shiny new error (repeated many times):
[Tue Mar 08 13:46:04.596088 2016] [:error] [pid 2547] [client 192.168.1.144:60293] PHP Warning: mysqli::query(): invalid object or resource mysqli\n in /var/www/html/fog/lib/db/mysql.class.php on line 52, referer: http://192.168.1.66/fog/management/index.php?node=home
-
After updating the node in question (despite it failing at “Restarting apache2 for fog vhost”), replication has been fixed.
-
@moses in the installer’s bin/error_logs/fog_error_<versionNumberOfInstall>.log what’s at the end of the file?