Intermittent error 504
-
@Kagashe Let me start with the standard response.
///
Lets assume is the issue we’ve found after FOG 1.5.4 has been released.- Change to the /etc directory from the fog server linux command prompt.
- Search for www.conf file. It can be in a number of locations depending on what version of php is installed. Use this command.
find /etc -name www.conf
(hopefully you will only find one) - Search for
php_admin_value[memory_limit] = 32M
. It should have a comment mark in front of the line. Remove the comment mark and replace 32MB with 256MB. Your entry should look like below.
php_admin_value[memory_limit] = 256M
- Save and exit your text editor.
- Reboot the fog server.
- See if that fixes what is wrong. You really should only see this strangeness under heavy load, but I guess it might show up sooner under certain conditions.
///
That should address the gateway timeout issue under load.
-
Found it, let me give it a run and see how it goes, thanks for the help.
-
Hi George,
Made the changes and still seem to get the error but getting less, seems to happen most when I add another storage location, also having very long load times in the browser.
-
@Kagashe How many clients with installed fog-client do you have?
As well please take a look at the apache error log and post here. See my signature on where to find the log.
-
@Kagashe What does
top
sorted byP
rocesses say as your highest cpu user? -
Hi Sebastian and George,
I currently have around 300 hosts and around 10 Mirrors with the locations add-on registered but aim to get to 600 odd hosts and 36 mirrors in the coming months.
Top 3 processes CPU usage is php-fpm: pool www with at most about 12% each.
Load on the system is showing as 0.34 0.33 0.29
System is an i3 Processor with 16GB of DDR4 Ram a 120GB solid state and a 1TB Mechanical drive mounted at /images.
Running Ubuntu Server 16.04.Apche Log shows the following error every 2 odd seconds
[Fri Nov 02 14:13:10.001117 2018] [proxy_fcgi:error] [pid 3236] [client 10.10.3.7:41454] AH01071: Got error 'PHP message: PHP Notice: A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice: A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 109\nPHP message: PHP Notice: A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\nPHP message: PHP Notice: A non well formed numeric value encountered in /var/www/fog/status/bandwidth.php on line 110\n
php-fpm log loops the following statement
[25-Sep-2018 15:01:01] NOTICE: fpm is running, pid 29251 [25-Sep-2018 15:01:01] NOTICE: ready to handle connections [25-Sep-2018 15:01:01] NOTICE: systemd monitor interval set to 10000ms [25-Sep-2018 15:03:40] NOTICE: Terminating ... [25-Sep-2018 15:03:40] NOTICE: exiting, bye-bye!
-
@Kagashe First on the errors you see about the bandwidth.php, while its a problem its just an annoyance message at the moment.
The fpm message is a bit troubling and I need to look into it.
Your configuration is a bit abnormal (based on what we typically see) with the number of mirrors. That is fine, it might mean that we need to tweak our configuration a bit.
There is a timeout setting I would like you to make/change to see if we can address this issue. Specifically to tell apache to wait a bit longer for php-fpm to respond than its default settings. I can see the potential if you have your mirrors behind a slow network connection that it may take a little longer to get a response.
<Proxy "fcgi://127.0.0.1:9000"> ProxySet timeout=300 </Proxy>
https://forums.fogproject.org/topic/11713/503-service-unavailable-error/40
and
https://forums.fogproject.org/topic/12057/fog-unresponsive-under-heavy-load/17 -
@george1421 Sorry haven’t answered been a hectic few days, have updated my config with the script you posted, will test it and feedback. Thanks for all the help so far, it’s much appreciated.
-
@george1421 This seems so have solved the issue so far, I’ve added to more mirrors to the mix and still haven’t had the 504 error again.
-
@Kagashe so what solved it. The timeout value being added to the apache config file?
-
@george1421 Yes that was the last change I made to the config.
-
@george1421 Aaaaaaand it just happened again
-
@Kagashe Trying to reach you on chat. See the speech bubble in the top right corner.
-
Marking as unsolved for now as the issue was worked around by George’s good advice to adjust proxy timeout but I am working on fixing the code to not have those very long timeouts at all.
-
@Kagashe Ok here we go. Can you please upgrade all your nodes to the latest
dev-branch
version and see if you still get timeouts? I’d even suggest reverting the proxy timeout change (if not reverted by upgrage anyway) to see if the code improvements really do make a difference. -
@Sebastian-Roth Have upgraded all the sites to 1.5.5.1 except one where I’m getting the following error
Adding Needed Repository…Failed!
Apart from that, all seems to be going well.
-
@Kagashe Please check the install logs which reside in the
bin
directory from which you run the installer.Are all your servers Ubuntu 16.04?
-
@Sebastian-Roth All servers are Ubuntu 16.04, below are the logs
/usr/bin/lsb_release /bin/systemctl Reading package lists... Building dependency tree... Reading state information... ntpdate is already the newest version (1:4.2.8p4+dfsg-3ubuntu5.9). software-properties-common is already the newest version (0.96.20.7). python-software-properties is already the newest version (0.96.20.7). The following packages were automatically installed and are no longer required: libapr1 libaprutil1 libaprutil1-dbd-sqlite3 libaprutil1-ldap libbrotli1 libgd3 libjansson4 libjbig0 liblua5.2-0 libnghttp2-14 libtiff5 libwebp6 php-common Use 'sudo apt autoremove' to remove them. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. 29 Nov 13:40:51 ntpdate[8213]: adjust time server 46.17.63.196 offset -0.019258 sec Generating locales (this might take a while)... en_US.UTF-8... done Generation complete. gpg: keyring `/tmp/tmpr79_9sh8/secring.gpg' created gpg: keyring `/tmp/tmpr79_9sh8/pubring.gpg' created gpg: requesting key E5267A6C from hkp server keyserver.ubuntu.com Error: retrieving gpg key timed out. gpg: /tmp/tmpr79_9sh8/trustdb.gpg: trustdb created gpg: key E5267A6C: public key "Launchpad PPA for Ondřej Surý" imported gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1) OK gpg: keyring `/tmp/tmp0z4tf8uy/secring.gpg' created gpg: keyring `/tmp/tmp0z4tf8uy/pubring.gpg' created gpg: requesting key E5267A6C from hkp server keyserver.ubuntu.com Error: retrieving gpg key timed out. gpg: /tmp/tmp0z4tf8uy/trustdb.gpg: trustdb created gpg: key E5267A6C: public key "Launchpad PPA for Ondřej Surý" imported gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1) OK
-
@Kagashe said in Intermittent error 504:
gpg: requesting key E5267A6C from hkp server keyserver.ubuntu.com
Error: retrieving gpg key timed out.There is nothing we can do. It’s not able to retrieve the key from the server. Probably a firewall is blocking that request?
To get around this you can edit the script code
fogproject/lib/common/functions.sh
, jump to line 601 and comment that so it looks like this.... esac ;; esac # errorStat $? dots "Preparing Package Manager" $packmanUpdate >>$workingdir/error_logs/fog_error_${version}.log 2>&1 ... Now rerun the installer and you should get past the repo stuff.
-
@Sebastian-Roth Weird I figured that was the issue, so I whitelisted the machine on my firewall just to test but still didn’t work and there’s nothin in the firewall logs to suggest that it’s blocking anything, it’s a strange one…
Will give your suggestion a go now.