Storage Nodes stop reporting after a while
-
The apache logs on one of the remote storage nodes is full of the below errors. @Tom-Elliott
[Wed Aug 17 13:36:15.502689 2016] [:error] [pid 34177] [client 10.51.1.53:46758] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14 [Wed Aug 17 13:36:15.710230 2016] [:error] [pid 30710] [client 10.51.1.53:46788] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14 [Wed Aug 17 13:36:19.494413 2016] [:error] [pid 30706] [client 10.51.1.53:46826] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14 [Wed Aug 17 13:36:19.617840 2016] [:error] [pid 30707] [client 10.51.1.53:46856] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14 [Wed Aug 17 13:36:23.516636 2016] [:error] [pid 30709] [client 10.51.1.53:46900] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14 [Wed Aug 17 13:36:23.699457 2016] [:error] [pid 30708] [client 10.51.1.53:46932] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14
-
I rebooted that particular Storage Node. For the moment, it appears to be reporting its version correctly. I expect it to quit again after some time passes.
-
Update on this.
The two storage nodes I’ve rebooted are sticking around with version reporting… Man this is weird.
I guess now the next step is to reboot them all and see what happens.
-
I checked the nodes version reporting and interface reporting this morning - it’s still working.
So - I have no idea why but - the moral of the story is if you change your
fogstorage
password on the main database, after changing thesnmysqlpass
field in/opt/fog/.fogsettings
on each node and re-running the installer, reboot the node. -
I spoke too soon.
They are all not reporting again. Considering - they stayed reporting all night long into the morning before anyone got here - this must be load related somehow.
-
I wrote a script that reboots them all, and ran that. They are all now reporting again…
This wasn’t an issue until I changed the password for
fogstorage
- Why would that cause this? -
Below is the apache error log from one of the nodes, snipped from when the server rebooted to when the issue happened again. The server operated for exactly 37 minutes before falling off again. All storage nodes fell off at the same time - since they were all rebooted at the same time.
[Thu Aug 18 08:29:09.737517 2016] [mpm_prefork:notice] [pid 1504] AH00170: caught SIGWINCH, shutting down gracefully [Thu Aug 18 08:29:22.823456 2016] [core:notice] [pid 1503] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0 [Thu Aug 18 08:29:22.907692 2016] [suexec:notice] [pid 1503] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Thu Aug 18 08:29:23.127056 2016] [ssl:warn] [pid 1503] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366) AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using fe80::215:5dff:fe01:a00. Set the 'ServerName' directive globally to suppress this message [Thu Aug 18 08:29:23.149139 2016] [auth_digest:notice] [pid 1503] AH01757: generating secret for digest authentication ... [Thu Aug 18 08:29:23.149672 2016] [lbmethod_heartbeat:notice] [pid 1503] AH02282: No slotmem from mod_heartmonitor [Thu Aug 18 08:29:23.150447 2016] [ssl:warn] [pid 1503] AH02292: Init: Name-based SSL virtual hosts only work for clients with TLS server name indication support (RFC 4366) [Thu Aug 18 08:29:23.230529 2016] [mpm_prefork:notice] [pid 1503] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.6.24 configured -- resuming normal operations [Thu Aug 18 08:29:23.230555 2016] [core:notice] [pid 1503] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND' [Thu Aug 18 09:07:34.373250 2016] [:error] [pid 2727] [client 10.51.1.53:44934] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/lib/fog/eventmanager.class.php:62\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/lib/fog/eventmanager.class.php(62): iterator_to_array(Object(RegexIterator), false)\n#4 /var/www/html/fog/commons/init.php(137): EventManager->load()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/lib/fog/eventmanager.class.php on line 62 [Thu Aug 18 09:07:38.374827 2016] [:error] [pid 2727] [client 10.51.1.53:45122] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14 [Thu Aug 18 09:07:42.356020 2016] [:error] [pid 2727] [client 10.51.1.53:45330] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14 [Thu Aug 18 09:07:46.249272 2016] [:error] [pid 2727] [client 10.51.1.53:45518] PHP Fatal error: Uncaught exception 'UnexpectedValueException' with message 'RecursiveDirectoryIterator::__construct(/var/www/html/fog/lib/plugins/capone/html/images): failed to open dir: Too many open files' in /var/www/html/fog/commons/init.php:14\nStack trace:\n#0 [internal function]: RecursiveDirectoryIterator->__construct('/var/www/html/f...', 4096)\n#1 [internal function]: RecursiveDirectoryIterator->getChildren()\n#2 [internal function]: FilterIterator->next()\n#3 /var/www/html/fog/commons/init.php(14): iterator_to_array(Object(RegexIterator))\n#4 /var/www/html/fog/commons/init.php(121): Initiator->__construct()\n#5 /var/www/html/fog/commons/base.inc.php(10): require('/var/www/html/f...')\n#6 /var/www/html/fog/status/bandwidth.php(2): require_once('/var/www/html/f...')\n#7 {main}\n thrown in /var/www/html/fog/commons/init.php on line 14
-
Found a blurb about this error on Stack Overflow:
http://stackoverflow.com/questions/14748499/fatal-error-too-many-open-files
Quote:
Don’t store DirectoryIterator objects for later; you will get an error saying “too many open files” when you store more than the operating system limit (usually 256 or 1024).
For example, this will yield an error if the directory has too many files:
<?php $files = array(); foreach (new DirectoryIterator('myDir') as $file) { $files[] = $file; } ?>
Presumably, this approach is memory intensive as well.
source: http://php.net/manual/pt_BR/directoryiterator.construct.php#87425
-
Here is a more detailed post about it:
-
I’ve found this site that shows how to set new limits properly on Ubuntu.
https://www.drupal.org/node/474152
I’ve converted the commands for CentOS 7/Fedora/RHEL - and made it into an SSH command ran against a remote server.
ssh aifog "echo 'apache soft nofile 1600' >> /etc/security/limits.conf;echo 'apache hard nofile 65535' >> /etc/security/limits.conf;systemctl restart httpd"
Note that the above is an ssh command that uses aliases and pki-based certificates - that my project called ssh-pki-setup has already established for me. That command executes the commands in double quotes against the remote server, and it successfully added the options to the remote server’s file and restarts httpd.
If successful, I’ll do this for all storage nodes - and post commands that can be ran manually too if you don’t have so many servers to manage as I do and prefer manual.
-
After the last reboot - all storage nodes are still reporting. But I assume they will eventually fail. I’m just waiting for that to happen - to see if the one I modified hangs in there.
In the mean time, I’ve been watching the number of files open by apache with this command I put together:
lsof -l -u apache | wc -l
Open files have been steadily growing:
[root@wgfog ~]# lsof -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6316 [root@wgfog ~]# lsof -u apache -l | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6412 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6424 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6432 [root@wgfog ~]# clear [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6444 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6448 [root@wgfog ~]# [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6448 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6456 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6456 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6464 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6472 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 7010 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 7538 [root@wgfog ~]# [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6444 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6448 [root@wgfog ~]# [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6448 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6456 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6456 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6464 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 6472 [root@wgfog ~]# lsof -l -u apache | wc -l lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/42/gvfs Output information may be incomplete. 7010
-
Ok, the node I’ve been monitoring open files on - just fell off.
And - the magical limit is…
8006
files open by Apache - causes everything to just stop.And… the one storage node I modified… it stopped reporting too…
-
This article seems relevant but I don’t see how it could help in the case of FOG. https://httpd.apache.org/docs/current/vhosts/fd-limits.html
-
I think this must be a file descriptors leak in the storage node configuration.
I’m going to patch it myself after lunch.
-
It would also appear that Apache is upping the file descriptors limits as it can - but it just can’t keep up, because there are so many files getting opened so fast, and not closed.
-
Figured out the bandwidth monitor is the primary file that’s getting so many calls.
bandwidth.php
Currently hunting down every place where it’s used - including it’s define line in
config.class.php
calledSTORAGE_BANDWIDTHPATH
-
[root@mastaweb fog]# grep -rl 'fopen' . ./status/logtoview.php ./lib/fog/eventmanager.class.php ./lib/fog/reportmaker.class.php ./lib/fog/hookmanager.class.php ./lib/fog/tasktype.class.php ./lib/fog/fogbase.class.php ./lib/fog/schema.class.php ./lib/fog/fogpage.class.php ./lib/pages/fogconfigurationpage.class.php ./lib/db/mysqldump.class.php ./lib/service/fogservice.class.php ./lib/client/snapinclient.class.php ./client/download.php [root@mastaweb fog]# grep -rl 'readfile' . [root@mastaweb fog]# grep -rl 'file_get_content' . ./status/bandwidth.php ./lib/fog/fogbase.class.php
These are a list of all files that are calling fopen and file_get_contents.
-
I’m struggling to find the file descriptor leak.
For now, I’ve patched together a way to keep things going. This is a CRON script that executes ssh commands against remote servers - and uses existing pki ssh authentication. I’ve set it up to run every 2 minutes.
echo=$(command -v echo) ssh=$(command -v ssh) array=( aifog annex bmfog clfog cvfog ckfog dufog fmfog hffog jwfog lhfog prfog rofog wgfog ) for i in "${array[@]}" do $ssh $i "lsof -l -u apache | wc -l > /root/apacheOpenFiles.txt" > /dev/null 2>&1 openApacheFiles=$($ssh $i "cat /root/apacheOpenFiles.txt") if [[ "$openApacheFiles" -gt 8000 ]]; then $ssh $i "killall --user apache" > /dev/null 2>&1 fi done
-
Additionally - apache error logs are filling up really fast on all the storage nodes from this issue. I’m going to have to script clearing them.
-
Cross-linking similar thread:
https://forums.fogproject.org/topic/8314/fog-log-viewer-for-nodes-giving-error