FOG server crashing on group deployment
-
Running 1.5.4 on Kubuntu 16. Used to be able to group deploy image to labs of 30-60 machines, FOG would self queue and run, imaging all machines. Now 6-8 start imaging, then after another 6-8 queue up, the FOG server crashes. Unable to access Web GUI, getting gateway timeout error… I tried rolling back Kernel to 14.15.2, no luck.
-
This will most likely have to do with PHP-FPM crashing. Check the PHP-FPM logs in the FOG WebUI and post the logs here.
-
Attached are the Apache Error log. Only logs related to PHP.0_1529421739894_apache log.txt
-
@zagaeski Looks like it failed to rename the kernel/make the directory. (check permissions?)
Anyway, that’s not the important bit, you’re getting a lot of timeouts. Do you have a logfile of PHP-FPM itself? (regarding its children). On some debian based systems it was stuck on max_children of 5 which is very little if you’re going to deploy to a bunch of clients.
I’ve been trying out some settings on my own FOG installation that seem to work more smoothly if you’re up for some expermintation. (though will have to wait until tomorrow)
-
@quazz It appears max_children was set to 5. I have edited the www.conf file to increase max_children to 100. I will see what changes this makes.
-
@zagaeski There are two changes I would like you to make
- Drop the max children back to a reasonable number between 20 and 30. In reality you should not see more than 7 processes running at any one time. Setting this value to high may starve your system from memory if something really bad goes on. Leave the min and spare children set to 5.
- There is a memory limit commented out. It probably says
;php_admin_value[memory_limit] = 32M
Change it to look like this (make sure you uncomment the line)
php_admin_value[memory_limit] = 256
Also check to see if this value is set to 2000
pm.max_requests = 2000
Finally php-fpm has its own log file so errors will be in both logs.
I helped someone the other day who was having issues with mutlicasting and once we got the extra copies of php-fpm out of his system and used hte above settings he was able to image 24 systems via multicast without issue.
-
@george1421 I made the suggested edits to the www.conf file. Do I need to do anything with php-fpm? I found where the log was located.
-
@zagaeski
Those edits should be done to the php-fpm config file. Then restart php-fpm.Sorry I misread your post, just restart php-fpm and you should be good.
-
@george1421 This seems to have worked. I will try another large group tomorrow and see what happens.
-
@zagaeski Excellent. Next time you run the multicast, run the
top
program from the fog server console. You should see 5-7 php-fpm worker threads running. I would be interested in seeing if you have different results. -
Hi, I have the same problem, FOG crash when deploying to groups, this is really bad.
-
@fog_rob You probably should start your own thread here because your circumstances may be different.
But in general if you follow my guidance below: https://forums.fogproject.org/topic/12116/fog-server-crashing-on-group-deployment/5
of checking in php-fpm (www.conf file) that these settings are in place. You should have a successful group deployment.
pm.max_children = 50 pm.max_requests = 2000 php_admin_value[memory_limit] = 256
-
@george1421 said in FOG server crashing on group deployment:
@fog_rob You probably should start your own thread here because your circumstances may be different.
But in general if you follow my guidance below: https://forums.fogproject.org/topic/12116/fog-server-crashing-on-group-deployment/5
of checking in php-fpm (www.conf file) that these settings are in place. You should have a successful group deployment.
pm.max_children = 50 pm.max_requests = 2000 php_admin_value[memory_limit] = 256
Thank you Gearge, I’ll try that.
FYI: I tried to fog +>10 computers manually and it is crashing too after 10
-
@george1421 thank you George for the hint but after modifying the file, web stoped working and had to reinstall fog
Probably it is an acces right to modify, but I ain’t a linux expert. so hopefully this will be fix in an update …I’m doing 10 computer at a time and storage set to five to make it work
the display of storage infos goes crazy too, see attachement.
-
I just had this exact issue this week when doing a clean install of 1.5.4, the install went fantastic, but when I went out to a lab and did a group deployment I came back the next day to find the computers boot looping due to FOG timing out. I traced it down to a PHP issue, but ultimately ended up downgrading to 1.4.4 with no issues anymore.
-
@fog_rob Well that’s disappointing, I know the parameters fix the multicast imaging issues. I can understand that if you had something out of place FOG would appear to stop since the php-fpm is the engine that runs the site, and the www.conf file is the config file for the engine.
-
I am having the same issues as described above. I did the required changes and am now getting a HTTP 500 Error. Any recommendations on how to diagnose? Do I need to change the permissions on the PHP file? Apperently I am running out of memory? I have plenty…
Apache Error Log:
[Mon Jul 30 13:50:38.882162 2018] [proxy_fcgi:error] [pid 1231] [client 192.168.1.3:44270] AH01071: Got error ‘PHP message: PHP Fatal error: Allowed memory size of 2097152 bytes exhausted (tried to allocate 8192 bytes) in /var/www/html/fog/lib/fog/fogpage.class.php on line 239\nPHP message: PHP Fatal error: Allowed memory size of 2097152 bytes exhausted (tried to allocate 114688 bytes) in Unknown on line 0\n’Here are the last lines from the FPM log after I reloaded and rebooted.
[30-Jul-2018 13:42:49] NOTICE: reloading: execvp(“/usr/sbin/php-fpm7.1”, {“/usr/sbin/php-fpm7.1”, “–nodaemonize”, “–fpm-config”, “/etc/php/7.1/fpm/php-fpm.conf”})
[30-Jul-2018 13:42:49] NOTICE: using inherited socket fd=8, “127.0.0.1:9000”
[30-Jul-2018 13:42:49] NOTICE: using inherited socket fd=8, “127.0.0.1:9000”
[30-Jul-2018 13:42:49] NOTICE: fpm is running, pid 9096
[30-Jul-2018 13:42:49] NOTICE: ready to handle connections
[30-Jul-2018 13:42:49] NOTICE: systemd monitor interval set to 10000ms
[30-Jul-2018 13:43:27] NOTICE: Terminating …
[30-Jul-2018 13:43:27] NOTICE: exiting, bye-bye!
[30-Jul-2018 13:43:27] NOTICE: fpm is running, pid 9648
[30-Jul-2018 13:43:27] NOTICE: ready to handle connections
[30-Jul-2018 13:43:27] NOTICE: systemd monitor interval set to 10000ms
[30-Jul-2018 13:43:31] NOTICE: Terminating …
[30-Jul-2018 13:43:31] NOTICE: exiting, bye-bye!
[30-Jul-2018 13:44:41] NOTICE: fpm is running, pid 1027
[30-Jul-2018 13:44:41] NOTICE: ready to handle connections
[30-Jul-2018 13:44:41] NOTICE: systemd monitor interval set to 10000ms -
Got it to run by upping my memory limit to 1280M. Testing now. Currently only getting 200Mb/s transfer speed on eno1.
-
@reese I am having the same issue. How did you safely downgrade back to 1.4.4. I had no issues there but can’t image an entire lab anymore. Just locks up. Hints on the downgrade process would be appreciated!