Group deployment with Snappin failed and Lost GUI access. Fog server is down
-
I Deploy using groups to do my monthly updates for Java, Adobe productions, PLUS shortcut changes. I put everyone in a group likely I always do and made sure everyone is part of the snap-ins that are required for this push. I push to everyone in the group. When I push to this many PC normally fog website turns super slow and sometimes times out. As it normally does, turn slow, very slow. Then I lost the website. I figure give it some time, users started to call in and said their PC rebooted and uninstall Java. This happen on 254 PCs today. After rebooting fog server I still don’t have web access. . I reinstall it thinking I would gain access again. I got the log in screen and got to the home page. Click on tasks to see what the fog server is doing, I got page can’t be displayed. Now that is all I see.I am running Fog 1.5.4
-
I want you to search the /etc directory for www.conf file using.
find / -name www.conf
It should be in a parent directory with php-fpm in the name. You should only (better only) find one of these files. You need to edit it.
Search the file and confirm these settings are correctphp_admin_value[memory_limit] = 256M pm.max_requests = 2000 pm.max_children = 50 pm.min_spare_servers = 5 pm.start_servers = 5
You may find that max children is set to 35 and memory limit is commented out and set to 32MB. Sync the above values, save the www.conf file then reboot the fog server. That “should” address the webgui unresponsiveness.
-
I did find one file as you already said. The only thing I needed to change was admin vault memory limit and children. I rebooted the server and still no joy. Some how during the update when I type http://fogserver it does not auto resolve like it use too. Now it gives the apache everything works page. Anyways I type in http://fogserver I get apache website but takes FOREVER. Like 5 secs. It use to take 30 secs. A lot better their. When I try the whole website I get in chrome
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.
-
@jhalbert OK, when you key in
http://<fogserver_ip>/fog
does it give you what you expect? This sounds like a redirect issue, but first try above.Also lets start by collecting a bit more information. What Host OS is your fog server running under?
If (from the fog server linux console) you key in
top
Then press the capital P to sort by CPU usage, what programs are at the top. Do you see apache/httpd or php-fpm? (use the q key to quit the top program when done) -
Website - Http://fogserver/fog (I created a dns record to deal with the ip. For easy lets call it fogserver for the ip address.) Tried fogserver and IP still the same results, page can’t be display and the same for chrome.
OS - Centos 7.5.1804
PHP-FPM - Yes its running
httpd - YES ( I guess in Centos theirs no Apache by itself. only Httpd.)I did a status on all of them
systemctl status
HTtpp - active
Php-fpm - activeCPU usage is .5 avg. <-- super low. normally its like 1.5 to 2.5 during normal working hours.
-
Gives the test page and took forever. about 45 secs.
-
I’ve added the suggested values that seem to work for most cases, of course this doesn’t account for the issue you’re describing @jhalbert, but does address, hopefully, the default set values to something that should be more operationally valid for most other users. From the sounds of things you may have multiple PHP versions installed? Just a thought.
When you say takes forever, is it like php-fpm configuration isn’t taking, or does this happen after a php-fpm restart? What circumstances are replicatable that can make us see the problem and come to a more suitable fix?
-
I ran php -i | grep ‘PHP Version’ and got below.
PHP Version => 5.6.36
PHP Version => 5.6.36I ran find / -name php | grep bin
/usr/bin/php
By the sound of things, I only have one version installed and that is 5.6.36
My setup is
Dell server
1u
3 hard drives with sas, 10 k each
raid 0 for two drives and 1 drive not raided
1 drive not raid is used by the host and that is Centos 7
Using qemu as my vm software
Fog server has 2 gigs of ram and 8 cpus. It has the raided drive. That gives it 600 gb of hard drive space
Followed your install steps found on the web. I did everything suggested.
The very first install is about a year ago and that was 1.4. Minor problems but you have help me with those things
Two weeks ago, I updated to 1.5.4. Like I said I noticed that when I type http://fogserver it would give a test page right after I updated. I had to type out the whole siteStep by step how it happen
The day I posted this, I used 2 groups I already had from 1.4. I change the name to match their new job. change membership to match what I was doing. Click on snapin and added all computers to the snapins. reset everyone between both groups. Waited 30 minutes for the PCS to check in. I found few new host because I used group policy to deploy the new version of fog client. We used powershell script to spot check few pcs. Found all had fog client 11.16. I figure likely the Group policy work and ready to deploy the updates. In my lab I ran all the snap ins on 4 pcs. Confirm the scripts and installers are solid. Took 5 minutes and all 4 pcs had the updates with no issues. Start on groups now. First group I click on snapin and click on adobe reader. Click do it now, back arrow, adobe flash and do it now, remove all version of java and pick do it now, pick java install and do it now. click groups and that is when I noticed the slow down. Last time I did this, that was normal. Click on the next group. Did java remove all version and do it now. That took a while. Like 4 minutes. Next try the adobe reader but I kept getting page can’t be displayed. I figure to wait. I started to get calls that Java is not working. I told them to wait, java will install. Waited 10 minutes and few more calls that their pcs got restarted and whole departments got that. java was removed on all those pcs. rest of the task where never done.I tried to restart the server and that did not help
I tried to restart php, http, do what George told me to do.If you want, we can do a screen share and we can ssh into the server and maybe we can figure this out?
-
I think I found the problem. The fog website is super fast now. I am not sure how it happen but SELinux was set to enforcing. I figure this out when I ssh to the server I got the warning that SELinux has disable small part of PHP. I am like WHAT!!! When run the command to put it out of enforcing it appears it took it without issue. ran it as root. rebooted and check the conf file and found it say enforcing. I changed it in the conf file. rebooted. Try fog and it came up in seconds. I check the conf file. I saw it was set to enforcing right before I sent out the group command. The uninstall java script has at the end to say NO REBOOT. I ran the command with the no reboot command and it did reboot the pc on all 4 pcs. I think however this happen the script was not allowed to complete. The task for java uninstall are not pending on any pc. I see yesterday around 11 am right after George gave me those things, tasks started to work. All the flash have stop working and all PCS are not going. I think that might have happen because I restarted the server so many times. I am going to kill all the tasks and try a small amount. See what happens.
I still have to type http://fogserver/fog and not just http://fogserver. Maybe you guys can help me fix that or should I create a different post for that?
-
@jhalbert Wow I’m glad you found that. selinux being enabled was the last thing I would have thought of. The php-fpm settings are the right settings still. You shouldn’t have issues imaging now (unless the firewall re-enabled itself again too).
-
@george1421 I left it all alone. I did ran my test and everything deployed just fine. Tried on 20 pcs and everything work just like it was during testing. How do I close? Can you close or how can I?