Tons of httpd processes
-
I made this script to monitor the httpd processes. Problem was, I can’t easily figure out where the issue is when I’m going through thousands of lines of text that all looks the same to me. I needed a way to make it more visual than just text. I modified my script to include color to represent severity of the problem. It highlights the text based on the number of httpd processes returned.
So… if it returned 100 processes, the text would be highlighted bright red.
Updated script:
#Get the date. dt="$(date +"%I:%M %p %m-%d-%Y")" #Get number of running httpd instances. x=$( /usr/bin/ps -ef | /usr/bin/grep httpd | /usr/bin/wc -l ) color=$x # Seriousness multiplier let color*=5 #Don't let it go over the maximum. if [[ $color -gt 255 ]]; then color=255 fi # Convert to hex. hexR=$(printf '%x\n' $color) #Print the ine to the file. echo '<p style="color: #66ccff; background-color: #'$hexR'0000">'$x' httpd instances running. '$dt'</p>' | cat - /var/www/html/httpd.html > /var/www/html/temp && mv /var/www/html/temp /var/www/html/httpd.html chown apache:apache /var/www/html/httpd.html
Here’s my
crontab -e
entry for it, this makes it run every minute.* * * * * /root/monitor.sh
Initial html file create code for Fedora/CentOS:
rm -f /var/www/html/httpd.html;touch /var/www/html/httpd.html;chown apache:apache /var/www/html/httpd.html
Sample output is below. I artificially manipulated the last return value in the script just to demonstrate color difference
-
Yesterday was a snow day here. So no users, but some computers did WOL at 7:30. There was no effect to the number of httpd processes.
I’ve noticed small spikes during times that I would presume large numbers of people are logging into computers all at generally the same time.
But I’ve seen nothing even close to the 127 reported a week ago.
So far, the highest I’ve seen since I’ve made this monitoring tool is 36, and that was when I was updating FOG this morning, oddly.
-
Ok so I have found an instance of where the httpd processes gets out of control.
Interestingly, right before that, this happened:
Looking at the results… it’s possible that at almost 8:00 AM on Monday, a lot of computers were turned on all at once. But what doesn’t make sense to me is how I wouldn’t see this sort of spike every morning.
-
The phenomena did not happen this morning… @Moderators @Developers Thoughts?
-
Just happened again, happened just after I snapshotted the VM, and updated fog.
Holy cow 220 httpd processes…
-
@Jbob I’m blaming this on the new fog client.
Right when the fog server updates or reboots or has a snapshot taken, those things take time and the Apache service goes down or the server becomes unresponsive during these brief moments.
I think that when the fog client cannot communicate with the server, something happens that causes them to rapidly try over and over. can you please look into it?
-
Looking through the logs, this happened shortly after yesterday’s episode, just 11 minutes later roughly.
-
I’ve set my client check-in time to two minutes.
This was right after a snapshot was taken in Hyper-V:
This was immediately after @Jbob got done with tinkering with SELinux:
-
@Jbob does the new client do anything in the event of the host shutting down?
-
I’ve made some headway on this particular issue. I’ve realized that a few clients have an unusually large number of connections to the server.
I’ve modified my
monitor.sh
script to be the following:netstat=/usr/bin/netstat #Get the date. dt="$(date +"%I:%M %p %m-%d-%Y")" #Get number of running httpd instances. x=$( /usr/bin/ps -ef | /usr/bin/grep httpd | /usr/bin/wc -l ) color=$x # Seriousness multiplier let color*=5 #Don't let it go over the maximum. if [[ $color -gt 255 ]]; then color=255 fi # Convert to hex. hexR=$(printf '%x\n' $color) topOffenders=$(netstat -tn 2>/dev/null | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr | head -n 4) echo '<p>Top Offenders: '$topOffenders'</p>' | cat - /var/www/html/httpd.html > /var/www/html/temp && mv /var/www/html/temp /var/www/html/httpd.html httpCount=$(netstat | grep http | wc -l) echo '<p>Number of http connections: '$httpCount'</p>' | cat - /var/www/html/httpd.html > /var/www/html/temp && mv /var/www/html/temp /var/www/html/httpd.html #Print the ine to the file. echo '<p style="color: #66ccff; background-color: #'$hexR'0000">'$x' httpd instances running. '$dt'</p>' | cat - /var/www/html/httpd.html > /var/www/html/temp && mv /var/www/html/temp /var/www/html/httpd.html chown apache:apache /var/www/html/httpd.html
The output now looks like below.
The “Top offenders” area is number of connections from a host, and then the host’s IP, all space delimited. I limited it to 4 IPs.
Interestingly enough, the FOG server has around 50 connections to itself.
-
This is after a server reboot at 8:33 this morning.
-
Fun fun. This was during a fog update I believe.
12 httpd instances running. 01:34 PM 02-02-2016 Number of http connections: 2004 Top Offenders: 40 10.2.3.11 25 10.2.3.108 20 10.2.4.9 20 10.2.4.36 34 httpd instances running. 01:33 PM 02-02-2016 Number of http connections: 2245 Top Offenders: 25 10.2.3.218 23 10.2.3.157 22 10.2.3.180 22 10.2.3.108 94 httpd instances running. 01:32 PM 02-02-2016 Number of http connections: 1462 Top Offenders: 73 10.2.3.11 23 10.2.3.178 22 10.2.3.156 20 10.2.4.47 155 httpd instances running. 01:31 PM 02-02-2016 Number of http connections: 2196 Top Offenders: 47 10.2.3.11 19 10.2.3.92 19 10.2.3.162 18 10.2.4.45 69 httpd instances running. 01:30 PM 02-02-2016 Number of http connections: 386 Top Offenders: 16 10.2.3.178 15 10.2.3.218 13 10.2.3.197 13 10.2.3.153 1 httpd instances running. 01:29 PM 02-02-2016 Number of http connections: 930 Top Offenders: 20 10.2.3.222 17 10.2.4.38 17 10.2.4.1 16 10.2.3.89 48 httpd instances running. 01:28 PM 02-02-2016 Number of http connections: 1753 Top Offenders: 20 10.2.3.109 18 10.2.4.47 18 10.2.3.242 18 10.2.3.182 32 httpd instances running. 01:27 PM 02-02-2016 Number of http connections: 1472 Top Offenders: 24 10.2.3.108 23 10.2.4.40 22 10.2.3.204 20 10.2.4.36 76 httpd instances running. 01:26 PM 02-02-2016 Number of http connections: 1583 Top Offenders: 20 10.2.3.13 19 10.2.3.92 19 10.2.3.14 18 10.2.3.163 112 httpd instances running. 01:25 PM 02-02-2016 Number of http connections: 1478 Top Offenders: 20 10.2.3.218 19 10.2.3.93 18 10.2.4.47 18 10.2.3.7 115 httpd instances running. 01:24 PM 02-02-2016 Number of http connections: 1251 Top Offenders: 18 10.2.3.14 17 10.2.3.194 17 10.2.3.184 16 10.2.3.19 57 httpd instances running. 01:23 PM 02-02-2016 Number of http connections: 1762 Top Offenders: 23 10.2.3.205 21 10.2.3.108 20 10.2.3.221 20 10.2.3.180 117 httpd instances running. 01:22 PM 02-02-2016 Number of http connections: 1556 Top Offenders: 23 10.2.3.105 22 10.2.3.93 21 10.2.3.87 21 10.2.3.216 177 httpd instances running. 01:21 PM 02-02-2016 Number of http connections: 1779 Top Offenders: 21 10.2.32.235 20 10.2.3.160 19 10.2.3.218 18 10.2.4.1 203 httpd instances running. 01:20 PM 02-02-2016 Number of http connections: 1295 Top Offenders: 13 10.2.3.108 12 10.2.3.19 12 10.2.3.185 11 10.2.4.94 15 httpd instances running. 01:19 PM 02-02-2016 Number of http connections: 322 Top Offenders: 18 10.2.3.143 16 10.2.3.27 16 10.2.3.165 16 10.2.3.110 17 httpd instances running. 01:18 PM 02-02-2016 Number of http connections: 2010 Top Offenders: 54 10.2.3.11 23 10.2.3.152 21 10.2.3.163 20 10.2.4.40
-
So I figured out that 10.2.3.119 was consistently having high http connection numbers. Turns out, that’s my desktop. Which makes sense because I usually am logged into FOG’s web management interface while I’m looking at these numbers.
The spikes in httpd instances is for sure a buildup of clients wanting to connect to the FOG server - due to me either updating the fog server, snapshotting the fog server, or rebooting the fog server. It seems like if there is a failure in communication between fog clients and the fog server, clients start spamming the fog server. @Jbob
I’ll confirm or deny this is true with wireshark. I’ll monitor average traffic flow for http while the fog server is on, and average traffic flow for http when I turn off the fog server. If the traffic goes wild while the server is off, that’s a problem.
More on this later.
-
So - just ran some basic tests.
Using Wireshark with a capture filter for just
tcp port 80
, I did two captures. Each capture lasted exactly 300 seconds.Capture 1 - baseline - Packets sent to FOG Server on port 80: 61,933
Capture 2 - http blocked - Packets sent to FOG Server on port 80: 9,771
So my theory was wrong, the new client does not start spamming if the FOG server is offline. It appears that when the FOG Client realizes the FOG server is back online – all the encryption/communications all come all at once - and it’s just A LOT for my server to deal with.
-
@Wayne-Workman Do we need to further look into this!?
-
@Sebastian-Roth I’m not having load issues anymore, the processor was under high utilization because Fedora 23 has issues inside of Hyper-V and the polling was causing too much of a load.
But Tom and I think jbob have begun working on limiting the polling down to just 1 per checkup instead of many.
-
I think I can make this tool into a plugin… just a thought…