Not sure if its a bug or a feature, high FOG server CPU on dashboard
-
So I’d ask if there is any way possible to send the bandwidth info to the client and then let the client draw/render the graph instead of the server. Or is my understanding not accurate with how it works presently?
-
@Wayne-Workman I don’t understand what you mean?
If you make the client do the rendering as you describe, you’re not testing the server, you’re testing the client’s bandwidth.
The way it works, the client does do all the rendering already, but it needs to get the data from the server.
So the first request just starts the read, but it knows nothing at that point. The second request (1 second later) is how it determines the bandwidth (and of course all the subsequent stuff).
-
@Tom-Elliott So how does that process you just described use so much CPU ?
-
Ever seconds there’s two polling requests to the server.
With multiple pages (let’s just say 2 tabs for now) you have 2 more polling requests to the server.
So add 3 you are up to 6 polling requests, add 3 more, you’re at 12 polling requests.
The requests are processed by the server, and every second there’s the number of tabs open being opened.
It’s just getting DDOS’d at that point. More tabs, equals more polling each time.
There isn’t a simple way to handle it, though I could just make the bandwidth a selectable element.
Now mind you, I also do the same type of checks for the client count. So really it’s being hit with 4 polling requests per second per tab. this is why things can get lost in action.
-
This is only pure speculation here. But looking in the http access log, as I posted below this page is being requested every 1 second.
POST /fog/management/index.php?node=home
What would happen if it takes longer than 1 second to render the page, before the next post request comes in? Would increasing this page refresh to 10 seconds have any negative impact? I tried looking for a meta refresh tag in the page but it looks like the page is being refreshed with javascript. I got lost tracing the source of the page refresh. Tom. without digging to deep in your memory, do you know where this page refresh request is coming from so I can change the refresh to 10 seconds to see if that has any negative impacct?
-
@george1421 The request is indeed being handled by jquery. It’s a literal timeout (setTimeout(<function>,1000)) in /var/www/fog/management/js/fog/fog.dashboard.js.
The function that calls the “refreshes” are (as you pointed out) UpdateClientCount and UpdateBandwidth. The $.ajax functions of those are the callers and the timeouts are performed at the complete sub functions.
Updating to 10 seconds will have a negative impact, not in the sense of polling, but the actual bandwidth determinations are calculated based on a 1 second interval. Also, the limiters (2 minutes, 10 minutes, 30 minutes, and 1 hour) are set based on 1 second intervals.
So to properly update to 10 second refresh rates, (particularly for bandwidth) we would need to recalculate the timings. For example, the 1 second 2 minute limiter is 120, at 10 seconds, it’s 12. So not difficult, just a bit of coding changes.
I’ll make those edits and hopefully things will be a bit better.
-
My intent wasn’t to have you change the official image. I was looking to test to see if it was a positive move or not. This would avoid mucking of something that is working.
But in the back of my head, even for dynamic bandwidth management a 10 second refresh is more that adequate. While its “nice to know” information, a 1 second update cycle is very fast. And what will the tech do with that information if bandwidth was to spike to 100MB/s for 3 seconds. Understand this is just my distorted view, but since this is only FYI info, lets not tax the system too much where it could dedicate those CPU resources to actually pushing the image or managing the client.
-
And I’m done. I added the code to make it functional and accurate for a 10 second span.
-
@Tom-Elliott Well you either fixed the issue or really broke it.
Actually watching top I see the http process pop up in the list but never over 0.4% of the CPU. And it is only a single process that is showing up. So unless something else is impacted by this change, it looks like you nailed it.
-
I consider this issue resolved. Thank you!!
-
Cool. Thanks for the report and hopefully this helps others. Though I still don’t think 1 second was an issue, as most (I imagine) don’t just load the fog management page and leave with it sitting on the dashboard.
-
The only reason I visit the home page is for two things - the storage pie chart and bandwidth. That’s just me though.
I’m glad to see this improvement made - it’ll make FOG work a lot better on older lower powered systems.