FOG Server CPU usage 100%

Fernando Gietz

I have captured the browser behaviour with firefox developer tools. Maybe the developer team can see anything.

You can download the JSON file from:

https://ehubox.ehu.eus/index.php/s/y4N08G4dJGVg03R

Fernando Gietz

I have made a new capture of the performance of the browser with the firefox developer tools and I can see the next:

The most of the time the browser calls to setTimeout function

Sebastian Roth

Probably it’s best to wait for @Tom-Elliott to look into this. He knows the web UI better than anyone else!

Fernando Gietz

Hello!!
Any news about it?

Tom Elliott

Mind running the clear memory cache on the server? It would seem, to me, that the setTimeout is being blocked somewhere, as it shouldn’t be having any issues run as it only calls it when the timeout is met.

Fernando Gietz

@tom-elliott said in FOG Server CPU usage 100%:

Mind running the clear memory cache on the server?

I try to clear the memory cache on the server with the next command:

sync; echo 3 > /proc/sys/vm/drop_caches

But the problem persists!!

Any idea more?

Fernando Gietz

One more thing.

My hosts table has 7000 entries, I think that the problem is the code Maybe is not optimized.
I drop from the table 6000 entries and now the response time is more normal … 10 -15 seconds.

@Tom-Elliott how does FOG the membership?

Wayne Workman

@fernando-gietz I was thinking about this - last time someone had this issue, it was the mysql max connections being hit.

Maybe one of the FOG updates or an OS update (especially likely with Ubuntu systems) wiped out our max connection setting.

Anyways, heres what might fix it:
https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_MySQL#Increase_maximum_simultaneous_MySQL_connections

Make sure to do the query in mysql AND to change the conf setting too so the change persists reboots!

Wayne Workman

@fernando-gietz said in FOG Server CPU usage 100%:

My hosts table has 7000 entries

Admittedly, this is a whole whole lot of hosts reporting to one server.

At a past job, we had in one setup maybe 2k hosts going to one fog server, that FOG Server has 8 cores, 8GB of RAM. It’s a pretty beefy server but it did a ton of work, and managed a ton of Technicians and a ton of images, and a ton of imaging and snapins and storage nodes. That box was Really busy, and that’s with 2K hosts… You say you have 7,000 hosts… I’d say you need a 16 core server, and configure MySQL to use more than one thread, use solid state disks to host the database, and assign 16GB of RAM - and never image from that server, make storage nodes handle the imaging loads - and keep those loads off of your main server. If you wanted to go even further, you could host the FOG Database on it’s own (very fast) server, and the rest of the main fog server on another box.

Fernando Gietz

After UEFI issue I have time to this XD

@wayne-workman said in FOG Server CPU usage 100%:

@fernando-gietz I was thinking about this - last time someone had this issue, it was the mysql max connections being hit.

Maybe one of the FOG updates or an OS update (especially likely with Ubuntu systems) wiped out our max connection setting.

Anyways, heres what might fix it:
https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_MySQL#Increase_maximum_simultaneous_MySQL_connections

Make sure to do the query in mysql AND to change the conf setting too so the change persists reboots!

I have increased the connections and the trouble follows
I have setup the mysql to log the slow queries and not appears nothing.

The problem is the apache and his activity, it reaches to 100% of CPU. If you see the capture of the browser activity, the problem begins when the browser tries to “Dibujar” (In english, Draw). I think that the browser must to draw all elements, visible and hidden ones.
If you see the membership page, you can see the list of the members, but the browser doesn’t show the members that are not in the group, to see them you need check the checkbox; then, appears all of them. This list is not create on-the-fly, the browser “drew” it before. In my case, the browser need to draw 7000 elements

Tom Elliott

@fernando-gietz The problem, as I’m understanding it, however, is on the Server, not the client machine. The “Drawing” should only be happening on the Browser, so the setTimeout is doing something a little funny, though I’m not 100% sure what.

Tom Elliott

Mind checking if the Location Plugin has a bunch of erroneous entries in it? I’ve seen/heard of people having a real hard time with CPU because the mysql database is just crammed full of data. The last time I saw it, there were a bunch (10’s of thousands) of entries in one of the tables. @Wayne-Workman May remember better than me. He sort of replicated this problem by accidentally uploading a binary data file (like teamviewer installer) into the CSV importer. This spammed the Database with TONS of data, and each request to the database would require a rescan on the HDD causing the CPU load being extremely high.

Sebastian Roth

Possibly it’s just that apache hits a wall with too many client requests and too little worker threads?

Tom Elliott

@sebastian-roth It’s possible, but less likely. While a ton of machines hitting the system at the same time can cause issues, I think the communication is so minuscule that I’m going to chalk this up to constant reloading of the Database, which is likely in a HUGE state.

@Fernando-Gietz You could try testing this theory by getting a backup of the current database, then clearing out the larger table sets, taskLog, historyLog, snapinJobs, imagingLog, etc…

Things that are “logging” or have no real value in being extremely large.

Wayne Workman

@tom-elliott said in FOG Server CPU usage 100%:

He sort of replicated this problem by accidentally uploading a binary data file (like teamviewer installer) into the CSV importer.

But you fixed that problem by verifying that the CSV uploaded was an actual CSV.

I would recommend cleaning out the DB though. Just truncate the history table, the tasks table, etc.

Tom Elliott

@wayne-workman I know what was fixed, but I also know WHY it was causing issues, it wasn’t the “bad” data persay, but rather the “amount” of bad data in the database.

Sifting through 5000 shreds of paper (let’s say a sheet of paper is 1000 shreds each) is much harder than sifting through 5 pieces of paper, right?

So size may not be the issue, rather the quantity held within the size.

Fernando Gietz

@wayne-workman The database size is not very big

+------------------------+---------------+---------------+
| Table Name             | Quant of Rows | Total Size Kb |
+------------------------+---------------+---------------+
| LDAPServers            |             1 |         14.32 |
| clientUpdates          |             0 |          4.00 |
| dirCleaner             |             0 |          1.00 |
| globalSettings         |           180 |         73.30 |
| greenFog               |             0 |          1.00 |
| groupMembers           |          6881 |        546.79 |
| groups                 |           313 |         53.92 |
| history                |          4268 |       1011.34 |
| hookEvents             |           277 |         31.68 |
| hostAutoLogOut         |           502 |         34.20 |
| hostMAC                |          7113 |        826.06 |
| hostScreenSettings     |           502 |         38.18 |
| hosts                  |          7008 |       1088.59 |
| imageGroupAssoc        |           279 |         18.03 |
| imagePartitionTypes    |            12 |          3.34 |
| imageTypes             |             4 |          3.22 |
| images                 |           279 |         72.12 |
| imagingLog             |          2537 |        273.77 |
| inventory              |           618 |        257.35 |
| ipxeTable              |          2097 |        171.71 |
| keySequence            |            35 |          2.79 |
| moduleStatusByHost     |          4932 |        403.68 |
| modules                |            13 |          5.66 |
| multicastSessions      |           181 |         26.43 |
| multicastSessionsAssoc |          2130 |        154.86 |
| nfsFailures            |             0 |          1.00 |
| nfsGroupMembers        |             1 |         20.22 |
| nfsGroups              |             1 |         13.05 |
| notifyEvents           |             5 |          9.14 |
| os                     |            12 |          3.26 |
+------------------------+---------------+---------------+

This server have 7000 entries in the hosts table but only is used by 400-500 computers

Fernando Gietz

In the develope environment I have other server, in this one I deleted 6500 entries and It works better.

+------------------------+---------------+---------------+
| Table Name             | Quant of Rows | Total Size Kb |
+------------------------+---------------+---------------+
| LDAPServers            |             1 |         14.32 |
| clientUpdates          |             0 |          1.00 |
| dirCleaner             |             0 |          1.00 |
| globalSettings         |           183 |         73.87 |
| greenFog               |             0 |          1.00 |
| groupMembers           |           279 |        534.84 |
| groups                 |            28 |         48.79 |
| history                |         25895 |       5894.44 |
| hookEvents             |           278 |         31.72 |
| hostAutoLogOut         |           108 |         12.11 |
| hostMAC                |          6938 |        787.91 |
| hostScreenSettings     |           109 |         13.51 |
| hosts                  |           480 |       1036.53 |
| imageGroupAssoc        |           292 |         18.15 |
| imagePartitionTypes    |            12 |          3.34 |
| imageTypes             |             4 |          3.22 |
| images                 |           292 |         71.45 |
| imagingLog             |          1735 |        165.28 |
| inventory              |           259 |         99.80 |
| ipxeTable              |          1620 |        123.98 |
| keySequence            |            35 |          2.79 |
| moduleStatusByHost     |          6097 |        489.08 |
| modules                |            13 |          5.66 |
| multicastSessions      |          1208 |        138.77 |
| multicastSessionsAssoc |         24929 |       1417.48 |
| nfsFailures            |             0 |          1.00 |
| nfsGroupMembers        |             1 |         12.18 |
| nfsGroups              |             1 |          7.05 |
| notifyEvents           |             5 |          9.14 |
| os                     |            12 |          3.26 |
+------------------------+---------------+---------------+

Wayne Workman

@fernando-gietz said in FOG Server CPU usage 100%:

The problem is the apache and his activity, it reaches to 100% of CPU.

Are you using image replication?

Fernando Gietz

@wayne-workman No. I have only one server and one node. But the service is running.

FOG Server CPU usage 100%

139

12.6k

17.5k

156.3k