CPU maxed when Storage nodes go down



  • I have a 2 fold issue.

    1st is if one of my storage nodes goes down or cant connect to the server my server cpu maxes out. im not sure why and its a bit frustrating because of my next issue:
    My fog server says it cant connect to 2 of my storage nodes even though they are up, i can SSH into them, and i have restated them and the server. but the frustrating part is i cant really dig into the logs because the web GUI is mostly unresponsive.

    im running
    Fog 1.5
    Cent OS 7


  • Developer

    @jherron The information is split in two different tables. Host information and MAC information. So you need to commands to clean that host’s information from the database:

    mysql> DELETE FROM hostMAC WHERE hmHostID = 1272;
    ...
    mysql> DELETE FROM hosts WHERE hostID = 1272;
    ...
    

    But I still wonder why we only see one entry while FOG seems to find duplicates. Would you be keen enough to add some more error output to the PHP code so we see which MAC it sees as duplicated? If yes, I should be able to help you with the code. Let me know.



  • @Sebastian-Roth
    apparently i did not put the mac in properly. after hacing unintsalled the one who ip address was in the error log ther is still one in the DB

    Welcome to the MariaDB monitor.  Commands end with ; or \g.
    Your MariaDB connection id is 14046881
    Server version: 5.5.56-MariaDB MariaDB Server
    
    Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    MariaDB [(none)]> use fog;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    MariaDB [fog]> SELECT * FROM hostMAC WHERE hmMAC LIKE '%C8:D7:19:C2:E3:24%';
    +------+----------+-------------------+--------+-----------+-----------+----------------+-----------------+
    | hmID | hmHostID | hmMAC             | hmDesc | hmPrimary | hmPending | hmIgnoreClient | hmIgnoreImaging |
    +------+----------+-------------------+--------+-----------+-----------+----------------+-----------------+
    | 1317 |     1272 | c8:d7:19:c2:e3:24 |        | 1         |           |                |                 |
    +------+----------+-------------------+--------+-----------+-----------+----------------+-----------------+
    1 row in set (0.01 sec)
    
    MariaDB [fog]>
    

    how would i go removing it from the DB? quickly looking through i might have others as well


  • Developer

    @jherron said:

    SELECT * FROM hostMAC WHERE hmMAC LIKE ‘%aa:bb:cc:dd:ee:ff%’;

    Did you actually put in the client’s MAC address here?

    am i just not throwing enough resources at this or is something fundamentally wrong with my setup?

    Well it’s like asking my car is not driving properly, what is wrong with it. We cannot answer but just guess if we don’t know much about your setup yet. One thing that I could think of is many clients (FOG client installed) check into the FOG server and if there are many clients it can cause that kind of issue.

    You might want to set FOG_CLIENT_CHECKIN_TIME (web UI -> FOG Configuration -> FOG Settings -> FOG Client).

    As well see if the mysql max_connection setting is back to 151. As I wrote in this thread the setting is lost when you restart mysql or the whole server [ref].

    PS: I still think there might be something wrong with the DB on top of the load.



  • @Sebastian-Roth said in CPU maxed when Storage nodes go down:

    SELECT * FROM hostMAC WHERE hmMAC LIKE ‘%aa:bb:cc:dd:ee:ff%’;

    i ran the command and it comes up empty set again. the only thing different between now and then was i had uninstalled the client on that machine. its still maxed out utilization though.

    i have also noticed that during the hours when no one is in CPU utilization is really low. am i just not throwing enough resources at this or is something fundamentally wrong with my setup?


  • Developer

    @jherron Hmmm, so let’s look at it form a different angle. In the logs we always see the same client causing this messages: 10.0.60.38
    I guess this host is properly registered. Can you please find this host’s MAC address and see if you can find that in the DB (put in where you see the aa:bb:cc:dd:ee:ff - but leaving % in place):

    mysql> use fog;
    
    mysql> SELECT * FROM hostMAC WHERE hmMAC LIKE '%aa:bb:cc:dd:ee:ff%';
    +------+----------+-------------------+--------+-----------+-----------+----------------+-----------------+
    | hmID | hmHostID | hmMAC             | hmDesc | hmPrimary | hmPending | hmIgnoreClient | hmIgnoreImaging |
    +------+----------+-------------------+--------+-----------+-----------+----------------+-----------------+
    |    2 |        2 | aa:bb:cc:dd:ee:ff |        | 1         |           |                |                 |
    +------+----------+-------------------+--------+-----------+-----------+----------------+-----------------+
    mysql> SELECT * FROM hostMAC WHERE hmHostID = 2;
    ...
    

    Just trying to gather more information about this host that seems to be causing the messages (and possibly the load).



  • [btsdit@localhost ~]$ mysql -u root
    Welcome to the MariaDB monitor.  Commands end with ; or \g.
    Your MariaDB connection id is 1831044
    Server version: 5.5.56-MariaDB MariaDB Server
    
    Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    MariaDB [(none)]> use fog;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    MariaDB [fog]> desc hostMAC;
    +-----------------+---------------+------+-----+---------+----------------+
    | Field           | Type          | Null | Key | Default | Extra          |
    +-----------------+---------------+------+-----+---------+----------------+
    | hmID            | int(11)       | NO   | PRI | NULL    | auto_increment |
    | hmHostID        | int(11)       | NO   | MUL | NULL    |                |
    | hmMAC           | varchar(59)   | NO   | UNI | NULL    |                |
    | hmDesc          | longtext      | NO   |     | NULL    |                |
    | hmPrimary       | enum('0','1') | NO   |     | NULL    |                |
    | hmPending       | enum('0','1') | NO   |     | NULL    |                |
    | hmIgnoreClient  | enum('0','1') | NO   |     | NULL    |                |
    | hmIgnoreImaging | enum('0','1') | NO   |     | NULL    |                |
    +-----------------+---------------+------+-----+---------+----------------+
    8 rows in set (0.03 sec)
    
    MariaDB [fog]> SELECT COUNT(hmMAC),hmID,hmHostID,hmMAC from hostMAC GROUP BY hmMAC HAVING COUNT(hmMAC)>1;
    Empty set (0.03 sec)
    
    MariaDB [fog]> quit
    

    Bye


  • Developer

    @jherron Great you posted that log. The error message Error multiple hosts returned for list of mac addresses points us to the fact that you have several hosts in the DB with the same MAC address. Let’s see if we can figure out which ones are duplicate:

    shell> mysql -u root -p
    Password:
    ...
    mysql> use fog;
    ...
    mysql> desc hostMAC;
    ...
    mysql> SELECT COUNT(hmMAC),hmID,hmHostID,hmMAC from hostMAC GROUP BY hmMAC HAVING COUNT(hmMAC)>1;
    ...
    mysql> quit
    

    Please copy the full text output and post in the forums so we can have a look. Hope I got the queries all right as I can’t actually replicate the duplicate issue.



  • 0_1537223105054_new_error_log.txt

    apparently it was too large so i cut it down to this. but it should get the messge across.


  • Developer

    @jherron Picture was not uploaded properly. You need to wait till you actually see the picture in the preview on the right.



  • UPDATE
    i was able to throw another vcpu at the box and got it to less than 100% utilization. looks like all the dbs connected. i can see anything wrong with the main server but i do have a BUNCH of apache errors that i do not understand.[0_1537200874571_error_log](Uploading 100%)


 

519
Online

41.9k
Users

12.4k
Topics

116.7k
Posts