[dev-branch] multicast: for some hosts DB not updated after restore



  • See this post.
    Apache logs don’t contain anything of note.
    PHP-FPM log during (or shortly after) multicast restore sessions sometimes contains these warnings:

    [04-Jan-2020 16:38:01] NOTICE: [pool www] child 29241 started
    [05-Jan-2020 02:54:37] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 17 total children
    [05-Jan-2020 02:54:38] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 22 total children
    ...
    [25-Jan-2020 18:00:58] NOTICE: [pool www] child 9916 started
    [25-Jan-2020 18:54:59] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 17 total children
    [25-Jan-2020 18:55:00] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 22 total children
    [25-Jan-2020 18:55:01] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 27 total children
    

    Output of egrep '^pm\.(start|min|max)' /etc/php-fpm.d/www.conf

    pm.max_children = 50
    pm.start_servers = 5
    pm.min_spare_servers = 5
    pm.max_spare_servers = 10
    pm.max_requests = 2000
    

    Output of free -h on FOG server

                  total        used        free      shared  buff/cache   available
    Mem:           3,9G        362M        372M        8,6M        3,2G        3,5G
    Swap:          4,0G          0B        4,0G
    

    Output of lscpu | egrep '^Core|^Socket'

    Core(s) per socket:    2
    Socket(s):             1
    

    Output of ps --no-headers -o rss,cmd -C php-fpm|awk '{sum+=$1}END{print sum/NR/1024"M"}'

    21.2727M
    

    Output of mysql -u root fog -e 'select l.taskID,s.tsName from taskLog as l,taskStates as s where l.taskStateID=s.tsID and l.id between 3282 and 3348 order by l.taskID'

    +--------+-------------+
    | taskID | tsName      |
    +--------+-------------+
    | 1701   | In-Progress |
    | 1701   | Complete    |
    | 1702   | In-Progress |
    | 1702   | Complete    |
    | 1703   | In-Progress |
    | 1703   | Complete    |
    | 1704   | In-Progress |
    | 1704   | Complete    |
    | 1705   | In-Progress |
    | 1706   | In-Progress |
    | 1706   | Complete    |
    | 1707   | In-Progress |
    | 1707   | Complete    |
    | 1708   | In-Progress |
    | 1708   | Complete    |
    | 1709   | In-Progress |
    | 1709   | Complete    |
    | 1710   | In-Progress |
    | 1710   | Complete    |
    | 1711   | In-Progress |
    | 1711   | Complete    |
    | 1712   | In-Progress |
    | 1712   | Complete    |
    | 1713   | In-Progress |
    | 1713   | Complete    |
    | 1714   | In-Progress |
    | 1714   | Complete    |
    | 1715   | In-Progress |
    | 1715   | Complete    |
    | 1716   | In-Progress |
    | 1716   | Complete    |
    | 1717   | In-Progress |
    | 1717   | Complete    |
    | 1718   | In-Progress |
    | 1718   | Complete    |
    | 1719   | In-Progress |
    | 1719   | Complete    |
    | 1720   | In-Progress |
    | 1721   | In-Progress |
    | 1721   | Complete    |
    | 1722   | In-Progress |
    | 1722   | Complete    |
    | 1723   | In-Progress |
    | 1723   | Complete    |
    | 1724   | In-Progress |
    | 1724   | Complete    |
    | 1725   | In-Progress |
    | 1725   | Complete    |
    | 1726   | In-Progress |
    | 1726   | Complete    |
    | 1727   | In-Progress |
    | 1727   | Complete    |
    | 1728   | In-Progress |
    | 1728   | Complete    |
    | 1729   | In-Progress |
    | 1730   | In-Progress |
    | 1730   | Complete    |
    | 1731   | In-Progress |
    | 1732   | In-Progress |
    | 1732   | Complete    |
    | 1733   | In-Progress |
    | 1733   | Complete    |
    | 1734   | In-Progress |
    | 1735   | In-Progress |
    | 1736   | In-Progress |
    | 1736   | Complete    |
    +--------+-------------+
    


  • @george1421 said in [dev-branch] multicast: for some hosts DB not updated after restore:

    @shruggy How many systems do you typically image at the same time with multicast? How much memory do you have on the fog server?

    Usually, it’s 36 systems at once. The setup is similar to @tec618’s: FOG on a VM with 4GB RAM, but both the VM and the hosting server run CentOS 7, and it’s Xen, not KVM. PHP 7.3 from Remi’s repo.



  • Ok, I will follow the @shruggy’s guidance and tomorrow I will tell you the results.

    In any case, comment that the fog server is a virtual machine with ubuntu 18.4 and 4Gb RAM. The main server has the latest version of CENTOS 7 installed and virtualizes with kvm


  • Moderator

    @tec618 Can you follow Shruggy’s guidance. Update the www.conf file (the location will be some place under /etc (hint: find /etc -name www.conf ) and change the pm to static pm = static and set pm.max_children = 50 . Save the file and then issue a sudo systemctl restart php-fpm to restart the php-fpm service.

    We will need to watch the available ram on your system since each pm client will consume a bit of ram memory.


  • Moderator

    @shruggy I’m interested in this issue. How many systems do you typically image at the same time with multicast? How much memory do you have on the fog server?



  • Hi.
    In our case the same thing is happening (with 30 PCs with the same hardware and the fog server mounted on ubuntu 18.04). When multicast with 12 pcs, on some hosts I received this error message after restoring the image: “Trying to update the database: Failed”, and in the database (imagingLog table) it does not record the end time of the deployment

    The Apache logs contain nothing of note and the PHP-FPM log contains no warnings. What can happen in our case?

    Thanks in advance



  • @Sebastian-Roth You can mark this as solved now. I didn’t go with the adjustments you suggested, though: just wanted to try first the configuration suggested at https://www.sitepoint.com/php-fpm-tuning-using-pm-static-max-performance and it worked.

    Here is an excerpt from my current /etc/php-fpm.d/www.conf (the changed lines are the first two and the last):

    pm = static
    pm.max_children = 40
    pm.start_servers = 5
    pm.min_spare_servers = 5
    pm.max_spare_servers = 35
    pm.max_requests = 500
    

    I have a pool of 38 identical hosts.


  • Senior Developer

    @shruggy said in [dev-branch] multicast: for some hosts DB not updated after restore:

    WARNING: [pool www] seems busy (you may need to increase pm.start_servers

    How many hosts do you have with fog-client installed? From those logs I would assume you have a lot.

    I would try adjusting /etc/php-fpm.d/www.conf to:

    pm.max_children = 100
    pm.start_servers = 10
    pm.min_spare_servers = 10
    pm.max_spare_servers = 20
    pm.max_requests = 2000
    

    Don’t forget to restart php-fpm after adjustment.

    As well you might want to increase the fog-client checkin time (FOG web UI -> FOG Configuration -> FOG Settings -> …)



  • @shruggy said in 1.5.7.89: partclone doesn't capture an image in dd mode: wrong options in fog.upload:

    After the coming Microsoft Patch Day (probably over the next weekend) I am planning to capture another disk image with this and deploy it to my pool in multi-cast mode.

    I did it last weekend and the results are mixed. Yes, the image was successfully captured and then restored to 36 PCs in multi-cast. But: On five hosts I got this error message after restoring the image:

    Reattempting to update database: Failed

    The image was restored successfully on those hosts nevertheless. Only the FOG database wasn’t updated. All 36 PCs are identical hardware.

    In the Imaging Log the End column for those five hosts says:

    -0001-11-30 00:00:00

    while the Duration column says:

    2020 years 1 month 18 days 15 hours 35 minutes 43 seconds

    It looks like somehow the data for Start timestamp got written into Duration?


Log in to reply
 

445
Online

7.4k
Users

14.5k
Topics

136.5k
Posts