<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Fog Scheduler running at 100% CPU + SSH connection flood between nodes]]></title><description><![CDATA[<p dir="auto">Hello,</p>
<p dir="auto">Recently, I noticed that I cannot run any snapins that were associated with the host. I also noticed the FOGScheduler was not working. After investigating, I found several issues.</p>
<p dir="auto"><strong>FOG Version:</strong><br />
Upgraded from 1.15.10 → &lt;several 1.6-beta in between&gt; → currently running <strong>1.6.0-beta.2297</strong></p>
<p dir="auto">Setup:</p>
<ul>
<li>FOG Server: 172.28.1.80</li>
<li>Storage Node: 172.28.1.89</li>
</ul>
<hr />
<h3>1) Replicator: falsely reports image files as missing at first, then immediately syncs.</h3>
<pre><code>[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1.fixed_size_partitions(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1.mbr(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1.minimum.partitions(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1.original.fstypes(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1.original.swapuuids(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1.partitions(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1.shrunken.partitions(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1p1.img(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1p2.img(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1p3.img(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist d1p4.img(storage 1)
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1.fixed_size_partitions on storage 1
[04-07-26 9:28:10 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1.fixed_size_partitions file to storage 1
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1.mbr on storage 1
[04-07-26 9:28:10 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1.mbr file to storage 1
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1.minimum.partitions on storage 1
[04-07-26 9:28:10 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1.minimum.partitions file to storage 1
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1.original.fstypes on storage 1
[04-07-26 9:28:10 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1.original.fstypes file to storage 1
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1.original.swapuuids on storage 1
[04-07-26 9:28:10 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1.original.swapuuids file to storage 1
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1.partitions on storage 1
[04-07-26 9:28:10 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1.partitions file to storage 1
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1.shrunken.partitions on storage 1
[04-07-26 9:28:10 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1.shrunken.partitions file to storage 1
[04-07-26 9:28:10 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1p1.img on storage 1
[04-07-26 9:28:11 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1p1.img file to storage 1
[04-07-26 9:28:11 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1p2.img on storage 1
[04-07-26 9:28:11 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1p2.img file to storage 1
[04-07-26 9:28:11 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1p3.img on storage 1
[04-07-26 9:28:11 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1p3.img file to storage 1
[04-07-26 9:28:11 am]   # peruswin-audit-1.2: File does not exist on master node, deleting /images/peruswin-audit-1.2/d1p4.img on storage 1
[04-07-26 9:28:11 am]  | peruswin-audit-1.2: No need to sync /images/peruswin-audit-1.2/d1p4.img file to storage 1
[04-07-26 9:28:11 am]  | CMD: lftp -e 'set xfer:log 1; set xfer:log-file "/opt/fog/log/fogreplicator.peruswin-audit-1.2.transfer.storage 1.log";set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; mirror -c --parallel=20 -R --ignore-time -vvv --exclude ".srvprivate" "/images/peruswin-audit-1.2" "/images/peruswin-audit-1.2";exit' -u fogproject,[redacted] 172.28.1.89
[04-07-26 9:28:11 am]  * Started sync for Image peruswin-audit-1.2 - Resource id #1583
[04-07-26 9:28:11 am]  | Sync finished - Resource id #602
</code></pre>
<p dir="auto"><strong>Observed:</strong></p>
<ul>
<li>Files exist on both server and storage under <code>/images/&lt;image&gt;</code></li>
<li>Verified with <code>find</code> and <code>lftp</code></li>
<li>Image deploys successfully to clients</li>
</ul>
<hr />
<h3>2) SSH spam between nodes</h3>
<pre><code>Apr 07 09:42:13 fog sshd[2483177]: error: kex_exchange_identification: Connection closed by remote host
Apr 07 09:42:13 fog sshd[2483177]: Connection closed by 172.28.1.89 port 55330
Apr 07 09:42:13 fog sshd[2483178]: error: kex_exchange_identification: Connection closed by remote host
Apr 07 09:42:13 fog sshd[2483178]: Connection closed by 172.28.1.89 port 55336
Apr 07 09:42:14 fog sshd[2483179]: error: kex_exchange_identification: Connection closed by remote host
Apr 07 09:42:14 fog sshd[2483179]: Connection closed by 172.28.1.80 port 34766
Apr 07 09:42:14 fog sshd[2483180]: error: kex_exchange_identification: Connection closed by remote host
Apr 07 09:42:14 fog sshd[2483180]: Connection closed by 172.28.1.80 port 34768
</code></pre>
<p dir="auto"><strong>Observed:</strong></p>
<ul>
<li>Happens multiple times per second</li>
<li>Seen on both server and storage</li>
</ul>
<p dir="auto"><strong>Fix / Isolation:</strong></p>
<ul>
<li>Stopping <code>FOGMulticastManager</code> stops the SSH spam</li>
<li>Starting it again reproduces the issue</li>
</ul>
<hr />
<h3>3) FOGMulticastManager creates broken PHP session files (storage node)</h3>
<pre><code>session_start(): open(... Permission denied)
</code></pre>
<p dir="auto"><strong>Observed:</strong></p>
<ul>
<li>
<p dir="auto"><code>/var/lib/php/sessions</code> directory is correct:</p>
<pre><code>drwx-wx-wt root:www-data
</code></pre>
</li>
<li>
<p dir="auto">Session files are created as:</p>
<pre><code>-rw------- 1 root root ...
</code></pre>
</li>
<li>
<p dir="auto">Apache/PHP-FPM runs as <code>www-data</code> → cannot access them</p>
</li>
</ul>
<p dir="auto"><strong>Isolation:</strong></p>
<ul>
<li>
<p dir="auto">Stop:</p>
<pre><code>systemctl stop FOGScheduler FOGMulticastManager
</code></pre>
</li>
<li>
<p dir="auto">Delete sessions:</p>
<pre><code>find /var/lib/php/sessions -type f -name 'sess_*' -delete
</code></pre>
</li>
<li>
<p dir="auto">Errors stop</p>
</li>
<li>
<p dir="auto">Start only:</p>
<pre><code>systemctl start FOGMulticastManager
</code></pre>
</li>
<li>
<p dir="auto">Errors immediately return</p>
</li>
</ul>
<hr />
<h3>4) Power Management warnings</h3>
<pre><code>Undefined array key "pmAction"
</code></pre>
<p dir="auto"><strong>Observed:</strong></p>
<ul>
<li>Many hosts have no row in <code>powerManagement</code> table</li>
</ul>
<p dir="auto"><strong>Fix:</strong></p>
<ul>
<li>Disabling Power Management in FOG settings stops the warnings</li>
</ul>
<hr />
<h3>5) Scheduler tasks do not run</h3>
<p dir="auto"><strong>Observed:</strong></p>
<ul>
<li>Scheduled tasks do not execute unless <code>FOGScheduler</code> is restarted</li>
<li>After restart, tasks run, but later scheduler stalls again</li>
<li>New tasks are not picked up</li>
</ul>
<hr />
<h3>6) Snapins do not execute</h3>
<p dir="auto"><strong>Observed:</strong></p>
<ul>
<li>Snapins can be assigned</li>
<li>Execution on a single host associated with the snapins fails</li>
<li>Snapin runs on a group</li>
</ul>
<hr />
<h3>7) High CPU usage (PHP)</h3>
<pre><code>php (root) ~100% CPU
</code></pre>
<p dir="auto"><strong>Observed:</strong></p>
<ul>
<li>High CPU usage on both server and storage</li>
<li>Drops when stopping <code>FOGScheduler</code></li>
</ul>
<hr />
<h3>Additional notes</h3>
<ul>
<li>Manual SSH from server → storage using <code>fogproject</code> works</li>
<li>FTP (<code>lftp</code>) can list image files correctly</li>
<li>Installer has been re-run on both nodes after update</li>
</ul>
]]></description><link>http://forums.fogproject.org/topic/18146/fog-scheduler-running-at-100-cpu-ssh-connection-flood-between-nodes</link><generator>RSS for Node</generator><lastBuildDate>Tue, 07 Apr 2026 18:42:33 GMT</lastBuildDate><atom:link href="http://forums.fogproject.org/topic/18146.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 07 Apr 2026 06:43:14 GMT</pubDate><ttl>60</ttl></channel></rss>