vsftpd

AndrewG78

Hi,
I recently added server master node(as a new storage group) to the fog server and upgraded FOG from 1.5.4 to 1.5.7. I have high CPU load 80% of the time, even if there are no any tasks to do. There are two vsftpd daemons consuming 20% of the CPU + kworker from time to time. How can I debug this ?

Sebastian Roth

@AndrewG78 Did you upgrade all your nodes? Please make sure you read and understand the important notice in the release notes for FOG 1.5.5 (and later): https://news.fogproject.org/fog-1-5-5-officially-released/

Nodes being on different versions (1.5.4 vs. 1.5.5) will replicate images over and over again as some of the hashing code needed to be changed. Therefore we advise you to update all nodes in one go! Please make sure you stop replication on the master first systemctl stop FOGImageReplicator; systemctl stop FOGSnapinReplicator, then update the storage node(s) and then update master node as a last step.

Possibly we need to add this notice to all new releases?!?

AndrewG78

@Sebastian-Roth
Thx for the update. Yes I have read this before I started.
I have updated my node to the FOG server version at the same time, but I did not stop the replication.
I’m not sure if this scenario is realted to my setup.
I have two separate starge groups with only one master node in each of these groups.
So there are no nodes in the groups in which Master would replicate.
I will disable FOGImageReplicator and FOGSnapinReplicator on the server, but Im not sure if this is the right way to solve the issue.

Sebastian Roth

@AndrewG78 Hmm, maybe I was heading down the wrong track but from the minimal information I had the impression some kind of replication would be going on.

kworker quite often is high disk IO and that kind of made up for me with PHP trying to calculating a checksum and FTP transferring… Just guessing here.

AndrewG78

@Sebastian-Roth
So after disabling replication services, FOG UI became super responsive.
No more kworkers and vsftps deamons.
Perhaps an issue in the newest version?
Does anyone have similar setup and can confirm this bad behaviour ?

Sebastian Roth

@AndrewG78 We need more information. Please check all the logs in /var/log/fog/... and upload log files here.

AndrewG78

@Sebastian-Roth
I think I found the reason(s).
There are 3 things I would like to clarify.
1.
Although replication services are disabled, there is still some replication done between storage groups.
In my case, I have two storage groups, every group has one storage node.
Both Nodes were master ones.
The image from the new group(2) was replicated to the old default group(1).
I have unchecked replicate checkbox in the image, and also disabled Master Node for the old default group. So there is only one master node. The old group has no master node at all.
After this, all seems to be fine now.
a)
The question is, was this a proper behaviour?
I thought replication is done only within the storage group members(nodes).
b)
Are there any other services that could do this replication?

2.
The high cpu load(kworker and vsftpd) was related to replication and lack of disk space. Replication processes did not stop even if there was 0% of free space.
I think this is a bug.
3.
I can see a bunch of multicast log files.
a)
Should there be some smarter log rotation ?
b)
"No new tasks found "is logged every 10s - Can we change this time somehow ?

Sebastian Roth

@AndrewG78 I’ll try to answer all the things you brought up. But first let me state that so far you haven’t been clear (from my point of view) what has happened on which FOG server. For replication there are at least two parties (servers) involved and it’s important for me to understand which one showed the issue. I will get to that point later on again.

Although replication services are disabled, there is still some replication done between storage groups.

Disabled on which server? All FOG servers?

1 a) The question is, was this a proper behaviour?
I thought replication is done only within the storage group members(nodes).

As I haven’t invented the replication algorithm I don’t know it as much as Tom would. But reading the docs I get the impression that this is expected to happen: https://wiki.fogproject.org/wiki/index.php?title=Replication
6. If the node currently checking is the "primary master group" for the data it's working, it will attempt replicating its data to the master of each of the other groups the data is assigned under.

1 b) Are there any other services that could do this replication?

You have two nodes and both have replication services running on them!

The high cpu load(kworker and vsftpd) was related to replication and lack of disk space. Replication processes did not stop even if there was 0% of free space.
I think this is a bug.

The vsftpd part is what I would call the receiving node in this constellation. This might give you an idea which node was causing this. Disks can run out of space for many different reasons. I don’t see why our replication service should constantly check and stop replication just because of little space. Every server needs a good working disk space monitoring to warn the sysadmin to take care of it. See it from this side: If we add a check and simply stop replicating because of a lack of disk space people who don’t monitor their disk space won’t notice possibly for month and might blame us about replication not working. Although it’s not nice to hit a full disk this will eventually cause trouble and make the sleeping sysadmin aware.

3 a) Should there be some smarter log rotation ?

As well something a sysadmin should be able to handle. Linux has logrotate and I don’t see why we should invent that again.

3 b) "No new tasks found "is logged every 10s - Can we change this time somehow ?

Yes, web UI -> FOG Configuration -> FOG Settings -> FOG Linux Service Sleep Times -> MULTICASTSLEEPTIME

Sorry if my answers sound a bit impolite. I don’t mean it that way! Just wanted to show you that things can be seen from the other side as well.

vsftpd

76

12.7k

17.6k

156.8k