Image/Snapin Replication Failed: Group Ownership

markbam

That log was from the Snapin Replicator log from the Fog Log viewer.
I’m not exactly sure which machine the chmod command is being run. Is it FogServer sending the command over the network or the StorageNode issuing the command locally?

Server side shows all snapins as rwxrwxrwx.
Node side shows some as rwxr-xr-x but the rest are rwxrwxrwx.

Correct, the chmod fails after the transfer and the file is removed from the storage node.
The only way I’ve been able to get it work is to change the ownership group from fogproject to www-data on the server.

So my particular issue is figuring out why Fog’s FTP once uploaded snapins as fogproject:www-data but now uploads as fogproject:fogproject. Or figuring out why the chmod wants the permissions associated with www-data instead of fogproject.

Sebastian Roth

@markbam said:

I’m not exactly sure which machine the chmod command is being run. Is it FogServer sending the command over the network or the StorageNode issuing the command locally?

The master node issues lftp command to sync the files over. So from what we see in the log I would think that it’s a FTP command issued by the master node but run on the storage node. But it’s kind of strange you can fix this by chown on the master node.

Node side shows some as rwxr-xr-x but the rest are rwxrwxrwx.

What if you make them all rwxrwxrwx on the storage node?

The only way I’ve been able to get it work is to change the ownership group from fogproject to www-data on the server.

As I said, this is kind of strange and I do not understand it yet. Possibly I have a wrong understanding of this issue.

So my particular issue is figuring out why Fog’s FTP once uploaded snapins as fogproject:www-data but now uploads as fogproject:fogproject.

I might be wrong but I would suspect that it never uploaded as fogproject:www-data. @Tom-Elliott what do you think?

Or figuring out why the chmod wants the permissions associated with www-data instead of fogproject.

While access rights are a combination of ownership (chown) and permissions (chmod) those are not really associated with each any further. But maybe I got you wrong on this one.

markbam

I’ve started with fresh installations of Ubuntu and a fresh installs of both FogServer and Storage Node. What I’m seeing is that all snapins now upload as fogproject:fogproject.

However, when it goes to replicate, only about 70% are successful. The rest continue to experience the same error: “chmod: Access failed: 550 SITE CHMOD command failed”

Permissions and user/groups are the same for every item in the snapin folder. 777 fogproject:fogproject

Sebastian Roth

@markbam said in Image/Snapin Replication Failed: Group Ownership:

only about 70% are successful.

This doesn’t make sense to me.

@Tom-Elliott Any idea?!

Tom Elliott

@Sebastian-Roth https://stackoverflow.com/questions/25308977/site-chmod-command-failed-through-ftp-cant-figure-out-why

Maybe we need to add the chmod_enabled=YES to the VSFTPD configuration?

Sebastian Roth

@Tom-Elliott But why should it work in some case but not in others??

Tom Elliott

@Sebastian-Roth Well, looking further, I don’t understand why lftp is doing chmod. No where do I see it attempting to do chmod for replication elements.

This leads me to think, while the master side permissions are working, maybe the nodes trying to receive the replicated items are owned by root? Meaning maybe fogproject is not the owner on the remote nodes, rather they’re owned by root or fog?

I can only surmise that the files that are failing already exist on the remote side and are owned by a different user, likely one who does not exist on the remote side.

Hopefully that makes sense.

markbam

@Tom-Elliott

This was my initial thinking and why I started over from scratch on both the Server and Storage Node.
The failing snapins are not present on the storage node. For troubleshooting, I’ve even deleted all items in the snapins folder to try and discover a pattern to the failures. It does not seem to be consistent.

Sebastian Roth

@Tom-Elliott said in Image/Snapin Replication Failed: Group Ownership:

Hopefully that makes sense.

Oh yes it does!! That rang a bell for me.

@markbam Please manually adjust the files ownership to be the same on all nodes! See if that fixed the issue for you.

markbam

@Sebastian-Roth
I’m not sure I understand. The failing snapins do not exist on the storage node so I’d have nothing to adjust ownership on.

Sebastian Roth

@markbam Quite obviously I headed down the wrong lane! If I had waited a few more mintues to read you last answer I wouldn’t have posted that.

I’ll probably need to dig into this further and test myself. Will need a bit of time though.

markbam

This may or may not be related:

To update Fog, the installer tells me I need to delete the user account fogproject. When I do so, it changes the user:group of the files in my snapins and images folder from fogproject:fogproject to fogproject:www-data.

So now I know where the www-data is coming from.

markbam

I think I’m on to something. Restating the problem: The Fog Snapin Replication log shows that the snapin transfers are successful but then fails to chmod and the snapins are deleted from the Storage Node.

But, even though Fog records the transfer as successful, it looks like the snapins don’t actually finish their copy. The snapins only transfer ~100MB, then something goes wrong. Fog logs the transfer as successful anyway and tries to chmod which fails because the file isn’t completely there.

So I’m guessing I’m either dropping a connection or hitting a FTP timeout somewhere?

Sebastian Roth

@markbam Good catch!! Are you sure the disk on the storage node has enough free space?

markbam

@Sebastian-Roth

Yup plenty of space. Only 34 GB of 2TB used.

markbam

I’ve been noticing some odd things happening:

“Test1.zip” starts an lftp command to transfer to /opt/fog/snapins on the Storage node as expected. Then at ~100mb transferred, the file “Test1.zip” disappears from /opt/fog/snapins on the Storage node BUT the ftp command is still active and transferring. The vsftp processes still have cpu and network activity.

It seems that the file is still stuck transferring to memory but, since it ceases to physically exist, lftp can’t perform a clean termination (the chmod command).

I am able to reproduce this myself using just the command I pulled from Fog. This is run on from the Fog Server:
lftp -e ‘set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; set net:limit-rate 0:128000; mirror -c --parallel=20 -R -i “Test1.zip” --ignore-time -vvv --exclude “.srvprivate” “/opt/fog/snapins” “/opt/fog/snapins”; exit’ -u fogproject,‘xxxxxx’ xxx.xxx.xxx.xxx

With this I can see a progress bar continuing even after the file disappears from the Storage Node at ~100mb.

Tom Elliott

@markbam Can you get logs from the remote side?

I don’t know exactly what logs to look for, likely FTP logs if you have them as well as output of dmesg

I’m wondering if selinux is stopping the connection for any files that are beyond 100M. The other thing to look at is /tmp on the remote box. If there’s not enough room there (maybe it’s only got around 100M available too?) I imagine it could be causing an unexpected issue too?

Just spit-balling of course.

Sebastian Roth

@markbam Ok, let’s try to enable VSFTP verbose logging: Edit /etc/vsftpd/vsftpd.conf, it should look like this:

max_per_ip=200
anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=022
dirmessage_enable=YES
xferlog_enable=YES
connect_from_port_20=YES
xferlog_std_format=YES
listen=YES
pam_service_name=vsftpd
userlist_enable=NO
tcp_wrappers=YES

Add the following two lines:

log_ftp_protocol=YES
vsftpd_log_file=/var/log/vsftpd.log

and change this one line:

xferlog_std_format=NO

This cahnge is important, as it won’t properly log if the later is still set to YES.

Now restart vsftpd service and watch the log /var/log/vsftpd.log…

Sebastian Roth

@markbam Any news on this?

markbam

@Sebastian-Roth

Nothing definitive yet. The ftp logs aren’t showing anything out of the ordinary.
I’m now thinking somehow my network backend between the two sites could be a culprit. I’ve put in a request for more bandwidth and am waiting for that to kick in.

Image/Snapin Replication Failed: Group Ownership

121

12.2k

17.3k

155.4k