Image/Snapin Replication Failed: Group Ownership

markbam

Recently some of my images and snapins have been failing to replicate due to group ownership errors. I’m certain my install got messed up from migrating it across various servers/VMs.

When a snapin is uploaded via the webgui, it’s assigned “fogproject:fogproject”. This will fail to replicate.
My solution: run “chown fogproject:www-data uploadedfile.exe” and it will successfully replicate.

Is there somewhere in the configs that I can change the group back to www-data for new uploads?

Thanks

markbam

@Sebastian-Roth

Nothing definitive yet. The ftp logs aren’t showing anything out of the ordinary.
I’m now thinking somehow my network backend between the two sites could be a culprit. I’ve put in a request for more bandwidth and am waiting for that to kick in.

Sebastian Roth

@markbam The upload happens using FTP (user fogproject) and it’s technically not possible to chown to a differen user ID unless you are root and this is not the case in the environment the FOG services are running. So currently the “hacky” solution to this is the FTP client does a chmod 777 to the uploaded files. This way the replication should work.

I just tested this on my server and it works well. Can you please have a look if the snapin files have rwxrwxrwx rights after the initial upload?

markbam

@Sebastian-Roth

Yes, the r/w permissions look the same (777) for all new and already uploaded snapins. Only difference is the group.

In an working setup, what should the user:group be? fogproject:www-data or fogproject:fogproject?

If I understand correctly, the uploaded snapin’s user:group is determined by the account the server’s FTP is run under. I’ll take a closer look at the group permissions that fogproject runs under and if anything looks off on my set up.

Sebastian Roth

@markbam said in Image/Snapin Replication Failed: Group Ownership:

In an working setup, what should the user:group be? fogproject:www-data or fogproject:fogproject?

Yeah, should be fogproject:fogproject.

If I understand correctly, the uploaded snapin’s user:group is determined by the account the server’s FTP is run under.

Exactly.

Recently some of my images and snapins have been failing to replicate…

Can you be more precise on how you noticed it doesn’t work? Did you see error message in the logs? Can you post those here?

markbam

The snapin log errors:

[11-04-19 7:52:01 am] | Started sync for Snapin ExampleZippedSnapin - Resource id #859274
chmod: Access failed: 550 SITE CHMOD command failed. (./ExampleZippedSnapin.zip)
[11-04-19 7:59:59 am] | Sync finished - Resource id #859274

It then deletes the file from the node and starts trying to sync again.

As I look again, I do see that the uploads on the server do have the correct permissions of rwxrwxrwx. But when they are replicated to the node they show rwxr-xr-x.

Sebastian Roth

@markbam Sorry, forgot to ask you on which server you see this? Please check access rights on the master node and the storage node. If I get the log message right it tries to chmod on the storage node side and fails.

markbam

That log was from the Snapin Replicator log from the Fog Log viewer.
I’m not exactly sure which machine the chmod command is being run. Is it FogServer sending the command over the network or the StorageNode issuing the command locally?

Server side shows all snapins as rwxrwxrwx.
Node side shows some as rwxr-xr-x but the rest are rwxrwxrwx.

Correct, the chmod fails after the transfer and the file is removed from the storage node.
The only way I’ve been able to get it work is to change the ownership group from fogproject to www-data on the server.

So my particular issue is figuring out why Fog’s FTP once uploaded snapins as fogproject:www-data but now uploads as fogproject:fogproject. Or figuring out why the chmod wants the permissions associated with www-data instead of fogproject.

Sebastian Roth

@markbam said:

I’m not exactly sure which machine the chmod command is being run. Is it FogServer sending the command over the network or the StorageNode issuing the command locally?

The master node issues lftp command to sync the files over. So from what we see in the log I would think that it’s a FTP command issued by the master node but run on the storage node. But it’s kind of strange you can fix this by chown on the master node.

Node side shows some as rwxr-xr-x but the rest are rwxrwxrwx.

What if you make them all rwxrwxrwx on the storage node?

The only way I’ve been able to get it work is to change the ownership group from fogproject to www-data on the server.

As I said, this is kind of strange and I do not understand it yet. Possibly I have a wrong understanding of this issue.

So my particular issue is figuring out why Fog’s FTP once uploaded snapins as fogproject:www-data but now uploads as fogproject:fogproject.

I might be wrong but I would suspect that it never uploaded as fogproject:www-data. @Tom-Elliott what do you think?

Or figuring out why the chmod wants the permissions associated with www-data instead of fogproject.

While access rights are a combination of ownership (chown) and permissions (chmod) those are not really associated with each any further. But maybe I got you wrong on this one.

markbam

I’ve started with fresh installations of Ubuntu and a fresh installs of both FogServer and Storage Node. What I’m seeing is that all snapins now upload as fogproject:fogproject.

However, when it goes to replicate, only about 70% are successful. The rest continue to experience the same error: “chmod: Access failed: 550 SITE CHMOD command failed”

Permissions and user/groups are the same for every item in the snapin folder. 777 fogproject:fogproject

Sebastian Roth

@markbam said in Image/Snapin Replication Failed: Group Ownership:

only about 70% are successful.

This doesn’t make sense to me.

@Tom-Elliott Any idea?!

Tom Elliott

@Sebastian-Roth https://stackoverflow.com/questions/25308977/site-chmod-command-failed-through-ftp-cant-figure-out-why

Maybe we need to add the chmod_enabled=YES to the VSFTPD configuration?

Sebastian Roth

@Tom-Elliott But why should it work in some case but not in others??

Tom Elliott

@Sebastian-Roth Well, looking further, I don’t understand why lftp is doing chmod. No where do I see it attempting to do chmod for replication elements.

This leads me to think, while the master side permissions are working, maybe the nodes trying to receive the replicated items are owned by root? Meaning maybe fogproject is not the owner on the remote nodes, rather they’re owned by root or fog?

I can only surmise that the files that are failing already exist on the remote side and are owned by a different user, likely one who does not exist on the remote side.

Hopefully that makes sense.

markbam

@Tom-Elliott

This was my initial thinking and why I started over from scratch on both the Server and Storage Node.
The failing snapins are not present on the storage node. For troubleshooting, I’ve even deleted all items in the snapins folder to try and discover a pattern to the failures. It does not seem to be consistent.

Sebastian Roth

@Tom-Elliott said in Image/Snapin Replication Failed: Group Ownership:

Hopefully that makes sense.

Oh yes it does!! That rang a bell for me.

@markbam Please manually adjust the files ownership to be the same on all nodes! See if that fixed the issue for you.

markbam

@Sebastian-Roth
I’m not sure I understand. The failing snapins do not exist on the storage node so I’d have nothing to adjust ownership on.

Sebastian Roth

@markbam Quite obviously I headed down the wrong lane! If I had waited a few more mintues to read you last answer I wouldn’t have posted that.

I’ll probably need to dig into this further and test myself. Will need a bit of time though.

markbam

This may or may not be related:

To update Fog, the installer tells me I need to delete the user account fogproject. When I do so, it changes the user:group of the files in my snapins and images folder from fogproject:fogproject to fogproject:www-data.

So now I know where the www-data is coming from.

markbam

I think I’m on to something. Restating the problem: The Fog Snapin Replication log shows that the snapin transfers are successful but then fails to chmod and the snapins are deleted from the Storage Node.

But, even though Fog records the transfer as successful, it looks like the snapins don’t actually finish their copy. The snapins only transfer ~100MB, then something goes wrong. Fog logs the transfer as successful anyway and tries to chmod which fails because the file isn’t completely there.

So I’m guessing I’m either dropping a connection or hitting a FTP timeout somewhere?

Sebastian Roth

@markbam Good catch!! Are you sure the disk on the storage node has enough free space?

Image/Snapin Replication Failed: Group Ownership

114

12.2k

17.4k

155.5k