Image/Snapin Replication Failed: Group Ownership
Recently some of my images and snapins have been failing to replicate due to group ownership errors. I’m certain my install got messed up from migrating it across various servers/VMs.
When a snapin is uploaded via the webgui, it’s assigned “fogproject:fogproject”. This will fail to replicate.
My solution: run “chown fogproject:www-data uploadedfile.exe” and it will successfully replicate.
Is there somewhere in the configs that I can change the group back to www-data for new uploads?
@markbam Are you good with network analyzing using tcpdump/wireshark? If you are keen to give it a try let and need help just let us know.
Nothing definitive yet. The ftp logs aren’t showing anything out of the ordinary.
I’m now thinking somehow my network backend between the two sites could be a culprit. I’ve put in a request for more bandwidth and am waiting for that to kick in.
@markbam Any news on this?
@markbam Ok, let’s try to enable VSFTP verbose logging: Edit
/etc/vsftpd/vsftpd.conf, it should look like this:
max_per_ip=200 anonymous_enable=NO local_enable=YES write_enable=YES local_umask=022 dirmessage_enable=YES xferlog_enable=YES connect_from_port_20=YES xferlog_std_format=YES listen=YES pam_service_name=vsftpd userlist_enable=NO tcp_wrappers=YES
Add the following two lines:
and change this one line:
This cahnge is important, as it won’t properly log if the later is still set to YES.
Now restart vsftpd service and watch the log
@markbam Can you get logs from the remote side?
I don’t know exactly what logs to look for, likely FTP logs if you have them as well as output of
I’m wondering if selinux is stopping the connection for any files that are beyond 100M. The other thing to look at is /tmp on the remote box. If there’s not enough room there (maybe it’s only got around 100M available too?) I imagine it could be causing an unexpected issue too?
Just spit-balling of course.
I’ve been noticing some odd things happening:
“Test1.zip” starts an lftp command to transfer to /opt/fog/snapins on the Storage node as expected. Then at ~100mb transferred, the file “Test1.zip” disappears from /opt/fog/snapins on the Storage node BUT the ftp command is still active and transferring. The vsftp processes still have cpu and network activity.
It seems that the file is still stuck transferring to memory but, since it ceases to physically exist, lftp can’t perform a clean termination (the chmod command).
I am able to reproduce this myself using just the command I pulled from Fog. This is run on from the Fog Server:
lftp -e ‘set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; set net:limit-rate 0:128000; mirror -c --parallel=20 -R -i “Test1.zip” --ignore-time -vvv --exclude “.srvprivate” “/opt/fog/snapins” “/opt/fog/snapins”; exit’ -u fogproject,‘xxxxxx’ xxx.xxx.xxx.xxx
With this I can see a progress bar continuing even after the file disappears from the Storage Node at ~100mb.
markbam last edited by markbam
Yup plenty of space. Only 34 GB of 2TB used.
@markbam Good catch!! Are you sure the disk on the storage node has enough free space?
I think I’m on to something. Restating the problem: The Fog Snapin Replication log shows that the snapin transfers are successful but then fails to chmod and the snapins are deleted from the Storage Node.
But, even though Fog records the transfer as successful, it looks like the snapins don’t actually finish their copy. The snapins only transfer ~100MB, then something goes wrong. Fog logs the transfer as successful anyway and tries to chmod which fails because the file isn’t completely there.
So I’m guessing I’m either dropping a connection or hitting a FTP timeout somewhere?
This may or may not be related:
To update Fog, the installer tells me I need to delete the user account fogproject. When I do so, it changes the user:group of the files in my snapins and images folder from fogproject:fogproject to fogproject:www-data.
So now I know where the www-data is coming from.
@markbam Quite obviously I headed down the wrong lane! If I had waited a few more mintues to read you last answer I wouldn’t have posted that.
I’ll probably need to dig into this further and test myself. Will need a bit of time though.
I’m not sure I understand. The failing snapins do not exist on the storage node so I’d have nothing to adjust ownership on.
Hopefully that makes sense.
Oh yes it does!! That rang a bell for me.
@markbam Please manually adjust the files ownership to be the same on all nodes! See if that fixed the issue for you.
This was my initial thinking and why I started over from scratch on both the Server and Storage Node.
The failing snapins are not present on the storage node. For troubleshooting, I’ve even deleted all items in the snapins folder to try and discover a pattern to the failures. It does not seem to be consistent.
@Sebastian-Roth Well, looking further, I don’t understand why lftp is doing chmod. No where do I see it attempting to do chmod for replication elements.
This leads me to think, while the master side permissions are working, maybe the nodes trying to receive the replicated items are owned by root? Meaning maybe fogproject is not the owner on the remote nodes, rather they’re owned by root or fog?
I can only surmise that the files that are failing already exist on the remote side and are owned by a different user, likely one who does not exist on the remote side.
Hopefully that makes sense.
@Tom-Elliott But why should it work in some case but not in others??
Maybe we need to add the chmod_enabled=YES to the VSFTPD configuration?
only about 70% are successful.
This doesn’t make sense to me.
@Tom-Elliott Any idea?!
I’ve started with fresh installations of Ubuntu and a fresh installs of both FogServer and Storage Node. What I’m seeing is that all snapins now upload as fogproject:fogproject.
However, when it goes to replicate, only about 70% are successful. The rest continue to experience the same error: “chmod: Access failed: 550 SITE CHMOD command failed”
Permissions and user/groups are the same for every item in the snapin folder. 777 fogproject:fogproject