Image/Snapin Replication Failed: Group Ownership

markbam

This may or may not be related:

To update Fog, the installer tells me I need to delete the user account fogproject. When I do so, it changes the user:group of the files in my snapins and images folder from fogproject:fogproject to fogproject:www-data.

So now I know where the www-data is coming from.

markbam

I think I’m on to something. Restating the problem: The Fog Snapin Replication log shows that the snapin transfers are successful but then fails to chmod and the snapins are deleted from the Storage Node.

But, even though Fog records the transfer as successful, it looks like the snapins don’t actually finish their copy. The snapins only transfer ~100MB, then something goes wrong. Fog logs the transfer as successful anyway and tries to chmod which fails because the file isn’t completely there.

So I’m guessing I’m either dropping a connection or hitting a FTP timeout somewhere?

Sebastian Roth

@markbam Good catch!! Are you sure the disk on the storage node has enough free space?

markbam

@Sebastian-Roth

Yup plenty of space. Only 34 GB of 2TB used.

markbam

I’ve been noticing some odd things happening:

“Test1.zip” starts an lftp command to transfer to /opt/fog/snapins on the Storage node as expected. Then at ~100mb transferred, the file “Test1.zip” disappears from /opt/fog/snapins on the Storage node BUT the ftp command is still active and transferring. The vsftp processes still have cpu and network activity.

It seems that the file is still stuck transferring to memory but, since it ceases to physically exist, lftp can’t perform a clean termination (the chmod command).

I am able to reproduce this myself using just the command I pulled from Fog. This is run on from the Fog Server:
lftp -e ‘set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; set net:limit-rate 0:128000; mirror -c --parallel=20 -R -i “Test1.zip” --ignore-time -vvv --exclude “.srvprivate” “/opt/fog/snapins” “/opt/fog/snapins”; exit’ -u fogproject,‘xxxxxx’ xxx.xxx.xxx.xxx

With this I can see a progress bar continuing even after the file disappears from the Storage Node at ~100mb.

Tom Elliott

@markbam Can you get logs from the remote side?

I don’t know exactly what logs to look for, likely FTP logs if you have them as well as output of dmesg

I’m wondering if selinux is stopping the connection for any files that are beyond 100M. The other thing to look at is /tmp on the remote box. If there’s not enough room there (maybe it’s only got around 100M available too?) I imagine it could be causing an unexpected issue too?

Just spit-balling of course.

Sebastian Roth

@markbam Ok, let’s try to enable VSFTP verbose logging: Edit /etc/vsftpd/vsftpd.conf, it should look like this:

max_per_ip=200
anonymous_enable=NO
local_enable=YES
write_enable=YES
local_umask=022
dirmessage_enable=YES
xferlog_enable=YES
connect_from_port_20=YES
xferlog_std_format=YES
listen=YES
pam_service_name=vsftpd
userlist_enable=NO
tcp_wrappers=YES

Add the following two lines:

log_ftp_protocol=YES
vsftpd_log_file=/var/log/vsftpd.log

and change this one line:

xferlog_std_format=NO

This cahnge is important, as it won’t properly log if the later is still set to YES.

Now restart vsftpd service and watch the log /var/log/vsftpd.log…

Sebastian Roth

@markbam Any news on this?

markbam

@Sebastian-Roth

Nothing definitive yet. The ftp logs aren’t showing anything out of the ordinary.
I’m now thinking somehow my network backend between the two sites could be a culprit. I’ve put in a request for more bandwidth and am waiting for that to kick in.

Sebastian Roth

@markbam Are you good with network analyzing using tcpdump/wireshark? If you are keen to give it a try let and need help just let us know.

Image/Snapin Replication Failed: Group Ownership

32

12.7k

17.6k

156.8k