Nothing definitive yet. The ftp logs aren’t showing anything out of the ordinary.
I’m now thinking somehow my network backend between the two sites could be a culprit. I’ve put in a request for more bandwidth and am waiting for that to kick in.
Latest posts made by markbam
-
RE: Image/Snapin Replication Failed: Group Ownership
-
RE: Image/Snapin Replication Failed: Group Ownership
I’ve been noticing some odd things happening:
“Test1.zip” starts an lftp command to transfer to /opt/fog/snapins on the Storage node as expected. Then at ~100mb transferred, the file “Test1.zip” disappears from /opt/fog/snapins on the Storage node BUT the ftp command is still active and transferring. The vsftp processes still have cpu and network activity.
It seems that the file is still stuck transferring to memory but, since it ceases to physically exist, lftp can’t perform a clean termination (the chmod command).
I am able to reproduce this myself using just the command I pulled from Fog. This is run on from the Fog Server:
lftp -e ‘set ftp:list-options -a;set net:max-retries 10;set net:timeout 30; set net:limit-rate 0:128000; mirror -c --parallel=20 -R -i “Test1.zip” --ignore-time -vvv --exclude “.srvprivate” “/opt/fog/snapins” “/opt/fog/snapins”; exit’ -u fogproject,‘xxxxxx’ xxx.xxx.xxx.xxxWith this I can see a progress bar continuing even after the file disappears from the Storage Node at ~100mb.
-
RE: Image/Snapin Replication Failed: Group Ownership
Yup plenty of space. Only 34 GB of 2TB used.
-
RE: Image/Snapin Replication Failed: Group Ownership
I think I’m on to something. Restating the problem: The Fog Snapin Replication log shows that the snapin transfers are successful but then fails to chmod and the snapins are deleted from the Storage Node.
But, even though Fog records the transfer as successful, it looks like the snapins don’t actually finish their copy. The snapins only transfer ~100MB, then something goes wrong. Fog logs the transfer as successful anyway and tries to chmod which fails because the file isn’t completely there.
So I’m guessing I’m either dropping a connection or hitting a FTP timeout somewhere?
-
RE: Image/Snapin Replication Failed: Group Ownership
This may or may not be related:
To update Fog, the installer tells me I need to delete the user account fogproject. When I do so, it changes the user:group of the files in my snapins and images folder from fogproject:fogproject to fogproject:www-data.
So now I know where the www-data is coming from.
-
RE: Image/Snapin Replication Failed: Group Ownership
@Sebastian-Roth
I’m not sure I understand. The failing snapins do not exist on the storage node so I’d have nothing to adjust ownership on. -
RE: Image/Snapin Replication Failed: Group Ownership
This was my initial thinking and why I started over from scratch on both the Server and Storage Node.
The failing snapins are not present on the storage node. For troubleshooting, I’ve even deleted all items in the snapins folder to try and discover a pattern to the failures. It does not seem to be consistent. -
RE: Image/Snapin Replication Failed: Group Ownership
I’ve started with fresh installations of Ubuntu and a fresh installs of both FogServer and Storage Node. What I’m seeing is that all snapins now upload as fogproject:fogproject.
However, when it goes to replicate, only about 70% are successful. The rest continue to experience the same error: “chmod: Access failed: 550 SITE CHMOD command failed”
Permissions and user/groups are the same for every item in the snapin folder. 777 fogproject:fogproject
-
RE: Image/Snapin Replication Failed: Group Ownership
That log was from the Snapin Replicator log from the Fog Log viewer.
I’m not exactly sure which machine the chmod command is being run. Is it FogServer sending the command over the network or the StorageNode issuing the command locally?Server side shows all snapins as rwxrwxrwx.
Node side shows some as rwxr-xr-x but the rest are rwxrwxrwx.Correct, the chmod fails after the transfer and the file is removed from the storage node.
The only way I’ve been able to get it work is to change the ownership group from fogproject to www-data on the server.So my particular issue is figuring out why Fog’s FTP once uploaded snapins as fogproject:www-data but now uploads as fogproject:fogproject. Or figuring out why the chmod wants the permissions associated with www-data instead of fogproject.
-
RE: Image/Snapin Replication Failed: Group Ownership
The snapin log errors:
[11-04-19 7:52:01 am] | Started sync for Snapin ExampleZippedSnapin - Resource id #859274
chmod: Access failed: 550 SITE CHMOD command failed. (./ExampleZippedSnapin.zip)
[11-04-19 7:59:59 am] | Sync finished - Resource id #859274It then deletes the file from the node and starts trying to sync again.
As I look again, I do see that the uploads on the server do have the correct permissions of rwxrwxrwx. But when they are replicated to the node they show rwxr-xr-x.