Feature request for FOG 1.6.x - Replace NFSv3
-
@george1421 said in Feature request for FOG 1.6.x - Replace NFSv3:
AESNI requires the CPU to support this and not all of them do. Some of the enterprise intel CPUs do, but not all. I think it would be risky to rely on AESNI in the cpu support.
If the person doing the imaging wants to use encryption, perhaps FOS can detect if AESNI is supported and if so, use it. Otherwise fall back to something else that still provides encryption but might be slower.
-
Below is some test lab baseline tests between a FOG server and a target computer. On the FOG server I’m running Centos 7 on a Dell 7010. The target computer is also a Dell 7010. I used these lower end systems specifically to test changes in kernel parameters with the intent of a lower end system would show more of a change (percentage wise) than a fast FOG server and target computer. Both the FOG server and target computer have SATA SSD drives installed.
For testing I created a 10GB file with dd containing all 0’s. I used this file to benchmark sending data between the FOG server and target computer. The network that is setup is the two computers on an isolated SG350 network switch.
The first test is copying a file from the FOG server to the local hard drive on the target computer. I ran the test 3 times to get an average
# time cp /mnt/t2/r101gb.img . real 1m36.260s user 0m0.036s sys 0m6.660s # time cp /mnt/t2/r102gb.img . real 1m36.334s user 0m0.051s sys 0m7.023s # time cp /mnt/t2/r103gb.img . real 1m35.751s user 0m0.059s sys 0m7.047s
The next test is using socat to copy the 10GB file from the FOG server to the target computer. Note below is only the client timing marks since this was a pull request from the FOG server.
# time socat TCP:192.168.10.1:8800 /mnt/t2/r10gb.img real 1m31.916s user 0m4.963s sys 0m19.418s # time socat TCP:192.168.10.1:8800 /mnt/t2/r10gb.img real 1m31.916s user 0m4.536s sys 0m16.369s # time socat TCP:192.168.10.1:8800 /mnt/t2/r10gb.img real 1m31.922s user 0m4.644s sys 0m17.251s
So in the end there wasn’t any remarkable differences in transfer times between NFSv4 and socat. It would be difficult (at this time) to make a good argument with moving away from NFSv4 vs the amount of effort that it would take to implement socat in a fog. Both socat and NFSv4 use a single tcp port.
With socat we can add certificate authentication. Authentication is also available on NFSv4. At this time its not clear if by using certificates with has an impact on transfer rates (as in full end to end encryption) or just for TLS handshaking.
-
@george1421 said in Feature request for FOG 1.6.x - Replace NFSv3:
It would be difficult (at this time) to make a good argument with moving away from NFSv4 vs the amount of effort that it would take to implement socat in a fog. Both socat and NFSv4 use a single tcp port.
Could we run the server “end” on the FOS client? This way we would only use the SSH port to setup socat in client mode on the FOG server without opening an extra port. Though on the other hand people who want to use FOG with a network firewall in between (e.g. connecting two sites via VPN) would still need to handle the reverse connection (FOG server to FOS engine).
-
@Sebastian-Roth In the case of socat the term server and client are relative to the direction of data flow. Data always flows from the client to the server (processes). Understand at this point there is no encryption in the mix to add that overhead. With socat ssh is only used to initiate the FOG Server side of the push/pull. No data is flowing across that link.
Since I have the test lab, I decided to test a scp file transfer from the target computer to the FOG Server.
# time scp /mnt/t2/r10gb.img root@192.168.10.1:/images/r11gb.img The authenticity of host '192.168.10.1 (192.168.10.1)' can't be established. ECDSA key fingerprint is SHA256:OpIsFYWVDCr/ovMlmPPSl46jpT332P3+BHnchdxzTCI. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '192.168.10.1' (ECDSA) to the list of known hosts. root@192.168.10.1's password: r10gb.img 100% 10GB 110.5MB/s 01:32 real 1m40.016s user 0m43.767s sys 0m12.531s
So on a quite FOG server and network the speeds of
scp
is about 4 seconds slower than NFS and about 10 seconds slower than socat, but you have the benefit with scp of the data being encrypted.You can see the usage increase in the user space application (scp) over nfs and socat transfers.
-
@george1421 The scp timing is interesting as I am unsure about the time taken for accepting the key and entering the passphrase might also account for some of the time. Would you mind redoing this test using a SSH key?
-
@Sebastian-Roth no problem I’ll hit that first thing in the AM. But really a 10 second (total) difference is pretty much a rounding error. The other thing I need to see is if the difference is linear or that 4 seconds difference (scp vs nfs) is just channel setup times. I did find an interesting fact about dd and file creation. You need to have more ram in your system than the size of the file you want to create with dd. I tried to create a 10GB file on a computer with 4GB of ram and it failed. When I went to 16GB of ram I was able to create a 10GB file. I’ll probably cat 2 10GB files to make a 20GB file to see if the difference is linear with scp.
If you look in the posted output scp actually reported a transfer time of
01:32
which is in line with the speed I’m getting with socat. Now something that might throw a wrench in the works is if scp can’t take an input from STDIN. It would be a shame if scp can only use real files to send. socat can be pipelined. -
@george1421 said in Feature request for FOG 1.6.x - Replace NFSv3:
I did find an interesting fact about dd and file creation. You need to have more ram in your system than the size of the file you want to create with dd. I tried to create a 10GB file on a computer with 4GB of ram and it failed. When I went to 16GB of ram I was able to create a 10GB file. I’ll probably cat 2 10GB files to make a 20GB file to see if the difference is linear with scp.
I can’t imagine that is really the case. I am sure I have created temporary files using dd way bigger that the size of RAM in my machine. What error did you get?
If you look in the posted output scp actually reported a transfer time of 01:32 which is in line with the speed I’m getting with socat.
Sounds good.
Now something that might throw a wrench in the works is if scp can’t take an input from STDIN. It would be a shame if scp can only use real files to send. socat can be pipelined.
While
scp
might not be able to the SSH protocol itself and thereforessh
command is able to pipe pretty much anything through the tunnel that you want.time cat /mnt/t2/r10gb.img | ssh root@192.168.10.1 "cat > /images/r11gb.img"
Now that I think of it, we could even use it to tunnel other protocols. Can’t think of a good use of this just yet but as a dumb example we could even use NFSv4 unencrypted and pipe it through a SSH tunnel (start
ssh
with port forwarding local port 2049 to FOG server IP:2049 and then NFS mount towards 127.0.0.1). -
I am personally a fan of an SSH/SCP solution. It’s a very familiar protocol, secure and pretty straightforward. SSH ports are likely already configured in firewalls as well. Also has pretty good error handling.
Tools like socat are cool, but I think a lot of people are not very familiar with them and since you’d need SSH or the like to get it going anyway, it seems like an extra step without any clear benefit (unless I’m missing something).
The only nod towards socat I’d give is that it is likely more reliable in network transfers, but this comes at the cost of needing another port open in the firewall.
It also would be kind of ironic to move away from NFS because of insecure open ports only to then turn around and open an insecure port anyway lol.
-
@Quazz said in Feature request for FOG 1.6.x - Replace NFSv3:
Tools like socat are cool, but I think a lot of people are not very familiar with them and since you’d need SSH or the like to get it going anyway, it seems like an extra step without any clear benefit (unless I’m missing something).
In the initial testing performance between scp/socat/nfs is pretty much the same. Understand I was working with a 10Gb file of all zeros so I don’t know the impact of real data on the transfer speeds.
From FOS’ perspective I kind of put nfs and ssh in one camp and socat/netcat into another. With nfs and ssh the target computer can do a push/pull of random files under the direction of the FOG code. With socat there needs to be a coordinate with the FOG server and FOS Engine because socat is a throw/catch program. I think it would be easier to use ssh as it kind of parallels the action of NFS.
It also would be kind of ironic to move away from NFS because of insecure open ports only to then turn around and open an insecure port anyway lol.
One option is to move FOS/FOG to nfsv4 and that consolidates everything down to a single well known port. With nfsv4 we can also introduce authentication so the NFS share won’t be just open to the world for writing. NFSv4 won’t address data security in transit, but it will help protect data at rest.
The downside with using port 22 ssh is there may be some policies where a certain encryption structure must be used and changing the sshd in certain circumstances will break imaging. The thought would be to then spin up a new sshd server on a different port so the sshd configuration could be tightly managed by FOG.
I’m not saying there is a right answer yet only this is what I see and protocol alone either of the methods were withing a few seconds of each other with just pure data transfer.
-
@george1421 said in Feature request for FOG 1.6.x - Replace NFSv3:
The downside with using port 22 ssh is there may be some policies where a certain encryption structure must be used and changing the sshd in certain circumstances will break imaging. The thought would be to then spin up a new sshd server on a different port so the sshd configuration could be tightly managed by FOG.
Hmmm, there are pros and cons on both sides with using default SSH on port 22 and spinning up an extra one on another port. Whichever we decide there will be setups that can’t handle it this or the other way round. So I would suggest we try to make it default to port 22 but build scripts and all in such a way that it’s fairly easy for anyone to switch to a non-standard SSH port if needed. @george1421 @Quazz What do you think?
We’ll need to work out a proof of concept over the next weeks to see if it all works anyway.
-
Some reading to consider: https://www.linuxjournal.com/content/encrypting-nfsv4-stunnel-tls
Mentions SSHFS as well (even faster than clear text NFS in their tests??)
I can’t really decide, in the end. Each approach has its own set of downsides and upsides it looks like.
What is most important? Reliability (eg NFS restarting TCP transactions), Security (encrypting the data stream), Maintainability (KISS), Performance (NFS likely slower than SSH pipe)
Additionally, I wonder if we would see differences in performance when we compare transfer performance of a static file vs a data stream. Or perhaps this consideration is irrelevant since more than likely the bottleneck won’t be network transfer anyway, right?
-
@Sebastian-Roth said in Feature request for FOG 1.6.x - Replace NFSv3:
@george1421 said in Feature request for FOG 1.6.x - Replace NFSv3:
The downside with using port 22 ssh is there may be some policies where a certain encryption structure must be used and changing the sshd in certain circumstances will break imaging. The thought would be to then spin up a new sshd server on a different port so the sshd configuration could be tightly managed by FOG.
Hmmm, there are pros and cons on both sides with using default SSH on port 22 and spinning up an extra one on another port. Whichever we decide there will be setups that can’t handle it this or the other way round. So I would suggest we try to make it default to port 22 but build scripts and all in such a way that it’s fairly easy for anyone to switch to a non-standard SSH port if needed. @george1421 @Quazz What do you think?
We’ll need to work out a proof of concept over the next weeks to see if it all works anyway.
I agree with trying to stick to 22 where possible, but to make it configurable. I can imagine some environments have custom ports.
-
Updated benchmarks. FOG Server 1.5.9 w/kernel 4.19.145(guess) running on Dell o7010. Target computer Dell o7010 both server and target have ssd sata drives. All copy tests use a 10GB file.
Make 10GB file on target computer to FOG hard drive over NFS
# time dd if=/dev/zero of=r10-1gb.img count=1024 bs=104857601024+0 records in 1024+0 records out 10737418240 bytes (11 GB, 10 GiB) copied, 93.0698 s, 115 MB/s real 1m33.072s user 0m0.013s sys 0m4.699s
Copy file using scp to FOG server x3 includes entering root password on FOG server
# time scp /mnt/t2/r10gb.img root@192.168.10.1:/images/r11gb.img The authenticity of host '192.168.10.1 (192.168.10.1)' can't be established. ECDSA key fingerprint is SHA256:OpIsFYWVDCr/ovMlmPPSl46jpT332P3+BHnchdxzTCI. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '192.168.10.1' (ECDSA) to the list of known hosts. root@192.168.10.1's password: r10gb.img 100% 10GB 111.1MB/s 01:32 real 1m43.380s user 0m44.117s sys 0m12.580s # time scp /mnt/t2/r10gb.img root@192.168.10.1:/images/r11gb.img root@192.168.10.1's password: r10gb.img 100% 10GB 111.1MB/s 01:32 real 1m35.493s user 0m44.476s sys 0m12.223s # time scp /mnt/t2/r10gb.img root@192.168.10.1:/images/r11gb.img root@192.168.10.1's password: r10gb.img 100% 10GB 111.1MB/s 01:32 real 1m35.447s user 0m44.404s sys 0m11.946s
Timing using piping over ssh instead of scp
# time cat /mnt/t2/r10gb.img | ssh root@192.168.10.1 "cat > /images/r12gb.img" root@192.168.10.1's password: real 1m36.133s user 0m43.906s sys 0m11.090s # time cat /mnt/t2/r10gb.img | ssh root@192.168.10.1 "cat > /images/r12gb.img" root@192.168.10.1's password: real 1m36.794s user 0m43.751s sys 0m12.099s
While the cpu load is heavier on both the target computer and the FOG server using ssh the actual copy times almost identical between nfs, scp, and ssh. Just the CPU load increased when sshd was involved.
-
@george1421 Would it be better to use SCP or RSYNC?
Can you run an example using RSYNC to establish the “SSH” connection and transfer to see what the FOG Server and Client load looks like?
I think you’ll see the same types of speeds. I think part of the issue with the cat pipe cat “load” is due mostly to the 2 processes being opened plus the addition of the SSH establishment.
If we are just looking to test ssh, scp is the best tool for the job, though rsync will probably give us more configuration options.
-
Interesting, I repeated the same test with the 5.6.18 kernel and got faster transfer times.
Kernel 5.6.18
Straight file copy over NFS# time cp r10gb.img /mnt/t2/ real 0m46.336s user 0m0.052s sys 0m7.169s # time cp r10gb.img /mnt/t2/ real 0m48.108s user 0m0.045s sys 0m8.881s
Now scp
# time scp /mnt/t2/r10gb.img root@192.168.10.1:/images/r11gb.img root@192.168.10.1's password: r10gb.img 100% 6875MB 111.1MB/s 01:01 real 1m5.796s user 0m29.704s sys 0m6.750s
Now piped over ssh
# time cat /mnt/t2/r10gb.img | ssh root@192.168.10.1 "cat > /images/r12gb.img" root@192.168.10.1's password: real 1m5.241s user 0m29.134s sys 0m6.849s # I had to repeat it a second time just to confirm it was actually 30 #seconds improvement # # time cat /mnt/t2/r10gb.img | ssh root@192.168.10.1 "cat > /images/r12gb.img" root@192.168.10.1's password: real 1m6.662s user 0m29.833s sys 0m6.966s
So for a straight nfs copy kerne 5.6.18 is about 45 seconds faster copying the file. For the ssh route it was about 30 seconds faster with 5.6.18 over 4.19.145
-
@Tom-Elliott said in Feature request for FOG 1.6.x - Replace NFSv3:
Would it be better to use SCP or RSYNC?
I don’t know the answer at the moment but I can/will surely test it. I have some screen shots of CPU loading while doing these transfers with 5.6.18 kernel. I setup rsyncd on one of my servers and I’m using it to evacuate a second physical server of data. It seems pretty fast moving 3.5GB image files. Just for disclosure this is on a 10GbE network
3,515,218,762,752 20% 176.05MB/s 5:17:22 (xfr#70, to-chk=213/284)
If ssh/encryption route is decided I want to look into the kernel to ensure it has all of the crypto APIs enabled and if enabled do they have an impact on transport times.
-
@george1421 said in Feature request for FOG 1.6.x - Replace NFSv3:
root@192.168.10.1's password: r10gb.img 100% 6875MB 111.1MB/s 01:01 real 1m5.796s
I assume something went wrong with the test file here. You seem to get faster copy because the file is smaller - 6875 MB vs. 10 GB in the last tests. Transfer rate in scp was and is around 111 MB/s!
-
@Tom-Elliott said in Feature request for FOG 1.6.x - Replace NFSv3:
Would it be better to use SCP or RSYNC?
In essence we need something that is able to pipe contents of a single file to partclone for writing to disk or the other way round. I don’t see how rsync (used for many files) or scp would help us to do this. While you can actually scp into/from stdin/out I can’t see this being much of a gain compared to using sshfs where we mount the remote filesystem directly.
-
This post is deleted! -
@george1421 said in Feature request for FOG 1.6.x - Replace NFSv3:
Their testing shows that ubuntu 20.04 moves data the fastes, then 18.04, Centos 8 and finally Cento 7 is the slowest.
What protocol are they using? Some proprietary stuff I’d imagine. That would break it down to subsystem IO being faster on newer kernals and Ubuntu leveraging some kind of optimized IO?!
FOG Server ssh pipeline
That picture shows both a
scp
andssh
command. So either one is spawned from the other (kind of likely when I look at the many command line options and PIDs) or you can two commands in parallel. The headline “ssh pipeline” doesn’t fit I would think.