Can you make FOG imaging go fast?

george1421

After reading through this thread: https://forums.fogproject.org/topic/10456/performance-issues-with-big-images I started wondering if there were performance tuning that could be done to the FOG environment to allow faster image deployment and captures. Maybe the linux distros system defaults ARE the most appropriate for FOG, maybe they are not.

This thread started me thinking:

In the referenced thread the OP’s image size was 160GB single disk raw images. This is a huge single 160GB blob file. Could we see better deployment times by breaking that 160GB file into multiple 2GB files or with the 160GB blob file/ Remember we need to decompress this file as its deployed. Would there be any performance gains by having multiple 2GB files, where a 2GB file would typically fit in RAM and the 160GB file not?
Would NFS be happier with a smaller file? Is there any performance tuning we can do on the NFS side. In a typical FOG configuration 85% of the data is read from the FOG server with 15% of the data is written to the FOG server. Are there any nfs tuning parameters we can do with this type of split between reading and writing?
Is there anything we can do from a disk subsystem standpoint to allow the NFS server to be able to read faster from disk? What type if disk configuration is better? Is disk caching (ram) an option? What about read ahead cache? What is the impact of a FOG server with a single sata disk verses a raid configuration? Does SSD drives make a solid investment for the FOG server?
Is there anything we can do from the networking side to better performance? Will more than one network adapter help and under what situations (<–hint: I worked for many years as a network engineer, I already know the answer to this one). Would increasing the the MTU size from 1500 to 9000 really make an impact on deployment times?

My idea is to create a test setup in the LAB and see if I can improve on the stock linux distribution in each of the 4 areas. I might fight the magic bullet to make FOG go faster or I might find that the linux distributions default setting are correct and tweaking this or that adds no real value. I can say from my production FOG server running 2 vCPUs on a 24 core vSphere server, I can achieve about 6.2GB/min transfer rates (yes I know this number is a bit misleading since it also include decompression times, but its a relative number that we all can see) for a single unicast image. I know others are able to get 10GB/min transfer rates with their setup. My plan is to use 4 older Dell 790s for this test setup (1 as FOG server and 3 as target computers). I want to remove any virtualization layers for this testing, so I will be installing Centos 7 on physical hardware.

My intent is to document the process I find here.
{Day 2}
As you think about the FOG imaging process there are 3 performance domains involved here.

Server
Network
Client computer

All three have a direct impact on the ability to image fast. For this thread I want to focus on the first two (Server and network) because those we should have the most control over.

Within the Server performance domain there are several sub classes that have an impact on imaging fast.

Disk subsystem
NFS subsystem
RAM memory
Network (to the boundary of the ethernet jack)

For fog imaging to achieve its top transfer rates each sub component must be configured to move data at its fastest rate.
For the first three sub components (disk, ram and nfs) I can see two different types of workloads we need to consider.

Single unicast or multicast stream
Multiple simultaneous unicast or multicast streams

The single unicast / multicast stream can take advantage of linear disk reads and onboard read ahead disk caching.

The multiple data streams are a bit more complex because of the randomness of the data requests for information on the disk.

Both workloads need to be taken into consideration.

{Day 3}
Well after burning [wasting] a whole day of trying to get a PoC RocketRaid 640 to work with Centos 7 by attempting to recompile the driver for the linux 3.10 kernel. I’ve given up trying to bench mark the differences between a single disk and a 4 disk raid 0 array for now. I may circle back to this if I can dig up another “working” raid controller that fits into my test setup.

{Day 4}
Well Day 4 was a bit more productive. While this isn’t a true representation of an actual workload but I set up several tests to baseline a few different server disk configurations. First lets cover the materials used in the testing.
For the “FOG Server” I’m using a Dell Optiplex 790 with 8GB of ram. This is a mini desktop version so I can add full size expansion cards (like that PoC RocketRaid card). I also plan on testing the LOM network adapter as well as an intel dual port server adapter in a future test. So the desktop case is required. See the disk testing results here.

{Day 5}
Testing disk performances between hdd and ssd drives was not surprising. A single ssd is about 6 times faster than a hdd running on a the same hardware. Because of that PoC RocketRaid being a bust, I decided to use linux’s built in software raid to support the raid configuration part of my testing. I was really surprised on how fast the linux software raid really was with the hdd topping out at 380MB/s with the ssd maxing out with 718MB/s (only about twice as fast). This is only speculation, but I probably could get a bit more performance out of the hdd software raid array by adding a few more disks to the array. As for the ssd drives, I feel they are about maxing out the 3Gb/s sata channel on the 790. I wouldn’t expect to see much better performance out of the ssd array by adding 1 or 2 more ssd drives to the array because of this. One might consider why is disk speed important, especially because a single GbE network adapter can only move 125MB/s (theoretical max)? Remember we have 2 workloads we need to consider both a single unicast stream (linear read) and multiple unicast streams (random disk reads). The faster disks subsystem will allow faster data retrieval during the multiple unicast deployment. As we get into the networking part of the test we will see which is a better value, or has the greatest impact on speed for the money [ssd vs hdd]. I have a feeling we will find that our disk subsystem isn’t our choke point in our fog deployment server. I’m going to speculate having a full SSD array may not be of much value.

{Day 6 to 8}
Other activities kept my attention

{Day 9}
Network performance testing. In this section I tried to find a suitable tool to measure total capable bandwidth. I settled on iperf3. I compiled iperf3 from source code and with static linking to the libraries. This allowed me to copy the compiled version to both the test fog server and pxe target computers without needing to worry about library dependencies. On the test fog server I set up the receiver and then tested each pxe target computer one by one to ensure all had comparable bandwidth reading before testing in groups. My test setup is still for the FOG server a Dell 790 mini tower and then for the pxe target computers Dell 790 SFF computers. The networking switch is an older Linksis/Cisco SRW2008 8 port switch. Just as a reminder I’m picking older hardware to get realistic testing results. I’m sure I can get really impressive results with new hardware, but I want real numbers. The fog server disk subsystem is using the 3 constellation hdd in a linux software raid-0 configuration.

More to come

george1421

Post place holder

george1421

Post place holder

george1421

Part 4 NFS subsystem testing

This part builds on the baseline network settings from part 3. In this test I ran the same command used for local hard drive testing on the pxe target computer to the nfs share on the fog server (/images/dev).

[Wed Jul 26 root@fogclient /images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.68355 s, 111 MB/s
[Wed Jul 26 root@fogclient /images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.67013 s, 111 MB/s

The results of a single nfs sequential file write is 111 MB/s (6.66GB/m)

I also performed the same commands for disk read over NFS

[Wed Jul 26 root@fogclient /images]# echo 3 | tee /proc/sys/vm/drop_caches
[Wed Jul 26 root@fogclient /images]# time dd if=/images/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.69505 s, 111 MB/s (6.66GB/m)
real    0m9.697s
user    0m0.025s
sys     0m0.352s

Again we had about 111MB/s image transfer rates.

This test I started 2 of the pxe target computers creating this sequential file on the nfs share. Here is the results from each pxe target computers.

#1 host
[Wed Jul 26 root@fogclient /images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 17.7051 s, 60.6 MB/s
#2 host
[Wed Jul 26 root@fogclient /images]# dd if=/dev/zero of=/images/test2.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 17.0664 s, 62.9 MB/s

As you can see the overall speed dropped to about 61 MB/s or (3.66 GB/m). So that is pretty linear.

Then I tried 3 pxe target computers creating the sequential image at the same time.

host #1
[Wed Jul 26 root@fogclient /images]# dd if=/dev/zero of=/images/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 26.362 s, 40.7 MB/s

host #2
[Wed Jul 26 root@fogclient /images]# dd if=/dev/zero of=/images/test2.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 27.1975 s, 39.5 MB/s

host #3
[Mon Jul 24 root@fogclient /images]#  dd if=/dev/zero of=/images/test3.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 26.0602 s, 41.2 MB/s

Again the overall speed dropped to 40MB/s (2.4GB/m), which is still pretty linear.

george1421

Part 3 Network subsystem testing

IPerf test between single target computer and FOG server

[Wed Jul 26 root@fogclient /images]# ./iperf3 -c 192.168.1.205 -p 5201
Connecting to host 192.168.1.205, port 5201
[  5] local 192.168.1.207 port 43302 connected to 192.168.1.205 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   935 Mbits/sec    0    362 KBytes
[  5]   1.00-2.00   sec   112 MBytes   936 Mbits/sec    0    362 KBytes
[  5]   2.00-3.00   sec   111 MBytes   935 Mbits/sec    0    362 KBytes
[  5]   3.00-4.00   sec   111 MBytes   933 Mbits/sec    0    362 KBytes
[  5]   4.00-5.00   sec   111 MBytes   935 Mbits/sec    0    362 KBytes
[  5]   5.00-6.00   sec   112 MBytes   936 Mbits/sec    0    362 KBytes
[  5]   6.00-7.00   sec   111 MBytes   933 Mbits/sec    0    362 KBytes
[  5]   7.00-8.00   sec   112 MBytes   937 Mbits/sec    0    362 KBytes
[  5]   8.00-9.00   sec   111 MBytes   934 Mbits/sec    0    362 KBytes
[  5]   9.00-10.00  sec   111 MBytes   934 Mbits/sec    0    362 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   935 Mbits/sec    0             sender
[  5]   0.00-10.02  sec  1.09 GBytes   932 Mbits/sec                  receiver

iperf Done

IPerf traffic test between 2 simultaneous pxe target computers and the fog server

[Wed Jul 26 root@fogclient /images]# ./iperf3 -c 192.168.1.205 -p 5202 -i 1 -t 30
Connecting to host 192.168.1.205, port 5202
[  5] local 192.168.1.210 port 56804 connected to 192.168.1.205 port 5202
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   938 Mbits/sec   45    181 KBytes
[  5]   1.00-2.00   sec  80.3 MBytes   673 Mbits/sec  234   48.1 KBytes
[  5]   2.00-3.00   sec  54.3 MBytes   456 Mbits/sec  304   18.4 KBytes
[  5]   3.00-4.00   sec  55.9 MBytes   469 Mbits/sec  313   26.9 KBytes
[  5]   4.00-5.00   sec  56.1 MBytes   470 Mbits/sec  332   33.9 KBytes
[  5]   5.00-6.00   sec  60.2 MBytes   505 Mbits/sec  268   43.8 KBytes
[  5]   6.00-7.00   sec  70.5 MBytes   591 Mbits/sec  284   46.7 KBytes
[  5]   7.00-8.00   sec  63.7 MBytes   534 Mbits/sec  232   48.1 KBytes
[  5]   8.00-9.00   sec  49.5 MBytes   415 Mbits/sec  274   50.9 KBytes
[  5]   9.00-10.00  sec  63.4 MBytes   532 Mbits/sec  269   43.8 KBytes
[  5]  10.00-11.00  sec  69.2 MBytes   580 Mbits/sec  246    253 KBytes
[  5]  11.00-12.00  sec   111 MBytes   932 Mbits/sec    0    355 KBytes
[  5]  12.00-13.00  sec   111 MBytes   935 Mbits/sec    0    356 KBytes
[  5]  13.00-14.00  sec   111 MBytes   931 Mbits/sec    0    358 KBytes
[  5]  14.00-15.00  sec   111 MBytes   935 Mbits/sec    0    358 KBytes
[  5]  15.00-16.00  sec   112 MBytes   936 Mbits/sec    0    358 KBytes
[  5]  16.00-17.00  sec   111 MBytes   933 Mbits/sec    0    358 KBytes
^C[  5]  17.00-17.11  sec  12.6 MBytes   932 Mbits/sec    0    358 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-17.11  sec  1.38 GBytes   694 Mbits/sec  2801             sender
[  5]   0.00-17.11  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

Notable output here is at 11 seconds notice that the retrans drops to 0 that is when the first of the pair of target computers completed its run

IPerf with 3 target computers

Connecting to host 192.168.1.205, port 5202
[  5] local 192.168.1.210 port 56816 connected to 192.168.1.205 port 5202
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   937 Mbits/sec    0    356 KBytes
[  5]   1.00-2.00   sec   111 MBytes   934 Mbits/sec    0    356 KBytes
[  5]   2.00-3.00   sec   111 MBytes   935 Mbits/sec    0    356 KBytes
[  5]   3.00-4.00   sec   111 MBytes   933 Mbits/sec    0    356 KBytes
[  5]   4.00-5.00   sec   111 MBytes   935 Mbits/sec    0    372 KBytes
[  5]   5.00-6.00   sec   110 MBytes   925 Mbits/sec   62   70.7 KBytes
[  5]   6.00-7.00   sec  51.3 MBytes   431 Mbits/sec  404   17.0 KBytes
[  5]   7.00-8.00   sec  52.0 MBytes   436 Mbits/sec  261   28.3 KBytes
[  5]   8.00-9.00   sec  56.1 MBytes   471 Mbits/sec  282   9.90 KBytes
[  5]   9.00-10.00  sec  52.1 MBytes   437 Mbits/sec  301   21.2 KBytes
[  5]  10.00-11.00  sec  71.8 MBytes   603 Mbits/sec  176    197 KBytes
[  5]  11.00-12.00  sec  55.2 MBytes   463 Mbits/sec  271   29.7 KBytes
[  5]  12.00-13.00  sec  47.9 MBytes   402 Mbits/sec  270   53.7 KBytes
[  5]  13.00-14.00  sec  34.1 MBytes   286 Mbits/sec  264   5.66 KBytes
[  5]  14.00-15.00  sec  39.1 MBytes   328 Mbits/sec  240   53.7 KBytes
[  5]  15.00-16.00  sec  52.3 MBytes   439 Mbits/sec  229   49.5 KBytes
[  5]  16.00-17.00  sec  60.6 MBytes   508 Mbits/sec  225    106 KBytes
[  5]  17.00-18.00  sec  54.1 MBytes   454 Mbits/sec  336   26.9 KBytes
[  5]  18.00-19.00  sec  50.9 MBytes   427 Mbits/sec  259   56.6 KBytes
[  5]  19.00-20.00  sec  74.1 MBytes   622 Mbits/sec  209    198 KBytes
[  5]  20.00-21.00  sec  75.1 MBytes   630 Mbits/sec  276   46.7 KBytes
[  5]  21.00-22.00  sec  44.4 MBytes   372 Mbits/sec  282   29.7 KBytes
[  5]  22.00-23.00  sec   103 MBytes   861 Mbits/sec   13    354 KBytes
[  5]  23.00-24.00  sec   111 MBytes   934 Mbits/sec    0    358 KBytes
[  5]  24.00-25.00  sec   111 MBytes   934 Mbits/sec    0    358 KBytes
[  5]  25.00-26.00  sec   111 MBytes   934 Mbits/sec    0    359 KBytes
[  5]  26.00-27.00  sec   111 MBytes   934 Mbits/sec    0    359 KBytes
[  5]  27.00-28.00  sec   111 MBytes   934 Mbits/sec    0    359 KBytes
[  5]  28.00-29.00  sec   111 MBytes   934 Mbits/sec    0    359 KBytes
[  5]  29.00-30.00  sec   111 MBytes   934 Mbits/sec    0    359 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  2.36 GBytes   677 Mbits/sec  4360             sender
[  5]   0.00-30.02  sec  2.36 GBytes   676 Mbits/sec                  receiver

Notable infor here is that I started the first target computer sending waited 5 seconds and started the second and then about 5 seconds and started the third. You can almost see in the MB/s transfer rates when these target computers stopped and started.

So what did this tell us? Don’t try to run 3 simultaneous all out image transfers or you will saturate that single nic to the server. The above tests were done with the LOM network adapter.

george1421

Part 2 Disk subsystem testing

To start this off I wanted to do a simple baseline comparison between installing FOG on a single sata disk using the onboard sata controller, a single sata hdd disk (same) on a raid controller as a JBOD disk, then setup a 4 disk raid 0 on the raid controller. The next steps are to replace the sata hdd with sata sdd drives and repeat the steps as with the hdd.

The the simple disk baseline I’m using the following linux command to create a sequential 1GB file on disk and then to read it back. This process is designed to simulate the single unicast workload. The command used to write the 1GB file is this:
dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
The command to read it back is:
echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/tmp/test1.img of=/dev/null bs=8k
The echo command is intended to disable the read cache so we get a true read back value.

The disks I used are as follows

(3) Dell Constellation ES 1TB server hard drives [hdd] (what I had in my magic box of extra bits).
(3) Crucial MX300 275GB SDD

I used 3 because that is what I had of the ssd drives in my no so magic box from amazon.

Test Process:

Install the test drives into the 790 and installed Centos 7 1611
No updates were applied, the install image was straight off usb.
Log in as root to the linux command prompt
Run the sequential write command 3 times (avg results)
5, Run the sequential read command 3 times (avg results)
Shutdown and prep for next test.

Test 1: Single Constellation (hdd) attached to on board sata

[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 13.9599 s, 76.9 MB/s
[root@localhost ~]#
[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 13.9033 s, 77.2 MB/s
[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 13.7618 s, 78.0 MB/s

[root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 13.6594 s, 78.6 MB/s

[root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 13.5738 s, 79.1 MB/s

real    0m13.577s
user    0m0.040s
sys     0m0.888s

Average speed write 77MB/s (4.7 GB/m) read 78MB/s

Test 2: Single MX300 (ssd) attached to on board sata

[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.24173 s, 479 MB/s
[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.24117 s, 479 MB/s

[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.24441 s, 478 MB/s

[root@localhost ~]# echo 3 | tee /proc/sys/vm/drop_caches
[root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 2.10576 s, 510 MB/s
real    0m2.109s
user    0m0.018s
sys     0m0.664s

Average speed write 478MB/s and read 510MB/s

Test 3: 3 Constellations (hdd) in software raid-0 configuration to on board sata

[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.90412 s, 370 MB/s
[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.78557 s, 385 MB/s
[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.75433 s, 390 MB/s
[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.802 s, 383 MB/s

[root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 2.75442 s, 390 MB/s

real    0m2.967s
user    0m0.016s
sys     0m0.461s

Average speed write 380MB/s and read 390MB/s
* since this was a software raid, I feel the runs after the very first one may be tainted because of some buffering in the software raid driver in linux

Test 4: 3 MX300 (ssd) in software raid-0 configuration to on board sata

[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.4921 s, 720 MB/s
[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.50214 s, 715 MB/s

[root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 1.49913 s, 716 MB/s

[root@localhost ~]# echo 3 | tee /proc/sys/vm/drop_caches
[root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 1.33486 s, 804 MB/s
real    0m1.343s
user    0m0.016s
sys     0m0.385s

[root@localhost ~]# echo 3 | tee /proc/sys/vm/drop_caches
[root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 1.31937 s, 814 MB/s
real    0m1.323s
user    0m0.013s
sys     0m0.322s

Average speed write 718MB/s and read 800MB/s
* since this was a software raid, I feel the runs after the very first one may be tainted because of some buffering in the software raid driver in linux

Test 5: Dell PE2950 6i Raid with 6 x WD RE drives (hdd) in Raid-10 configuration. (just a comparison test)

[root@localhost /]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.96148 s, 363 MB/s

[root@localhost /]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 2.86738 s, 374 MB/s

[root@localhost /]# echo 3 | tee /proc/sys/vm/drop_caches
[root@localhost /]# time dd if=/tmp/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB) copied, 3.199 s, 336 MB/s
real    0m3.367s
user    0m0.024s
sys     0m0.861s

Average speed write 368MB/s and read 336MB/s
* performance values may be tainted by current workload on the server. The intent of this test was to identify a ball park number with production server vs Dell 790 desktop

Wayne Workman

@george1421 said in Can you make FOG imaging go fast?:

I can say from my production FOG server running 2 vCPUs on a 24 core vSphere server, I can achieve about 6.2GB/min transfer rates (yes I know this number is a bit misleading since it also include decompression times, but its a relative number that we all can see) for a single unicast image.

That figure is not network transfer speed or compression/decompression speed nor is it an aggrigate, it is simply write speed to the host’s disk.

It doesn’t represent or reflect network transfer speed or decompression speeds. These things are very loosely related to the write speed just as the disk you’re using is related to the write speed - but this figure does not tell where any bottleneck is.

Trying to use this figure to gauge network transfer speed would be like trying to gauge the mail man’s speed based on how long it takes me to go check my mailbox (if the post office used that as their metric, the mailman would be fired because I check my mail every few days).

Further, your bottleneck is probably not the next person’s bottleneck. My experience with multiple FOG servers on multiple types of hardware has shown that tuning FOG is a matter of balancing network throughput with a host’s ability to decompress. We cannot speed up how fast a host’s disk can write, it’s maximum write speed is still it’s maximum write speed no matter what we do with CPU or Network or Compression or RAM - the idea is simply to always have data waiting to be written to disk without delay, and how to balance the CPU’s ability to decompress with the network’s ability to transmit to many clients at once, and the FOG server’s ability to serve many clients at once. This all comes back to two simple things I think: Max Clients and compression rate.

It’s a balancing act of these two things. Of course, ZSTD is the most superior compression algorithm, which is why it’s not one of the two simple things. But it’s compression rate is.

The FOG Server’s disk does play a role - but at my last job, I was clearly hitting the network’s maximum throughput bottleneck - so a solid state disk would not have helped.

At any rate, the script below is an example of how to automate the monitoring & collecting of things from FOS: https://github.com/FOGProject/fog-community-scripts/blob/master/fogAutomatedTesting/postinit.sh
That’s what I’d use to collect any custom metrics you want to monitor more quickly, instead of doing a debug every time and manually monitoring.

Can you make FOG imaging go fast?

188

12.2k

17.3k

155.5k