Can you make FOG imaging go fast?


  • Moderator

    After reading through this thread: https://forums.fogproject.org/topic/10456/performance-issues-with-big-images I started wondering if there were performance tuning that could be done to the FOG environment to allow faster image deployment and captures. Maybe the linux distros system defaults ARE the most appropriate for FOG, maybe they are not.

    This thread started me thinking:

    1. In the referenced thread the OP’s image size was 160GB single disk raw images. This is a huge single 160GB blob file. Could we see better deployment times by breaking that 160GB file into multiple 2GB files or with the 160GB blob file/ Remember we need to decompress this file as its deployed. Would there be any performance gains by having multiple 2GB files, where a 2GB file would typically fit in RAM and the 160GB file not?

    2. Would NFS be happier with a smaller file? Is there any performance tuning we can do on the NFS side. In a typical FOG configuration 85% of the data is read from the FOG server with 15% of the data is written to the FOG server. Are there any nfs tuning parameters we can do with this type of split between reading and writing?

    3. Is there anything we can do from a disk subsystem standpoint to allow the NFS server to be able to read faster from disk? What type if disk configuration is better? Is disk caching (ram) an option? What about read ahead cache? What is the impact of a FOG server with a single sata disk verses a raid configuration? Does SSD drives make a solid investment for the FOG server?

    4. Is there anything we can do from the networking side to better performance? Will more than one network adapter help and under what situations (<–hint: I worked for many years as a network engineer, I already know the answer to this one). Would increasing the the MTU size from 1500 to 9000 really make an impact on deployment times?

    My idea is to create a test setup in the LAB and see if I can improve on the stock linux distribution in each of the 4 areas. I might fight the magic bullet to make FOG go faster or I might find that the linux distributions default setting are correct and tweaking this or that adds no real value. I can say from my production FOG server running 2 vCPUs on a 24 core vSphere server, I can achieve about 6.2GB/min transfer rates (yes I know this number is a bit misleading since it also include decompression times, but its a relative number that we all can see) for a single unicast image. I know others are able to get 10GB/min transfer rates with their setup. My plan is to use 4 older Dell 790s for this test setup (1 as FOG server and 3 as target computers). I want to remove any virtualization layers for this testing, so I will be installing Centos 7 on physical hardware.

    My intent is to document the process I find here.
    {Day 2}
    As you think about the FOG imaging process there are 3 performance domains involved here.

    1. Server
    2. Network
    3. Client computer

    All three have a direct impact on the ability to image fast. For this thread I want to focus on the first two (Server and network) because those we should have the most control over.

    Within the Server performance domain there are several sub classes that have an impact on imaging fast.

    1. Disk subsystem
    2. NFS subsystem
    3. RAM memory
    4. Network (to the boundary of the ethernet jack)

    For fog imaging to achieve its top transfer rates each sub component must be configured to move data at its fastest rate.
    For the first three sub components (disk, ram and nfs) I can see two different types of workloads we need to consider.

    1. Single unicast or multicast stream
    2. Multiple simultaneous unicast or multicast streams

    The single unicast / multicast stream can take advantage of linear disk reads and onboard read ahead disk caching.

    The multiple data streams are a bit more complex because of the randomness of the data requests for information on the disk.

    Both workloads need to be taken into consideration.

    {Day 3}
    Well after burning [wasting] a whole day of trying to get a PoC RocketRaid 640 to work with Centos 7 by attempting to recompile the driver for the linux 3.10 kernel. I’ve given up trying to bench mark the differences between a single disk and a 4 disk raid 0 array for now. I may circle back to this if I can dig up another “working” raid controller that fits into my test setup.

    {Day 4}
    Well Day 4 was a bit more productive. While this isn’t a true representation of an actual workload but I set up several tests to baseline a few different server disk configurations. First lets cover the materials used in the testing.
    For the “FOG Server” I’m using a Dell Optiplex 790 with 8GB of ram. This is a mini desktop version so I can add full size expansion cards (like that PoC RocketRaid card). I also plan on testing the LOM network adapter as well as an intel dual port server adapter in a future test. So the desktop case is required. See the disk testing results here.

    {Day 5}
    Testing disk performances between hdd and ssd drives was not surprising. A single ssd is about 6 times faster than a hdd running on a the same hardware. Because of that PoC RocketRaid being a bust, I decided to use linux’s built in software raid to support the raid configuration part of my testing. I was really surprised on how fast the linux software raid really was with the hdd topping out at 380MB/s with the ssd maxing out with 718MB/s (only about twice as fast). This is only speculation, but I probably could get a bit more performance out of the hdd software raid array by adding a few more disks to the array. As for the ssd drives, I feel they are about maxing out the 3Gb/s sata channel on the 790. I wouldn’t expect to see much better performance out of the ssd array by adding 1 or 2 more ssd drives to the array because of this. One might consider why is disk speed important, especially because a single GbE network adapter can only move 125MB/s (theoretical max)? Remember we have 2 workloads we need to consider both a single unicast stream (linear read) and multiple unicast streams (random disk reads). The faster disks subsystem will allow faster data retrieval during the multiple unicast deployment. As we get into the networking part of the test we will see which is a better value, or has the greatest impact on speed for the money [ssd vs hdd]. I have a feeling we will find that our disk subsystem isn’t our choke point in our fog deployment server. I’m going to speculate having a full SSD array may not be of much value.

    {Date 6}
    More to come


  • Moderator

    @george1421 said in Can you make FOG imaging go fast?:

    I can say from my production FOG server running 2 vCPUs on a 24 core vSphere server, I can achieve about 6.2GB/min transfer rates (yes I know this number is a bit misleading since it also include decompression times, but its a relative number that we all can see) for a single unicast image.

    That figure is not network transfer speed or compression/decompression speed nor is it an aggrigate, it is simply write speed to the host’s disk.

    It doesn’t represent or reflect network transfer speed or decompression speeds. These things are very loosely related to the write speed just as the disk you’re using is related to the write speed - but this figure does not tell where any bottleneck is.

    Trying to use this figure to gauge network transfer speed would be like trying to gauge the mail man’s speed based on how long it takes me to go check my mailbox (if the post office used that as their metric, the mailman would be fired because I check my mail every few days).

    Further, your bottleneck is probably not the next person’s bottleneck. My experience with multiple FOG servers on multiple types of hardware has shown that tuning FOG is a matter of balancing network throughput with a host’s ability to decompress. We cannot speed up how fast a host’s disk can write, it’s maximum write speed is still it’s maximum write speed no matter what we do with CPU or Network or Compression or RAM - the idea is simply to always have data waiting to be written to disk without delay, and how to balance the CPU’s ability to decompress with the network’s ability to transmit to many clients at once, and the FOG server’s ability to serve many clients at once. This all comes back to two simple things I think: Max Clients and compression rate.

    It’s a balancing act of these two things. Of course, ZSTD is the most superior compression algorithm, which is why it’s not one of the two simple things. But it’s compression rate is.

    The FOG Server’s disk does play a role - but at my last job, I was clearly hitting the network’s maximum throughput bottleneck - so a solid state disk would not have helped.

    At any rate, the script below is an example of how to automate the monitoring & collecting of things from FOS: https://github.com/FOGProject/fog-community-scripts/blob/master/fogAutomatedTesting/postinit.sh
    That’s what I’d use to collect any custom metrics you want to monitor more quickly, instead of doing a debug every time and manually monitoring.


  • Moderator

    Part 2 Disk subsystem testing

    To start this off I wanted to do a simple baseline comparison between installing FOG on a single sata disk using the onboard sata controller, a single sata hdd disk (same) on a raid controller as a JBOD disk, then setup a 4 disk raid 0 on the raid controller. The next steps are to replace the sata hdd with sata sdd drives and repeat the steps as with the hdd.

    The the simple disk baseline I’m using the following linux command to create a sequential 1GB file on disk and then to read it back. This process is designed to simulate the single unicast workload. The command used to write the 1GB file is this:
    dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    The command to read it back is:
    echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/tmp/test1.img of=/dev/null bs=8k
    The echo command is intended to disable the read cache so we get a true read back value.

    The disks I used are as follows

    1. (3) Dell Constellation ES 1TB server hard drives [hdd] (what I had in my magic box of extra bits).
    2. (3) Crucial MX300 275GB SDD

    I used 3 because that is what I had of the ssd drives in my no so magic box from amazon.

    Test Process:

    1. Install the test drives into the 790 and installed Centos 7 1611
    2. No updates were applied, the install image was straight off usb.
    3. Log in as root to the linux command prompt
    4. Run the sequential write command 3 times (avg results)
      5, Run the sequential read command 3 times (avg results)
    5. Shutdown and prep for next test.

    Test 1: Single Constellation (hdd) attached to on board sata

    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 13.9599 s, 76.9 MB/s
    [root@localhost ~]#
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 13.9033 s, 77.2 MB/s
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 13.7618 s, 78.0 MB/s
    
    [root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
    131072+0 records in
    131072+0 records out
    1073741824 bytes (1.1 GB) copied, 13.6594 s, 78.6 MB/s
    
    [root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
    131072+0 records in
    131072+0 records out
    1073741824 bytes (1.1 GB) copied, 13.5738 s, 79.1 MB/s
    
    real    0m13.577s
    user    0m0.040s
    sys     0m0.888s
    

    Average speed write 77MB/s (4.7 GB/m) read 78MB/s

    Test 2: Single MX300 (ssd) attached to on board sata

    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.24173 s, 479 MB/s
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.24117 s, 479 MB/s
    
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.24441 s, 478 MB/s
    
    [root@localhost ~]# echo 3 | tee /proc/sys/vm/drop_caches
    [root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
    131072+0 records in
    131072+0 records out
    1073741824 bytes (1.1 GB) copied, 2.10576 s, 510 MB/s
    real    0m2.109s
    user    0m0.018s
    sys     0m0.664s
    

    Average speed write 478MB/s and read 510MB/s

    Test 3: 3 Constellations (hdd) in software raid-0 configuration to on board sata

    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.90412 s, 370 MB/s
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.78557 s, 385 MB/s
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.75433 s, 390 MB/s
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.802 s, 383 MB/s
    
    [root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
    131072+0 records in
    131072+0 records out
    1073741824 bytes (1.1 GB) copied, 2.75442 s, 390 MB/s
    
    real    0m2.967s
    user    0m0.016s
    sys     0m0.461s
    

    Average speed write 380MB/s and read 390MB/s
    * since this was a software raid, I feel the runs after the very first one may be tainted because of some buffering in the software raid driver in linux

    Test 4: 3 MX300 (ssd) in software raid-0 configuration to on board sata

    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 1.4921 s, 720 MB/s
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 1.50214 s, 715 MB/s
    
    [root@localhost ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 1.49913 s, 716 MB/s
    
    [root@localhost ~]# echo 3 | tee /proc/sys/vm/drop_caches
    [root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
    131072+0 records in
    131072+0 records out
    1073741824 bytes (1.1 GB) copied, 1.33486 s, 804 MB/s
    real    0m1.343s
    user    0m0.016s
    sys     0m0.385s
    
    [root@localhost ~]# echo 3 | tee /proc/sys/vm/drop_caches
    [root@localhost ~]# time dd if=/tmp/test1.img of=/dev/null bs=8k
    131072+0 records in
    131072+0 records out
    1073741824 bytes (1.1 GB) copied, 1.31937 s, 814 MB/s
    real    0m1.323s
    user    0m0.013s
    sys     0m0.322s
    

    Average speed write 718MB/s and read 800MB/s
    * since this was a software raid, I feel the runs after the very first one may be tainted because of some buffering in the software raid driver in linux

    Test 5: Dell PE2950 6i Raid with 6 x WD RE drives (hdd) in Raid-10 configuration. (just a comparison test)

    [root@localhost /]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.96148 s, 363 MB/s
    
    [root@localhost /]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct
    1+0 records in
    1+0 records out
    1073741824 bytes (1.1 GB) copied, 2.86738 s, 374 MB/s
    
    [root@localhost /]# echo 3 | tee /proc/sys/vm/drop_caches
    [root@localhost /]# time dd if=/tmp/test1.img of=/dev/null bs=8k
    131072+0 records in
    131072+0 records out
    1073741824 bytes (1.1 GB) copied, 3.199 s, 336 MB/s
    real    0m3.367s
    user    0m0.024s
    sys     0m0.861s
    
    

    Average speed write 368MB/s and read 336MB/s
    * performance values may be tainted by current workload on the server. The intent of this test was to identify a ball park number with production server vs Dell 790 desktop


  • Moderator

    Part 3 Network subsystem testing

    {to be continued}


  • Moderator

    Post place holder


  • Moderator

    Post place holder


  • Moderator

    Post place holder


Log in to reply
 

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.