Performace testing slow FOG Imaging

george1421

Intro

The intended audience of this thread is for those FOG admins who have one or more models of computers with slow imaging rates, where otherwise most of their campuses are imaging at a normal rate. The “normal rate” is a bit of a moving target because FOG imaging relies on a health network, well managed FOG server and modern target computers.

The easiest bench marking method a FOG admin has to access is the speed rating listed on the blue partclone screen, seen during imaging. This rating is measured in data volume per minute of transfer. We need to be mindful that this “speed rating” is a composite score of the entire imaging process and not specifically network throughput. This composite score is the combination of the fog server moving data to and from the network interface (plus) network throughput (plus) target computer ingest and image decompression in memory (plus) the target writing the expanded image to local storage media. Any one of these components not functioning optimally will cause a lower than “normal” score in partclone.

Some FOG admins equate this partclone score directly to network throughput. This an incorrect assumption. I’ve seen comments like “I’m getting partclone speeds faster than physically possible with my network, how is that possible?”. In this case the poster was getting 8.2GB/min according to partclone over a 1GbE network. A 1GbE network has a theoretical throughput of 7.5GB/min, yet the poster was seeing 8.2GB/min according to partclone. As I mentioned earlier the partclone score is a composite score of the entire process, where network throughput is only one component. So that 8.2GB/min score is telling me the poster’s network is running very well in that the target computer is receiving the image at a rate to keep the input buffer full and the target computer is performing well in that it is ingesting, decompressing the image, and writing the image to local storage at top speed.

I can speak for the benchmarks I see on my campus. I do have to mention a caveat here in that on my campus I don't use the FOG Client, so from this perspective my FOG server is only used for imaging and not for system management. The FOG Client adds its own overhead to the FOG server that may skew your results if you have a large campus with all target computers running the FOG client. For a well managed pure 1GbE network with a modern (contemporary) target computer, I typically see 6.1GB/min score in partclone. Using our enterprise infrastructure with a 10GbE core network we typically see 13GB/min score on the target computers. Just to contrast this, my FOG-Pi3 server on a 1 GbE network I’ve seen 5GB/min partclone scores. The point is the FOG server has minimal impact on FOG imaging rates. All of the heavy lifting (so to speak) during imaging is done by the target computer. To say it a different way, the target computer performance has a bigger impact on imaging than the FOG server.

One thing I need to point out here is that as you read through this thread be mindful of the unit of scale being used. Some tools report out in bits per second, others in Bytes per second, and others in MB per minute. I will try my best to keep everything straight myself. For example for a 1GbE network that is 1 gigabits per second, or 125 Mega Bytes per second or 7.5 gigabytes per minute. They all mean the same speed but at a different unit of time.

The remainder of this thread is going to assume your campus is imaging at your normal speed except one specific model of computer. We will go through the steps to try to determine which leg of imaging is causing the partclone score to be lower than expected.

This thread is based on the work I did several years ago in this thread: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast We will take some of the lessons learned in that thread and apply them below.

Target system setup for testing

Register your target system with the FOG server. If you can’t use the built in registration process, manually register the target computer with the FOG server.
Connect the target computer’s configuration to an image definition.
Schedule a debug capture/deploy (doesn’t matter which). Before you hit the Schedule Task button, tick the Debug checkbox then schedule the task.
PXE boot the target computer into FOG. You will see several screens of text on the target computer that you need to clear using the Enter key. At this point you should be at the FOS Linux command prompt.
Follow the testing procedures below.

For tips on remote debugging the target computer check out this link: <Editor note: add in link to remote debugging article when its written>

george1421

Summary of results

george1421

NFS throughput

In this section we will test the file copy performance over the network. We will use the FOG server to host a file for use to copy. Later in this section we will use a previously captured disk image to deploy to our test system.

To start off this section we will assume that you have already connected the /ntfs directory to your local hard drive partition. This step was carried out in the previous section regarding [ Target Dsk Subsystem ] so we will continue on from there.

Create a new directory off the root so we can connect the FOG server’s NFS share to our target test system.

mkdir /images

Now we will connect the FOG Server’s share for image capture to the directory we just created.

mount -o nolock,proto=tcp,rsize=32768,wsize=32768,intr,noatime "<fog_server_ip>:/images/dev" /images

This next step we will create a working directory on the FOG server and then use dd to create our 1GB test file similar to what we did to test the local hard drive write speed.

mkdir /images/test
dd if=/dev/zero of=/images/test/test1.img bs=1G count=1 oflag=direct

The output should look similar to this:

# dd if=/dev/zero of=/images/test/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.64448 s, 140 MB/s

You might want to repeat this process a few times to confirm you have a consistent performance number.

Next we will turn around and read back in that 1GB file we created to see what our read performance is. The first thing we need to do is tell linux to not cache any reads then read in and time the reads. These two commands below need to be executed one right after the other.

echo 3 | tee /proc/sys/vm/drop_caches
time dd if=/images/test/test1.img of=/dev/null bs=8k

The output of these commands should look like this:

# time dd if=/images/test/test1.img of=/dev/null bs=8k
131072+0 records in
131072+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.56148 s, 235 MB/s

real	0m4.566s
user	0m0.072s
sys	0m1.092s

So for these two VMs I’m using I have roughly 140MB/s write and 245MB/s read performance running on the same proxmox host server. Note the real time stat. It took 4.5 seconds to read in that 1GB file from the FOG server.

The next step will be for us to time the copy rate between the FOG server and local hard drive. We will do that with these two command. Again we need to tell linux not to cache the file copy or read ahead any.

echo 3 | tee /proc/sys/vm/drop_caches
time cp /images/teset/test1.img /ntfs

The output should look like this:

# echo 3 | tee /proc/sys/vm/drop_caches
# time cp /images/teset/test1.img /ntfs

real	0m7.445s
user	0m0.009s
sys	0m1.088s

While the copy results were not given to us in MB/s, we see the copy took about 7.4 seconds. This is just a bit slower than our read speeds from the FOG Server NFS share. For a quick comparison I ran the command to create the 1GB file on this test vm and this is the results.

# dd if=/dev/zero of=/ntfs/test1.img bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.39809 s, 199 MB/s

So based on these two rating we can tell our bottleneck is reading the file from the FOG server.

The next test we will need to get a partition image from another captured image on your FOG server. We will test the download, expand and write process using partclone to send a partition to your test hard drive.

To do this

george1421

Network throughput

In this section we are going to test the network bandwidth performance between the target computer and the FOG server. This test will involve both sending and receiving data to and from the FOG server from the target computer. The uitlity we will use for this test is called iperf3. The FOS Linux OS running on the target computer already has this utility installed, you will need to install this program on the FOG server because its not installed by default.

For example if your FOG server is running Ubuntu you would install iperf using this command sudo apt-get install iperf3. That command should work for any debian/ubuntu variant OS.

Once iperf is installed, lets setup the server process. On the FOG server from a linux console key in the following command:

# iperf3 -s
-----------------------------------------------------------
Server listening on <fog_server_ip>
-----------------------------------------------------------

This will startup the server service running on the FOG server. For the rest of the testing you will not need to interact with the FOG server until its time to stop the iperf3 service on the FOG server using Ctrl-C command.

If the target computer is not already in debug mode, put the target computer into debug mode following the process in the first post. Now lets proceed with testing the network connection.

On the target computer’s FOS Linux command prompt key in the following command:

# iperf3 -c <fog_server_ip>

The output of the command will be presented as in the following chart

# iperf3 -c <fog_server_ip>

Connecting to host <fog_server_ip>, port 5201
[  5] local <target_computer_ip> port 43302 connected to <fog_server_ip> port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   935 Mbits/sec    0    362 KBytes
[  5]   1.00-2.00   sec   112 MBytes   936 Mbits/sec    0    362 KBytes
[  5]   2.00-3.00   sec   111 MBytes   935 Mbits/sec    0    362 KBytes
[  5]   3.00-4.00   sec   111 MBytes   933 Mbits/sec    0    362 KBytes
[  5]   4.00-5.00   sec   111 MBytes   935 Mbits/sec    0    362 KBytes
[  5]   5.00-6.00   sec   112 MBytes   936 Mbits/sec    0    362 KBytes
[  5]   6.00-7.00   sec   111 MBytes   933 Mbits/sec    0    362 KBytes
[  5]   7.00-8.00   sec   112 MBytes   937 Mbits/sec    0    362 KBytes
[  5]   8.00-9.00   sec   111 MBytes   934 Mbits/sec    0    362 KBytes
[  5]   9.00-10.00  sec   111 MBytes   934 Mbits/sec    0    362 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   935 Mbits/sec    0             sender
[  5]   0.00-10.02  sec  1.09 GBytes   932 Mbits/sec                  receiver

The above chart is what I would expect a typical network flow to look like. The important columns to pay attention to is Bitrate and Retr.

The Bitrate shows the throughput speed. For a 1GbE network 1000Mb/s is the theoretical maximum speed. The Retr column shows the number of times a data packet needed to be retransmitted. Ideally you should have 0 retransmissions on a well designed network.

Below is an example you might see on a congested network.

iperf3 -c 192.168.1.205 -p 5202 -i 1 -t 30
Connecting to host 192.168.1.205, port 5202
[  5] local 192.168.1.210 port 56804 connected to 192.168.1.205 port 5202
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   112 MBytes   938 Mbits/sec   45    181 KBytes
[  5]   1.00-2.00   sec  80.3 MBytes   673 Mbits/sec  234   48.1 KBytes
[  5]   2.00-3.00   sec  54.3 MBytes   456 Mbits/sec  304   18.4 KBytes
[  5]   3.00-4.00   sec  55.9 MBytes   469 Mbits/sec  313   26.9 KBytes
[  5]   4.00-5.00   sec  56.1 MBytes   470 Mbits/sec  332   33.9 KBytes
[  5]   5.00-6.00   sec  60.2 MBytes   505 Mbits/sec  268   43.8 KBytes
[  5]   6.00-7.00   sec  70.5 MBytes   591 Mbits/sec  284   46.7 KBytes
[  5]   7.00-8.00   sec  63.7 MBytes   534 Mbits/sec  232   48.1 KBytes
[  5]   8.00-9.00   sec  49.5 MBytes   415 Mbits/sec  274   50.9 KBytes
[  5]   9.00-10.00  sec  63.4 MBytes   532 Mbits/sec  269   43.8 KBytes
[  5]  10.00-11.00  sec  69.2 MBytes   580 Mbits/sec  246    253 KBytes
[  5]  11.00-12.00  sec   111 MBytes   932 Mbits/sec    0    355 KBytes
[  5]  12.00-13.00  sec   111 MBytes   935 Mbits/sec    0    356 KBytes
[  5]  13.00-14.00  sec   111 MBytes   931 Mbits/sec    0    358 KBytes
[  5]  14.00-15.00  sec   111 MBytes   935 Mbits/sec    0    358 KBytes
[  5]  15.00-16.00  sec   112 MBytes   936 Mbits/sec    0    358 KBytes
[  5]  16.00-17.00  sec   111 MBytes   933 Mbits/sec    0    358 KBytes
[  5]  17.00-17.11  sec  12.6 MBytes   932 Mbits/sec    0    358 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-17.11  sec  1.38 GBytes   694 Mbits/sec  2801             sender
[  5]   0.00-17.11  sec  0.00 Bytes  0.00 bits/sec                  receiver

Note the Bitrate speed is impacted by the number of times the data packet needed to be Retr (retransmitted) during the test. Remember this process tests the entire data path between the target computer and FOG server.The transmitted files are all created in memory to memory so no part of the disk subsystem is being used here. While the above chart shows network congestion it really doesn’t tell us where in the data path its congested. For the context of this thread the network congestion could be the cause of the slower than normal imaging performance.

george1421

Target disk subsystem

In this section we are going to test the target computer’s performance to create a 1 GB file on local storage using the linux dd command. The dd command will create this 1GB file and time the creation process for us. Just be aware that this is a data distructive test. The contents of your local storage device will be erased during the test. Don’t perform this storage bandwidth test on a disk where you can not afford to lose the data.

The hardest step in the process is finding the local storage device name, removing all partitions on the disk, and then creating a new partition for our testing.

First lets find the name of your local storage disk. We will use the lsblk command to locate the linux device name. In the figure below you see the linux device name is sda for a sata attached disk, It has 2 partitions sda1 sda2

# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 238.5G  0 disk 
├─sda1   8:1    0   512M  0 part /boot/efi
└─sda2   8:2    0   238G  0 part /
sr0     11:0    1  1024M  0 rom

Below is an example of an NVMe disk. In this case the device name is nume0n1 and the partition numbers are p1 p2 p3 p4.

# lsblk
NAME MAT:MIN RM SIZE RO TYPE MOUNTPOINT
nume0n1 259:0 0 4776 0 disk
|-nume0n1p1 259:1 0 100M 0 part
|-nume0n1p2 259:2 0 16M 0 part
|-nume0n1p3 259:3 0 476.3G 0 part
|-nume0n1p4 259:4 0 508M 0 part

For the rest of this section we will assume you have a NVMe drive so we will use that naming convention. So we know the NVMe device name is nume0n1. Lets use the fdisk utility to remove all of the existing partitions on the disk. Don’t forget I mentioned this is a data destructive test.

fdisk /dev/nume0n1

Use the d command to remove all of the existing partitions on the disk. Then use the w command to write the blank partition table to disk. You can confirm the partitions are gone with the p command. Now finally create a new partition using the n then p primary, 1 first partition and then pick the defaults for the remainder. Now use the w write command to write the partitions to disk and the q command to quit fdisk. Finally ensure the OS is in sync with the disk by keying in sync twice at the FOS Linux command prompt.

You can confirm your changes my once again using the lsblk command.

# lsblk
NAME MAT:MIN RM SIZE RO TYPE MOUNTPOINT
nume0n1 259:0 0 4776 0 disk
|-nume0n1p1 259:1 0 477.6G 0 part

Now that we have our test partition we need to format it. Lets format this nvme first partition using this command.

mkfs -t ext4 /dev/nvme0n1p1

The output of this command should look similar to this

# mkfs -t ext4 /dev/nvme0n1p1

nke2fs 1.45.6 (20-Mar-2020)
Discarding device blocks: done
Creating filesysten with 124866880 4k blocks and 31219712 inodes
Filesysten UUID: 5652bad-814c-4a2d-811a-fd5fb50a6dc4
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

Hang on we are almost done with the setup. The next step is to create a directory mount point and to connect the nvme partition to the directory mount point.

mkdir /ntfs

mount -t ext4 /dev/nvme0n1p1 /ntfs

Issue the following command to confirm the partition is mounted.

df -h

Filesystem Size Used Avail Use% Mounted on
/dev/root 248M 97M 139M 42% /
/dev/nvme0n1p1 477G 26G 452G 6% /ntfs

The line we are looking for is this one. It shows that the device /dev/nvme0n1p1 is connected to the /ntfs path.

/dev/nvme0n1p1 477G 26G 452G 6% /ntfs

Finally we’ve made it to the benchmarking point. Now we will use the dd command to create a 1GB file on the local disk.

dd if=/dev/zero of=/ntfs/test1.img bs=1G count=1 oflag=direct

1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0GiB) copied, 0.546232 s, 2.0 GB/s

In this case the dd command created the 1GB file in about a 1/2 second at a rate of 2.0 GB/s. This results is withing the expected range.

I can give you a few numbers off the top of my head that are reasonable results.
SATA HDD (spinning disk) 40-90MB/s
SATA SSD 350-520MB/s
NVMe 950-3500MB/s

If your results are within the above ranges for the selected storage device this part of the test was successful.

Performace testing slow FOG Imaging

78

12.5k

17.5k

156.2k