Rate is at a slow crawl when trying to deploy/capture image

hvaransky

Very much a newbie FOG user here, so if this question sounds stupid, I apologize!

I am trying to both deploy and capture an image and everything starts out great until I get to the actual deploy/capture. The rate starts out High, but within 30 minutes, goes so low (20-60MB/Min) that I end up canceling the task because it’s taking over 24 hours to actually complete. I did let a deployed image go and it did actually complete the deployment and image seems to be running okay on the computer that I put it in. The image that I deployed was made last July and it worked great (rate ran at about 1.5GB/Min) the whole summer while re-imaging. I’ve only noticed it be this slow since about December. I have tried rebooting FOG server and restarting the services but nothing helps. We have not ran any updates on the FOG server or updated it. Currently running FOG 1.4.4 or Ubuntu 14.04.

Wayne Workman

@hvaransky Most likely not a fog problem, just so you know. You should start with general Linux troubleshooting - and focus on network.

There’s a utility for linux called ethtool that you can install & use to see the configuration of the local NIC of a linux OS. Run ethtool on your fog server and see if it’s configured at 1Gbps. Also, make sure the link between the fog server and target host is 1Gbps all the way through. A bad/kinked/broke/borked patch cable can cause a 1Gbps link to be derated down to 100Mbps or even 10Mbps. There’s another tool called iperf to test network thorughput between two linux boxes - install that, and live-boot the target computer using a Ubuntu Desktop disk or something - run some iperf tests to see throughput.

hvaransky

@wayne-workman Thank you!! Will definitely try those options!

hvaransky

@wayne-workman I did run the ethtool utility and the speed is set at 1Gbps. I also did an ifconfig and am getting RX at 419.0MB and TX at 20.6GB. I restarted the server and reset the services yesterday as well. I thought maybe it was because I was trying to image during the day when the network is being hit hard, so I waited until most everyone left, but it still took over 7 hours to complete. Connection between server and host computer looks to be okay as well. I did look back at the imaging reports, and the same image only took 1 hour and 7 minutes last July.

Wayne Workman

@hvaransky Where is the computer you were imaging? Is it connected to the same physical switch the fog server is? If not can you move it to that switch and try again?

hvaransky

@wayne-workman The computer that I imaged last night is in the same building, but is through a different switch. I tried an image last week that took 7 hours and 44 minutes and was connected to the same switch. They are two different images, but both were made around the same time for HP desktops.

Wayne Workman

@hvaransky For the image that is deploying slowly:

What is the compression settings set to?
How big does the image say it is on the FOG Server?

Also, please post a screenshot of the top command on your FOG Server during the time when the largest partition is being written to disk.

hvaransky

@wayne-workman I took 2 because I’m not exactly sure what we’re looking for. The compression is set at 22 and the image size is 465GB.

0_1520868079537_Screenshot from 2018-03-12 11:10:46.png

0_1520867985169_Screenshot from 2018-03-12 11:10:52.png

Tom Elliott

@hvaransky turn compression down to 19. 20 and 22 on zstd requires a lot of memory. While the selector lets you choose that high, it is not recommended. 19, I’ve found, is the highest you can go without causing an issue with the client machine.

Wayne Workman

@hvaransky I agree with Tom, that compression setting is too high. I wanted the top output to look for any potential load issues; I don’t see any.

Tom Elliott

I should add, the higher the compression, the slower the image will be captured. For gzip, maximum compression allowed is 9, and it is VERY slow. Zstd maximum compression is 22, though 19 is the highest you can do with out running into memory issues, I’ve found. 19 on zstd is still faster than gzip on 9 from my experience. The reason higher compression takes longer is the work that has to be done to actually compress it the furthest. Hopefully this will lend some insight.

hvaransky

@tom-elliott I understand about the compression rates, but I’m not sure about the Gzip/ZSTD stuff. All of our images are set to partclone Gzip as per screenshot. Am I safe to assume that I have something set up wrong in the image management?

0_1520947088026_fog image management.png

Tom Elliott

@hvaransky the maximum gzip can compress is -9. If you set compression to 22 and gzip is compression manager, it will set to -9. This is very very slow.

hvaransky

@tom-elliott I changed the rate down to 6 and deployed the image again. It started out at 10GB/Min, but within 10 minutes it is down to 247MB/Min. I’m at 44% completed with it running for an hour and 22 minutes, which is definitely MUCH better, but is there something else I need to adjust to get it even quicker? The rate is still dropping (it is going down at about 2MB every 3 minutes or so). Sorry for all the questions, I really am a newbie with FOG.

Junkhacker

@hvaransky if you’re wanting high compression, you’ll want to switch to zstd. it’s faster and compresses better. don’t bother with maxing it out though, you’ll triple the time it takes to compress and only save a few % in size. comparing gzip -6 to zstd -11 ( our recommended settings ) my testing showed zstd was 10% faster at capture, 26% smaller in final file size, and 36% faster on deployment

hvaransky

@junkhacker We’re not really worried about compression size per say. We would rather it take less time for an image to deploy. (It was running at about 35 minutes per machine last summer and is now taking more than 7 hours to complete.) On the plus size, once changing compression size on the current image I’m deploying, it is predicted to only take about 3 1/2 hours to finish!

Tom Elliott

@hvaransky There’s a lot of variables to consider in deploy, or capture, speed.

First is your network.
Second is the disks writing to/reading from.
Third is the compression.

As @Junkhacker stated, finding the “goldilocks” zone of compression is also useful. For example, gzip at -9 will take a long time to capture and you don’t really gain much compression increase. That and speed to deploy isn’t much better (partly due to the compression already reaching it’s peak).

Less data to deploy = faster network transfer, but if your disk is really old or slow the speed could be limited there. (This on deploy and capture).

I’ve found -11 on Zstd to be a good zone, though I don’t have much disk space, so I use zstd on 19 (which i find is still faster than gzip on 9) during capture.

As I said, there’s a lot of variables to consider.

Also, if your fog server is replicating images and files at the same time as you performing a capture or deploy, chances are likely the slowdown is due to the server being used at the same time as the leads have to jump around the server’s hard drive.

You could try rebooting the fog server though. After all, the server is still a computer, and while it’s not necessarily a normal requirement, rebooting might solve many of the problems you’re seeing.

Also, look at your network, if you have a 1Gbps network, but a switch is a 100/Mbps, the maximum capture/deploy across network to that machine would be limited to 100 Mbps (about 2.5GB/min) Where 1Gbps would give about 7.5GB/min. (This is for uncompressed data though.) The speed, is also (as stated earlier) limited to the hard drives of both the client and server machines. Most often I’ve found the slowdown is not the networks, rather they’re the hdd’s reading/writing from/to.

george1421

If you have some time, I’d like you to do some system benchmarking. Maybe we can find the source of your issues.

The first and easiest to test is local disk subsystem. From a linux command prompt on your fog server run these commands.

sudo dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=direct

run it 3 times and average the output which should look something like this:

1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 13.9599 s, 76.9 MB/s

Then run this command 3 times and average the output.
sudo echo 3 | tee /proc/sys/vm/drop_caches && time dd if=/tmp/test1.img of=/dev/null bs=8k

Post the results here.

And finally we need to remove the 1GB file we created.
sudo rm -f /tmp/test1.img

The next bit is network throughput. But lets see your disk speeds to start.

hvaransky

@george1421 I couldn’t get the 2nd part of the command line to work as I kept getting permission denied. I was able to use the built in benchmarking on the disks menu to come up with the screenshot below:

0_1521034320684_benchmark test.png

We also double-checked all of the switches last night and all seem to be set properly and rebooted the FOG server. I am going to try to capture a new image with the ZSTD compression instead of Gzip. On the downside, after changing compressions on the image and it starting out super high transfer rate yesterday, it still took over 8 hours to complete and the rate almost bottomed out by the time it was an hour in.

george1421

@hvaransky If you run sudo su - first then you should be able to run the commands without the sudo at all.

It would be interesting from a bench marking standpoint to use the same tool to give us the same relative number.

But based on the benchmark screen, I would expect you have either a SATA SSD or a multi hard drive (>6) disk array. Maybe raid 10. So your slowness is probably not your disk subsystem. So the next steps are network testing.

Rate is at a slow crawl when trying to deploy/capture image

129

12.1k

17.3k

155.4k