Performance decrease using Hyper-V Win10 clients

jkozee

Anecdotally, it appears that both image captures and image deploys take longer in 6299 than my previous 5315 installation. I am using Windows 10 clients under Hyper-V.

Is this expected/explainable behavior?

If not, I can bring up both installations and provide some metrics between the two versions. I haven’t measured the overall capture/deploy times, but it definatelly takes longer for the partclone step to begin.

Thanks.

Sebastian Roth

@jkozee said:

Is this expected/explainable behavior?

In general I’d say no! Did you change compression ratio? We usually don’t have this kind of setup around to test new versions with. So it would be great if you could find some more specific details on this? Where exactly does it take longer?

jkozee

I don’t think anything was changed between versions, but I will verify. I will perform a detailed analysis and report my findings.

Wayne Workman

One time I accidentally only had one core assigned to a VM in Hyper-V. I went through all the motions of installing FOG Trunk… performance sucked. I deleted the VM and started over, this time with 4 cores!

jkozee

I ran some tests that will hopefully prove useful.

Both the client and server are VM’s on the same server. I used a single checkpoint on the client to run all of the tests. The server was tested from a checkpoint running 5315 and then upgraded to 6303 with a new checkpoint created, so that I can easily do additional tests if needed. The upgraded VM gives similar results as a new install on a VM that I originally observed the slow behavior. So, there should be no appreciable differences between the test scenarios, except for the updated FOG version.

Deployment went from 6:03 to 8:54, with the most time increase seen during “Formatting initialized partition” before Partclone and “Resizing ntfs volume” after.

Capture went from 18:03 to 25:52, with the most time increase seen during “Resizing filesysten” before Partclone and “Resizing ntfs volume” after.

I will include additional data and times in separate posts for each test for closer inspection.

Please let me know if you have any ideas, or anything else you would like to see tested.

Thanks!

jkozee

Sorry, long post as I have a limit on how often I can post

5315-Capture
#0:00

Verifying network interface configuration…Done
Checking Operating System…Windows 10
Checking CPU Cores…1
Send method…NFS
Checking In…Done
Mounting File System…Done
Preparing to send image file to server…Done
Checking Mounted File System…Done
Using Image: delme
Preparing backup location…Done
Looking for Hard Disks…Done
Re-reading Partition Tables…Done
Using Hard Disk: /dev/sda
Clearing part (/dev/sdal)…Done
Mounting partition (/dev/sdal)…Done
Removing page file…Done
Removing hibernate file…No hibernate found
Clearing ntfs flag…Done
Saving original partition table…Done
Saving Partition Tables (MBR)…Done
Possible resize partition size: 11263111 k
Running resize test /dev/sdal…Done
Resize test was successful
Resizing filesystem…Done
Clearing ntfs flag…Done
Resizing partition dev/sda1…Done
Checking Hard Disks…Done
Clearing ntfs flag…Done
Now FOG will attempt to upload the image using Partclone.
Processing Partition: /dev/sdal (1)
Using partclone.ntfs
#0:23
<<PARTCLONE>>
#18:00
Image uploaded
Restoring MBR…Done
Resizing ntfs volume (/dev/sdal)…Done
Clearing ntfs flag…Done
Stopping FOG Status Reporter…Done
#18:03

6303-Capture
#0:00

Verifying network interface configuration…Done
Checking Operating System…Windows 10
Checking CPU Cores…1
Send method…NFS
Attempting to check in…Done
Mounting File System…Done
Checking Mounted File System…Done
Checking img variable is set…Done
Preparing to send image file to server
Preparing backup location…Done
Setting permission on /images/00155d016673…Done
Removing any pre-existing files…Done
Using Image: delme
Looking for Hard Disk…Done
Reading Partition Tables…Done
Using Hard Disk: /dev/sda
Now FOG will attempt to upload the image using Partclone
Checking for fixed partitions…Done
Getting Windows/Linux Partition Count…Done
NTFS Partition count of: 1
EXTFS Partition count of: 0
Setting up any additional fixed parts
Saving original partition table…Done
Saving original disk/parts UUIDs…Done
Shrinking Partitions on disk
Clearing part (/dev/sda1)…Done
Mounting partition (/dev/sdal)…Done
Removing page file…Done
Possible resize partition size: 11263111 k
Running resize test /dev/sdal…Done
Resize test was successful
#0:18
Resizing filesysten…Done
#4:53
Resizing partition /dev/sdal…Done
Clearing ntfs flag…Done
Saving shrunken partition table
Saving Partition Tables (MBR)…Done
#4:53
<<PARTCLONE>>
#22:44
Image Uploaded
Restoring Original Partition Layout…Done
#22:44
Resizing ntfs volune (/dev/sda1)…Done
#25:49
Clearing ntfs flag…Done
Stopping FOG Status Reporter…Done
Task Complete
Updating Database…Done
Rebooting system as task is conplete
reboot: Restarting system
#25:52

5315-Deploy
#0:00

Verifying network interface configuration…Done
Checking Operating System…Windows 10
Checking CPU Cores…1
Send method…NFS
Attempting to send inventory…Done
Checking In…Done
Mounting File System…Done
Checking Mounted File System…Done
Starting Image Push
Using Image: delme
Looking for Hard Disks…Done
Checking write caching status on HDD…Enabled
Erasing current MBR/GPT Tables…Done
Restoring Partition Tables (MBR)…Done
Extended partitions…Done
Expanding partition table to fill disk…Done
Processing Partition: /dev/sdal (1)
#0:28
<<PARTCLONE>>
#6:00
Clearing ntfs flag…Done
Stopping FOG Status Reporter…Done
Resizing ntfs uolume (/dev/sda1)…Done
Clearing ntfs flag…Done
Backing up and replacing BCD…Done
Changing hostname…Done
Updating Computer Database Status
Database Updated!
Task is completed, computer will now restart.

reboot: Restarting system
#6:03

6303-Deploy
#0:00

Verifying network interface configuration…Done
Checking Operating System…Windows 10
Checking CPU Cores…1
Send method…NFS
Attempting to check in…Done
Mounting File System…Done
Checking Mounted File System…Done
Checking img variable is set…Done
Attenpting to send inventory…Done
Using Image: delme
Looking for Hard Disk…Done
Using Disk: /dev/sda
Write caching not supported
Preparing Partition layout
Wiping /dev/sda partition information
Erasing current MBBA3PT Tables…Done
Creating disk with new label…Done
Initializing /dev/sda with NTFS partition…Done
#0:20
Formatting initialized partition…Done
#3:53
Erasing current MBR/GPT Tables…Done
Restoring Partition Tables (MBR)…Done
Inserting Extended partitions…Done
Attempting to expand/fill partitions…Done
#3:57
<<PARTCLONE>>
#5:53
Clearing ntfs flag…Done
#5:53
Resizing ntfs volume (/dev/sda1)…Done
#8:51
Clearing ntfs flag…Done
Resetting UUIDs for /dev/sda
Resettings swap systems
Stopping FOG Status Reporter…Done
Mounting directory…Done
Changing hostname…Done
Task Complete
Updating Database…Done
Rebooting system as task is complete
reboot: Restarting system
#8:54

jkozee

I should also note that 5315 == kernel 4.3.0 and 6303 == kernel 4.4.1, as that’s probably relevant. I have tested tried an older kernel on 6303, but can if needed.

jkozee

New tests indicate the slowdown exists in kernel 4.4.0 (x86_64) and 4.4.1 (x86_64), but 4.3.0 (x86_64) appears to be fine.

Sebastian Roth

@jkozee Thanks a lot for the accurate timing! Good to know where exactly time is passing by. I thing @Tom-Elliott is the only one who can shed a light on what changed in “Resizing filesysten”, “Resizing ntfs volune” and “Formatting initialized partition”. Between 5315 and 6303 there were heaps of changes in the whole process.

New tests indicate the slowdown exists in kernel 4.4.0 (x86_64) and 4.4.1 (x86_64), but 4.3.0 (x86_64) appears to be fine.

Do you mean 6303 with kernel 4.3.0 is as fast as 5315?? Can you please verify if you see such drastic differences (where exactly? still resize ntfs…?) just by using older/newer kernel!

jkozee

@Sebastian-Roth said:

Do you mean 6303 with kernel 4.3.0 is as fast as 5315?? Can you please verify if you see such drastic differences (where exactly? still resize ntfs…?) just by using older/newer kernel!

Actually it may be faster. I ran a deploy test using the same VM’s with 6307 using kernel 4.3.0, and it completed in 4:18. To be accurate, I would need to repeat all of the tests to compare 5315/4.3.0 and 6307/4.3.0 under the same server load. But it’s probably safe to say it’s as fast using the older kernel.

jkozee

Here are the metrics comparing 6307 using kernel 4.3.0 and 4.4.1. Looking at the numbers, it’s safe to say that 6307/4.3.0 is faster than 5315/4.3.0 and far faster than 6307/4.4.1 when using a VM client on Hyper-V.

Let me know if there are any more measurements required. I’ll keep the VMs around for a day or two.

6307-4.3.0-Capture
#0:00
#0:18

Resizing filesysten…Done
#0:18
#0:19
<<PARTCLONE>>
#14:19
#14:20
Resizing ntfs volune (/dev/sda1)…Done
#14:20
#14:25

6307-4.4.1-Capture
#0:00
#0:17

Resizing filesysten…Done
#4:53
#4:54
<<PARTCLONE>>
#22:42
#22:43
Resizing ntfs volune (/dev/sda1)…Done
#25:49
#25:53

6307-4.3.0-Deploy
#0:00
#0:20

Formatting initialized partition…Done
#0:20
#0:24
<<PARTCLONE>>
#4:11
#4:11
Resizing ntfs volume (/dev/sda1)…Done
#4:12
#4:14

6307-4.4.1-Deploy
#0:00
#0:20

Formatting initialized partition…Done
#3:53
#3:57
<<PARTCLONE>>
#5:42
#5:42
Resizing ntfs volume (/dev/sda1)…Done
#8:49
#8:52

Wayne Workman

@jkozee Well, I for one really appreciate your efforts with testing performances of various revisions and kernels! Perhaps you can just turn one of the VMs off and leave it alone, and wait until FOG Trunk enters into RC (release candidate) so that you can test speeds then and compare to your findings here? It’d be very appreciated.

Sebastian Roth

@jkozee Do you use the web interface to up-/downgrade kernels? I looked through the official kernel change logs but couldn’t find anything related to NTFS at all. As well I checked the buildroot (this is what is used in the inits doing all the work you see when capturing/deploying a client) change logs and couldn’t find an obvious hint on issues with the ntfs-3g progs. Hmmmm, still wondering if it is the kernel or the init??

Does anyone else see capture/deploy taking literally minutes to resize/format NTFS when image type is set to resizable? Or is this an issue only happening within Hyper-V?

jkozee

@Sebastian-Roth I tested the different kernels by downloading them to separate files using the web interface. The only difference between the last two setups I compared is the kernel parameter in the host setting using the web interface. I only see this issue on my VM’s, my physical units behave normally with either kernel.

I did some additional tests this morning and here’s what I found.

Both 4.4.0 and 4.4.1 take around 3.5 minutes to complete “Formatting initialized partition” during deploy, while 4.3.0, 4.3.2, and 4.3.0CDCETHER take less than 1 second.

The VM I’ve been using has a VHDX file that lives on an SSD, and is the only thing on it. I tested using on VHDX that is on a spinning disk (desktop drive 5600 rpm) amd 4.3.0 still takes <1 sec to complete, but 4.4.1 can complete the step in about 15.5 seconds.

Something has changed in the kernel build in regards to a VM running on Hyper-V with SSD backed storage.

sudburr

I’ve noticed this problem with Hyper-V VMs on physical discs as well. Though much, much worse at 45 -60 minutes stuck on Resizing Filesystem. These are brand new VMs built from scratch. The last command I use next to shutdown is:

defrag c: /x /h /u /v

I have an added problem in that I can’t download other kernels (in another thread) so I have been unable to test with older kernels other than 4.4.1 .

sudburr

I pulled version 4.1.4 of bzImage and bzImage32 from another server.

Without replacing the init’s, “Resizing Filesystem” now completes in about a minute.

Sebastian Roth

@jkozee said:

Something has changed in the kernel build in regards to a VM running on Hyper-V with SSD backed storage

From my point of view the kernel in the VM has absolutely no knowledge of the underlaying filesystem/disk outside the VM. I thought it could be a fragmentation problem on the backend storage device but then you wouldn’t see a difference in speed just by booting different kernel versions. From what you said I think your test setup is pretty good (just changing the kernel parameter in the host setting and leaving everything else untouched).

There is a great way to pin this kind of issue down to exactly one version/commit. It’s called git bisect. Please read through this article and see if you want to dive into this. I am more than happy to help you along the way! Have you ever compiled a (FOG) kernel? It’s actually not to complicated. Just give it a try following this article: https://wiki.fogproject.org/wiki/index.php/Build_TomElliott_Kernel (the second half is talking about current FOG version). Instead of make menuconfig (after downloading Tom’s kernel config) you can just run make oldconfig instead where you don’t need to bother about the menu stuff.

You can build the kernels on your FOG server if you like. Just needs some disk space for the kernel git repo and some tools. As I don’t know which linux you are on I will leave this open. Ask google which packages you need to install on CentOS/Debian/… to compile the kernel. There are lots of tutorials out there.

jkozee

@Sebastian-Roth I’m pretty short on time right now (I’m sure everyone here can say the same thing), but compiling and testing kernels shouldn’t be a problem. I’ll try to make time this weekend, but it may have to wait untill next weekend. I’ll post here if/when I make any progress.

My FOG server is slow storage backed, so I’ll need to build a new VM to make kernel compiling tolerable. My plan would be to script building the incremented versions between 4.3.2 and 4.4.0, to narrow it down. Once we have that, we can bisect between them to find what changed.

@Tom-Elliott Are the .config’s available for download for the 4.3.2 and 4.4.0 builds that you released? Do you build with the defaults, or do you tweak the .config for FOG?

@sudburr Ouch, 45-60 minutes is way more painful. Looks like the FTP issue is now resolved. How does the performance compare with 4.1.4, 4.3.2, and 4.4.1 ?

Tom Elliott

@jkozee Configs can be downloaded as I improved/edit the kernel configs I update them on SVN/GIT, though I can’t possibly tell you which specific revisions these 4.3.0 to 4.4.x changes were made.

I’m about to try building the 4.4.2 kernel (didn’t know it released) and I will pull in a 4.3.0 kernel and rip the config out of it.

jkozee

@Tom-Elliott Thanks Tom. I’ll look through the repo after I get a VM up to compile on. If you get 4.4.2 built and available, I’ll test it first, as there will be really no point in testing the other builds if it is fixed now…

Performance decrease using Hyper-V Win10 clients

69

12.7k

17.6k

156.8k