Performance decrease using Hyper-V Win10 clients
-
I don’t think anything was changed between versions, but I will verify. I will perform a detailed analysis and report my findings.
-
One time I accidentally only had one core assigned to a VM in Hyper-V. I went through all the motions of installing FOG Trunk… performance sucked. I deleted the VM and started over, this time with 4 cores!
-
I ran some tests that will hopefully prove useful.
Both the client and server are VM’s on the same server. I used a single checkpoint on the client to run all of the tests. The server was tested from a checkpoint running 5315 and then upgraded to 6303 with a new checkpoint created, so that I can easily do additional tests if needed. The upgraded VM gives similar results as a new install on a VM that I originally observed the slow behavior. So, there should be no appreciable differences between the test scenarios, except for the updated FOG version.
Deployment went from 6:03 to 8:54, with the most time increase seen during “Formatting initialized partition” before Partclone and “Resizing ntfs volume” after.
Capture went from 18:03 to 25:52, with the most time increase seen during “Resizing filesysten” before Partclone and “Resizing ntfs volume” after.
I will include additional data and times in separate posts for each test for closer inspection.
Please let me know if you have any ideas, or anything else you would like to see tested.
Thanks!
-
Sorry, long post as I have a limit on how often I can post
5315-Capture
#0:00-
Verifying network interface configuration…Done
-
Checking Operating System…Windows 10
-
Checking CPU Cores…1
-
Send method…NFS
-
Checking In…Done
-
Mounting File System…Done
-
Preparing to send image file to server…Done
-
Checking Mounted File System…Done
-
Using Image: delme
-
Preparing backup location…Done
-
Looking for Hard Disks…Done
-
Re-reading Partition Tables…Done
-
Using Hard Disk: /dev/sda
-
Clearing part (/dev/sdal)…Done
-
Mounting partition (/dev/sdal)…Done
-
Removing page file…Done
-
Removing hibernate file…No hibernate found
-
Clearing ntfs flag…Done
-
Saving original partition table…Done
-
Saving Partition Tables (MBR)…Done
-
Possible resize partition size: 11263111 k
-
Running resize test /dev/sdal…Done
-
Resize test was successful
-
Resizing filesystem…Done
-
Clearing ntfs flag…Done
-
Resizing partition dev/sda1…Done
-
Checking Hard Disks…Done
-
Clearing ntfs flag…Done
-
Now FOG will attempt to upload the image using Partclone.
-
Processing Partition: /dev/sdal (1)
-
Using partclone.ntfs
#0:23
<<PARTCLONE>>
#18:00 -
Image uploaded
-
Restoring MBR…Done
-
Resizing ntfs volume (/dev/sdal)…Done
-
Clearing ntfs flag…Done
-
Stopping FOG Status Reporter…Done
#18:03
6303-Capture
#0:00- Verifying network interface configuration…Done
- Checking Operating System…Windows 10
- Checking CPU Cores…1
- Send method…NFS
- Attempting to check in…Done
- Mounting File System…Done
- Checking Mounted File System…Done
- Checking img variable is set…Done
- Preparing to send image file to server
- Preparing backup location…Done
- Setting permission on /images/00155d016673…Done
- Removing any pre-existing files…Done
- Using Image: delme
- Looking for Hard Disk…Done
- Reading Partition Tables…Done
- Using Hard Disk: /dev/sda
- Now FOG will attempt to upload the image using Partclone
- Checking for fixed partitions…Done
- Getting Windows/Linux Partition Count…Done
- NTFS Partition count of: 1
- EXTFS Partition count of: 0
- Setting up any additional fixed parts
- Saving original partition table…Done
- Saving original disk/parts UUIDs…Done
- Shrinking Partitions on disk
- Clearing part (/dev/sda1)…Done
- Mounting partition (/dev/sdal)…Done
- Removing page file…Done
- Possible resize partition size: 11263111 k
- Running resize test /dev/sdal…Done
- Resize test was successful
#0:18 - Resizing filesysten…Done
#4:53 - Resizing partition /dev/sdal…Done
- Clearing ntfs flag…Done
- Saving shrunken partition table
- Saving Partition Tables (MBR)…Done
#4:53
<<PARTCLONE>>
#22:44 - Image Uploaded
- Restoring Original Partition Layout…Done
#22:44 - Resizing ntfs volune (/dev/sda1)…Done
#25:49 - Clearing ntfs flag…Done
- Stopping FOG Status Reporter…Done
- Task Complete
- Updating Database…Done
- Rebooting system as task is conplete
reboot: Restarting system
#25:52
5315-Deploy
#0:00-
Verifying network interface configuration…Done
-
Checking Operating System…Windows 10
-
Checking CPU Cores…1
-
Send method…NFS
-
Attempting to send inventory…Done
-
Checking In…Done
-
Mounting File System…Done
-
Checking Mounted File System…Done
-
Starting Image Push
-
Using Image: delme
-
Looking for Hard Disks…Done
-
Checking write caching status on HDD…Enabled
-
Erasing current MBR/GPT Tables…Done
-
Restoring Partition Tables (MBR)…Done
-
Extended partitions…Done
-
Expanding partition table to fill disk…Done
-
Processing Partition: /dev/sdal (1)
#0:28
<<PARTCLONE>>
#6:00 -
Clearing ntfs flag…Done
-
Stopping FOG Status Reporter…Done
-
Resizing ntfs uolume (/dev/sda1)…Done
-
Clearing ntfs flag…Done
-
Backing up and replacing BCD…Done
-
Changing hostname…Done
-
Updating Computer Database Status
-
Database Updated!
-
Task is completed, computer will now restart.
reboot: Restarting system
#6:036303-Deploy
#0:00- Verifying network interface configuration…Done
- Checking Operating System…Windows 10
- Checking CPU Cores…1
- Send method…NFS
- Attempting to check in…Done
- Mounting File System…Done
- Checking Mounted File System…Done
- Checking img variable is set…Done
- Attenpting to send inventory…Done
- Using Image: delme
- Looking for Hard Disk…Done
- Using Disk: /dev/sda
- Write caching not supported
- Preparing Partition layout
- Wiping /dev/sda partition information
- Erasing current MBBA3PT Tables…Done
- Creating disk with new label…Done
- Initializing /dev/sda with NTFS partition…Done
#0:20 - Formatting initialized partition…Done
#3:53 - Erasing current MBR/GPT Tables…Done
- Restoring Partition Tables (MBR)…Done
- Inserting Extended partitions…Done
- Attempting to expand/fill partitions…Done
#3:57
<<PARTCLONE>>
#5:53 - Clearing ntfs flag…Done
#5:53 - Resizing ntfs volume (/dev/sda1)…Done
#8:51 - Clearing ntfs flag…Done
- Resetting UUIDs for /dev/sda
- Resettings swap systems
- Stopping FOG Status Reporter…Done
- Mounting directory…Done
- Changing hostname…Done
- Task Complete
- Updating Database…Done
- Rebooting system as task is complete
reboot: Restarting system
#8:54
-
-
I should also note that 5315 == kernel 4.3.0 and 6303 == kernel 4.4.1, as that’s probably relevant. I have tested tried an older kernel on 6303, but can if needed.
-
New tests indicate the slowdown exists in kernel 4.4.0 (x86_64) and 4.4.1 (x86_64), but 4.3.0 (x86_64) appears to be fine.
-
@jkozee Thanks a lot for the accurate timing! Good to know where exactly time is passing by. I thing @Tom-Elliott is the only one who can shed a light on what changed in “Resizing filesysten”, “Resizing ntfs volune” and “Formatting initialized partition”. Between 5315 and 6303 there were heaps of changes in the whole process.
New tests indicate the slowdown exists in kernel 4.4.0 (x86_64) and 4.4.1 (x86_64), but 4.3.0 (x86_64) appears to be fine.
Do you mean 6303 with kernel 4.3.0 is as fast as 5315?? Can you please verify if you see such drastic differences (where exactly? still resize ntfs…?) just by using older/newer kernel!
-
@Sebastian-Roth said:
Do you mean 6303 with kernel 4.3.0 is as fast as 5315?? Can you please verify if you see such drastic differences (where exactly? still resize ntfs…?) just by using older/newer kernel!
Actually it may be faster. I ran a deploy test using the same VM’s with 6307 using kernel 4.3.0, and it completed in 4:18. To be accurate, I would need to repeat all of the tests to compare 5315/4.3.0 and 6307/4.3.0 under the same server load. But it’s probably safe to say it’s as fast using the older kernel.
-
Here are the metrics comparing 6307 using kernel 4.3.0 and 4.4.1. Looking at the numbers, it’s safe to say that 6307/4.3.0 is faster than 5315/4.3.0 and far faster than 6307/4.4.1 when using a VM client on Hyper-V.
Let me know if there are any more measurements required. I’ll keep the VMs around for a day or two.
6307-4.3.0-Capture
#0:00
#0:18- Resizing filesysten…Done
#0:18
#0:19
<<PARTCLONE>>
#14:19
#14:20 - Resizing ntfs volune (/dev/sda1)…Done
#14:20
#14:25
6307-4.4.1-Capture
#0:00
#0:17- Resizing filesysten…Done
#4:53
#4:54
<<PARTCLONE>>
#22:42
#22:43 - Resizing ntfs volune (/dev/sda1)…Done
#25:49
#25:53
6307-4.3.0-Deploy
#0:00
#0:20- Formatting initialized partition…Done
#0:20
#0:24
<<PARTCLONE>>
#4:11
#4:11 - Resizing ntfs volume (/dev/sda1)…Done
#4:12
#4:14
6307-4.4.1-Deploy
#0:00
#0:20- Formatting initialized partition…Done
#3:53
#3:57
<<PARTCLONE>>
#5:42
#5:42 - Resizing ntfs volume (/dev/sda1)…Done
#8:49
#8:52
- Resizing filesysten…Done
-
@jkozee Well, I for one really appreciate your efforts with testing performances of various revisions and kernels! Perhaps you can just turn one of the VMs off and leave it alone, and wait until FOG Trunk enters into RC (release candidate) so that you can test speeds then and compare to your findings here? It’d be very appreciated.
-
@jkozee Do you use the web interface to up-/downgrade kernels? I looked through the official kernel change logs but couldn’t find anything related to NTFS at all. As well I checked the buildroot (this is what is used in the inits doing all the work you see when capturing/deploying a client) change logs and couldn’t find an obvious hint on issues with the ntfs-3g progs. Hmmmm, still wondering if it is the kernel or the init??
Does anyone else see capture/deploy taking literally minutes to resize/format NTFS when image type is set to resizable? Or is this an issue only happening within Hyper-V?
-
@Sebastian-Roth I tested the different kernels by downloading them to separate files using the web interface. The only difference between the last two setups I compared is the kernel parameter in the host setting using the web interface. I only see this issue on my VM’s, my physical units behave normally with either kernel.
I did some additional tests this morning and here’s what I found.
Both 4.4.0 and 4.4.1 take around 3.5 minutes to complete “Formatting initialized partition” during deploy, while 4.3.0, 4.3.2, and 4.3.0CDCETHER take less than 1 second.
The VM I’ve been using has a VHDX file that lives on an SSD, and is the only thing on it. I tested using on VHDX that is on a spinning disk (desktop drive 5600 rpm) amd 4.3.0 still takes <1 sec to complete, but 4.4.1 can complete the step in about 15.5 seconds.
Something has changed in the kernel build in regards to a VM running on Hyper-V with SSD backed storage.
-
I’ve noticed this problem with Hyper-V VMs on physical discs as well. Though much, much worse at 45 -60 minutes stuck on Resizing Filesystem. These are brand new VMs built from scratch. The last command I use next to shutdown is:
defrag c: /x /h /u /v
I have an added problem in that I can’t download other kernels (in another thread) so I have been unable to test with older kernels other than 4.4.1 .
-
I pulled version 4.1.4 of bzImage and bzImage32 from another server.
Without replacing the init’s, “Resizing Filesystem” now completes in about a minute.
-
@jkozee said:
Something has changed in the kernel build in regards to a VM running on Hyper-V with SSD backed storage
From my point of view the kernel in the VM has absolutely no knowledge of the underlaying filesystem/disk outside the VM. I thought it could be a fragmentation problem on the backend storage device but then you wouldn’t see a difference in speed just by booting different kernel versions. From what you said I think your test setup is pretty good (just changing the kernel parameter in the host setting and leaving everything else untouched).
There is a great way to pin this kind of issue down to exactly one version/commit. It’s called git bisect. Please read through this article and see if you want to dive into this. I am more than happy to help you along the way! Have you ever compiled a (FOG) kernel? It’s actually not to complicated. Just give it a try following this article: https://wiki.fogproject.org/wiki/index.php/Build_TomElliott_Kernel (the second half is talking about current FOG version). Instead of
make menuconfig
(after downloading Tom’s kernel config) you can just runmake oldconfig
instead where you don’t need to bother about the menu stuff.You can build the kernels on your FOG server if you like. Just needs some disk space for the kernel git repo and some tools. As I don’t know which linux you are on I will leave this open. Ask google which packages you need to install on CentOS/Debian/… to compile the kernel. There are lots of tutorials out there.
-
@Sebastian-Roth I’m pretty short on time right now (I’m sure everyone here can say the same thing), but compiling and testing kernels shouldn’t be a problem. I’ll try to make time this weekend, but it may have to wait untill next weekend. I’ll post here if/when I make any progress.
My FOG server is slow storage backed, so I’ll need to build a new VM to make kernel compiling tolerable. My plan would be to script building the incremented versions between 4.3.2 and 4.4.0, to narrow it down. Once we have that, we can bisect between them to find what changed.
@Tom-Elliott Are the .config’s available for download for the 4.3.2 and 4.4.0 builds that you released? Do you build with the defaults, or do you tweak the .config for FOG?
@sudburr Ouch, 45-60 minutes is way more painful. Looks like the FTP issue is now resolved. How does the performance compare with 4.1.4, 4.3.2, and 4.4.1 ?
-
@jkozee Configs can be downloaded as I improved/edit the kernel configs I update them on SVN/GIT, though I can’t possibly tell you which specific revisions these 4.3.0 to 4.4.x changes were made.
I’m about to try building the 4.4.2 kernel (didn’t know it released) and I will pull in a 4.3.0 kernel and rip the config out of it.
-
@Tom-Elliott Thanks Tom. I’ll look through the repo after I get a VM up to compile on. If you get 4.4.2 built and available, I’ll test it first, as there will be really no point in testing the other builds if it is fixed now…
-
@jkozee I am building the 4.4.2 using the 4.3.0 config (adding the nics that were not part, and patches to other areas as needed for other support thing. I’ll let you know when they’re complete. Rather than immediately publish them, I’d just like you to run a few tests to see if they are working better.
-
@Tom-Elliott NP, just let me know.