Performance decrease using Hyper-V Win10 clients

jkozee

Here are the metrics comparing 6307 using kernel 4.3.0 and 4.4.1. Looking at the numbers, it’s safe to say that 6307/4.3.0 is faster than 5315/4.3.0 and far faster than 6307/4.4.1 when using a VM client on Hyper-V.

Let me know if there are any more measurements required. I’ll keep the VMs around for a day or two.

6307-4.3.0-Capture
#0:00
#0:18

Resizing filesysten…Done
#0:18
#0:19
<<PARTCLONE>>
#14:19
#14:20
Resizing ntfs volune (/dev/sda1)…Done
#14:20
#14:25

6307-4.4.1-Capture
#0:00
#0:17

Resizing filesysten…Done
#4:53
#4:54
<<PARTCLONE>>
#22:42
#22:43
Resizing ntfs volune (/dev/sda1)…Done
#25:49
#25:53

6307-4.3.0-Deploy
#0:00
#0:20

Formatting initialized partition…Done
#0:20
#0:24
<<PARTCLONE>>
#4:11
#4:11
Resizing ntfs volume (/dev/sda1)…Done
#4:12
#4:14

6307-4.4.1-Deploy
#0:00
#0:20

Formatting initialized partition…Done
#3:53
#3:57
<<PARTCLONE>>
#5:42
#5:42
Resizing ntfs volume (/dev/sda1)…Done
#8:49
#8:52

Wayne Workman

@jkozee Well, I for one really appreciate your efforts with testing performances of various revisions and kernels! Perhaps you can just turn one of the VMs off and leave it alone, and wait until FOG Trunk enters into RC (release candidate) so that you can test speeds then and compare to your findings here? It’d be very appreciated.

Sebastian Roth

@jkozee Do you use the web interface to up-/downgrade kernels? I looked through the official kernel change logs but couldn’t find anything related to NTFS at all. As well I checked the buildroot (this is what is used in the inits doing all the work you see when capturing/deploying a client) change logs and couldn’t find an obvious hint on issues with the ntfs-3g progs. Hmmmm, still wondering if it is the kernel or the init??

Does anyone else see capture/deploy taking literally minutes to resize/format NTFS when image type is set to resizable? Or is this an issue only happening within Hyper-V?

jkozee

@Sebastian-Roth I tested the different kernels by downloading them to separate files using the web interface. The only difference between the last two setups I compared is the kernel parameter in the host setting using the web interface. I only see this issue on my VM’s, my physical units behave normally with either kernel.

I did some additional tests this morning and here’s what I found.

Both 4.4.0 and 4.4.1 take around 3.5 minutes to complete “Formatting initialized partition” during deploy, while 4.3.0, 4.3.2, and 4.3.0CDCETHER take less than 1 second.

The VM I’ve been using has a VHDX file that lives on an SSD, and is the only thing on it. I tested using on VHDX that is on a spinning disk (desktop drive 5600 rpm) amd 4.3.0 still takes <1 sec to complete, but 4.4.1 can complete the step in about 15.5 seconds.

Something has changed in the kernel build in regards to a VM running on Hyper-V with SSD backed storage.

sudburr

I’ve noticed this problem with Hyper-V VMs on physical discs as well. Though much, much worse at 45 -60 minutes stuck on Resizing Filesystem. These are brand new VMs built from scratch. The last command I use next to shutdown is:

defrag c: /x /h /u /v

I have an added problem in that I can’t download other kernels (in another thread) so I have been unable to test with older kernels other than 4.4.1 .

sudburr

I pulled version 4.1.4 of bzImage and bzImage32 from another server.

Without replacing the init’s, “Resizing Filesystem” now completes in about a minute.

Sebastian Roth

@jkozee said:

Something has changed in the kernel build in regards to a VM running on Hyper-V with SSD backed storage

From my point of view the kernel in the VM has absolutely no knowledge of the underlaying filesystem/disk outside the VM. I thought it could be a fragmentation problem on the backend storage device but then you wouldn’t see a difference in speed just by booting different kernel versions. From what you said I think your test setup is pretty good (just changing the kernel parameter in the host setting and leaving everything else untouched).

There is a great way to pin this kind of issue down to exactly one version/commit. It’s called git bisect. Please read through this article and see if you want to dive into this. I am more than happy to help you along the way! Have you ever compiled a (FOG) kernel? It’s actually not to complicated. Just give it a try following this article: https://wiki.fogproject.org/wiki/index.php/Build_TomElliott_Kernel (the second half is talking about current FOG version). Instead of make menuconfig (after downloading Tom’s kernel config) you can just run make oldconfig instead where you don’t need to bother about the menu stuff.

You can build the kernels on your FOG server if you like. Just needs some disk space for the kernel git repo and some tools. As I don’t know which linux you are on I will leave this open. Ask google which packages you need to install on CentOS/Debian/… to compile the kernel. There are lots of tutorials out there.

jkozee

@Sebastian-Roth I’m pretty short on time right now (I’m sure everyone here can say the same thing), but compiling and testing kernels shouldn’t be a problem. I’ll try to make time this weekend, but it may have to wait untill next weekend. I’ll post here if/when I make any progress.

My FOG server is slow storage backed, so I’ll need to build a new VM to make kernel compiling tolerable. My plan would be to script building the incremented versions between 4.3.2 and 4.4.0, to narrow it down. Once we have that, we can bisect between them to find what changed.

@Tom-Elliott Are the .config’s available for download for the 4.3.2 and 4.4.0 builds that you released? Do you build with the defaults, or do you tweak the .config for FOG?

@sudburr Ouch, 45-60 minutes is way more painful. Looks like the FTP issue is now resolved. How does the performance compare with 4.1.4, 4.3.2, and 4.4.1 ?

Tom Elliott

@jkozee Configs can be downloaded as I improved/edit the kernel configs I update them on SVN/GIT, though I can’t possibly tell you which specific revisions these 4.3.0 to 4.4.x changes were made.

I’m about to try building the 4.4.2 kernel (didn’t know it released) and I will pull in a 4.3.0 kernel and rip the config out of it.

jkozee

@Tom-Elliott Thanks Tom. I’ll look through the repo after I get a VM up to compile on. If you get 4.4.2 built and available, I’ll test it first, as there will be really no point in testing the other builds if it is fixed now…

Tom Elliott

@jkozee I am building the 4.4.2 using the 4.3.0 config (adding the nics that were not part, and patches to other areas as needed for other support thing. I’ll let you know when they’re complete. Rather than immediately publish them, I’d just like you to run a few tests to see if they are working better.

jkozee

@Tom-Elliott NP, just let me know.

Tom Elliott

@jkozee http://mastacontrola.com/bzImage (64bit)
http://mastacontrola.com/bzImage32 (32bit)

jkozee

@Tom-Elliott Got them. Give me a few minutes to fight another fire, then I’ll test and report.

jkozee

@Tom-Elliott
I didn’t run the full deploy (probably not needed at this point), but the “Formatting initialized partition” step during deploy takes 3:34, which is on par with 4.4.1. So, the new build doesn’t appear to solve this issue.

Wonder if something changed in the device block size or cache? Or maybe it does full zero out of the partition now, when it used to do a quick format?

Tom Elliott

@jkozee that would mean it was a binary of the ntfs-progs of which changing the kernel out would not have any impact.

jkozee

@Tom-Elliott Ah, yes for ntfs. So, perhaps the block driver.

jkozee

@Tom-Elliott I kicked off a script to build the kernels. Assuming they build and boot, I’ll report my findings.

jkozee

@Tom-Elliott Hmm. 3.3.2 built but wouldn’t boot. I got a kernel panic, not sycning VFS. I used the config from https://svn.code.sf.net/p/freeghost/code/trunk/kernel/TomElliott.config.64. Is there another one I should use for 3.3.2?

Tom Elliott

@jkozee 3.3 is very old. I thought 4.3 worked and 4.4 doesn’t so I would suspect somewhere between those would be enough to start to figure out.

Solved Performance decrease using Hyper-V Win10 clients

234

11.5k

16.9k

153.2k