Performance decrease using Hyper-V Win10 clients
-
@Tom-Elliott NP, just let me know.
-
-
@Tom-Elliott Got them. Give me a few minutes to fight another fire, then I’ll test and report.
-
@Tom-Elliott
I didn’t run the full deploy (probably not needed at this point), but the “Formatting initialized partition” step during deploy takes 3:34, which is on par with 4.4.1. So, the new build doesn’t appear to solve this issue.Wonder if something changed in the device block size or cache? Or maybe it does full zero out of the partition now, when it used to do a quick format?
-
@jkozee that would mean it was a binary of the ntfs-progs of which changing the kernel out would not have any impact.
-
@Tom-Elliott Ah, yes for ntfs. So, perhaps the block driver.
-
@Tom-Elliott I kicked off a script to build the kernels. Assuming they build and boot, I’ll report my findings.
-
@Tom-Elliott Hmm. 3.3.2 built but wouldn’t boot. I got a kernel panic, not sycning VFS. I used the config from https://svn.code.sf.net/p/freeghost/code/trunk/kernel/TomElliott.config.64. Is there another one I should use for 3.3.2?
-
@jkozee 3.3 is very old. I thought 4.3 worked and 4.4 doesn’t so I would suspect somewhere between those would be enough to start to figure out.
-
@Tom-Elliott Um, yeah. That’s what I get for trying to multitask and trying to script the builds. Let me see what I’m actually doing. Sorry for wasting space here…
-
@Tom-Elliott Ok, I think I scripted builds 4.3.2 to 4.3.5 and 4.4.1, but I’ll start over just to be sure. I see the config for 4.3 on the repo at r4316. Let me start there and see what I get…
-
@jkozee look on wiki for build tomelliott kernel
Follow instructions and please test with the additional patches. Speed up build time by adding -j $(nproc) to the make commands
-
@jkozee said:
I see the config for 4.3 on the repo at r4316. Let me start there and see what I get…
Don’t bother too much about getting the exact config Tom used for a particular version. I’d suggest using the newest config for all the builds. As far as I know - hope this is correct -
make oldconfig
will ask you on the console if there are settings missing. Older ones will just be tossed.As well, using the same config as a base is wise to properly compare the different kernels versions. Otherwise you end up wondering if a change in config made the difference!
-
@Sebastian-Roth
Looks like my script wasn’t copying the .config file, so I was building with the defaults.I updated it and it built 4.3.2 and it boots fine now. I used the latest config and my script does “yes ‘’ | make oldconfig”. I’ll let it build the ones I mentioned earlier and test them. I’m about out of time for now, so I’ll post the results later.
@Tom-Elliott
I did not have time to write a sed script to include the additional patches from the wiki, but I can do that later or apply them by hand, once I have a chance to test the scripted builds.Sound reasonable?
-
Build script finished quicker than I expected. Looks like it was introduced between 4.3.5 and 4.4.1. I’ll look at git bisect when I can make the time.
-
@jkozee Great work! I am sure you will see what’s exactly causing it and when it was introduced! bisect is your friend.
-
@Sebastian-Roth and @Tom-Elliott
The change to the kernel is actually in the scsi driver.
The commit that introduced the delay is 81988a0e6b031bc80da15257201810ddcf989e64, which applies changes to drivers/scsi/storvsc_drv.c.
I can confirm that reverting the diff on 4.4.2 brings the performance on the hyper-v client on par with 4.3.2. I can’t speak to the commit itself, as I just blindly reverted it and didn’t spend any time on digesting the patch itself.
My timings on the patched 4.4.2 was 2:14 for the deploy and 18:20 for the capture. That means the deploy is 50% faster and the capture is 27% slower than my tests for 4.3.2. @Tom-Elliott I did not include the additional patches you mentioned either, so I would need to retest both kernels under the same server conditions (and with the additional patches applied to 4.4.2) for more accurate results.
Hope this proves useful.
-
Still seems more like the issue should be with the block device, rather than the scsi driver. Seems like it would be related to caching or block size/block alignment of the ssd.
-
To me, these lines from the commit look most interesting:
/* Ensure there are no gaps in presented sgls */
blk_queue_virt_boundary(sdevice->request_queue, PAGE_SIZE - 1); -
So, adding that line to 4.3.2 results in performance degradation and removing it from 4.4.2 results in performance increase. Guess all that’s left is to figure out (understand) what it actually does…