Very slow cloning speed on specific model
-
@Quazz Do you see any issue with just disabling it for all nvme drives? I don’t know the impact if we did. FOS Linux is not a general purpose OS so we don’t really want or need any sleep functions at all. We really want the OS and the hardware to run as fast as possible and not be concerned about any power savings.
You are right about the postinit scripts. If we had the raw data, I’m sure we could come up with a script to disable this function on certain detected drives or just turn it off all together. Comments??
-
On a working laptop APST was enabled also.
So i guess it is i firmware or slight hardware difference.
With the APST disabled on this one again im seeing speeds of 2.8 - 3.0gb/min
-
@george1421 As far as I’m aware, all disabling APST does is lock the drive to its “highest power state”. Which for the purposes of FOS isn’t a bad choice if it would otherwise malfunction.
I don’t foresee a problem doing this for all NVME devices, but of course there might be instances we are unaware about currently where it does matter for something.
That said, FOS only runs for a little while, so odds of it being bad are very low.
-
@Duncan said in Very slow cloning speed on specific model:
Kernel 4.9.51 … Deployed the image and away it went. Full speed. building about 8gb/min
Is this all the way through or just top speed? Maybe it’s better you note down the full deploy time to compare the different situations more appropriately?!
latest kernel with APST disabled… Its now building at 2.7gb/min.
Does this really mean it’s that much slower than using the 4.9.51 kernel or is it more just a top speed thing? As I said, better we compare the time it takes to deploy the full drive.
@george1421 @Quazz I’d vote for disabling APST in FOS as we don’t need to save energy. The drive should go at full speed.
-
Definatly a difference in speeds.
Using bzimage 4.19 and init_partclone.xz got an average of 3gb/min
Using bzimage-4.9.51 and init.xz started at 7gb/min and dropped and hanging around 6.6(ish)gb/min
both tests on the same laptop
-
@Sebastian-Roth So I’m wondering 2 things.
- Before 1.5.8 comes out, could/should we create a post init script with the logic that might go into FOS Linux for 1.5.8 that would test the impact of this proposed change? This way if the change caused problems, deleting the script would fix it. (know I worded that a bit funny. But the idea is to test it with an approved post init script before its coded into 1.5.8. So if people have this issue, we can say place this script here and test. This would be for 1.5.7 and lower versions)
- Does the kernel parameter
nvme_core.default_ps_max_latency_us=0
have any impact on shutting off this feature right at the disk level? Better/worse/nochange? If it had a positive impact then that could be integrated into the post init script and then into FOS Linux 1.5.8.
-
@george1421 Yes, good points:
- It’s a good idea to provide a post init script right now for people to test. I am not exactly sure what part is doing it. I think it’s
nvme set-feature -f 0x0c -v=0 /dev/nvme0
right? @Duncan @Quazz - Would you like to help testing as well, @oleg-knysh? - I have thought about the
nvme_core.default_ps_max_latency_us
parameter as well. Not sure if that sort of doing the same thing?! Probably a bit different but might have the same outcome?! The parameter is mentioned in that ARCH Linux wiki I posted below already. @Duncan Would you please test this kernel parameter for us on that problematic laptop? Go to the host’s settings in the web UI and setnvme_core.default_ps_max_latency_us=0
as Kernel Parameter but using the default kernel (4.15.x). See what speed you get. As well trynvme_core.default_ps_max_latency_us=5500
(as described in the wiki) also using default kernel. Thanks!
- It’s a good idea to provide a post init script right now for people to test. I am not exactly sure what part is doing it. I think it’s
-
Ok so i ran some tests, i hope it make sense to you all.
These where all ran on the same original slow laptop i have been using since the start.
Build1:
Host Kernel: Blank
Host Kernel Arguments:BLank
Host Init: Blankbuild speed slow
Build2:
Host Kernel: bzImage-4.9.51
Host Kernel Arguments:BLank
Host Init: Blankbuild speed - 6.5gb - 7gb/min (ish)
build3:
Host Kernel:bzimage
Host Kernel Argument: nvme_core.default_ps_max_latency_us=0
Host Init: Blanknvme_core.default_ps_max_latency_us=0 - not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish)Build4:
Host Kernel: bzImage-4.15.2
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=0
Host Init: Blanknvme_core.default_ps_max_latency_us=0 - not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish)Build5:
Host Kernel: bzImage-4.9.51
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=0
Host Init: Blanknvme_core.default_ps_max_latency_us=0 - not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish)build 6:
Host Kernel: bzImage
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=5500
Host Init: Blanknvme_core.default_ps_max_latency_us=5500- not a valid identifier
build speed slowbuild7:
Host Kernel: bzImage-4.15.2
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=5500
Host Init: Blanknvme_core.default_ps_max_latency_us=5500 - not a valid identifier
build speed slowbuild8:
Host Kernel: bzImage-4.9.51
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=5500
Host Init: Blanknvme_core.default_ps_max_latency_us=5500- not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish) -
@Duncan said in Very slow cloning speed on specific model:
vme_core.default_ps_max_latency_us=0 - not a valid identifier
First of all let me say excellent matrix. It looks like the latency of 0 does the trick without having to use the nvme-cli command.
Second thing the above error message is not really an error, its a spurious message because of the way FOG converts kernel parameters into variables. The kernel parameter apparently does its job, but throws that warning which can be ignored.
Again, well done with the truth table matrix. So it looks like you can go back to using the standard fog kernel but just place the kernel argument
nvme_core.default_ps_max_latency_us=0
in the global kernel parameters in the FOG Configuration -> FOG Settings menu. -
Setting now set, my original laptop is now building at the 6.5gb/min i expected.
Will set a load more off soon and report back.
Again many thanks to everyone that has helped me out over the last few weeks.
-
@Duncan Many thanks to you too!! Great work on the testing you’ve done here, awesome. I think this has given us a great set of recipes we can give people in case they run into that issue. We might even think about sending the kernel parameter
nvme_core.default_ps_max_latency_us=0
as default. @Tom-Elliott @Quazz @george1421 Do you see any issue with that?nvme_core.default_ps_max_latency_us=0 - not a valid identifier
As George already said, this is not an issue but more a warning. I was hoping to find some time and fix that at some point. Will do so now.
-
@Sebastian-Roth said in Very slow cloning speed on specific model:
nvme_core.default_ps_max_latency_us=0
I don’t see an issue with just adding into sysctl inside FOS and not worry about passing it. That way the variable conversion won’t have an issue. Also since its a nvme specific kernel tweak, if nvme isn’t use (i.e. sata disk) then the kernel “should” ignore it. I only say “should” because we don’t have a large enough sample population to say yes or no yet. But that is just my opinion.
As I said before the OP did a great job helping us come up with a sound solution. Without having the troubled hardware in front of us it would have been impossible to find a solution.
I still think adding the nvme-cli tool to FOS will add value in trying to debug issues later on too.
-
@george1421 I agree with it all.
I can’t imagine a need for latency being enabled by default. I added it to 1.6 for safety. Shouldn’t be hard to port to 1.5.x
-
@Tom-Elliott I haven’t looked just yet, but there should be a sysctl.conf file in FOS Linux for 1.5.x too.
As I said before FOS Linux isn’t a general purpose OS. We need it to image as fast as possible, power saving states are not wanted or needed. So turning off sleep states for any device should be preferred. I just noticed as I worked on a Dell 9020 there was a specific firmware parameter to disable APST sleep/power management states for pcie devices. When I saw that I went, “Hey I know what that does…”
-
Just added the parameter for 1.5.x too. Way easier to do it via the boot menu code than adding it within FOS using sysctl.
Will finally mark this solved! Thanks to everyone.
-
@Sebastian-Roth I see no harm in it, though I did run into cases where APST had to be explicitily disabled because the latency parameter wasn’t sufficient. But we can cross that bridge if it pops up.
-
@Quazz Would there be any advantage to building that command into the FOS scripts where / if a specific kernel parameter (i.e. a flag parameter) existed it would then signal the nvme-cli command to run during system startup? That way the kernel parameter could be set per machine or globally. That decision could then be up to the FOG Admin to use it or not. By default it would not run.
-
@george1421 As we don’t fancy the NVMe energy saving modes I don’t see why we shouldn’t set this for everyone. Sure we don’t know the consequences yet but most of us seem pretty assured this is not causing harm but only helps. Keeping my fingers crossed. The more people test
dev-branch
the sooner we’ll know. -
Unfortunately, nvme_core.default_ps_max_latency_us=0 hasn’t worked for me. I’ve tried both setting this manually via Host Kernel Argument using the default 1.5.7 kernel and updating to the latest dev-branch which seems to include it by default. Both result in slow transfer on an HP Elitebook 840 G6 (latest December BIOS).
Disabling APST using the init_partclone.xz and debug command that @Quazz posted gets an average transfer speed around 2.5GB/min which is a usable improvement. Is there a way to automate/improve this rather than entering debug mode each time to disable APST?
I haven’t been able to get the bzImage-4.9.51 and init-4.9.x.xz combo to work either (kernel panic with just bzImage-4.9.51 and partclone errors with both of them).
-
@Middle said in Very slow cloning speed on specific model:
Is there a way to automate/improve this rather than entering debug mode each time to disable APST?
Sure, you can use post init scripts!
I am really confused on why you’d get the “kernel too old” error (even when it had already started to image - very strange)?! Unfortunately our build server is offline again and I can’t build another init-4.9.x.xz right now.