Very slow cloning speed on specific model
-
Ok so i ran some tests, i hope it make sense to you all.
These where all ran on the same original slow laptop i have been using since the start.
Build1:
Host Kernel: Blank
Host Kernel Arguments:BLank
Host Init: Blankbuild speed slow
Build2:
Host Kernel: bzImage-4.9.51
Host Kernel Arguments:BLank
Host Init: Blankbuild speed - 6.5gb - 7gb/min (ish)
build3:
Host Kernel:bzimage
Host Kernel Argument: nvme_core.default_ps_max_latency_us=0
Host Init: Blanknvme_core.default_ps_max_latency_us=0 - not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish)Build4:
Host Kernel: bzImage-4.15.2
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=0
Host Init: Blanknvme_core.default_ps_max_latency_us=0 - not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish)Build5:
Host Kernel: bzImage-4.9.51
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=0
Host Init: Blanknvme_core.default_ps_max_latency_us=0 - not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish)build 6:
Host Kernel: bzImage
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=5500
Host Init: Blanknvme_core.default_ps_max_latency_us=5500- not a valid identifier
build speed slowbuild7:
Host Kernel: bzImage-4.15.2
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=5500
Host Init: Blanknvme_core.default_ps_max_latency_us=5500 - not a valid identifier
build speed slowbuild8:
Host Kernel: bzImage-4.9.51
Host Kernel Arguments: nvme_core.default_ps_max_latency_us=5500
Host Init: Blanknvme_core.default_ps_max_latency_us=5500- not a valid identifier
build speed fast - 6.5gb - 7gb/min (ish) -
@Duncan said in Very slow cloning speed on specific model:
vme_core.default_ps_max_latency_us=0 - not a valid identifier
First of all let me say excellent matrix. It looks like the latency of 0 does the trick without having to use the nvme-cli command.
Second thing the above error message is not really an error, its a spurious message because of the way FOG converts kernel parameters into variables. The kernel parameter apparently does its job, but throws that warning which can be ignored.
Again, well done with the truth table matrix. So it looks like you can go back to using the standard fog kernel but just place the kernel argument
nvme_core.default_ps_max_latency_us=0
in the global kernel parameters in the FOG Configuration -> FOG Settings menu. -
Setting now set, my original laptop is now building at the 6.5gb/min i expected.
Will set a load more off soon and report back.
Again many thanks to everyone that has helped me out over the last few weeks.
-
@Duncan Many thanks to you too!! Great work on the testing you’ve done here, awesome. I think this has given us a great set of recipes we can give people in case they run into that issue. We might even think about sending the kernel parameter
nvme_core.default_ps_max_latency_us=0
as default. @Tom-Elliott @Quazz @george1421 Do you see any issue with that?nvme_core.default_ps_max_latency_us=0 - not a valid identifier
As George already said, this is not an issue but more a warning. I was hoping to find some time and fix that at some point. Will do so now.
-
@Sebastian-Roth said in Very slow cloning speed on specific model:
nvme_core.default_ps_max_latency_us=0
I don’t see an issue with just adding into sysctl inside FOS and not worry about passing it. That way the variable conversion won’t have an issue. Also since its a nvme specific kernel tweak, if nvme isn’t use (i.e. sata disk) then the kernel “should” ignore it. I only say “should” because we don’t have a large enough sample population to say yes or no yet. But that is just my opinion.
As I said before the OP did a great job helping us come up with a sound solution. Without having the troubled hardware in front of us it would have been impossible to find a solution.
I still think adding the nvme-cli tool to FOS will add value in trying to debug issues later on too.
-
@george1421 I agree with it all.
I can’t imagine a need for latency being enabled by default. I added it to 1.6 for safety. Shouldn’t be hard to port to 1.5.x
-
@Tom-Elliott I haven’t looked just yet, but there should be a sysctl.conf file in FOS Linux for 1.5.x too.
As I said before FOS Linux isn’t a general purpose OS. We need it to image as fast as possible, power saving states are not wanted or needed. So turning off sleep states for any device should be preferred. I just noticed as I worked on a Dell 9020 there was a specific firmware parameter to disable APST sleep/power management states for pcie devices. When I saw that I went, “Hey I know what that does…”
-
Just added the parameter for 1.5.x too. Way easier to do it via the boot menu code than adding it within FOS using sysctl.
Will finally mark this solved! Thanks to everyone.
-
@Sebastian-Roth I see no harm in it, though I did run into cases where APST had to be explicitily disabled because the latency parameter wasn’t sufficient. But we can cross that bridge if it pops up.
-
@Quazz Would there be any advantage to building that command into the FOS scripts where / if a specific kernel parameter (i.e. a flag parameter) existed it would then signal the nvme-cli command to run during system startup? That way the kernel parameter could be set per machine or globally. That decision could then be up to the FOG Admin to use it or not. By default it would not run.
-
@george1421 As we don’t fancy the NVMe energy saving modes I don’t see why we shouldn’t set this for everyone. Sure we don’t know the consequences yet but most of us seem pretty assured this is not causing harm but only helps. Keeping my fingers crossed. The more people test
dev-branch
the sooner we’ll know. -
Unfortunately, nvme_core.default_ps_max_latency_us=0 hasn’t worked for me. I’ve tried both setting this manually via Host Kernel Argument using the default 1.5.7 kernel and updating to the latest dev-branch which seems to include it by default. Both result in slow transfer on an HP Elitebook 840 G6 (latest December BIOS).
Disabling APST using the init_partclone.xz and debug command that @Quazz posted gets an average transfer speed around 2.5GB/min which is a usable improvement. Is there a way to automate/improve this rather than entering debug mode each time to disable APST?
I haven’t been able to get the bzImage-4.9.51 and init-4.9.x.xz combo to work either (kernel panic with just bzImage-4.9.51 and partclone errors with both of them).
-
@Middle said in Very slow cloning speed on specific model:
Is there a way to automate/improve this rather than entering debug mode each time to disable APST?
Sure, you can use post init scripts!
I am really confused on why you’d get the “kernel too old” error (even when it had already started to image - very strange)?! Unfortunately our build server is offline again and I can’t build another init-4.9.x.xz right now.
-
@Middle Looking back at @Duncan’s first post in this topic I see those details:
EliteBook 840 G6 … Toshiba KXG60ZNV256G 79VA215DKRVN
Can you please check if you actually have the exact same disk model?
As well, can you please try using Acronis to capture and deploy an image just to see if their kernel works on your hardware. I will send you a link as private message. Check the speech bubble in the top right corner.
-
@Sebastian-Roth said in Very slow cloning speed on specific model:
Can you please check if you actually have the exact same disk model?
Just checked, the disk is different. It’s Micron M.2 NVMe Gen3 x4 Model: MTFDHBA256TCK and the HP part number: L36057-001.
I’ll give Acronis a try this morning.
By removing the disk and adding to a HP 840 G5, we can image without issues (I think Duncan had this as well).
-
@Sebastian-Roth said in Very slow cloning speed on specific model:
@Duncan Just updated the kernel in https://fogproject.org/kernels/bzImage-4.9.51
Please re-download and try again.
I’ve been having a similar issue as the OP with one of these images. I’ve updated my fog to the latest version and got the latest kernel but I suppose that it doesn’t necessarily mean it gets the older ones.
I’ve ssh into my Fog environment, ran a command to download the kernel however I’m not sure what to do with it after that point?
-
@dylan123 Now that you have downloaded the kernel make sure it’s in the right location (
/var/www/html/fog/service/ipxe/
), then edit the host’s settings in the web UI and setbzImage-4.9.51
as Host Kernel. Schedule a task, boot the device and pay attention to where it saysbzImage-4.9.51... ok
. -
@Sebastian-Roth said in Very slow cloning speed on specific model:
@dylan123 Now that you have downloaded the kernel make sure it’s in the right location (
/var/www/html/fog/service/ipxe/
), then edit the host’s settings in the web UI and setbzImage-4.9.51
as Host Kernel. Schedule a task, boot the device and pay attention to where it saysbzImage-4.9.51... ok
.Thanks Seb.
Had some issues when I attempted to run the image.
Just showing I’ve got the files in the right area -
Have put the argument in -
When it boots, It says the Kernel is old and then it goes to this -
It then just counts up from /dev/nvme0n1p2, 3, etc
Without the init argument I get a different kind of issue which ends in -
Fatal : Kernel too old
Kernal panic - not syncing: Attempting to kill init! exitcode=0x00007f00kernel Offset: disabled
—[ end Kernel panic - not syncing: Attempting to kill init! exitcode=0x00007f00I’ve only got one more machine to go so I might just be better doing this one from scratch but figured I’d mention it anyways as I’m sure someone else will likely have a similar issue and if it’s something I can fix in the next couple of days I’ll do it that way before manually setting it up.
-
@dylan123 We have seen this “late” message of “kernel too old” already and I still have no idea how it can happen that late in the process. When you leave the init parameter off it uses the current init file which shouldn’t be capable to play with the 4.9.51 kernel and ends in a Kernel panic. That’s understandable. But why the heck do we see the “kernel too old” when it already booted all the way into FOS and started cloning? @Quazz Any ideas?
Is you image set to “Partclone zstd” or “Partclone grip”??
-
@Sebastian-Roth I’m guessing that some programs/functionality got coded with a correct glibc version for that kernel version, thus allowing it to boot and such, but that certain ones for whatever reason didn’t??? (partclone perhaps?) I have no clue how or why that would happen though.
That’s the only thing I can think of anyway!
edit: perhaps here https://github.com/Thomas-Tsai/partclone/blob/master/fail-mbr/compile-mbr.sh
It might accidentally call system GCC which might use a newer glibc version, thus causing the mismatch?