Extremely Slow Deploy to NVME drives
-
We’re getting the same slow deployment speeds when deploying images to HP Elitebook 840 Gen 6 laptops. All other laptops are fine. Before the deploy starts (at very slow speeds), it will hang at the initial Partclone screen for 10 - 15 mins.
@Sebastian-Roth & @george1421 - Below is a photo of the the ‘not a valid identifier’ error we see. It will also hang at this stage from time-to-time.
Below is the setting applied to the host in Fog.
We’re running CentOS 7.6 with Fog v1.5.7.4 and Kernel 5.1.16. I’ve tried with and without nvme_core.default_ps_max_latency_us=5500 set as the Host Kernel Arguments and Partclone 0.3.12 which @Quazz provided in another post.
The disks appear to be Toshiba NVMe kbg30zmv256g.
All suggestions welcome and happy to provide anymore info that might help troubleshoot.
-
@Middle That error message (while its a valid error message) isn’t important in this case. After testing the dot after nvme_core is at issue with the bash shell for variables to be used during image deployment. What IS important is that the linux kernel see that parameter and understand it. To test this you can create a debug deployment (capture or deploy) by ticking the debug checkbox just before scheduling the task. PXE boot the target computer and after a few screens of text you will be dropped to the FOS Linux command prompt. Key in
sysctl -a | grep nvme
If that nvme_core parameter is set then its job is done.Now with that said, in another thread (I need to locate and link here) the developers are testing a new init (FOS Linux virtual hard drive) with updated version of partclone. The updated version of partclone along with the nvme_core kernel parameter seemed to fix the slow speeds with these specific nvme drives.
Edit: here is the link in @Quazz post. Understand this is an experimental init that hasn’t been fully tested, but has shown promise on these nvme drives. https://forums.fogproject.org/topic/13620/very-slow-cloning-speed-on-specific-model/10
Edit2: Wait, I see from your picture you are already using the new inits because you have partclone 3.12. Hmmmm there must be something else going on here.
-
@george1421 entered debug and the nvme_core parameter doesn’t appear to be set.
-
@Middle make sure that sysctl exists on the FOS Linux system. Just run
sysctl -a
and that will print out all kernel parameters. You can pipe that to more if you want to see it a page at a time. -
@george1421
sysctl
exists and returns results. Usingsysctl -a | more
we still don’t see anything that’s setting the latency parameter. -
@Middle OK so sysctl exists lets try this.
- Setup a debug deploy (tick the debug checkbox when you go to schedule the task)
- PXE boot the target computer.
- After several screens of text you need to clear by pressing the enter key you will be dropped to the FOS Linux command prompt.
- At the FOS Linux command prompt key in
sysctll -w nvme_core.default_ps_max_latency_us=5500
and press enter - Confirm the setting is in place with
sysctl -a | grep nvme
- If everything checks out OK then start the FOG master script with
fog
In debug mode you will need to press enter after every step but imaging should proceed. Lets see if by manually entering the kernel parameter it images correctly. If it works we can do an automated process later. Right now I want to see if there is a change.
-
@george1421 Back in the office and just tried this. We get unknown key for the latency parameter.
grep nvme doesn’t return the value either and running fog gets stuck at the ‘Restoring Partition Tables (GPT)’ section, so didn’t get as far as Partclone. Note we’re using Kernel 5.1.16.
Thanks for helping out.
-
@Middle Try setting the kernel argument as a global setting instead of on the host page. (FOG Configuration -> FOG Settings -> General -> Kernel args)
This problem may also be resolved with SSD firmware updates if available.
I’d also be interested in the results of kernel args
pcie_aspm=off
andpcie_aspm=force
(do not set the latter as global)Only set one of the 3 kernel arguments.
This problem is caused by ASPM and how certain devices interact with it. The reason it’s a problem specifically for NVME devices is because of their PCIE connection. A lot of these drives have buggy implementations (sometimes fixed in firmware updates)
-
@Quazz No change I’m afraid with the pcie_aspm args (slow transfer rate). I didn’t spot any errors like we get with the latency one, however I do receive ‘is an unknown key’ when trying to add in debug mode. I’ve tired with both the 5.1.16 and 4.19.64 kernels.
Incidentally, if I have a kernel args set and I use debug mode, it always seems to stop at the ‘Restoring Partition Tables (GPT)’ section. Running a normal deploy at least moves onto the Partclone screen and eventually to a slow transfer rate.
I’ve also installed the Sept 27th HP BIOS and Firmware pack. I’m still looking for a firmware update specifically for the disk.
-
@Middle Unfortunately, aside from the latency kernel argument there isn’t anything else we can do from our side as far as I’m aware.
Unfortunately manufacturers don’t always check how their stuff works on linux…
-
Same here with the “nvme_core.default_ps_max_latency_us=5500 not a valid identifier” problem…
-
@DeRo93 said in Extremely Slow Deploy to NVME drives:
Same here with the “nvme_core.default_ps_max_latency_us=5500 not a valid identifier” problem…
The message is more a warning that the variable couldn’t be used in the FOS environment but it’s still properly setting the kernel parameter. So it should make a difference if that option is of any help in your case.
-
ah okay. Unfortunately this does not helped and im not able to deploy images on the HP Elitebook 840 G6.
Do you have any other suggestions :/?
-
@DeRo93 If available, install firmware updates, BIOS updates and such.
@Developers Looking over FOS, it seems that sector size is always assumed to be 512. Could this be involved in the slow speeds? (as it would cause missalignment, potentially)
Additionally, it seems sector size isn’t always correctly reported by tools such as fdisk (possibly hardware manufacturers fault; dont know). So even if software is generally clever enough to handle it on its own, if it assumes the wrong value, we can assume worse performance (even after deployment)
-
@Quazz Sector sizes are calculated based on 512 byte sectors, but this wouldn’t necessarily be a miss-alignment. The sector is still 4k and 4096 is divisible into 8 chunks of 512 bytes. The alignment should happen on this disk. This is a logical allowance. There are some 4k drives that only allow 4096 byte sectors, but most drives allow logical 512 byte break outs.
-
@Tom-Elliott Was just wondering, since I came across someone who needed to specifiy 4k sector size to get full speed out of their drive; though admittedly their speeds were far greater than users who are having the issue of this thread it seems! So unrelated indeed. Thanks for the info.
-
@Quazz said in Extremely Slow Deploy to NVME drives:
@DeRo93 If available, install firmware updates, BIOS updates and such.
Already done. Nothing helped :/. I hope there will be a Solution in future. We only work with HP. And i think the new one will all get NVME SSD´s… Anyway thank your for your Help.
-
I’m curious, what drives are inside your HP 840 G6s? We currently have the same laptop at my company, but our drives included in our G6 are Western Digital PC SN520 NVMe SSD. I know there is a configuration with OPAL drives (so they are already encrypted by default)