Extremely Slow Deploy to NVME drives
-
@robbit The sg300 is a good enough switch. So lets rule out network infrastructure.
The 7050 most likely has a NMVe disk in it. So I’m pretty sure we can rule out the FOG server, so that kind of leaves the target computer.
I’m trying to remember if I’ve seen this issue before with nvme. I’m thinking yes (but I see a lot of things in the threads so I may be mixing up issues). The latest version of fog is 1.5.7.2 which addresses a few issues that was discovered after 1.5.7 was release. How did you install FOG, was it using the git method or tarball?
-
@george1421 said in Extremely Slow Deploy to NVME drives:
tarball
Tarbell. I went https://fogproject.org/download to download it. Installed it as Normal Server.
I forgot to mention, Fog is installed on a standard 3TB 3.5" HDD. The images are being stored under /images.
-
I’m not going to say this will fix the issue but lets setup the git method, because it will be easier to install future updates this way. FWIW the download method is the tarball way.
Run these commands.
sudo apt-get update && apt-get install git sudo -i git clone https://github.com/FOGProject/fogproject.git /root/fogproject cd /root/fogproject/bin ./installfog.sh
Just rerun through the installer, it will use the answer file you created on the first install. Hopefully that will bring you up to 1.5.7.2, if not we will need to switch to the development branch. But lets see if we can get to the right version this way.
FWIW when a new release comes out you would just switch to the /opt/fogproject directory and do a
git pull
then change to the bin directory and run the installfog.sh script again. No need to download and extract the updated tarball.That 3TB spinning hard disk won’t be a problem until you have 3-4 simultaneous unicast imaging running at the same time, then it will be the bottleneck in your fog server, with that single nic card being a close second.
-
Thanks for the screenshot. Looks like I don’t have apt-get on CentOS 7. Is there another alternative? It seems like I just need to get Git, right? Tried running yum install git but got a bunch of errors about mirror sites not available.
-
@robbit ok I just assumed since most people install ubuntu for some reason you had that. Let me translate it for centos (what I use)
yum update -y yum install git -y git clone https://github.com/FOGProject/fogproject.git /root/fogproject cd /root/fogproject/bin ./installfog.sh
-
@george1421
updating now, I ran ./installfog.sh and it’s taking some time to backup database. I will report back tomorrowThank you for all the help so far! I’ve also just found this thread just now. We pretty much have the same exact laptop
https://forums.fogproject.org/topic/13733/hp-elitebook-830-gen-6-issues-capturing-images-and-deploying-images/5so I may try the kernal parameters as well (unless you suggest otherwise)
-
@george1421 @robbit As for the commands to get the very latest develpment version (called
dev-branch
) you need to add one more command to what was posted below:sudo -i cd /root/fogproject/bin git checkout dev-branch ./installfog.sh
Leaving that one command out doesn’t breach anything but you’ll end up with current master 1.5.7 again.
About the slow speed. Do you have Toshiba drives? https://forums.fogproject.org/topic/13620/very-slow-cloning-speed-on-specific-model
-
Thank you for that. Our HP laptops were equipped with Western Digital PC SN520 NVMe SSD, but also tested with Toshiba NVMe THNSSN5256GPUK with same results. After the update, it still went to a crawl HOWEVER
When I went to change the Kernal parameter under FOG Configuration -> FOG Settings -> General -> Kernel args -> nvme_core.default_ps_max_latency_us=5500 -> it’s working now. I was able to deploy the image to a NVMe drive via unicast.
Test#1 w/ Fog Server 1.5.7.3
- Same results with the NVMe drives where the rate goes to a crawl of about 10MB/min
Test#2 w/ nvme_core.default_ps_max_latency_us=5500 + Fog 1.5.7.3
- Deploying at a solid speed ~5GB/min on isolated networrk
Whatever it is, the combination of both of those have fixed the issue.
I want to thank both you @george1421 and @Sebastian-Roth for chiming in! It’s good to know there’s a very active community for this.
-
I am having this same issue. I posted about this a month ago. I tried the nvme_core.default_ps_max_latency_us=5500 kernel argument but it returned an error about it being not a valid identifier after booting. I’ve updated to 1.5.7.4. Any other suggestions?
-
@Shad0wguy said in Extremely Slow Deploy to NVME drives:
nvme_core.default_ps_max_latency_us=5500
This needs to go into the kernel parameters field either the global one in FOG Settings->FOG Configuration field or in the host specific field kernel parameter.
-
I’m beginning to run into the same problem, also with an HP 840 G6. I’ve tried the kernel arguments, however it just says ‘not a valid indentifier’. I have a Samsung 850 evo M.2 Sata 6GB/s drive laying around which I tested with, along with an HP 830 G5:
830 + 850 EVO: Success
830 + Original NVMe SSD: Success
840 + Original NVMe SSD: Failure
840 + 850 Evo: SuccessI compared the SSDs from the 840 G6 and the 830 G5, and they are the exact same model. So while it’s of course a very small sample size, it’s pretty clear that’s it has something to do with the 840 G6 combined with NVMe that’s causing the failure.
One thing that also only happens with the 840 G6 is that it shows the following message:
udevd[3088]: inotify_add_watch(6, /dev/nvme0n1p2, 10) failed: no such file or directory udevd[3089]: inotify_add_watch(6, /dev/nvme0n1p1, 10) failed: no such file or directory
-
@KSiig said in Extremely Slow Deploy to NVME drives:
however it just says ‘not a valid indentifier’
Please take a picture of the error and post here!!
-
We’re getting the same slow deployment speeds when deploying images to HP Elitebook 840 Gen 6 laptops. All other laptops are fine. Before the deploy starts (at very slow speeds), it will hang at the initial Partclone screen for 10 - 15 mins.
@Sebastian-Roth & @george1421 - Below is a photo of the the ‘not a valid identifier’ error we see. It will also hang at this stage from time-to-time.
Below is the setting applied to the host in Fog.
We’re running CentOS 7.6 with Fog v1.5.7.4 and Kernel 5.1.16. I’ve tried with and without nvme_core.default_ps_max_latency_us=5500 set as the Host Kernel Arguments and Partclone 0.3.12 which @Quazz provided in another post.
The disks appear to be Toshiba NVMe kbg30zmv256g.
All suggestions welcome and happy to provide anymore info that might help troubleshoot.
-
@Middle That error message (while its a valid error message) isn’t important in this case. After testing the dot after nvme_core is at issue with the bash shell for variables to be used during image deployment. What IS important is that the linux kernel see that parameter and understand it. To test this you can create a debug deployment (capture or deploy) by ticking the debug checkbox just before scheduling the task. PXE boot the target computer and after a few screens of text you will be dropped to the FOS Linux command prompt. Key in
sysctl -a | grep nvme
If that nvme_core parameter is set then its job is done.Now with that said, in another thread (I need to locate and link here) the developers are testing a new init (FOS Linux virtual hard drive) with updated version of partclone. The updated version of partclone along with the nvme_core kernel parameter seemed to fix the slow speeds with these specific nvme drives.
Edit: here is the link in @Quazz post. Understand this is an experimental init that hasn’t been fully tested, but has shown promise on these nvme drives. https://forums.fogproject.org/topic/13620/very-slow-cloning-speed-on-specific-model/10
Edit2: Wait, I see from your picture you are already using the new inits because you have partclone 3.12. Hmmmm there must be something else going on here.
-
@george1421 entered debug and the nvme_core parameter doesn’t appear to be set.
-
@Middle make sure that sysctl exists on the FOS Linux system. Just run
sysctl -a
and that will print out all kernel parameters. You can pipe that to more if you want to see it a page at a time. -
@george1421
sysctl
exists and returns results. Usingsysctl -a | more
we still don’t see anything that’s setting the latency parameter. -
@Middle OK so sysctl exists lets try this.
- Setup a debug deploy (tick the debug checkbox when you go to schedule the task)
- PXE boot the target computer.
- After several screens of text you need to clear by pressing the enter key you will be dropped to the FOS Linux command prompt.
- At the FOS Linux command prompt key in
sysctll -w nvme_core.default_ps_max_latency_us=5500
and press enter - Confirm the setting is in place with
sysctl -a | grep nvme
- If everything checks out OK then start the FOG master script with
fog
In debug mode you will need to press enter after every step but imaging should proceed. Lets see if by manually entering the kernel parameter it images correctly. If it works we can do an automated process later. Right now I want to see if there is a change.
-
@george1421 Back in the office and just tried this. We get unknown key for the latency parameter.
grep nvme doesn’t return the value either and running fog gets stuck at the ‘Restoring Partition Tables (GPT)’ section, so didn’t get as far as Partclone. Note we’re using Kernel 5.1.16.
Thanks for helping out.
-
@Middle Try setting the kernel argument as a global setting instead of on the host page. (FOG Configuration -> FOG Settings -> General -> Kernel args)
This problem may also be resolved with SSD firmware updates if available.
I’d also be interested in the results of kernel args
pcie_aspm=off
andpcie_aspm=force
(do not set the latter as global)Only set one of the 3 kernel arguments.
This problem is caused by ASPM and how certain devices interact with it. The reason it’s a problem specifically for NVME devices is because of their PCIE connection. A lot of these drives have buggy implementations (sometimes fixed in firmware updates)