Slow Image Deployment Speeds after Updating BZImage Files
-
@sebastian-roth said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko said:
In the past I have always seen close to 4GB transfer speeds and now we are only seeing around 500mb.
The transfer speed seen on the blue partclone screen is a combination of throughput of disk and network IO. So the plain number does not tell us which one is the bottleneck in your case.
Is it possible to have two different sets of bzimage files? One that works for the new equipment (albeit slower) and one that works with the old equipment (fast) ?
Yes it is. Set the old bzImage as default kernel in FOG Settings but apply the newer kernel through individual host settings for all the newer machines.
Or should I investigate something else entirely. I’m not really sure where to begin. I am on the latest version of fog project release, Ubuntu base.
It’s not “either/or” from my perspective. Use the different kernels now but let’s start to dig in to what is causing the slowness. First off I need to ask a few questions:
- Which version of FOG did you use before the update and which version do you use now (lower right corner of the web UI)?
- Which FOS kernel is currently installed (command
file /var/www/{,html/}fog/service/ipxe/bzImage*
)? - Can you see different speeds when comparing different models? Please use unicast deployments on single machines with different hardware to see if they’re all at the same slow speed or if speeds vary. I highly recommend you take notes when testing the different machines and share the stats here. It’s very hard to help finding a solution if we don’t have the details.
The forum is full of these kind of “slowness” topics and usually it’s related to specific hardware on the clients or a major issue within the network. Sure there also were issues in the FOS init/kernel at times but I am not aware of this kind of issues with the latest FOG versions (be it the official 1.5.9 release or current dev-branch version).
Here is on topic talking about a major speed impact because of a specific BIOS setting. Please check the BIOS settings on VMD: https://forums.fogproject.org/post/141675
I appreciate the detailed response. I should have the ability to test some of the things you suggested on this later this week. I’ll keep you update on the results of my test.
-
@george1421 said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko I’d like to see a bit more scientific results than the new kernels appear slow. My personal experience is with linux 5.5 there was a major improvement in speed over 4.19 series kernels in general linux distributions.
In my mind its not clear if your slowness is related to the kernel, the new version of partclone, target hardware, network infrastructure. Just as a point of reference on a well designed 1GbE network FOG should image at or about 6.1GB/min. Understand there can be some variances in the network design but 5.5-6.5GB/min are what I would say is normal.
I would like to devise an experiment here. You can download previous kernels from here: https://fogproject.org/kernels/ and here: https://fogproject.org/kernels/old/
What I would like to see you do is download the latest 64 bit kernel in the 4.19 series, 5.6 series, and 5.10 series. Place these files in
/var/www/html/fog/service/ipxe
directory. Rename the images as you place them there as bzImage4.19.x, bzImage5.6.x, bzImage5.10.xThe actual name doesn’t matter, just so they are different. Now pick a target system that is circa 2017-2018, or just old enough where the 4.19.x series kernels will run. Now go into the host definition for that target computer. Update the kernel field to bzImage4.19.x save the setting and then deploy an image to the computer. Note the speed on the partclone screen after about 1.5 minutes of deploying the c drive image. Now from the same computer on the same network port, test the 5.6 and 5.10 (you could/should test the 5.15 series kernel too, because you say is about 500Mb/min). The idea is to keep all of the variables the same and only change the kernel to see if the kernel is at fault here.
Thank you George for your detailed response. I 100% agree with you and I want to have a more scientific approach than “I changed this now speeds are bad”. There is definitely something strange going on now that I updated.
I should have some time later this week to test this out.
I’ll make sure to update you on where I end up.
-
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
@george1421 said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko I’d like to see a bit more scientific results than the new kernels appear slow. My personal experience is with linux 5.5 there was a major improvement in speed over 4.19 series kernels in general linux distributions.
In my mind its not clear if your slowness is related to the kernel, the new version of partclone, target hardware, network infrastructure. Just as a point of reference on a well designed 1GbE network FOG should image at or about 6.1GB/min. Understand there can be some variances in the network design but 5.5-6.5GB/min are what I would say is normal.
I would like to devise an experiment here. You can download previous kernels from here: https://fogproject.org/kernels/ and here: https://fogproject.org/kernels/old/
What I would like to see you do is download the latest 64 bit kernel in the 4.19 series, 5.6 series, and 5.10 series. Place these files in
/var/www/html/fog/service/ipxe
directory. Rename the images as you place them there as bzImage4.19.x, bzImage5.6.x, bzImage5.10.xThe actual name doesn’t matter, just so they are different. Now pick a target system that is circa 2017-2018, or just old enough where the 4.19.x series kernels will run. Now go into the host definition for that target computer. Update the kernel field to bzImage4.19.x save the setting and then deploy an image to the computer. Note the speed on the partclone screen after about 1.5 minutes of deploying the c drive image. Now from the same computer on the same network port, test the 5.6 and 5.10 (you could/should test the 5.15 series kernel too, because you say is about 500Mb/min). The idea is to keep all of the variables the same and only change the kernel to see if the kernel is at fault here.
Thank you George for your detailed response. I 100% agree with you and I want to have a more scientific approach than “I changed this now speeds are bad”. There is definitely something strange going on now that I updated.
I should have some time later this week to test this out.
I’ll make sure to update you on where I end up.
I have uploaded some screenshots of my Partclone speeds.
This picture is the current speeds that I am seeing today.The second picture is an older one from 2020 before the update. It is using a much older version of partclone.
I also found the old Kernels that I was using before I updated as well. I was using version :
bzImage32: Linux kernel x86 boot executable bzImage, version 4.19.145 (sebastian@Tollana) #1 SMP Sun Sep 13 05:43:10 CDT 2020, RO-rootFS, swap_dev 0x7, Normal VGASo now I am at the point where I would like to try and run two different Kernels. One for my newer equipment and one for the older.
For this process, you are suggesting to modify the host hardware inventory and add the name of the specific Kernel into this menus here?
I can do some modifications just for the sake of the test, but for production rollout I typically don’t register any of the computers to the Fog server unless they are a ‘master’ PC which we use to capture the images. Is it possible to set this Kernel on the ‘image’ level instead? Or is Fog ‘smart’ enough to know that for all computers being deployed with this image should be associated with the Host kernel?
-
@cdutko Well we’d really like to know what changed your speed. There maybe something else (other than FOG that is causing this). So the idea is to keep everything the same except for changing the kernel to see if that has an impact. I know with the 5.15 version some vulnerability mitigation code was enabled. So its unclear of that is having a speed impact on deployment. Also partclone changing might have an impact on speed. I’m leaning towards an impact on capture speed over deployment speed, but anything is possible.
From a real world experience I would expect normal imaging, on a solid 1GbE network with contemporary hardware to run in the 5.5 to 6.3 GB/minute range. The to note is that 668MB/min is suspicious. 668MB/min == 11.13MB/sec. 100Mb networking runs at 12.5MB/sec (theoretical). It may just be a numeric anomaly but your imaging speeds might indicate there is a 100MB network link somewhere in your imaging path.
-
@george1421 said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko Well we’d really like to know what changed your speed. There maybe something else (other than FOG that is causing this). So the idea is to keep everything the same except for changing the kernel to see if that has an impact. I know with the 5.15 version some vulnerability mitigation code was enabled. So its unclear of that is having a speed impact on deployment. Also partclone changing might have an impact on speed. I’m leaning towards an impact on capture speed over deployment speed, but anything is possible.
From a real world experience I would expect normal imaging, on a solid 1GbE network with contemporary hardware to run in the 5.5 to 6.3 GB/minute range. The to note is that 668MB/min is suspicious. 668MB/min == 11.13MB/sec. 100Mb networking runs at 12.5MB/sec (theoretical). It may just be a numeric anomaly but your imaging speeds might indicate there is a 100MB network link somewhere in your imaging path.
I would like to know as well. Currently I am still stuck with the slower speeds. I can send a picture of my network setup, but it is beyond simple really. The Fog Server is completely disconnected from the rest of my network, so it doesn’t get impacted by the other devices on my network. I have a 5-port unmanaged 10/100/1000 Switch that sits between the Fog server and the 1 or 2 devices we have being deployed to.
Does my question make sense? About associating the Kernel to an image versus a host?
-
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
Does my question make sense? About associating the Kernel to an image versus a host?
FOG currently doesn’t have that capabilities. The kernel is assigned to the host and the image is assigned to the host. If there is no host record then there is nothing to link the two values to. Its missing the linking pin.
But again we haven’t confirmed that its the kernel at fault yet have we? I find its hard to believe that it IS the kernel, but anything is possible so we shouldn’t rule it out until we test it.
Other levels of debugging we can do is run a iperf test between the target computer and fog server. We can test network throughput. I also have tests for testing local hard drive (or ssd) speeds. Its possible that something is causing a slowness in writing to disk (the kernel would probably be at fault here). Right now we don’t know a lot other than it was fast at one time and now its slow.
-
@george1421 said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
Does my question make sense? About associating the Kernel to an image versus a host?
FOG currently doesn’t have that capabilities. The kernel is assigned to the host and the image is assigned to the host. If there is no host record then there is nothing to link the two values to. Its missing the linking pin.
But again we haven’t confirmed that its the kernel at fault yet have we? I find its hard to believe that it IS the kernel, but anything is possible so we shouldn’t rule it out until we test it.
Other levels of debugging we can do is run a iperf test between the target computer and fog server. We can test network throughput. I also have tests for testing local hard drive (or ssd) speeds. Its possible that something is causing a slowness in writing to disk (the kernel would probably be at fault here). Right now we don’t know a lot other than it was fast at one time and now its slow.
Understood. Thank you George. I will take some time and do some testing with some different kernels and see where I end up.
I honestly think it is hardware/software related, but I am going to work through the steps to prove it today. My network setup hasn’t changed , the server itself is using a SATA SSD, so read/write speeds shouldn’t be an issue either. All cabling is CAT5e or better.
Pic of switch setup:
Pic of “station”
Power supply on the back is used to power the screens, have KVM switch for USB Mouse/keyboard for navigating Fog menu (pre deployment).
-
@cdutko and @george1421 My memory is kind of hazy and I apologize for being so out of the loop lately, but just wanted to give my memory of things a little understanding as to the potential of the issue.
As you stated George, you may remember the whole VMD configuration within the BIOS of the host machine posing a potential issue. This is most likely the culprit though I don’t fully understand what this does exactly. However, this is due to the kernel as well.
If my memory is serving correctly, at some point there was a change in the Kernel’s on how it calculates hard drive partitions which we did have a patch for at some point. Memory is a bit hazy, but it was a patch to the partclone functions because editing a kernel is pretty difficult to do (though again I could be incorrect on my memory here.)
While adjusting the parameters, it was more around Page Sizing and VMD being disabled allowed the speeds to work better. It seems (too me) that the kernel element that brought the slowness issue wasn’t/isn’t really a bug, but rather an improvement to allow the VMD capabilities to be actually used appropriately. This does cause a detriment it seems, but this is semi-intentional.
I hope this makes sense and maybe allows more proper understanding of the issue and potentially a way for us (the FOG Team) to narrow down the issue and maybe come up with a better fix than “adjust this item in your bios for every machine that has the problem”.
Of course you can also simply use the older kernel that doesn’t appear to have this issue.
-
@tom-elliott said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko and @george1421 My memory is kind of hazy and I apologize for being so out of the loop lately, but just wanted to give my memory of things a little understanding as to the potential of the issue.
As you stated George, you may remember the whole VMD configuration within the BIOS of the host machine posing a potential issue. This is most likely the culprit though I don’t fully understand what this does exactly. However, this is due to the kernel as well.
If my memory is serving correctly, at some point there was a change in the Kernel’s on how it calculates hard drive partitions which we did have a patch for at some point. Memory is a bit hazy, but it was a patch to the partclone functions because editing a kernel is pretty difficult to do (though again I could be incorrect on my memory here.)
While adjusting the parameters, it was more around Page Sizing and VMD being disabled allowed the speeds to work better. It seems (too me) that the kernel element that brought the slowness issue wasn’t/isn’t really a bug, but rather an improvement to allow the VMD capabilities to be actually used appropriately. This does cause a detriment it seems, but this is semi-intentional.
I hope this makes sense and maybe allows more proper understanding of the issue and potentially a way for us (the FOG Team) to narrow down the issue and maybe come up with a better fix than “adjust this item in your bios for every machine that has the problem”.
Of course you can also simply use the older kernel that doesn’t appear to have this issue.
@Tom-Elliott thank you for the detailed response.
My host system is quite an old build, so I don’t believe I even have those VMD settings to enable/disable even if I wanted to!
@george1421 /@Tom-Elliott what is interesting is that it does seem to be entirely related to the Kernel update causing the slower speeds. I don’t think that my old Celeron computers that I am imaging has enough horsepower to use the new/enhanced features that you guys implemented.
I reverted my bzImage files to the old Kernel that I was using ( 4.19.145) and my speeds have returned to the “FAST” speeds that I am expecting.
I need some way now to be able to use both the old Kernel for my (for lack of a better term) Legacy computers and then use the new Kernel for my Win10 11th/12th gen builds.
-
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
Is it possible to set this Kernel on the ‘image’ level instead? Or is Fog ‘smart’ enough to know that for all computers being deployed with this image should be associated with the Host kernel?
No and No. The kernel can only be set for hosts and/or globally.
-
@sebastian-roth said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
Is it possible to set this Kernel on the ‘image’ level instead? Or is Fog ‘smart’ enough to know that for all computers being deployed with this image should be associated with the Host kernel?
No and No. The kernel can only be set for hosts and/or globally.
Hmm… I guess I would have to do quick registration of the newer computers and manually associate them with the newer Kernel?
I’m honestly not really sure what my best course of action is here.
-
@cdutko Please go back to the questions I posted in my first answer. Do you see the same slowness with every hardware? Have you tried in different models? What kind hardware are we talking about?
Have you checked on the BIOS setting VMD as mentioned before?
The spectre mitigation George mentioned is being discussed here: https://forums.fogproject.org/topic/16508/enable-or-disable-speculation_mitigations-in-the-linux-kernel (Unfortunately the test kernels are long gone.)
-
@sebastian-roth said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko Please go back to the questions I posted in my first answer. Do you see the same slowness with every hardware? Have you tried in different models? What kind hardware are we talking about?
Have you checked on the BIOS setting VMD as mentioned before?
The spectre mitigation George mentioned is being discussed here: https://forums.fogproject.org/topic/16508/enable-or-disable-speculation_mitigations-in-the-linux-kernel (Unfortunately the test kernels are long gone.)
I was not seeing the slow speeds with the newer equipment we are imaging. They are 11th gen Intel systems.
I do not have any bios settings with respect to VMD.
The updated kernel seemed to only effect the performance of my previous, older spec units (https://www.advantech.com/en/products/1-2jkjm3/ppc-3100s/mod_1a0afa16-fab0-4642-aec4-504d02832d28).
-
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
The updated kernel seemed to only effect the performance of my previous, older spec units (https://www.advantech.com/en/products/1-2jkjm3/ppc-3100s/mod_1a0afa16-fab0-4642-aec4-504d02832d28).
Interesting piece of hardware. What kind of hard drive do you use in those?
We should really get into testing the different parts of the hardware to find out what is causing the slowness. Please schedule a debug deploy, boot it up and when you get to the console run the following command:
dd if=/dev/zero of=/dev/sda bs=4G count=1 oflag=direct status=progress
(depending on the drive you might need to usenvme0n1
instead ofsda
) - HINT: Be aware this will wipe the data off the drive!Do this three times in a row and take a picture of the output on screen to post here.
-
@sebastian-roth said in Slow Image Deployment Speeds after Updating BZImage Files:
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
The updated kernel seemed to only effect the performance of my previous, older spec units (https://www.advantech.com/en/products/1-2jkjm3/ppc-3100s/mod_1a0afa16-fab0-4642-aec4-504d02832d28).
Interesting piece of hardware. What kind of hard drive do you use in those?
We should really get into testing the different parts of the hardware to find out what is causing the slowness. Please schedule a debug deploy, boot it up and when you get to the console run the following command:
dd if=/dev/zero of=/dev/sda bs=4G count=1 oflag=direct status=progress
(depending on the drive you might need to usenvme0n1
instead ofsda
) - HINT: Be aware this will wipe the data off the drive!Do this three times in a row and take a picture of the output on screen to post here.
Yeah it is a Panel PC that we use in our printing hardware.
We are using with that particular model, SQF-SMSM2-64G-SBE it is an SQF mSATA 640 64G MLC (40~85°C).
Am I supposed to run this command on a Client that I am deploying to? Sorry I am probably not using FOG in its intended way as I only register the hosts that I capture images from and none of the clients that I deploy images to ever get registered on the system.
Also would it make sense to do this with the Old Kernel? Since that is working… I’m assuming we want to run this Debug command with the new kernel that is ‘not working’ for me.
-
@cdutko said in Slow Image Deployment Speeds after Updating BZImage Files:
Am I supposed to run this command on a Client that I am deploying to?
Yes, and the client needs to be registered for it to run a debug deploy task. If you want to figure this out you need to look into the details and run those kinds of commands which you can’t if the client is not registered.
Also would it make sense to do this with the Old Kernel? Since that is working… I’m assuming we want to run this Debug command with the new kernel that is ‘not working’ for me.
Well I suggest you do the first round of tests using the new kernel to get some figures. This will tell us if it’s disk IO causing the slowness or not.