Slowness after upgrade to 1.5.7.102 (dev branch)



  • We have experienced some major slowdown of deploying images after upgrading to a newer dev branch to fix some issues we had (I am now on 1.5.7.102). I went from roughly 13GB per minute to 2GB or slower per minute. Very frustrating. I tried capturing the image a couple different times with different compression and a bunch of things. Tried on multiple Dell Optiplex models and all of them display the same slowness. Got this issue on two separate servers (we have 2 campuses at our College, so two different servers). Almost feels like the different kernels did this. We were on “5.1.16 mac nvmefix” but then upgraded to the “4.19.101” which came with the 1.5.7.102 install.

    I am interested in any fix for this as my desktop support team is very frustrated at the moment. I am happy to test out any theories to help this along. Would hate for others to run into this as well. Model information of the computers used is below:

    Dell Optiplex 9010 - 8gb ram, core i5, 500gb crucial SSD drive
    Dell Optiplex 9020 - 12gb ram, core i5, 500gb crucial SSD drive
    Dell Optiplex 7050 - 16gb ram, core i7, 500gb crucial SSD drive
    Dell Optiplex 7060 - 16gb ram, core i7, not sure about the hard drive



  • @Sebastian-Roth Thank you for all you and everyone does. I agree with you this was a good choice to make a second topic out of. This is working well now and I greatly appreciate all the assistance!


  • Senior Developer

    @rogalskij Seems like I was actually wrong about you having a different issue than discussed in the other topic. Nevertheless I think it was good to have two topics to not mix up the information on who had tested what.

    Thanks for your effort. I will update the official binaries tomorrow.



  • @Sebastian-Roth Testing with newer init with partclone 0.3.13 and kernel 4.19.101 seems to fix my slowness in deployment of images. I am seeing speeds of 9GB or more in my testing now. I am guessing it is the partclone that is causing the slowness in some environments. Will continue to test and utilize 0.3.13 partclone to verify theory.

    speed2.jpg


  • Senior Developer

    @rogalskij said in Slowness after upgrade to 1.5.7.102 (dev branch):

    I tried using the original version of init.xz but it claimed my memory was too low or something of that nature

    Can you try using init.xz and bzImage from https://fogproject.org/binaries1.5.7.zip as well as from https://fogproject.org/binaries1.5.3.zip - always use both init.xz/bzImage from the same ZIP archive to test. See if any of these will make speed things up again. Though you have to re-capture the image before you can deploy because those use an older version of partclone that is not able to deploy newer style images.

    As well you might take a picture of the actual memory error on screen of you still get this. We might figure out what this is about.



  • @Tom-Elliott I tried some newer kernels like 5.3.3 as well as what shipped with 1.5.7 (kernel version 4.19.48). I tried using the original version of init.xz but it claimed my memory was too low or something of that nature so I switched it back. The speed didn’t seem to change at all on any of the different kernels, almost like the speed wasn’t related to those kernels. Also as another piece of information I am using UEFI pxe booting with ipxe.efi as the boot file.

    I just updated to 1.5.7.115 but no change at all. Just for full disclosure.


  • Senior Developer

    @rogalskij what’s the lowest version kernel you’ve tried?



  • @Tom-Elliott It is possible it is something in my environment. Typically it is a single machine (imaged either via the console, or directly from the machine itself by a member of our desktop support team). It seemed related to the version because I first noticed it after I upgraded from 1.5.7 to the dev version of 1.5.8 and then later the dev version of 1.5.7.102. It went from like 12GB/min to 4GB/min or 1GB/min. The servers are production boxes so I can’t make too many changes without potentially affecting others but it still seems like it is related to the update I did. Perhaps the update of partclone? Also I noticed that when I pulled an image today for testing, the capture was super quick at 10GB/min but the deploys to the same machine and others like it were 4GB or less. I feel like I am missing something extremely mundane and obvious.


  • Senior Developer

    @rogalskij while I understand the slowdown is problematic some of it seems bound to network speeds. For your tests are you only imaging a single machine, or multiples at the same time? Are they on a separated network or a congested one? No matter how fast an ssd disk you have, these things need to be thought of as well. What are the speeds of the performing network?

    Please don’t take this as the only word. I just want to best understand the whole.



  • @Sebastian-Roth They are normal SSD sata drives for the most part. I was told by a member of the desktop team today though that they tried to image an Optiplex 7070 with an M2 solid state drive (the ram style drives) and it was around 1.4gb per minute.

    I will certainly try to make some changes, but the fixes that came with the dev version far outweigh the speed issues I am having at the moment. Hoping some bug fixes in the 1.5.8 version will do the trick. I will keep testing as I can. Thank you all for your help for now!


  • Senior Developer

    @rogalskij In your initial post you said:

    Dell Optiplex 9010 - 8gb ram, core i5, 500gb crucial SSD drive
    Dell Optiplex 9020 - 12gb ram, core i5, 500gb crucial SSD drive
    Dell Optiplex 7050 - 16gb ram, core i7, 500gb crucial SSD drive
    Dell Optiplex 7060 - 16gb ram, core i7, not sure about the hard drive

    Sounds pretty much like you have the same SSD in all those machines, right? Are those normal SATA SSDs or NVMe drives?


  • Senior Developer

    @rogalskij said in Slowness after upgrade to 1.5.7.102 (dev branch):

    The speed is the exact same however.

    Too bad!

    I tried using the old init.xy file from 1.5.7 but it claimed I didn’t have enough memory.

    Can you try using init.xz and bzImage from https://fogproject.org/binaries1.5.7.zip as well as from https://fogproject.org/binaries1.5.3.zip - see if any of these will make speed things up again. Though you have to re-capture the image before you can deploy because those use an older version of partclone that is not able to deploy newer style images.

    Do I just deal with it until stable version of 1.5.8 comes out maybe?

    We can’t fix this without knowing what exactly is causing this. We don’t have your hardware here and so we need your help to test things and report back.



  • @Sebastian-Roth said in Slowness after upgrade to 1.5.7.102 (dev branch):

    file /var/www/html/fog/service/ipxe/bzImage533

    Good catch, found the typo and got it to deploy. The speed is the exact same however. I stepped through all the steps, verified the 5.5.3 Kernel and the speed is around 950mb per minute. Way slower than my usual. Part Clone is .3.12, I tried using the old init.xy file from 1.5.7 but it claimed I didn’t have enough memory. Very very odd. Do I just deal with it until stable version of 1.5.8 comes out maybe?


  • Senior Developer

    @rogalskij Please run file /var/www/html/fog/service/ipxe/bzImage533 on your FOG server command line and post output here.

    EDIT: Reading the earlier posts I think this is a typo: bzImage533 vs. bzImage553



  • @george1421 Tried the debug method, it booted from bzImage533 but it failed to start. Below is the error I recieved while attempting to boot using that bzImage:

    ipxe_error.jpg


  • Moderator

    @rogalskij When you deploy an image in the web ui, then pxe boot the target computer, iPXE boots then calls boot.php script on the server. Right after boot.php is called it should transfer bzImage and init.xz. The transfer is super fast so its possible to miss it.

    So lets take this route. Go to tasks and close out any open tasks then go back to schedule another deploy, but this time before you hit the schedule task button tick the debug checkbox. Now pxe boot the target computer. After a few screens of text you need to clear with the enter key, you will be dropped to a FOS Linux command prompt. At the command prompt key in uname -a That will print out the kernel version and name. The version number should be 5.5.3 if the proper kernel is booting.

    If that is the case they key in fog and press enter at the command prompt. This is FOG in debug mode, you will need to press the enter key at each break point in the program, but you will be able to single step through the deployment. It will get to the partclone screen so you can see the transfer rates.



  • @george1421 Oddly, when I deploy it doesn’t show any screen that mentions bzimage or xy.init.xz at all. Is this because I am deploying using ipex.efi for booting? I am actually a bit ignorant to what these files actually do? Does the kernel boot the host client, or does ipxe.efi boot the client?



  • @george1421 I watched during the deploy but didn’t see anything relating to which bzimage version it is using. Odd. I have a feeling it is still using the other bzimage. I am going to test by backing up the current 4.19.101 bzimage and renaming the bzimage533 to “bzimage”. Just a temporary test.


  • Moderator

    @rogalskij in the host definition for that specific computer (via the web ui) go to that target host. On the main page there is a field called kernel. In that field enter the custom kernel name bzImage553 (watch your case). Then capture/deploy the image. When you pxe boot the computer watch the iPXE screen you should see it transfer bzImage553 to the target computer along with init.xz



  • @george1421 said in Slowness after upgrade to 1.5.7.102 (dev branch):

    /var/www/html/fog/service/ipxe

    Just tried the new Kernel, I moved it to: /var/www/html/fog/service/ipxe and for the Optiplex 7050 in question I changed “Host Kernel” to “bzImage533”. Is that all I had to do to make that machine image using that 5.3.3 kernel? I didn’t see a way for me to know if I was using the right kernel or not.


Log in to reply
 

277
Online

7.4k
Users

14.5k
Topics

136.7k
Posts