Upgrade from 1.5.7 to 1.5.8 issues
@Chris-Whiteley So just for clarity, the speed drop you are seeing is with which inits? The ones from Sebastian’s link or the 1.5.7 inits?
@Sebastian-Roth After a test with the new init I am still having the issues of speed decrease. It is almost double what it used to take. My images being pushed out was around 2:30 minutes and now it is 4:17.
@Chris-Whiteley 3-5 minutes is definitely a bigger deal than 0-30 seconds. I was hoping I was right, but I guess not. Have you tried the changes to the kernel suggested?
@JJ-Fullmer Thanks for the update. It is taking the machine considerably longer. Now…longer is relative at about 3-5 extra minutes, but if you have a ton to image it can be painful.
Just wanted to chime in with another report on a speed change between 1.5.7 and 1.5.8
1.5.7 ~22 GiB/min
1.5.8 ~11 GiB/min
This is on nvme drives, and we have a gigabit port aggregation on the main deploying node (in case you’re wondering how we got it going so fast).
However on 1.5.7 there was always a slow but steady drop in speed. It would start at 20-25 GiB/min and slowly drop GiB/min every couple seconds. But I never cared much since the ~20 GB image was done deploying in 2-3 minutes each time. In 1.5.8 it isn’t doing the speed drop and the overall time taken is about the same. It was just cycling between just below and just above 11 GiB/min (i.e. 10.58 - 11.03 or something along those lines) Looking at some of my recent imaging times just before and now after the upgrade to 1.5.8 they’re all at about 2 minutes 30 seconds. The only real variation appears to be the hardware being imaged, which is to be expected.
Point being, perhaps there isn’t actually a speed change but rather a more accurate overall average speed for the whole process instead of attempting a realtime speed? Or maybe just a generally more steady speed? Or just a better way of calculating the displayed imaging speed?
@Chris-Whiteley Maybe take a look in the web gui at the report viewer -> Imaging log and see if there’s actually a difference in time for your images deploying before and after the upgrade? I’m finding mine are all still within 0-30 seconds of the same time.
@Sebastian-Roth I understand completely where you are coming from and how frustrating it could be to have someone not try and help out the community. I take no offense at all.
I will get working on this right away and let you know my findings.
@Chris-Whiteley No need to say sorry. I know we are all pretty busy and I kind of regret having thrown this at you. Thanks for not taking it as offense. Wasn’t meant to.
Here is the init proposed: https://fogproject.org/inits/init-1.5.8-pc0.2.89.xz
Download and put in
/var/www/html/fog/service/ipxe/. Either rename to
init.xzor leave filename as is and just set the filename as Host Init option within one of your test hosts settings to use it.
@Sebastian-Roth I am sorry that I didn’t post anything or submit my feedback. As a SysAdmin it is hard sometimes to find the time to start trying to dig into issues when you are busy and you know that going back to the version you had fixes the issue and you can move on. You guys have always been incredible and you have a team of people here that truly wants to help. I so appreciate the time and energy you guys spend tirelessly making this into a product I recommend to anyone that will listen to me.
I held off on doing 1.5.7.X since I had that issue with speed. I was hoping that 1.5.8 was going to be different.
While I totally understand that not everyone can be pushing the edge (e.g. using latest dev-branch) we can only fix the things we are aware of. There is no point in hoping something will be fixed if we don’t know about it beforehand. Hope you don’t get me wrong here. I don’t want to sound harsh or anything, just pointing out that we need people to test things in their environments and report when issues come up.
Anyway, let’s face it and try to figure out what’s wrong. I’d suggest I build fresh inits with the only difference of partclone being reverted to 0.2.89. If that turns out to speed things up again for you we are sure it’s just that and we can dig into finding the speed issue in the new partclone version. Will be just a few minutes till I post a link for you to download.
@Sebastian-Roth Thanks for the heads up! I have not done any imaging since upgrading. I held off on doing 1.5.7.X since I had that issue with speed. I was hoping that 1.5.8 was going to be different. Luckily I have it as a VM and I just reverted my snapshot so I could do some testing with you guys.
@Chris-Whiteley Just be aware, image format has changed between partclone 0.2.89 (FOG 1.5.7) and partclone 0.3.13 (FOG 1.5.8). While you can deploy all your old images using the newer partclone you cannot deploy images captured with 0.3.13 using partclone 0.2.89!
@Chris-Whiteley Can we get you to reinstall 1.5.8.
Make sure the version of bzImage is at 4.19.100+
file /var/www/html/fog/service/ipxe/bzImageIf not grab that from here: https://fogproject.org/kernels/Kernel.TomElliott.188.8.131.52
Download this file https://fogproject.org/binaries1.5.7.zip and take the init.xz in that zip file and move that to
/var/www/html/fog/service/ipxedirectory overwriting what 1.5.8 installed.
Now try to pxe boot. The configuration you current (will) have is the fog server at 1.5.8 and FOS linux with the current kernel with the 1.5.7 virtual hard drive. We don’t normally like to mix the version of FOS Linux with the version of the FOG server, but we need to see if this condition corrects the issue. Note this is not a fix only a test condition. If need be you can run in this (specific) state until the devs and sort this out.
@Sebastian-Roth I have done some testing and the version of partclone is v0.3.13. I have mostly NVMe machines which will make it difficult to want to update to 1.5.8. 1.5.7 did not have the same issues with NVMe drives. I also tested it on one of my machines that I put a normal SSD in and that machine was also very slow. It is my imaging machine that I make my golden image on and it is usually around 14 GB/min download and now it is 9.50 GB/min. So it looks like in both scenarios that it is slow. The only other thing is the same is that they are all Dell machines, optiplex and latitudes.
Thanks for reaching out!
@Chris-Whiteley The speed change can be caused for different reasons and while we have had reports about this I am not sure I see a connection here with what you see.
The slowness described in the topics mentioned by @george1421 where due to partclone 0.3.12 we had added in
dev-branchbut I have updated it to partclone 0.3.13 just before the release of FOG 1.5.8 (as mentioned in the topics) as people have reported speeds were back to normal with those. Please pay attention to the version number you should see in the blue partclone screens and let us know what you see.
The other thing that comes to my mind is that we have heard of certain NVMe drives being very slow with newer Linux kernels. Though on the other hand this should have been the case in 1.5.7 already and I don’t see why it would come up with an update to 1.5.8.
So if you see partclone 0.3.13 and don’t have NVMe drives then we might look at an even different issue here and will need to start debugging it with your help. Start off by deploying to different hardware (if you have) and let us know if it’s consistently slower on all machines.
In the first post Sebastian posts a link to a new/updated init as well as the latest linux kernel. At this point I don’t know if these inits have been integrated into the dev branch that is at 184.108.40.206 the last I knew. But if you upgrade to 1.5.8 again then install the patched inits and updated kernel then you should see your speed return.
@Chris-Whiteley There was another thread about the same issue. Let me find it. I think there was a hot fix to partclone that resolved it. Let me find it.