Partclone Upload Stalling
-
@Sebastian-Roth said in Partclone Upload Stalling:
@jemerson93 Can you please do me a favor testing-wise? I just updated the 4.19.1 kernels to have the Hyper-V patch included as well. Can you please downgrade to that kernel version on your FOG server and test again? The hard stall shouldn’t be happing with this version anymore.
About the other issue: I am looking forward to hear what you guys find out testing this. Maybe we can report that to the kernel developers when we find out what version exactly is causing this.
Hi Sebastian,
Unfortunately, I still got the stall. As a notice, I did get the legacy image to upload on 4.19.6 yesterday. Was extremely slow but did upload. Trying again right now on 4.19.6.
I am still setting it up at my home to test as well.
-
@jemerson93 I meant the stall from the initial post.
-
@Sebastian-Roth said in Partclone Upload Stalling:
@jemerson93 I meant the stall from the initial post.
Hi Sebastian,
I did not see the stall from the initial post. Now I am just running into the self-detected stall.
-
@Sebastian-Roth said in Partclone Upload Stalling:
@jemerson93 I meant the stall from the initial post.
I also attempted 4.19.6 and got the same error. I’ll chalk it up as luck that I got it uploaded, but now I can’t re-upload.
-
@jemerson93 It’s very interesting we see this or similar issues on many different platforms - hardware as well as virtualized environments. There does not seem to be a general answer and those kind of things have been around for years - when you search for
rcu_sched
you find tons of messages in kernel related mailing lists and forums.But still this got more and more lately in the FOG forums as well and we are not sure why yet. But moving back to 4.15.2 kernel helped most of the people. We have seen issues with the newer init files not being compatible with this kernel and so I might ask you to manually download those more compatible inits (64 bit and 32bit). Rename/backup the ones you have in
/var/www/html/fog/service/ipxe
and put those in place (namesinit.xz
andinit_32.xz
). Then downgrade to the 4.15.2 kernel and see if you still get the ugly error you had before when trying to downgrade the kernel.If that doesn’t work then I expect this particular issue to be a problem with Hyper-V and Linux kernel option PAGE_TABLE_ISOLATION - Meltdown patch (ref). But would be kind of strange as we have this option enabled in all later kernel version. Nevertheless you can try going back to even older versions of kernel and init. Find those used in FOG 1.5.0 here (kernel 4.13.4).
Beside that I have tried to find current information specific to Hyper-V. Not much I could find, really. https://access.redhat.com/solutions/3743631 (anyone who’s access to RedHat stuff? @Wayne-Workman?)
-
@Sebastian-Roth I don’t have access to their articles anymore - but when I did have access back at a previous job, I found if I google’d enough I could find the same stuff elsewhere.
Has a duplicate IP been looked for yet, for the fog server and for the VMs?
-
@Sebastian-Roth said in Partclone Upload Stalling:
@jemerson93 It’s very interesting we see this or similar issues on many different platforms - hardware as well as virtualized environments. There does not seem to be a general answer and those kind of things have been around for years - when you search for
rcu_sched
you find tons of messages in kernel related mailing lists and forums.But still this got more and more lately in the FOG forums as well and we are not sure why yet. But moving back to 4.15.2 kernel helped most of the people. We have seen issues with the newer init files not being compatible with this kernel and so I might ask you to manually download those more compatible inits (64 bit and 32bit). Rename/backup the ones you have in
/var/www/html/fog/service/ipxe
and put those in place (namesinit.xz
andinit_32.xz
). Then downgrade to the 4.15.2 kernel and see if you still get the ugly error you had before when trying to downgrade the kernel.If that doesn’t work then I expect this particular issue to be a problem with Hyper-V and Linux kernel option PAGE_TABLE_ISOLATION - Meltdown patch (ref). But would be kind of strange as we have this option enabled in all later kernel version. Nevertheless you can try going back to even older versions of kernel and init. Find those used in FOG 1.5.0 here (kernel 4.13.4).
Beside that I have tried to find current information specific to Hyper-V. Not much I could find, really. https://access.redhat.com/solutions/3743631 (anyone who’s access to RedHat stuff? @Wayne-Workman?)
Hi Sebastian,
My apologies for the late response. I backed up the old inits and downloaded and moved the ones you requested. I then downgraded to 4.15.2 kernel. Did not receive the kernel panic and the legacy image successfully uploaded. It take much longer then usual (I think it took about 3 and a half hours opposed to the past 30 minutes) but no CPU stall or initial network stall.
-
@Wayne-Workman said in Partclone Upload Stalling:
@Sebastian-Roth I don’t have access to their articles anymore - but when I did have access back at a previous job, I found if I google’d enough I could find the same stuff elsewhere.
Has a duplicate IP been looked for yet, for the fog server and for the VMs?
Hi Wayne,
I did verify there are no duplicate IP’s in use between the FOG server and VM’s nor are any in use on individual workstations on the LAN.
-
@jemerson93 said in Partclone Upload Stalling:
It take much longer then usual (I think it took about 3 and a half hours opposed to the past 30 minutes) but no CPU stall or initial network stall.
I find it very strange that it it’s going slow with 4.15.2 as well. I will re-compile those kernels as well and see if it was missing the patch.
-
@jemerson93 Please test the newly compiled 4.15.2 kernels you find here: https://fogproject.org/kernels/new/
Let us know if those help in speeding up the process and still don’t have the other
rcu_sched
stall issue. -
@Sebastian-Roth said in Partclone Upload Stalling:
@jemerson93 Please test the newly compiled 4.15.2 kernels you find here: https://fogproject.org/kernels/new/
Let us know if those help in speeding up the process and still don’t have the other
rcu_sched
stall issue.Hi Sebastian,
It may take me a couple of days to get a response back to you. As soon as I can, I’ll do another upload and update you.