@george1421 very well, I’ll test as you specified and report the results later.
Best posts made by fenix_team
-
RE: Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job
-
RE: Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job
Hi guys, I didn’t forget about this topic. I’m just currently dealing with some iPXE booting challenges due to the big differences between system archs I have here. I’m almost finished, so I can try out these tests you asked.
So far, some answers:
@Sebastian-Roth said in Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job:
Are you saying that it does work “sometimes” without an issue. Is that on the same kernel version 4.19.6 that is causing the error initially posted?? Would make it even harder for us to nail this issue down.
As much as I wanted to answer it technically, the best I have is: yes, it’s kinda random. Our business model demands constant infrastructure changes as our clients points out their needs, so we have lots of machines that although are the same models, have slightly different CPUs and BIOS versions, a challenging scenario for applications such as FOG to be set up as an automation tool. So at each node I have to test what FOS image will be the best fit.
So far I had 4.19.6 bzImage + FOG 1.5.5 init.xz working on about 90% of my systems with no bugs, hangings or issues of other nature. For the ones I did find issues, switching it to 4.15.2 as suggested by @george1421 fixed the problems, but only when I used init.xz packed with FOG 1.5.2 binaries.
Using bzImage 4.15.2 + FOG 1.5.5 init.xz gave me kernel panic “FATAL: Kernel too old” messages on every single system I’ve tried it. It happens also with bzImage of all versions from this up to 4.19.6 with the same init.xz, which works fine to boot and start the task, but throws me the errors reported in the title of this topic at given point in image deploy/capture tasks (it’s not always the same point and I didn’t test other kinds of tasks).
@Sebastian-Roth said in Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job:
Trying to figure out what might be causing this on your hardware I started by reading the kernel docs on this. Essentially it says that this can be caused by many different things (see a detailed list in the document linked) and we might need to turn on CONFIG_RCU_TRACE in the kernel to get an idea where things go wrong. But as a start we would need to have a clear picture of the exact error messages on screen.
Ok, I’ll reproduce the error scenario and take a picture of the screen. I’m doing this right now.
@Sebastian-Roth said in Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job:
@fenix_team @george1421 @Quazz Ok, I just compiled inits that should work with kernels all the way back to 4.15.x (64 bit and 32bit). Can you guys give those a try in your environments before I make those the default?
Will test it right after the rcu_sched issue.
-
RE: Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job
@george1421 yes, I understood that, I’m downloading the inits and will test it asap. What I stated was just to confirm why these issues were happening.
-
RE: Error "rcu_sched self detected stall on CPU" on legacy BIOS Capture job
@george1421 @Sebastian-Roth Hello guys! I’m here to say that I had no problems loading tasks, neither capture or deploy, since I’ve updated the init files.
I’m using bzImage at latest version, did extensive tests on Legacy BIOS systems that were presenting the “rcu_sched” warnings and so far I’ve never saw them again or any other hanging issues.
If I can help with any othe kind of tests, please let me know.
Thanks everyone, awesome work!