rcu_sched Error on Host Registration - PC Tablet w/ Dock
-
@george1421 Hmm, if I can get this working that’d be an okay workaround provided they still can select the image being deployed without having to be registered first. I made a boot USB using the .img file you provided but when trying to go back into the compatibility menu I get a “db_root:cannot open: /etc/target” message, followed by the same cascading “rcu_sched” errors if left alone long enough.
-
@explosivo98 said in rcu_sched Error on Host Registration - PC Tablet w/ Dock:
I get a “db_root:cannot open: /etc/target” message,
This is just an unrelated symptom of older version init/kernel files. Don’t worry about that.
-
@Sebastian-Roth ah okay, unfortunately after seeing that message it goes back to the same repeated error. I did attempt to run the debug kernel option in the FOS menu and it looked like it was about to do something but then started showing the rcu_sched error again with more bits of information in between, however there was a bit more info attached to this one regarding the system
-
@explosivo98 While it appears you are still no where, it does tell us a few things in that your issue doesn’t appear to be an issue with PXE booting, because usb booting gave you the same error. I seem to recall us having this rcu_sched error before. I though downgrading the FOS Linux kernel solved the problem OR it was adding a kernel parameter. I’m going to see if I can find the solution we solved before. I can’t remember if it was a apci kernel parameter or something else at the moment.
-
@explosivo98 Just to be clear, you have more than one of these tablets and each of the tables are seeing the rcu stall? Because it could be bad memory too, just saying.
-
@george1421 Yeah I have a couple here and they all seem to be doing the same thing. I did find this post from a while ago about some new kernel inits that were manually created to resolve this in the past, but the links don’t seem to work anymore. Not sure if this is what you were referring to or not but in my time scrubbing the forums for the solution I did notice it but couldn’t proceed:
https://forums.fogproject.org/post/121137 -
@explosivo98 If you manually register this system with the FOG ui. In the host definition for this suspect system. In the kernel args, place the following kernel parameter
clocksource=hpet
Then pxe boot that target computer again. This will change how the CPUs are metered for stall detection. -
@george1421 okay, and to be sure this goes under “Host Kernel Arguments”, correct?
-
@explosivo98 Yes, I stated the name from memory. I was close but not exact.
-
@george1421 haha I got you. I went ahead and added that argument but no change
-
@explosivo98 Well nuts. One last thing I can find. Go ahead and remove the kernel parameter we just set. Lets roll back to 4.13. version of the linux kernel from here: https://fogproject.org/kernels/Kernel.TomElliott.4.13.4.64 Same process as before rename it to bzImage4136 . Place the file in /var/www/html/fog/service/ipxe directory then update the host definition boot kernel to bzImage4136
I’m finding references to a regression error introduced in linux kernel 4.17 that produces a similar error as you are seeing.
Just so you’re aware this isn’t specifically a FOG issue, but rather a linux kernel error. FOG uses a stock linux kernel as part of its FOS Linux OS.
One last thing I’m thinking of is to see if can get around this by forcing the 32 bit kernel to load. I might expect the same results, but you never know.
-
There might be some odd BIOS setting that causes this, worth checking out. (or perhaps a BIOS update)
Might also be worth trying a Ubuntu live USB or something to see if that will work.
-
Same results (black screen, no text) with the 4136 kernel. I looked through the BIOS and there’s not a ton of options to change, but I did try flipping a few of them on/off with no luck. 32 bit kernel looped back around to a dark version of the main FOG menu when I tried to choose compatibility mode, which was strange, but then trying to deploy an image I got a “Could not boot: Error 0x71048283” message three times before sending me to the iPXE command line.
I’m not sure if it matters but poking around in the BIOS reminded me that these things do support Android. There’s three options in the bios for “Droid Mode” “Android Mode” and “Windows Mode”, all of which are set to be disabled on the stock tablet except the Windows option which says “Windows 8.x”. I don’t know if that means the drives are partitioned in such a way that it would cause issues like this but that was about the only thing I found looking around in the BIOS for settings to change.
-
Hey all, I’m doing a bit of thread necromancy here to ask if there’s ever been any breakthroughs on solving this particular error. These tablets are really the only thing in our arsenal that we still need to manually image with a Clonezilla drive and, well, everyone here loves the way Fog works now that we deploy 99% of our systems using it. I did try one of the newer kernels (4.19.64) for the heck of it but didn’t notice a difference. I wasn’t sure if there were any new experimental kernels to try that might help with this or what so I figured I’d ask.
-
@explosivo98 Latest dev build is available here: https://dev.fogproject.org/blue/organizations/jenkins/fos/detail/master/115/artifacts
I don’t believe we have actively tried to hunt this particular bug down, but changes have been made to kernel configs and inits, so worth a shot! (so get both the bzImage and init.xz)
Only use these for testing those devices for now. Still some bugs to kill, but you should be able to see if you can deploy at least.
-
@Quazz dang, no luck. This tablet is x64 so there shouldn’t be a need to try the 32bit kernel right? Or are there some cases where the x32 config worked when the x64 wouldn’t?
-
@explosivo98 Can you test something. Manually register this tablet then in the host definition for that tablet in the kernel args field enter
acpi=off
In other threads I think that is where we ended up getting things to work here.Edit: Just quickly scanning the other threads I also see
tsc=unstable
the other one was where the processor had many cores (>8) we had to create a custom kernel with 64 set as the max core.
-
@george1421 !! holy cow, that actually got me past the rcu_sched error! This is farther than I’ve ever gotten with this. Everything looked like it was going to work and I was ready to throw a party but it failed the getHardDisk check, it can’t detect the hard drive on here now for some reason. The storage on this is a Sandisk DF4064 which is an eMMC drive if that matters. I tried single disk resizable mode as well as multi partition non resizeable and neither worked. Running the compatibility test does show a Fail for the hard drive check as well. Are there special considerations I need to make when dealing with one of these drives?
-
@explosivo98 What version of the FOS Linux kernel are you using?
Also which kernel parameter did the trick?
-
@george1421 Looks like it was the ACPI toggle that did it, I switched back to the newest production kernel after getting it working on the newest dev build and it worked there so right now I’m using 4.19.64