rcu_sched Error on Host Registration - PC Tablet w/ Dock



  • Hi all, I was thrown a bit of a curveball today when I was told we’re going to be supporting some third party Windows Tablets and will need to be able to image these things. I’m getting the “rcu_sched” error when attempting to register the device as a host which I did do some research on here on the forums and as far as I can tell this is an issue with the type of hardware and normally it seems like downgrading the Kernel to a prior version can help? The last post I found on the subject that mentioned this fix was from February so I’m not sure if that info is still valid or not, or if the recommended kernel version to get around this (1.5.2) has changed since then.

    I’m not sure if this is due to the fact that the only way to PXE boot into these things is by plugging these into a dock with an ethernet port which is obscuring the hardware type or what the issue is-- I was told by the manufacturers that they image these using MS DISM tools and that “any imaging software that can support UEFI x64 should work”. If this is still a matter of changing the Kernel to a previous version I can do that, I just wanted to check beforehand to see if that changed since there’s been a few revisions between now and the most recent previous post about this.


  • Moderator

    @explosivo98 Strange that kernel should boot because the only thing added was some apple hardware IDs and an apple disk patch. Nothing for non-apples should be impacted.



  • @george1421 okay, thought I’d check. This is an x64 device.


  • Moderator

    @explosivo98 You should not need the new init for this. That kernel should work with FOG 1.5.7. Now is that tablet a 32 bit machine or a 64 bit?



  • @george1421 would I need a new init file for this as well or no? I tried rebooting with the same debug task in queue from last time and it just froze after loading (successfully) the init file. I rebooted with the task canceled and selected the system compatibility check option and it just went to a black screen after that.



  • @george1421
    hm, the hardware ID for this looks like it’s just a generic driver, it shows as SD\GenDisk in the device manager. I’m on the command prompt now but lsblk returns nothing and the grep commands returned “no such file” when ran.

    Edit: Trying the kernel now!


  • Moderator

    Doing a bit more research on this and there are several linux distros that have the same issue with this hardware. I did find a reference to it being fixed in Linux 5.3.x series. I have a one off kernel that I complied for working around the T2 chip in a 2018 mac that should be 5.3.(something)
    https://drive.google.com/open?id=12St-Wix1io0s0oXhgxAuLlQOVoT9548L


  • Moderator

    @explosivo98 Ok lets collect some data on 2 fronts.

    1. From a running Windows computer. Can you get the hardware IDs of that storage device. It will be in the device manager. I need both the vendor and device ID.

    2. This one will be a bit more involved. Reschedule a capture/deploy to that target computer, but before you hit the schedule task button, check the debug checkbox. Then pxe boot the target computer. After a few screens of text where you will have to clear with the enter key you should be dropped to the FOS Linux command prompt. At the fos linux command prompt key in lsblk and post the output here (you may need to take a clear picture with a mobile phone to get what we need). Keep this session running because we may need you to look at a startup log file… while you are at it getting screen shots does this return anything grep /var/log/messages mc0 or grep /var/log/messages mmc?



  • @george1421 yeah I am, like I said I tried switching between some of the hard drive types but nothing made a difference, but I assume since running the compatibility test returns a similar error that it probably doesn’t have anything to do with that anyway.


  • Moderator

    @explosivo98 OK so now that this point you are still stuck with the missing disk?



  • @george1421 Looks like it was the ACPI toggle that did it, I switched back to the newest production kernel after getting it working on the newest dev build and it worked there so right now I’m using 4.19.64


  • Moderator

    @explosivo98 What version of the FOS Linux kernel are you using?

    Also which kernel parameter did the trick?



  • @george1421 !! holy cow, that actually got me past the rcu_sched error! This is farther than I’ve ever gotten with this. Everything looked like it was going to work and I was ready to throw a party but it failed the getHardDisk check, it can’t detect the hard drive on here now for some reason. The storage on this is a Sandisk DF4064 which is an eMMC drive if that matters. I tried single disk resizable mode as well as multi partition non resizeable and neither worked. Running the compatibility test does show a Fail for the hard drive check as well. Are there special considerations I need to make when dealing with one of these drives?

    20200115_132132.jpg


  • Moderator

    @explosivo98 Can you test something. Manually register this tablet then in the host definition for that tablet in the kernel args field enter acpi=off In other threads I think that is where we ended up getting things to work here.

    Edit: Just quickly scanning the other threads I also see
    tsc=unstable

    the other one was where the processor had many cores (>8) we had to create a custom kernel with 64 set as the max core.



  • @Quazz dang, no luck. This tablet is x64 so there shouldn’t be a need to try the 32bit kernel right? Or are there some cases where the x32 config worked when the x64 wouldn’t?


  • Moderator

    @explosivo98 Latest dev build is available here: https://dev.fogproject.org/blue/organizations/jenkins/fos/detail/master/115/artifacts

    I don’t believe we have actively tried to hunt this particular bug down, but changes have been made to kernel configs and inits, so worth a shot! (so get both the bzImage and init.xz)

    Only use these for testing those devices for now. Still some bugs to kill, but you should be able to see if you can deploy at least.



  • Hey all, I’m doing a bit of thread necromancy here to ask if there’s ever been any breakthroughs on solving this particular error. These tablets are really the only thing in our arsenal that we still need to manually image with a Clonezilla drive and, well, everyone here loves the way Fog works now that we deploy 99% of our systems using it. I did try one of the newer kernels (4.19.64) for the heck of it but didn’t notice a difference. I wasn’t sure if there were any new experimental kernels to try that might help with this or what so I figured I’d ask.



  • Same results (black screen, no text) with the 4136 kernel. I looked through the BIOS and there’s not a ton of options to change, but I did try flipping a few of them on/off with no luck. 32 bit kernel looped back around to a dark version of the main FOG menu when I tried to choose compatibility mode, which was strange, but then trying to deploy an image I got a “Could not boot: Error 0x71048283” message three times before sending me to the iPXE command line.

    I’m not sure if it matters but poking around in the BIOS reminded me that these things do support Android. There’s three options in the bios for “Droid Mode” “Android Mode” and “Windows Mode”, all of which are set to be disabled on the stock tablet except the Windows option which says “Windows 8.x”. I don’t know if that means the drives are partitioned in such a way that it would cause issues like this but that was about the only thing I found looking around in the BIOS for settings to change.


  • Moderator

    There might be some odd BIOS setting that causes this, worth checking out. (or perhaps a BIOS update)

    Might also be worth trying a Ubuntu live USB or something to see if that will work.


  • Moderator

    @explosivo98 Well nuts. One last thing I can find. Go ahead and remove the kernel parameter we just set. Lets roll back to 4.13. version of the linux kernel from here: https://fogproject.org/kernels/Kernel.TomElliott.4.13.4.64 Same process as before rename it to bzImage4136 . Place the file in /var/www/html/fog/service/ipxe directory then update the host definition boot kernel to bzImage4136

    I’m finding references to a regression error introduced in linux kernel 4.17 that produces a similar error as you are seeing.

    Just so you’re aware this isn’t specifically a FOG issue, but rather a linux kernel error. FOG uses a stock linux kernel as part of its FOS Linux OS.

    One last thing I’m thinking of is to see if can get around this by forcing the 32 bit kernel to load. I might expect the same results, but you never know.


Log in to reply
 

360
Online

7.4k
Users

14.5k
Topics

136.7k
Posts