Linux kernel
-
@sebastian-roth the new kernel is bzImage-4.4.0-62-generic, and other 4.4.0 based kernels had similar results. Fog-* means fog official kernels. The last line is official Ubuntu 16.04 kernel. I guess APIC or ACPI or some related module have a stability/compatibility issue, and Ubuntu fixed it.
-
@Maorui2k Ok, so I compared the dmesg outputs of
bzImage-4.4.0-62-generic
(original Ubuntu kernel but PXE booted plus modules packed into the init.xz) andUbuntu-16.04
(original Ubuntu kernel booted via CD). There are still some minor differences that mostly come from the fact that Ubuntu does load more kernel modules on boot. So I added some more of those to the initrd. Give the newinit-ubuntu.xz
a try but I am pretty sure the drivers won’t make a difference.The other thing I noticed it that all dmesg outputs except the Ubuntu 16.04 one show the below messages:
Misrouted IRQ fixup and polling support enabled This may significantly impact system performance
This comes from FOG adding the
irqpoll
kernel parameter. I am not sure if this is causing the issue but I would ask you to give this a try. Make a backup copy of the file/var/www/fog/lib/fog/bootmenu.class.php
and then edit it. There should be two lines withirqpoll
(e.g. line 895 and 1577 in FOG version 1.4.4). Just comment those two lines (//
in PHP). Should look like this then:... "osid=$osid", // "irqpoll", "chkdsk=$chkdsk", ...
Save the file and PXE boot again with one of the
bzImage-4.4.0-62-generic
kernel andinit-ubuntu.xz
. Try several times to see if this has any positive effect. You can also check in dmesg output (dmesg | grep Misrouted
). -
@sebastian-roth The new initrd had a little improvement, I did 20 times reboot/shutdown, 70% passed. I’m not sure if it is just a result of probability.
The irqpoll has no any effect, and caused a hung during the boot process before network initializing messages. The passed ratio was still around 70% in 10 times reboot/shutdown.
-
@Maorui2k Could you please grab a new
dmesg
output of the latest initrd? I’d compare those again and hopefully we’ll figure this out at some point.This is all really strange. I could imagine it making a difference if you boot Ubuntu from CD versus PXE booting it. But why the heck does the 3.17.3 kernel work even when this is PXE booting as well. Could you please do some more testing on those three (at least 25 better 50 each):
- 3.17.3 kernel - PXE boot
- bzImage-4.4.0-62-generic and latest initrd - PXE boot
- Ubuntu-16.04 - CD boot
Does it make a difference if you cold boot (from fully switched off state) or warm boot?
-
@sebastian-roth log uploaded https://drive.google.com/open?id=0Bx_soHaLoSYETXhEeUVBRVllNVE dmesg-bzImage-4.4.0-62-generic-with-new-initrd.log
I did 50 times reboot/shutdown each. Here is the result.
3.17.3 kernel - PXE boot 46 passed, 4 failed. bzImage-4.4.0-62-generic and latest initrd - PXE boot 27 passed, 23 failed. Ubuntu-16.04 - CD boot 50 passed, 0 failed.
Seems 3.17.3 also has a very little chance of failure, but acceptable for me.
Is it possible to use Ubuntu 16.04 kernel and generate an initrd from it? It’s the best one.
-
@Maorui2k This is a really hard one to find but I still think we are on a good way to track this down.
Is it possible to use Ubuntu 16.04 kernel and generate an initrd from it? It’s the best one.
The
bzImage-4.4.0-62-generic
you’ve used and tested is the original Ubuntu kernel! This binary I did not compile but just extracted it from an ubuntu package. So we are as close to the Ubuntu CD as possible while still having full FOG functionality.
Trying something new I just saw that the kernel I extracted from the debian package is not exactly the same as the one found in the Ubuntu server ISO. They are very very similar (compared both with the hexeditor) but there is a minor difference. so let’s try something else again. Here I outline the steps to download and setup the Ubuntu netinstaller. Run the following commands on your FOG server:wget http://old-releases.ubuntu.com/releases/16.04.2/ubuntu-16.04.2-server-amd64.iso mkdir extract cd extract 7z x ../ubuntu-16.04.2-server-amd64.iso sudo cp install/vmlinuz /var/www/fog/service/ipxe sudo cp install/initrd.gz /var/www/fog/service/ipxe cd .. rm -rf ubuntu-16.04.2-server-amd64.iso extract/
Then rename or link those files to
bzImage
/init.xz
so you can PXE boot them as usually. When you get to the installer “Select language” question switch to the second virtual terminal by using the key combination Ctrl+Alt+F2 and press ENTER to open a console. Here you can get another dmesg output and do the reboot/shutdown test again.This is as close to the Ubuntu CD as we can get - but PXE booted! Let’s see if this fails to reboot/shutdown as well.
-
@sebastian-roth It seems here is a misunderstanding, the official Ubuntu 16.04 kernel I tested was not from installation DVD, but from an installed system.
I tested the kernel+initrd of installation DVD, it’s similar to kernels built by you, say about 50% failed.
The installed kernel is not same as installation DVD. The installed kernel + initrd-ubuntu.xz has about 75% successful rate in 20 times reboot.
I tried to PXE boot the installed kernel (/boot/vmlinuz-4.4.0-62-generic) and initrd (/boot/initrd.img-4.4.0-62-generic), but the rootfs failed to mount. I uploaded screen capture screen-capture-rootfs-failed.jpg https://drive.google.com/open?id=0Bx_soHaLoSYETXhEeUVBRVllNVE
It seems the handler of reboot/shutdown is different in your initrd and Ubuntu initrd. The Ubuntu reboot/shutdown is handled by systemctl. Now I wonder if my reboot/shutdown issue is a kind of system-level issue, both kernel/drivers/utilities have their roles.
How can I use the initrd of the installed Ubuntu in PXE? I think this may have a good chance to resolve my issue.
-
@Maorui2k said:
It seems here is a misunderstanding, the official Ubuntu 16.04 kernel I tested was not from installation DVD, but from an installed system.
Ok, thanks for clarifying this point. So to sum things up: Only the ubuntu kernel booted through grub from disk and FOG kernel 3.17.3 booted via PXE do properly shutdown/reboot. I am sorry to say this but it doesn’t make any sense to me. Could you please re-test the 3.17.3 kernel (PXE booted)?!?!?
I tried to PXE boot the installed kernel (/boot/vmlinuz-4.4.0-62-generic) and initrd (/boot/initrd.img-4.4.0-62-generic), but the rootfs failed to mount.
Seems like the Ubuntu initrd does not have the block RAM device driver (CONFIG_BLK_DEV_RAM) included or does not load it on startup.
It seems the handler of reboot/shutdown is different in your initrd and Ubuntu initrd. The Ubuntu reboot/shutdown is handled by systemctl. Now I wonder if my reboot/shutdown issue is a kind of system-level issue, both kernel/drivers/utilities have their roles.
Be careful to not mix up the concepts here. Ubuntu only uses the initrd to pre-load kernel drivers before handing over to the system on disk. But the FOG initrd is an entire linux system that boots up over PXE and is loaded to RAM and executed. Sure this is a huge difference! But as you were saying that FOG kernel 3.17.3 is doing fine I somehow assumed that there is a chance we can find the issue.
How can I use the initrd of the installed Ubuntu in PXE? I think this may have a good chance to resolve my issue.
I don’t think this would help. Using the installed Ubuntu initrd still does not bootup to a full systemd controlled system. As I said before the Ubuntu initrd is not a full system but only includes a wide range of kernel drivers (most of those don’t get loaded anyway).
-
@Maorui2k Did you get to test even more? I am wondering if this is just a crappy piece of hardware and things are going wrong because of that?!?
-
@sebastian-roth Sorry, I was out again, the whole week. I retested 3.17.3+PXE 60 times, 2 failed and 58 passed.
-
@Maorui2k Ok, so this kernel is also not perfect. Way better than the 4.x kernels but still not perfect. So we could start bisecting the kernel code starting from 3.17.3 up to the current version but that would take ages doing this the way we did all the other debugging (me compiling a new kernel and uploading, you testing and replying here) and I won’t go through this.
So which other options do we have? Mind testing the installed Ubuntu system 50-60 times? Does it always work??
-
@sebastian-roth Yes, 3.17.3 is not perfect. I did a quick testing and found starting from 4.0 kernel (fog kernel + fog initrd + PXE) the chance of this issue raised to around 50%. I tested the installed Ubuntu 16.04 system 60 times, no any failure. I also retested the latest CentOS 7 which used 3.10 kernal and systemd, but the failure rate was around 50% too.
As you said, we already did many compiling and testing, it’s fainful :-D, and got no significant changes. This should not be the right way. The recent testing result strengthened my gut feeling that this is a system level issue. Ubuntu team found and resolved this issue somehow, but CentOS didn’t. Kernel, driver, systemd and some other unknown parts involved in this issue. So I think the best solution should be converting the installed Ubuntu 16 into PXE boot in some way. What’s your suggestion?
-
@maorui2k said in Linux kernel:
So I think the best solution should be converting the installed Ubuntu 16 into PXE boot in some way. What’s your suggestion?
Unfortunately I don’t think this is gonna be a proper solution as we have already tried the original Ubuntu 16.04 kernel at least once. Wait, let me read through all this again. Yeah, I remember correctly the
bzImage-4.4.0-62-generic
andinit-ubuntu.xz
(download here) are as close to the installed Ubuntu 16.04 as you can get from my point of view. It actually is the very original Ubuntu kernel binary (not one I compiled!) plus a modified FOG initrd as we still want to have the FOG functionality.As well I also suggested you should try PXE booting the original Ubuntu kernel and initrd just to see if this is any better. But I think you never tried this or at least never told us abou the outcome. Please give this a try and see how many times this does a proper reboot/shutdown:
wget http://old-releases.ubuntu.com/releases/16.04.2/ubuntu-16.04.2-server-amd64.iso mkdir extract cd extract 7z x ../ubuntu-16.04.2-server-amd64.iso sudo cp install/vmlinuz /var/www/fog/service/ipxe sudo cp install/initrd.gz /var/www/fog/service/ipxe
Then rename or link those files to bzImage/init.xz so you can PXE boot them as usually. When you get to the installer “Select language” question switch to the second virtual terminal by using the key combination
Ctrl+Alt+F2
and press ENTER to open a console. Now do the reboot/shutdown test again.My gut feeling is that you’ll see 50% hangs again although this is so very close to the installed Ubuntu… This crappy CPU/chipset might just trigger the hang if PXE booted which is definitely a little different than booting from disk.
-
@sebastian-roth Oops, I missed the result. I tested the kernel & initrd from ISO in two booting ways - PXE and USB. The results were quiet similar, 36 passed + 14 failed & 34 passed + 16 failed.
-
@Maorui2k Ok, than it either seems like…
- you have a magic kernel installed on disk (we’ll try that one out in a minute)
- or booting from HD is different than PXE and USB
- or systemd is playing a role here.
For the first idea, this is something we can test. Boot up your installed Ubuntu and copy
/boot/vmlinuz-4.4.0-62-generic
(or whichever the one is that you boot on that system (find out byuname -a
) over to your FOG server. Keep the initrd.gz that you were using from the extracted ISO file last time. See if this makes a difference. -
@sebastian-roth I tried the 1st option, but the PC hung after the language selection screen showed up, both keyboard & power button were frozen. Didn’t see any error message, so I don’t know if it is realted to HID driver or not, but anyway I cannot do the testing, should be driver missed or conflicted.
-
@Maorui2k So which initrd did you use?
-
@sebastian-roth The one extracted from ISO
-
@Maorui2k Maybe the kernel version does not match up with the modules version in the initrd? What did you get from
uname -r
? If you get something different than4.4.0-62-generic
then try this:wget http://old-releases.ubuntu.com/releases/16.04.2/ubuntu-16.04.2-server-amd64.iso mkdir extract cd extract 7z x ../ubuntu-16.04.2-server-amd64.iso cd install gunzip initrd.gz mkdir extract cd extract cpio -idv <../initrd mv lib/modules/4.4.0-62-generic/ lib/modules/$(uname -r)/ find . -print | cpio -ov -H newc > ../initrd cd .. gzip initrd
You should end up with a newly created
initrd.gz
file. Copy that to/var/www/fog/service/ipxe/
directory on your FOG server and boot from that! -
@sebastian-roth still failed at the same position…