Linux kernel
-
@sebastian-roth Sorry, I was out again, the whole week. I retested 3.17.3+PXE 60 times, 2 failed and 58 passed.
-
@Maorui2k Ok, so this kernel is also not perfect. Way better than the 4.x kernels but still not perfect. So we could start bisecting the kernel code starting from 3.17.3 up to the current version but that would take ages doing this the way we did all the other debugging (me compiling a new kernel and uploading, you testing and replying here) and I won’t go through this.
So which other options do we have? Mind testing the installed Ubuntu system 50-60 times? Does it always work??
-
@sebastian-roth Yes, 3.17.3 is not perfect. I did a quick testing and found starting from 4.0 kernel (fog kernel + fog initrd + PXE) the chance of this issue raised to around 50%. I tested the installed Ubuntu 16.04 system 60 times, no any failure. I also retested the latest CentOS 7 which used 3.10 kernal and systemd, but the failure rate was around 50% too.
As you said, we already did many compiling and testing, it’s fainful :-D, and got no significant changes. This should not be the right way. The recent testing result strengthened my gut feeling that this is a system level issue. Ubuntu team found and resolved this issue somehow, but CentOS didn’t. Kernel, driver, systemd and some other unknown parts involved in this issue. So I think the best solution should be converting the installed Ubuntu 16 into PXE boot in some way. What’s your suggestion?
-
@maorui2k said in Linux kernel:
So I think the best solution should be converting the installed Ubuntu 16 into PXE boot in some way. What’s your suggestion?
Unfortunately I don’t think this is gonna be a proper solution as we have already tried the original Ubuntu 16.04 kernel at least once. Wait, let me read through all this again. Yeah, I remember correctly the
bzImage-4.4.0-62-generic
andinit-ubuntu.xz
(download here) are as close to the installed Ubuntu 16.04 as you can get from my point of view. It actually is the very original Ubuntu kernel binary (not one I compiled!) plus a modified FOG initrd as we still want to have the FOG functionality.As well I also suggested you should try PXE booting the original Ubuntu kernel and initrd just to see if this is any better. But I think you never tried this or at least never told us abou the outcome. Please give this a try and see how many times this does a proper reboot/shutdown:
wget http://old-releases.ubuntu.com/releases/16.04.2/ubuntu-16.04.2-server-amd64.iso mkdir extract cd extract 7z x ../ubuntu-16.04.2-server-amd64.iso sudo cp install/vmlinuz /var/www/fog/service/ipxe sudo cp install/initrd.gz /var/www/fog/service/ipxe
Then rename or link those files to bzImage/init.xz so you can PXE boot them as usually. When you get to the installer “Select language” question switch to the second virtual terminal by using the key combination
Ctrl+Alt+F2
and press ENTER to open a console. Now do the reboot/shutdown test again.My gut feeling is that you’ll see 50% hangs again although this is so very close to the installed Ubuntu… This crappy CPU/chipset might just trigger the hang if PXE booted which is definitely a little different than booting from disk.
-
@sebastian-roth Oops, I missed the result. I tested the kernel & initrd from ISO in two booting ways - PXE and USB. The results were quiet similar, 36 passed + 14 failed & 34 passed + 16 failed.
-
@Maorui2k Ok, than it either seems like…
- you have a magic kernel installed on disk (we’ll try that one out in a minute)
- or booting from HD is different than PXE and USB
- or systemd is playing a role here.
For the first idea, this is something we can test. Boot up your installed Ubuntu and copy
/boot/vmlinuz-4.4.0-62-generic
(or whichever the one is that you boot on that system (find out byuname -a
) over to your FOG server. Keep the initrd.gz that you were using from the extracted ISO file last time. See if this makes a difference. -
@sebastian-roth I tried the 1st option, but the PC hung after the language selection screen showed up, both keyboard & power button were frozen. Didn’t see any error message, so I don’t know if it is realted to HID driver or not, but anyway I cannot do the testing, should be driver missed or conflicted.
-
@Maorui2k So which initrd did you use?
-
@sebastian-roth The one extracted from ISO
-
@Maorui2k Maybe the kernel version does not match up with the modules version in the initrd? What did you get from
uname -r
? If you get something different than4.4.0-62-generic
then try this:wget http://old-releases.ubuntu.com/releases/16.04.2/ubuntu-16.04.2-server-amd64.iso mkdir extract cd extract 7z x ../ubuntu-16.04.2-server-amd64.iso cd install gunzip initrd.gz mkdir extract cd extract cpio -idv <../initrd mv lib/modules/4.4.0-62-generic/ lib/modules/$(uname -r)/ find . -print | cpio -ov -H newc > ../initrd cd .. gzip initrd
You should end up with a newly created
initrd.gz
file. Copy that to/var/www/fog/service/ipxe/
directory on your FOG server and boot from that! -
@sebastian-roth still failed at the same position…
-
@Maorui2k I think I am out of ideas for now. Sorry to say this. It’s just way to complicated to merge an Ubuntu initrd with ours to make it work as a proper FOG initrd. I definitely won’t go there. Not sure what else you could try. Maybe think about buying new (proper) hardware?
-
@sebastian-roth I have no idea too… Changing hardware is not an option, too many PCs… Currently I would keep using fog 1.3.x and kernel 3.17.3, so far it could meet my requirement. Maybe I would try to merge the initrd later. Anyway thank you so much for your support!!!
-
@Maorui2k I am not saying it’s impossible to figure out but doing this in the forums one post every couple of days forth and back won’t get us anywhere. If you are really keen to get this solved then send over a machine so I can test and I’ll see what I can do. Just not sure if it’s worth sending it around the world as I am located in Germany.
You could still stick to test different linux distros like all kinds of live linux CDs (system rescue CD, …) as well as installed distros and see which ones work and which have the issue. Maybe you can figure out what they all have in common.
-
@sebastian-roth said in Linux kernel:
Just not sure if it’s worth sending it around the world as I am located in Germany.
His org could just buy you one and have it shipped to you. Of course you are deserving of much more than a crap hardware model, Sebastian.