Surface Pro 4 won't get to registration menu
-
So now we are back to eth0 only (driver seams to be r8152). But what was the other eth device just some hours ago?? Where is it gone. Maybe because you changed back to an older kernel? Sorry man but I think you’ve lost me here. Not really sure what devices (dock, USB NIC) you have and kernels you switch forth and back.
-
@Sebastian-Roth The other eth device some hours ago was the USB 2.0 usb-ethernet adapter I was using. This helped us some how in the testing but I can’t remember how. I did go back to an older kernel. Here is the current config:
Surface Dock (ethernet cable plugged into that)
Older kernel
USB booting of flogger os in usb port ON surfaceI can test a variety of other configs, I can’t remember what we were doing yesterday with the ethernet-usb adapter though…
-
Just playing with the Flogger os, and looking around at things. When I’m connected to the surface dock using the ethernet cable, I get this before the fog prompt:
r8152 2-2.2:1.0 eth0: v 1.08.2
I don’t know if that helps at all?
-
Feels like we are going in circles with this. If you don’t remember what you tried yesterday how would we be able to keep track of what has been tried and which options we should look into. No offense here - just saying that it’s very hard to follow through so many posts and no real track to follow down.
Re-reading your initial post and one of your very early posts I find that were pretty close already. You had iPXE (snponly.efi/ipxe.efi) booting up to the menu but no kernel coming up at first. Then you posted:
Copied over the contents of a fresh trunk install to /tftpboot/ on FOG 1.2.0 server
Updated Kernel from 3.19.3 to 4.1.2 as seemed good in FOG UI
Tried imaging, it pulled ipxe.efi down from dhcp and we got our customized FOG
menu(different than before)
Went to register and the machine kernel panic’d:i8042: No Controller found
cdc-ether 2-2.2:2.- eth0 kevent 12 may have been dropped
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
Kernel Offset: disabled
-----[ end Kernel panic – not syncing: VFS: unable to mount root fs on unknown-block(1,0)]----So we have seen the kernel panic already. So to me it actually seams like hand over from netbooted iPXE to the kernel is kind of working. At least we see some kernel messages! And the other thing we know (for sure?!) is that you were able to boot the most current FOG kernel from a USB stick. So kernel is fine on the device and netboot also seams good.
So we are left with the question what’s going on with this kernel panic?? What this panic actually means is that the root device given (in the case of FOG it is a ramdisk where init.xz is loaded into) cannot be mounted. There can be other reasons for this but in most cases this is due to a corrupted or wrong init file.
So lets try to download and check most current kernel and init files by hand. Just to make sure those are fine. Run those commands as root on your FOG server:
cd /var/www/fog/service/ipxe mkdir bak mv bzImage* init*.xz bak wget -O kernels.sha512 https://fogproject.org/kernels/index.php wget https://fogproject.org/kernels/bzImage wget https://fogproject.org/kernels/bzImage32 sha512sum -c kernels.sha512 wget -O inits.sha512 https://fogproject.org/inits/index.php wget https://fogproject.org/inits/init.xz wget https://fogproject.org/inits/init_32.xz sha512sum -c inits.sha512
Pay attention if you get ‘OK’ as results from the sha512sum commands. This is really important! Then try booting your device via netboot (not via USB stick!) in UEFI mode like you have already done several times.
-
@Sebastian-Roth I understand that I “may” have gone in circles. Thanks to @george1421 I’m documenting all the changes I’m doing from here on out so we don’t have that problem. I will do this now, thanks again for all your awesome suggestions and indications of where its going wrong, I really do appreciate it!!
-
I “wgot” the kernels and inits. I sha’d them to make sure they were okay and they both checked out. Chown’d them to fog:apache and the UEFI booted the surface. Here is what I have:
This second picture is a Optiplex 9020 on the new kernels and inits. It seems there is an issue when I try to go to the new kernels and inits with the stable FOG 1.2.0 and grab the undionly.kpxe. So in order to image the other machines we have in production(Optiplex 9020 and Latitude E7450, E7470) I drop back down to the other kernels. Thoughts?
-
@sarge_212 I still want to help!!!
-
If I remember what Tom said, you need to keep the inits (i.e. virtual hard drive) for 1.2.0 stable with 1.2.0 since the trunk inits have changed quite a bit. I got the impression that the inits for the trunk version were not backwards compatible. But the kernel (bzImage) could be used. Understand its been a long week and I may not have understood what I was told, but that is what I remember.
I have a ESXi VM that is configured for efi booting. And I can boot both ipxe.efi and spnonly.efi using a trunk version of FOG and right where the ram drive gets loaded I get an error message basically saying the inits can’t be found. This same setup will boot a 9020, e7440, and e6430 without issue. I did note that on the 6430 the fog menu is black with red text, and with the ESXi image it is the traditional white background with blue text. I’m going to work with my home lab this weekend to see if I can find any combination of ipxe script settings that will let me boot with the esxi vm. I have a few ideas I want to test. I agree with Sebastian that we are very close we just need to find the right combination to have repeatable success. EFI is here to stay, much like windows 10. We just have to learn to work with it.
[edit] I’m not looking to resolve this problem in the OPs thread. I want to document it here since I think it all related. The error message on the console is “Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b”. In the following link the error message I have is exactly like the error message in the first picture. No resolution was given in the thread.
http://askubuntu.com/questions/651974/kernel-panic-not-syncing-attempted-to-kill-init-exitcode-0x00000009
Here is a great explanation of what this cryptic error message really means (first post) http://www.linuxquestions.org/questions/linux-software-2/explained-kernel-panic-not-syncing-attempted-to-kill-init-353920/
[/edit][edit 2]I’ve tried every combination of esxi configuration to change the results (or at least alter the error message) with no success. I did check the apache access log and confirmed that the bzImage and init.xz were being requested and sent to the target. In the access log I did see that the version of the ipxe environment was the same as when I used the rom-o-matic earlier in the day so ipxe is as fresh as today. Next I renamed the bzImage and init.xz to something else and copied in the init.xz and bzImage from the 1.2.0 stable version. Booting this still gave me the same results in kernel panic. But there was some additional detail in the error. “Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation for guidance” [/edit2]
[edit3] I was finally able to trap the error before the kernel panic. This error was trapped with the 1.2.0 stable kernel and inits.
RAMDISK: xz image found at block 0 XZ-compressed data is corrupt EXT2-fs (ram0): error: ext2_lookup: deleted inode reference: 1565 EXT2-fs (ram0): error: remounting filesystem read-only <this above errors was repeated about 8 times. Then this error (below) was thrown> Failed to execute /sbin/init (error -5). Attempting defaults...
So it appears that the inits are being corrupted in transit (??) or the memory they are being saved into is being damaged somehow.
[/edit3][edit4] Just for sanity sake, I changed the mode of the vm from efi to bios, changed the dhcp option 67 back to undionly. kxpe and rebooted the vm. At the fog menu I selected again the quick register host. This time the vm booted into FOG client OS. [/edit4]
-
@george1421 I think you’re right, but it’s been something like 3000 commits since 1.2.0. While I do try to ensure the init’s will work regardless of the version (at least starting from 1.0.0), I don’t know all possibilities.
@sarge_212 As you’re able to boot the client, can you get it to boot to a debug window and get us a lspci, lsusb output?
-
@george1421 said:
So it appears that the inits are being corrupted in transit (??) or the memory they are being saved into is being damaged somehow.
I thought about this too but somehow hoped that it’s not true. Why would we see this on several UEFI machines? Do we need to verify init.xz when downloading it to the client (iPXE command imgverify)?? Awesome stuff that you found with your VM booting in UEFI mode. Thanks for hanging in there.
@Tom-Elliott said:
@sarge_212 As you’re able to boot the client, can you get it to boot to a debug window and get us a lspci, lsusb output?
Up to now he only was able to boot into debug using George’s flos/flogger USB stick. What do you expect to see in lspci/lsusb?
-
Take the hint from Sebastian for image verify here is what I did so far.
I created a new snponly.efi image from the rom-o-matic. The script will chain to my dev fog server running the trunk build, but contains fog 1.2.0 stable kernels (yields better error messages).
This is the ipxe chain command:
chain tftp://192.168.1.88/tester.ipxe
The tester.ipxe script file was populated with the code used for the quick reg action.
#!ipxe kernel bzImage init=/sbin/init initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=192.168.1.88/fog/ consoleblank=0 debug loglevel=7 mode=autoreg initrd init.xz boot
So that is the testing environment.
[round 1]
I copied the bzImage and init.xz files (fog 1.2.0 stable) to the /tftpboot directory so I could load them via tftp instead of http. The system was booted and the transfer (as expected) was terribly slow. But the results were the same “corrupt init.xz”[round 2]
I updated the ipxe file to use http to download the image file.#!ipxe kernel http://192.168.1.88/fog/service/ipxe/bzImage init=/sbin/init initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=192.168.1.88/fog/ consoleblank=0 debug loglevel=7 mode=autoreg initrd http://192.168.1.88/fog/service/ipxe/init.xz boot
Same results: “corrupt init.xz”
[round 3]
Increased the size of the ram drive to 250000
Results: “corrupt init.xz”[round 4]
Attempted to boot bzImage32 and init_32.xz
Results: boot failed[round 5]
Reset bzImage and init.xzUpdated the tester.ipxe to use image verify command and built the self signed ca and then signed both bzImage and init.xz creating the required .sig files.
#!ipxe imgtrust --permanent kernel http://192.168.1.88/fog/service/ipxe/bzImage init=/sbin/init initrd=init.xz root=/dev/ram0 rw ramdisk_size=250000 web=192.168.1.88/fog/ consoleblank=0 debug loglevel=7 mode=autoreg initrd http://192.168.1.88/fog/service/ipxe/init.xz imgverify bzImage http://192.168.1.88/fog/service/ipxe/bzImage.sig boot
Results: imgverify command not found (!!nuts!!)
[round 6]
rebuilt the ipxe.efi to include the IMAGE_TRUST_CMD. updated dhcp to use ipxe.efi instead of snponly.efi
Results: Failure the ipxe.efi was not compiled with a valid certificate. Ugh! unless the ipxe.efi file was compiled with the self signing certificate the imgverify command won’t work. -
@Sebastian-Roth I think I’m just hoping to see nic information. While efi issues are still present the original issue which registration was failing the output may help us get at least drivers for the nic.
-
@Tom-Elliott In which OS should I do this? Will I do this in the Flogger OS or in the ubuntu live image or where should I run this command?
-
@sarge_212 Yes in FOS/FLOGGER
-
@Tom-Elliott Here is the output from those 2 commands:
Let me know what else I can do, thanks!
-
So this system doesn’t have an onboard nic? Is the usb nic plugged in when you ran these commands.
-
@Tom-Elliott
Correct, the surface pro 4 does not have an onboard nic. Let me compare output but I think the USB nic is plugged in when I ran this. I might be able to boot from the USB stick in the dock and with the 2.0 USB network adapter plugged in, would that be helpful? -
@sarge_212 I think/hope so maybe. I don’t really know though,
-
@Tom-Elliott So I tried that, with the USB network-adapter, and it just added a device in the output of lsusb. Not sure where to go from here.
-
@sarge_212 You might wanna try this bzImage_epk kernel image. Download it and put it into /var/www/fog/service/ipxe on your FOG server. I compiled this kernel with something called earlyprintk enabled. Possibly we see some more information before we hit the kernel panic. I mean use this kernel and try PXE booting your device with the USB NIC. Just use this kernel binary instead of the normal one. Plus use
earlyprintk=efi
as kernel parameter for this host - register this host/MAC by hand in the WebGUI, add the kernel parameter to it and run a debug task with it.