Surface Pro 4 won't get to registration menu
-
I “wgot” the kernels and inits. I sha’d them to make sure they were okay and they both checked out. Chown’d them to fog:apache and the UEFI booted the surface. Here is what I have:
This second picture is a Optiplex 9020 on the new kernels and inits. It seems there is an issue when I try to go to the new kernels and inits with the stable FOG 1.2.0 and grab the undionly.kpxe. So in order to image the other machines we have in production(Optiplex 9020 and Latitude E7450, E7470) I drop back down to the other kernels. Thoughts?
-
@sarge_212 I still want to help!!!
-
If I remember what Tom said, you need to keep the inits (i.e. virtual hard drive) for 1.2.0 stable with 1.2.0 since the trunk inits have changed quite a bit. I got the impression that the inits for the trunk version were not backwards compatible. But the kernel (bzImage) could be used. Understand its been a long week and I may not have understood what I was told, but that is what I remember.
I have a ESXi VM that is configured for efi booting. And I can boot both ipxe.efi and spnonly.efi using a trunk version of FOG and right where the ram drive gets loaded I get an error message basically saying the inits can’t be found. This same setup will boot a 9020, e7440, and e6430 without issue. I did note that on the 6430 the fog menu is black with red text, and with the ESXi image it is the traditional white background with blue text. I’m going to work with my home lab this weekend to see if I can find any combination of ipxe script settings that will let me boot with the esxi vm. I have a few ideas I want to test. I agree with Sebastian that we are very close we just need to find the right combination to have repeatable success. EFI is here to stay, much like windows 10. We just have to learn to work with it.
[edit] I’m not looking to resolve this problem in the OPs thread. I want to document it here since I think it all related. The error message on the console is “Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b”. In the following link the error message I have is exactly like the error message in the first picture. No resolution was given in the thread.
http://askubuntu.com/questions/651974/kernel-panic-not-syncing-attempted-to-kill-init-exitcode-0x00000009
Here is a great explanation of what this cryptic error message really means (first post) http://www.linuxquestions.org/questions/linux-software-2/explained-kernel-panic-not-syncing-attempted-to-kill-init-353920/
[/edit][edit 2]I’ve tried every combination of esxi configuration to change the results (or at least alter the error message) with no success. I did check the apache access log and confirmed that the bzImage and init.xz were being requested and sent to the target. In the access log I did see that the version of the ipxe environment was the same as when I used the rom-o-matic earlier in the day so ipxe is as fresh as today. Next I renamed the bzImage and init.xz to something else and copied in the init.xz and bzImage from the 1.2.0 stable version. Booting this still gave me the same results in kernel panic. But there was some additional detail in the error. “Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation for guidance” [/edit2]
[edit3] I was finally able to trap the error before the kernel panic. This error was trapped with the 1.2.0 stable kernel and inits.
RAMDISK: xz image found at block 0 XZ-compressed data is corrupt EXT2-fs (ram0): error: ext2_lookup: deleted inode reference: 1565 EXT2-fs (ram0): error: remounting filesystem read-only <this above errors was repeated about 8 times. Then this error (below) was thrown> Failed to execute /sbin/init (error -5). Attempting defaults...
So it appears that the inits are being corrupted in transit (??) or the memory they are being saved into is being damaged somehow.
[/edit3][edit4] Just for sanity sake, I changed the mode of the vm from efi to bios, changed the dhcp option 67 back to undionly. kxpe and rebooted the vm. At the fog menu I selected again the quick register host. This time the vm booted into FOG client OS. [/edit4]
-
@george1421 I think you’re right, but it’s been something like 3000 commits since 1.2.0. While I do try to ensure the init’s will work regardless of the version (at least starting from 1.0.0), I don’t know all possibilities.
@sarge_212 As you’re able to boot the client, can you get it to boot to a debug window and get us a lspci, lsusb output?
-
@george1421 said:
So it appears that the inits are being corrupted in transit (??) or the memory they are being saved into is being damaged somehow.
I thought about this too but somehow hoped that it’s not true. Why would we see this on several UEFI machines? Do we need to verify init.xz when downloading it to the client (iPXE command imgverify)?? Awesome stuff that you found with your VM booting in UEFI mode. Thanks for hanging in there.
@Tom-Elliott said:
@sarge_212 As you’re able to boot the client, can you get it to boot to a debug window and get us a lspci, lsusb output?
Up to now he only was able to boot into debug using George’s flos/flogger USB stick. What do you expect to see in lspci/lsusb?
-
Take the hint from Sebastian for image verify here is what I did so far.
I created a new snponly.efi image from the rom-o-matic. The script will chain to my dev fog server running the trunk build, but contains fog 1.2.0 stable kernels (yields better error messages).
This is the ipxe chain command:
chain tftp://192.168.1.88/tester.ipxe
The tester.ipxe script file was populated with the code used for the quick reg action.
#!ipxe kernel bzImage init=/sbin/init initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=192.168.1.88/fog/ consoleblank=0 debug loglevel=7 mode=autoreg initrd init.xz boot
So that is the testing environment.
[round 1]
I copied the bzImage and init.xz files (fog 1.2.0 stable) to the /tftpboot directory so I could load them via tftp instead of http. The system was booted and the transfer (as expected) was terribly slow. But the results were the same “corrupt init.xz”[round 2]
I updated the ipxe file to use http to download the image file.#!ipxe kernel http://192.168.1.88/fog/service/ipxe/bzImage init=/sbin/init initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 web=192.168.1.88/fog/ consoleblank=0 debug loglevel=7 mode=autoreg initrd http://192.168.1.88/fog/service/ipxe/init.xz boot
Same results: “corrupt init.xz”
[round 3]
Increased the size of the ram drive to 250000
Results: “corrupt init.xz”[round 4]
Attempted to boot bzImage32 and init_32.xz
Results: boot failed[round 5]
Reset bzImage and init.xzUpdated the tester.ipxe to use image verify command and built the self signed ca and then signed both bzImage and init.xz creating the required .sig files.
#!ipxe imgtrust --permanent kernel http://192.168.1.88/fog/service/ipxe/bzImage init=/sbin/init initrd=init.xz root=/dev/ram0 rw ramdisk_size=250000 web=192.168.1.88/fog/ consoleblank=0 debug loglevel=7 mode=autoreg initrd http://192.168.1.88/fog/service/ipxe/init.xz imgverify bzImage http://192.168.1.88/fog/service/ipxe/bzImage.sig boot
Results: imgverify command not found (!!nuts!!)
[round 6]
rebuilt the ipxe.efi to include the IMAGE_TRUST_CMD. updated dhcp to use ipxe.efi instead of snponly.efi
Results: Failure the ipxe.efi was not compiled with a valid certificate. Ugh! unless the ipxe.efi file was compiled with the self signing certificate the imgverify command won’t work. -
@Sebastian-Roth I think I’m just hoping to see nic information. While efi issues are still present the original issue which registration was failing the output may help us get at least drivers for the nic.
-
@Tom-Elliott In which OS should I do this? Will I do this in the Flogger OS or in the ubuntu live image or where should I run this command?
-
@sarge_212 Yes in FOS/FLOGGER
-
@Tom-Elliott Here is the output from those 2 commands:
Let me know what else I can do, thanks!
-
So this system doesn’t have an onboard nic? Is the usb nic plugged in when you ran these commands.
-
@Tom-Elliott
Correct, the surface pro 4 does not have an onboard nic. Let me compare output but I think the USB nic is plugged in when I ran this. I might be able to boot from the USB stick in the dock and with the 2.0 USB network adapter plugged in, would that be helpful? -
@sarge_212 I think/hope so maybe. I don’t really know though,
-
@Tom-Elliott So I tried that, with the USB network-adapter, and it just added a device in the output of lsusb. Not sure where to go from here.
-
@sarge_212 You might wanna try this bzImage_epk kernel image. Download it and put it into /var/www/fog/service/ipxe on your FOG server. I compiled this kernel with something called earlyprintk enabled. Possibly we see some more information before we hit the kernel panic. I mean use this kernel and try PXE booting your device with the USB NIC. Just use this kernel binary instead of the normal one. Plus use
earlyprintk=efi
as kernel parameter for this host - register this host/MAC by hand in the WebGUI, add the kernel parameter to it and run a debug task with it. -
@Sebastian-Roth So I can manually register the host in the WebGUI with the MAC address? I will try that out.
-
@Sebastian-Roth I’ve done the manual registration and pushed the bzImage_epk to the host with the kernel arguments and it did give a lot of text printing before the kernel panic. Video coming soon…
-
@sarge_212 Here is some video output of the pxe boot and kernel panic. Hope this helps!
-
@sarge_212 Thanks for the video. I get the impression but it’s impossible to read what’s going on there. Are you able to take a series of pictures? Most cameras can do this. And having the camera in a fixed position would also help I guess. As a start, can you take a clear picture of the full screen when it gets to the end (kernel panic). Possibly we see something there already.
-
@Sebastian-Roth Sorry! Yeah I can do that.