Tablet PC hangs on bzImage
-
@Sebastian-Roth I’m only running one Fog server.
Here’s the hang3.pcap
https://drive.google.com/file/d/1uA54wIeUgPbpduEJZp_RhobKP1Xh4aWq/view?usp=sharing -
@Zerpie Ok, this time we see a bit more in the PCAP and I kind of know why my last filter was wrong. Thanks for sticking with me and being patient.
One thing (probably unrelated) that jumped at me is that the kernel args/parameters (
debug earlyprintk=efi loglevel=7
) are not set. Maybe you just removed them after testing because we saw that the kernel does not even load properly. Just wanted to mention this in case you thing they are still in place. Then maybe something else is wrong here.So now let’s get to the interesting bits of the PCAP. Near the end we can see the client starting the download of the kernel (
GET /fog/service/ipxe/bzImage_debug
). And it starts of as normal. Well kind of. Looking at the response from the webserver in detail I have a couple of things bogging me. See the picture where I compare the startup of one of my clients (left) with yours (right) - click on the picture to get a readable version:
- PHP version string showing
PHP/5.6.37
- should not be possible to run a recent version of FOG with that PHP version. Please runrpm -qa | grep php
and post output here. Content-Length: 1727680
seems like my kernel image is 4-5 times as big as yours is. Why? Please runls -alk /var/www/html/fog/service/ipxe/bzImage*
- The binary data on my side starts off with the magic
MZ
header [ref] - this is because the kernel is build as EFI executable binary.
The later two things might just be an issue of the bzImage_debug home-brew kernel. Maybe even my fault when I gave you the instructions to build it. Not sure though.
But back to the kernel transfer. It does actually run for a bit including the client sending proper TCP acknowledge packets - as well fairly quickly. Seems all pretty perfect. But then fairly soon (20 ms) the client just stalls out of nowhere, not sending TCP ACKs or any other packets back to the server. The server on the other side keeps asking for confirmation for a couple of seconds and gives up then.
So it looks like iPXE causing the hang - maybe when transferring data over network or just after some amount of time. When I told you to try and use
snponly.efi
I hoped that this might fix the issue because a different network stack/driver is used. Didn’t help it.So before we get into building iPXE from source (as easy as building the kernel!) I would like you to try an old iPXE binary that we have used with other tablets when there was an EFI timer issue in iPXE more than two years ago. That issue was fixed in iPXE and therefore I didn’t think about that till now.
ipxe.efi
: https://github.com/FOGProject/fogproject/raw/9213bd2a456718b2ce00fa46de4982d35f2703be/packages/tftp/i386-efi/ipxe.efi
snponly.efi
https://github.com/FOGProject/fogproject/raw/9213bd2a456718b2ce00fa46de4982d35f2703be/packages/tftp/i386-efi/snponly.efi (only in case you wanna give it a try but I guess if the other one doesn’t do it this won’t either)
Just download the binary and put into your/tftpboot
directory on the FOG server (rename original binary I’d suggest). - PHP version string showing
-
@Sebastian-Roth said in Tablet PC hangs on bzImage:
PHP version string showing PHP/5.6.37 - should not be possible to run a recent version of FOG with that PHP version. Please run rpm -qa | grep php and post output here.
# rpm -qa | grep php php-gd-5.6.37-1.el7.remi.x86_64 php-mcrypt-5.6.37-1.el7.remi.x86_64 php-pecl-jsonc-1.3.10-2.el7.remi.5.6.x86_64 php-5.6.37-1.el7.remi.x86_64 php-ldap-5.6.37-1.el7.remi.x86_64 php-pdo-5.6.37-1.el7.remi.x86_64 php-process-5.6.37-1.el7.remi.x86_64 php-common-5.6.37-1.el7.remi.x86_64 php-cli-5.6.37-1.el7.remi.x86_64 php-bcmath-5.6.37-1.el7.remi.x86_64 php-mbstring-5.6.37-1.el7.remi.x86_64 php-mysqlnd-5.6.37-1.el7.remi.x86_64 php-pecl-zip-1.15.3-1.el7.remi.5.6.x86_64 php-fpm-5.6.37-1.el7.remi.x86_6
Content-Length: 1727680 seems like my kernel image is 4-5 times as big as >yours is. Why? Please run ls -alk /var/www/html/fog/service/ipxe/bzImage*
# ls -alk /var/www/html/fog/service/ipxe/bzImage* -rwxr-xr-x. 1 fog fog 8118832 Sep 6 16:21 /var/www/html/fog/service/ipxe/bzImage -rwxr-xr-x. 1 fog fog 7562352 Sep 6 16:21 /var/www/html/fog/service/ipxe/bzImage32 -rw-r--r--. 1 root root 1727680 Sep 10 13:23 /var/www/html/fog/service/ipxe/bzImage_debug
I re-added the Host kernel Arguments (debug earlyprintk=efi loglevel=7) and tried the old iPXE binary, but it’s still hanging on BzImage.
-
@Zerpie said in Tablet PC hangs on bzImage:
tried the old iPXE binary, but it’s still hanging on BzImage
Too bad. So I guess we need to get into debugging iPXE. Start by building your own ipxe.efi binary following the instructions here: https://wiki.fogproject.org/wiki/index.php?title=IPXE#Compile
Which version of FOG are you running? Sorry if I asked this before but cannot find it in the thread.
-
-
@Sebastian-Roth I made it to the “Bake The Cake” section and built the simple 32 bit efi binaries. I’m not sure what to do from there.
-
@Zerpie Ok, I guess you were able to boot up your client using that new iPXE binary to the same point where it hangs, right?
So we need to dive into it. Probably the easiest way to do this is getting to know the iPXE command shell. For that create a fresh text file in the ipxe source code directory, name it
shell
or whatever you like with the following content:#!ipxe shell
Now compile the binary again but using:
make bin-i386-efi/ipxe.efi EMBED=shell
(filename of the above script)So now when you boot a client it does not do all the FOG magic but throws you straight to the iPXE shell. Please run the following commands, take a picture of the screen and post here:
ipxe> dhcp net0 ... ipxe> ifstat ...
In case you are very keen you can go through the full set of iPXE driver testing on your own: http://ipxe.org/dev/driver
Feel free to ask questions where ever you need help with that.
-
@Sebastian-Roth said in Tablet PC hangs on bzImage:
Now compile the binary again but using: make bin-i386-efi/ipxe.efi EMBED=shell (filename of the above script)
I must be doing something wrong. I made shell.txt and placed it in /projects/ipxe/ipxe-efi/src (perhaps that’s wrong?) Then I tried compiling the binary again with that command and this is what I get.
# make bin-i386-efi/ipxe.efi EMBED=shell make: *** No rule to make target `shell', needed by `bin-i386-efi/embedded.o'. Stop.
-
@Zerpie I should have been more clear. The exact filename has to be used in the
EMBED=
parameter and the file needs to be in theipxe-code/src/
directory.So if you have
ipxe-code/src/shell.txt
you should be able to compile using commandmake bin-i386-efi/ipxe.efi EMBED=shell.txt
-
@Sebastian-Roth Thanks, I got it now. So I compiled the binary again, but it’s still hanging on bzimage and doesn’t drop me to the iPXE shell. I must be doing something wrong. I made sure to edit the dhcpd.conf again to put it back to ipxe.efi instead of snponly.efi.
Here’s what I’m still seeing when I try to boot the tablet (Sorry for the quality, that the clearest I can get the text to appear in a photo.)
-
@Zerpie Did you copy the newly build binary to the TFTP direcotry? I forgot to mention this point and it’s not in the wiki either.
Important hint: At this point you need to be careful because it will break PXE booting for all your clients if this is your productive environment. As long as the shell iPXE binary is in place all your 32 bit UEFI clients will use it and won’t properly PXE boot into tasks and won’t chainload to boot from hard disk either.
So first move the original binary out of the way
mv /tftpboot/i386-efi/ipxe.efi /tftpboot/i386-efi/ipxe.efi.orig
Then copy the newly build binary over
cp path_to_ipxe_code/src/bin-i386-efi/ipxe.efi /tftpboot/i386-efi/ipxe.efi
After booting the client it should drop you to the shell.
-
@Sebastian-Roth Thanks for the help. I was able to run the commands in the iPXE shell and here’s the results.
-
@Zerpie Well that’s interesting. We see that it uses SNP driver. This is kind of a more general network stack. I was expecting to see a specific iPXE network driver here that we could try to debug. I think it’s not worth to look into the SNP part of the code as this is being used on many machines and is probably well tested. I guess the tablet firmware is just not playing by the rules here.
One route we still have is finding out what network chipset you use and see if we can modify one of the iPXE drivers to be used. So boot up Windows on your device and take a look at the device manager. In the details you find the hardware ids of the network card.
If it is a USB adapter you might find a different model to use with that tablet?
-
@Sebastian-Roth So the network adapter is built into the docking station that the tablet sits on and it looks like Windows sees it as simply a USB ethernet adapter. All it’s showing for the adapter in the Details tab is ASIX AX88179 USB 3.0 to Gigabit Ethernet Adapter.
I do have a few different brands of USB ethernet adapters here that I can test out. The docking station also has USB ports so I should be able to try out each one.
-
I tried 3 different USB ethernet adapters, but it looks like it only saw one of them while booting up. Once in Windows, Windows could see another one, but it asks as if it’s not there when booting up.
Here’s the result of running dhcp net0 and ifstat on the one that was working during boot up. Again, sorry for the quality. This little display doesn’t photograph well.
-
@Zerpie Great you were able to test other adapater. Too bad none of them worked out of the box.
About the ASIX AX88179, seems like we had someone else two years ago having an issue with an adapter having the same chipset [ref]. Some time ago I started documenting all the adapters we have heard about in the forums. This one seems not confirmed to be working, sorry [ref] - does not mean we cannot make it work but I have not seen it yet.
In general: not every adapter works with every device. Especially for PXE boot to work the device firmware (UEFI) needs to have a driver for the specific adapter. And for FOG to work we need iPXE and the Linux kernel to have a driver for the adapter.
I just figured out that there might be a native driver in iPXE for that network chip. See in
ipxe-code/src/drivers/net/axge.c
:static struct usb_device_id axge_ids[] = { { .name = "ax88179", .vendor = 0x0b95, .product = 0x1790, },
So there is hope. Maybe we just need to figure out the USB IDs of your adapter to make it work. Can you figure those out in Windows device manager? device manager -> right click the ethernet adapter -> properties -> details -> select property Hardware Ids
-
@Sebastian-Roth said in Tablet PC hangs on bzImage:
Maybe we just need to figure out the USB IDs of your adapter to make it work. Can you figure those out in Windows device manager? device manager -> right click the ethernet adapter -> properties -> details -> select property Hardware Ids
Ah, thank you. I finally found it.
Under Hardware IDs it showsUSB\VID_0B95&PID_1790&REV_0100
USB\VID_0B95&PID_1790 -
@Zerpie Might be some hope at the horizon. The IDs do match so we just need to figure out why iPXE is not using this native driver but SNP driver stack instead. From my understanding (though I am not an iPXE developer but just have been playing with it for a fair while) iPXE mostly uses native drivers if available.
Looking through the axge.c driver code I found this:
* * Asix 10/100/1000 USB Ethernet driver * * Large chunks of functionality are undocumented in the available * datasheets. The gaps are deduced from combinations of the Linux * driver, the FreeBSD driver, and experimentation with the hardware. */
Sounds promising, hmm?
Seems like a different make target needs to be used to enable the native driver for USB network adapters (ref). Maybe try
make bin-i386-efi/ncm--ecm--axge.efi EMBED=shell.txt
and on boot:ipxe> dhcp net0 ... ipxe> ifstat ...
-
If it’s recognized as USB you may want to turn on
has_usb_nic=1
as a kernel argument.https://wiki.fogproject.org/wiki/index.php?title=USB_NIC_(usb_network_adapter)
Potentially this will get you a bit further already with the default binaries.
-
@Sebastian-Roth said in Tablet PC hangs on bzImage:
Maybe try make bin-i386-efi/ncm–ecm–axge.efi EMBED=shell.txt and on boot:
Once I run that command do I need to move the file over to /tftpboot/i386-efi as well? Will it know to boot from that file? Sorry, I’ve had to put this project on the back burner for a while and now that I’m coming back to it it’s hard to pick up again where I left off.