UEFI booting with Yoga 370
-
@Iceman344 Alright I have tried hard to understand what’s going on here and I have had some new insights. But still no solution I’m afraid. First off, here is another binary
08_ipxe.efi
(download)which should print out which devices it tries to remove.Trying this binary on a UEFI MacBook I see the following:
This is not actually hanging - I just added a sleep call so I could capture a good picture of this. Knowing a bit about TCP I see the transfer being properly finished and the connections closed (PSH
= “push” the last bytes,FIN
= then close the connection,ACK
= acknowledging the last couple of bytes). Then we receiveFIN ACK
from the server and both finally terminate the connection withACK
. This is how it should be.In the pictures you posted I see things being all over the place. In the picture you took with
05_ipxe.efi
we see the client side wants toFIN ACK
the connection but the server has not sentFIN
yet. Looks like it kills the connection right in the middle somewhere. For 06 and 07 I see a properPSH FIN ACK
received from the server. The client sends a good lookingFIN ACK
but for whatever reason does not get the finalACK
from the server.Please do me a favor and capture a packet dump of the communication as well. See George’s instructions here: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue but use
tcpdump -w output.pcap host 192.168.12.x and port 80
as we only want to see the HTTP traffic of this one client (put in the correct IP of your test client). Please upload the PCAP file somewhere and post a link here or send me a private message with the download link.@george1421 Maybe you have an idea what could be the issue here?
-
@sebastian-roth Loaded up the image and took some pictures. i saw some devices being initalised so snapped that and the last line pushed to the screen. i don’t know if these where devices you where referencing too in the first lines of your post.
Now about the network trafic. Reading over your reasoning i agree and am afraid that the reality is very close to worse. Taking a quick look at the dump it is filled with retransmissions because of the “Frame check sequence” being incorrect (packet checksums don’t match). It seems that all the incorrect packets have a FCS of 0xd3600000 . I’m guessing its dropping the packets as from time to time the loading dump freezes for a decent amount of time.
I also spotted this at the end of a first partial dump i did. Could however not find the same again in the full capture. Probably the result from packet errors:
7zXZ Destination address too large XZ-compressed data is corrupt Bug in the XZ decompressor Destination physical address inappropriately aligned Destination virtual address inappropriately aligned XZ decompressor ran out of memory Input is not in the XZ format (wrong magic bytes) Input was encoded with settings that are not supported by this XZ decoder Kernel is not a valid ELF file Failed to allocate space for phdrs Avoiding potentially unsafe overlapping memcpy()! -- System halted EL64 EL32 Failed to handle fs_proto Failed to open volume initrd= Failed to alloc mem for rom Failed to read rom->vendor Failed to read rom->devid Failed to alloc mem for gdt efi_main() failed! exit_boot() failed! Failed to get handle for LOADED_IMAGE_PROTOCOL Failed to alloc lowmem for boot params Trying to load files to higher address Failed to alloc mem for pci_handle Failed to alloc mem for gdt structure efi_relocate_kernel() failed! efi= nochunk Failed to open file: Failed to get file info size Failed to get initrd info EFI stub: ERROR: Failed to alloc mem for file handle list Failed to alloc mem for file info EFI stub: ERROR: Failed to alloc highmem for files EFI stub: ERROR: We've run out of free low memory EFI stub: ERROR: Failed to read file EFI stub: ERROR: Failed to allocate usable memory for kernel. EFI stub: UEFI Secure Boot is enabled. EFI stub: ERROR: Could not determine UEFI Secure Boot status.
-
@Iceman344 Damn, I forgot to add the “skipping tcp_shutdown” part, so you ran into the same issue as earlier. Sorry for that. Try
09_ipxe.efi
(download)!Thanks for the packet capture. I’ll look into this later on when I have more time.
-
So last update from me. The device dump did get displayed now, nicely halting at EFI. Hope this gives a good insight into the problem. As i’m leaving my current job i’m handing this issue over to my boss. I explained the efi testing steps and he’s been following it for some time so he is up to date.
Thanks for all the help and support, its great to see dedication to an open project like this. I’ll be around on the forum still as i’ll keep using fog from now on personally.
-
@Iceman344 It’s been great to work with you. All the best to you.
TCP is pretty amazing! I’ve seen lots of packet dumps but it’s always a bit different and you see new things when taking a deep dive in. On first sight this looks as if the client (Realtek USB NIC) is just a bit slow processing the data it gets from the server. In this case it has to handle quite a lot of it as kernel/initrd is more than just a few HTTP bytes. So for when one end is slower than the other TCP has a good set of “regulative” algorithms. One is “flow control” (a.k.a. “window size” or “sliding window protocol”). Both ends can tell the other to send more or less data in one frame. Smaller window size slows down the transfer. But from what I can see the client does not make use of this.
I found an interesting post (https://ask.wireshark.org/questions/17730/retransmissions) that states:
Packets get lost for any number of reasons. Here are a few likely candidates for large number of retransmissions:
- Full Duplex / Half Duplex mismatch (check the configuration of the network card and switch interfaces)
- The server transmits data with a high speed (say 1 GBit) and the receiver is connected with a lower speed (say 100 MBit). Drops occur if the receiver is signalling a large TCP window size, found in the TCP header.
- One of your routers is configured with a quality of service rule that enforces a certain the bandwidth
- A broken cable offers very poor signal quality
- A wireless network is busy or suffers from interference
My first guess is server and client are going at different speeds. Could you please force the connection speed on the switch port on both server and client side to 100MBit/s! In case you cannot alter switch configuration you could also just take an old 100MBit/s mini switch and connect it in between. Again take a packet dump on the FOG server and see if you get see
TCP Dup ACK
andTCP Retransmission
packets.The device dump did get displayed now, nicely halting at EFI.
Although I am not sure I kind of hope that this is just caused by skipping the tcp cleanup stuff. Let’s see if we can get the connection/transfer play nicely first and then see if we still hang on removing EFI…
-
Yay me. I’m on this ship now too. I just received a Yoga 370 ( 20JJS2F200 ) and I’m also unable to perform operations after a UEFI network boot.
I thought it may have been due to the new hardware this model has, Thunderbolt and WiGig but disabling them has no effect.
-
@sudburr Sorry to say this but yeah!! I am real happy that you join the team. Hope you are willing to keep this up with me and keep the debugging going. Could you please get a packet dump using this command on you FOG server while booting up one of the Yogas:
tcpdump -w output.pcap host x.x.x.x and port 80
(just replace the x.x.x.x with the client’s IP). The capture file will be about 30-40 MB because the transfered kernel and initrd are within that capture file. But we need all this information to see if you have the same issue with retransmissions and such.I haven’t heard from @Iceman344 or his boss yet. Hope they’ll be back again as well.
-
I don’t have a lot of spare time just now for much testing; on the road; but I do have a USB-C adapter coming in and I have hooked Lenovo on the line to hopefully help out too.
-
@sudburr Did you get to test this yet?
@Iceman344 Are you still around? Who’s your boss? How would he get in touch with us?
@Brian-Hoehn Are you still onto this topic? -
Was able to squeeze in a baseline observation before I go onto other things and I’ve informed Lenovo as well.
I have the following:
1.Lenovo ThinkPad OneLink+ to RJ45 Adapter ( p/n: SC10J34224BB FRU: 00JT801 )
2.Lenovo ThinkPad USB 3.0 Ethernet Adapter ( p/n: SC10H30171/RTL8153 FRU: 03X6903 )
3.Lenovo USB-C to Ethernet Adapter ( p/n: SC10L66919/RTL8153-04 FRU: 03X7205 )All three work as desired on a Lenovo ThinkPad 13 (20GKSOA700 BIOS 1.24) for both Legacy and UEFI network boot. I also tested 1&2 on a Lenovo Yoga 260 (20FES0VJ00 BIOS 1.59) successfully.
Since there isn’t a OneLink+ connector on a Yoga 370 (1.17) here are my findings for the two USB models.
2a. Using the Lenovo ThinkPad USB 3.0 Ethernet Adapter ( p/n: SC10H30171/RTL8153 FRU: 03X6903 )
- Legacy mode (no secure boot, CSM, Legacy only)
- Selecting PCI LAN > Realtek PXE B000 D14
- PXE boots okay, FOS environment hands off successfully and process succeeds
2b. Using the Lenovo ThinkPad USB 3.0 Ethernet Adapter ( p/n: SC10H30171/RTL8153 FRU: 03X6903 )
- UEFI mode (no secure boot, no CSM, UEFI only)
- Selecting PCI LAN > Thinkpad USB LAN-IPv4
- PXE boots okay, but FOS environment then hangs on black screen after loading their inits for performing the selected action
3a. Using the Lenovo USB-C to Ethernet Adapter ( p/n: SC10L66919/RTL8153-04 FRU: 03X7205 )
- Legacy mode (no secure boot, CSM, Legacy only)
- Selecting PCI LAN > IBA CL Slot 00FE v0109
- Attempts to PXE boot but fails with: PXE-E61: Media test failure, check cable ; PXE-M0F: Exiting Intel Boot Agent.
3b. Using the Lenovo USB-C to Ethernet Adapter ( p/n: SC10L66919/RTL8153-04 FRU: 03X7205 )
- UEFI mode (no secure boot, no CSM, UEFI only)
- Selecting PCI LAN > Intel Gigabit 0.0.18-IPv4
- goes directly to a blank screen for 5 seconds then returns to the F12 boot menu
There is clearly something amiss with the 370’s code.
Lenovo says that the 4X90E51405 should work, but I don’t have one … yet 8).
-
@sudburr Nice, thanks a lot for testing and posting all those many details here!!
Do you think you can squeeze in another short test to capture a packet dump for us to look at? Run
tcpdump -w output.pcap host x.x.x.x and port 80
on the server, and boot up the client. When it hangs jst quit tcpdump (Ctrl-C), upload and post a link here. -
Lenovo is shipping me a 4X90E51405 for testing. I hope to be back into testing it soon.
-
Alrighty then, I have successfully UEFI and Legacy network booted a Lenovo ThinkPad Yoga 370 (20JJS2F200) with BIOS 1.17 using a Lenovo Mini-Ethernet Extension Adapter (p/n: sc10a39882aa, fru: 04x6435)
Yay another proprietary interface! (here’s the modern version).
-
@sudburr Thanks for testing! So does that mean you don’t see the same issue (hang on init.xz) that was reported initially?
-
That’s right. It works exactly as desired.
-
@sudburr You still have the other USB NIC adapters by any chance? Would you give those a try as well?
-
I already tested the others (see above) with the same BIOS and fog server configuration (1.4.4 / 4.12.3).
-
@sudburr Sorry, didn’t figure you had tested those on that very same model. Now I got it.
So that means the issue is gone. Most probably because of a new hardware release or firmware version?!?
-
No. Nothing changed on the laptop.
To get past the inits in UEFI network boot I had to use the Lenovo Mini-Ethernet Extension Adapter ( p/n: sc10a39882aa, fru: 04x6435 ) that I just acquired.
To get past the inits in Legacy network booting I can use the Lenovo ThinkPad USB 3.0 Ethernet Adapter ( p/n: SC10H30171/RTL8153 FRU: 03X6903 ) or the Lenovo Mini-Ethernet Extension Adapter ( p/n: sc10a39882aa, fru: 04x6435 ) that I just acquired.
All tests were conducted with BIOS 1.17 and fog server configuration (1.4.4 / 4.12.3).
-
@sudburr Ok, thank you very much again for all the tests and detailed information! So I mark this solved (on my todo list) now.
Hope that all the other Yoga 370 users out there find this!