UEFI booting with Yoga 370



  • I can boot it into iPXE in legacy but when I try to use UEFI I just get a solid black screen after the image and init load. Saw a similar issue in an older post. Has anyone found a solution to this?
    I’ve made sure Secure boot is disabled. And disabled SGX but i’m not sure that would cause any problems. I can’t think of anything different from my other devices that work just fined, even my ThinkPad T570.
    I’m using the USB 3.0 adapter for this but I don’t know if that matters at this point as the image and init are already downloaded.
    I’m building my windows 10 images with UEFI. So I would rather not have to change things around just to image.
    I’ve tried kernel ver 4.9.4 and 4.10.10.


  • Developer

    @sudburr Nice, thanks a lot for testing and posting all those many details here!!

    Do you think you can squeeze in another short test to capture a packet dump for us to look at? Run tcpdump -w output.pcap host x.x.x.x and port 80 on the server, and boot up the client. When it hangs jst quit tcpdump (Ctrl-C), upload and post a link here.



  • Was able to squeeze in a baseline observation before I go onto other things and I’ve informed Lenovo as well.

    I have the following:

    1.Lenovo ThinkPad OneLink+ to RJ45 Adapter ( p/n: SC10J34224BB FRU: 00JT801 )
    2.Lenovo ThinkPad USB 3.0 Ethernet Adapter ( p/n: SC10H30171/RTL8153 FRU: 03X6903 )
    3.Lenovo USB-C to Ethernet Adapter ( p/n: SC10L66919/RTL8153-04 FRU: 03X7205 )

    All three work as desired on a Lenovo ThinkPad 13 (20GKSOA700 BIOS 1.24) for both Legacy and UEFI network boot. I also tested 1&2 on a Lenovo Yoga 260 (20FES0VJ00 BIOS 1.59) successfully.

    Since there isn’t a OneLink+ connector on a Yoga 370 (1.17) here are my findings for the two USB models.

    2a. Using the Lenovo ThinkPad USB 3.0 Ethernet Adapter ( p/n: SC10H30171/RTL8153 FRU: 03X6903 )

    • Legacy mode (no secure boot, CSM, Legacy only)
    • Selecting PCI LAN > Realtek PXE B000 D14
    • PXE boots okay, FOS environment hands off successfully and process succeeds

    2b. Using the Lenovo ThinkPad USB 3.0 Ethernet Adapter ( p/n: SC10H30171/RTL8153 FRU: 03X6903 )

    • UEFI mode (no secure boot, no CSM, UEFI only)
    • Selecting PCI LAN > Thinkpad USB LAN-IPv4
    • PXE boots okay, but FOS environment then hangs on black screen after loading their inits for performing the selected action

    3a. Using the Lenovo USB-C to Ethernet Adapter ( p/n: SC10L66919/RTL8153-04 FRU: 03X7205 )

    • Legacy mode (no secure boot, CSM, Legacy only)
    • Selecting PCI LAN > IBA CL Slot 00FE v0109
    • Attempts to PXE boot but fails with: PXE-E61: Media test failure, check cable ; PXE-M0F: Exiting Intel Boot Agent.

    3b. Using the Lenovo USB-C to Ethernet Adapter ( p/n: SC10L66919/RTL8153-04 FRU: 03X7205 )

    • UEFI mode (no secure boot, no CSM, UEFI only)
    • Selecting PCI LAN > Intel® Gigabit 0.0.18-IPv4
    • goes directly to a blank screen for 5 seconds then returns to the F12 boot menu

    There is clearly something amiss with the 370’s code.

    Lenovo says that the 4X90E51405 should work, but I don’t have one … yet 8).


  • Developer

    @sudburr Did you get to test this yet?
    @Iceman344 Are you still around? Who’s your boss? How would he get in touch with us?
    @Brian-Hoehn Are you still onto this topic?



  • I don’t have a lot of spare time just now for much testing; on the road; but I do have a USB-C adapter coming in and I have hooked Lenovo on the line to hopefully help out too.


  • Developer

    @sudburr Sorry to say this but yeah!! I am real happy that you join the team. Hope you are willing to keep this up with me and keep the debugging going. Could you please get a packet dump using this command on you FOG server while booting up one of the Yogas: tcpdump -w output.pcap host x.x.x.x and port 80 (just replace the x.x.x.x with the client’s IP). The capture file will be about 30-40 MB because the transfered kernel and initrd are within that capture file. But we need all this information to see if you have the same issue with retransmissions and such.

    I haven’t heard from @Iceman344 or his boss yet. Hope they’ll be back again as well.



  • Yay me. I’m on this ship now too. I just received a Yoga 370 ( 20JJS2F200 ) and I’m also unable to perform operations after a UEFI network boot.

    I thought it may have been due to the new hardware this model has, Thunderbolt and WiGig but disabling them has no effect.


  • Developer

    @Iceman344 It’s been great to work with you. All the best to you.

    TCP is pretty amazing! I’ve seen lots of packet dumps but it’s always a bit different and you see new things when taking a deep dive in. On first sight this looks as if the client (Realtek USB NIC) is just a bit slow processing the data it gets from the server. In this case it has to handle quite a lot of it as kernel/initrd is more than just a few HTTP bytes. So for when one end is slower than the other TCP has a good set of “regulative” algorithms. One is “flow control” (a.k.a. “window size” or “sliding window protocol”). Both ends can tell the other to send more or less data in one frame. Smaller window size slows down the transfer. But from what I can see the client does not make use of this.

    I found an interesting post (https://ask.wireshark.org/questions/17730/retransmissions) that states:

    Packets get lost for any number of reasons. Here are a few likely candidates for large number of retransmissions:

    • Full Duplex / Half Duplex mismatch (check the configuration of the network card and switch interfaces)
    • The server transmits data with a high speed (say 1 GBit) and the receiver is connected with a lower speed (say 100 MBit). Drops occur if the receiver is signalling a large TCP window size, found in the TCP header.
    • One of your routers is configured with a quality of service rule that enforces a certain the bandwidth
    • A broken cable offers very poor signal quality
    • A wireless network is busy or suffers from interference

    My first guess is server and client are going at different speeds. Could you please force the connection speed on the switch port on both server and client side to 100MBit/s! In case you cannot alter switch configuration you could also just take an old 100MBit/s mini switch and connect it in between. Again take a packet dump on the FOG server and see if you get see TCP Dup ACK and TCP Retransmission packets.

    The device dump did get displayed now, nicely halting at EFI.

    Although I am not sure I kind of hope that this is just caused by skipping the tcp cleanup stuff. Let’s see if we can get the connection/transfer play nicely first and then see if we still hang on removing EFI…



  • @sebastian-roth

    So last update from me. The device dump did get displayed now, nicely halting at EFI. Hope this gives a good insight into the problem. As i’m leaving my current job i’m handing this issue over to my boss. I explained the efi testing steps and he’s been following it for some time so he is up to date.

    Thanks for all the help and support, its great to see dedication to an open project like this. I’ll be around on the forum still as i’ll keep using fog from now on personally.

    0_1504170960285_DSC_0089.JPG


  • Developer

    @Iceman344 Damn, I forgot to add the “skipping tcp_shutdown” part, so you ran into the same issue as earlier. Sorry for that. Try 09_ipxe.efi (download)!

    Thanks for the packet capture. I’ll look into this later on when I have more time.



  • @sebastian-roth Loaded up the image and took some pictures. i saw some devices being initalised so snapped that and the last line pushed to the screen. i don’t know if these where devices you where referencing too in the first lines of your post.

    0_1504095738424_DSC_0086.JPG

    0_1504095679738_DSC_0088.JPG

    Now about the network trafic. Reading over your reasoning i agree and am afraid that the reality is very close to worse. Taking a quick look at the dump it is filled with retransmissions because of the “Frame check sequence” being incorrect (packet checksums don’t match). It seems that all the incorrect packets have a FCS of 0xd3600000 . I’m guessing its dropping the packets as from time to time the loading dump freezes for a decent amount of time.

    I uploaded the capture here

    I also spotted this at the end of a first partial dump i did. Could however not find the same again in the full capture. Probably the result from packet errors:

    7zXZ Destination address too large XZ-compressed data is corrupt Bug in the XZ decompressor    Destination physical address inappropriately aligned    Destination virtual address inappropriately aligned     XZ decompressor ran out of memory       Input is not in the XZ format (wrong magic bytes)       Input was encoded with settings that are not supported by this XZ decoder       Kernel is not a valid ELF file  Failed to allocate space for phdrs      Avoiding potentially unsafe overlapping memcpy()! 
    
      -- System halted EL64 EL32 Failed to handle fs_proto
     Failed to open volume
     initrd= Failed to alloc mem for rom
     Failed to read rom->vendor
     Failed to read rom->devid
     Failed to alloc mem for gdt
     efi_main() failed!
     exit_boot() failed!
        Failed to get handle for LOADED_IMAGE_PROTOCOL
     Failed to alloc lowmem for boot params
     Trying to load files to higher address
     Failed to alloc mem for pci_handle
         Failed to alloc mem for gdt structure
      efi_relocate_kernel() failed!
     efi= nochunk Failed to open file:  Failed to get file info size
     Failed to get initrd info
          EFI stub: ERROR: Failed to alloc mem for file handle list
          Failed to alloc mem for file info
          EFI stub: ERROR: Failed to alloc highmem for files
         EFI stub: ERROR: We've run out of free low memory
          EFI stub: ERROR: Failed to read file
       EFI stub: ERROR: Failed to allocate usable memory for kernel.
      EFI stub: UEFI Secure Boot is enabled.
     EFI stub: ERROR: Could not determine UEFI Secure Boot status.
    

  • Developer

    @Iceman344 Alright I have tried hard to understand what’s going on here and I have had some new insights. But still no solution I’m afraid. First off, here is another binary 08_ipxe.efi (download)which should print out which devices it tries to remove.

    Trying this binary on a UEFI MacBook I see the following:
    0_1504038172145_vlcsnap-2017-08-29-22h16m42s682.jpg
    This is not actually hanging - I just added a sleep call so I could capture a good picture of this. Knowing a bit about TCP I see the transfer being properly finished and the connections closed (PSH = “push” the last bytes, FIN = then close the connection, ACK = acknowledging the last couple of bytes). Then we receive FIN ACK from the server and both finally terminate the connection with ACK. This is how it should be.

    In the pictures you posted I see things being all over the place. In the picture you took with 05_ipxe.efi we see the client side wants to FIN ACK the connection but the server has not sent FIN yet. Looks like it kills the connection right in the middle somewhere. For 06 and 07 I see a proper PSH FIN ACK received from the server. The client sends a good looking FIN ACK but for whatever reason does not get the final ACK from the server.

    Please do me a favor and capture a packet dump of the communication as well. See George’s instructions here: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue but use tcpdump -w output.pcap host 192.168.12.x and port 80 as we only want to see the HTTP traffic of this one client (put in the correct IP of your test client). Please upload the PCAP file somewhere and post a link here or send me a private message with the download link.

    @george1421 Maybe you have an idea what could be the issue here?


  • Developer

    @Iceman344 This combination of hardware, USB NIC and UEFI firmware is just a piece of s**t I reckon. Sorry for using those words but I can’t believe it’s hanging on one of the other shutdown functions as well, now that we skipped tHe TCP shutdown function. There are six if I remember correctly…
    I will look into this when I get home.



  • @sebastian-roth Running 06_ipxe did not produce a different output to number 5 at first glance.
    0_1504021093657_DSC_0084.JPG

    However 07_ipxe did trow out something else. It hung this time on removing devices. I’m guessing this is because of bad handover from the bootloader to the kernel ?

    0_1504021193666_DSC_0085.JPG

    Not ugly imho, pretty cool to see the packets flow by


  • Developer

    @Iceman344 I am sorry but I still don’t see the logic behind this. Maybe I just can’t see the wood for the trees right now. Again added more debug output in 06_ipxe.efi (download) and also compiled 07_ipxe.efi that skips the TCP shutdown code altogether. Sure this is really ugly but let’s see what happens with this. Again try it out and take a picture when it hangs. Thanks!



  • @sebastian-roth I loaded up the image and let it run for a bit (it took a while as it had to dump all the tcp handshakes). I let the laptop stay on for a while after this but nothing further happened.

    0_1503934786639_DSC_0243.JPG


  • Developer

    @Iceman344 Right now to me this looks as if iPXE is just waiting for the TCP connection to close. For some reason the other communication partner (FOG server in this case I reckon) seems to not properly close the connection and iPXE is waiting for it. This is just an assumption up to now as we don’t have a packet dump of the communication yet. Might be one of the next steps. Don’t get me wrong. I still think this is something caused by the Realtek USB NIC… we’ll figure it out at some point I am sure.

    But first please try simple waiting. Use 05_ipxe.efi (again added more debug output - download), take a picture of the screen where it hangs and then just let it sit there for a couple of hours. Check on it every now and then to see if it went any further or if it just sits at this stage for ever.



  • @sebastian-roth Ah yes, this is because i was using an already in place thunderbolt-dock, but had to fetch a new one as it was needed. The machine is still the same however.

    So now this is the last message i get

    0_1503663599348_DSC_0220.JPG


  • Developer

    @iceman344 said in UEFI booting with Yoga 370:

    … and boots nicely to the menu.

    This is because you used a different client this time - see the different MAC addresses in the pictures. For this host no task has been scheduled and therefor you got the FOG menu. But that’s ok I reckon. The last of the three pictures you posted lately is a good pointer on where things go wrong.

    Although we must be very close to the “hang” (as I don’t see many function calls further down that way in the code) I still have no clue why this would stall. Possibly this is not a hang but more of an infinite loop. We’ll see from the next debug output - 04_ipxe.efi (download). Don’t worry about taking a slow motion video if it hangs. We just need to see the very last output. In case it loops over and over a video might be handy but I guess we could even go with a picture then. So no need for high tech video capturing I reckon.

    Also i totally understand the iteration issue. I’ll be here to test !

    Thanks a lot for taking this up with me.



  • @sebastian-roth So update ! The new bootloader shows some more info and boots nicely to the menu. Problem right now is that the debug info goes a little quick. I got some footage with a camera but the 30fps limit doesn’t help, so i winged it with some burst mode pictures. Tomorrow i’ll capture the HDMI output straight to get more acurate info. Is there any way to make it dump the output to the file system its running on ?

    Also i totally understand the iteration issue. I’ll be here to test !

    This is as much info as i could get out of the footage.

    0_1503588992059_DSC_0124.JPG

    This after selecting “Client System Information” as a test

    0_1503588945599_DSC_0109.JPG

    0_1503588956963_DSC_0111.JPG

    Also on a side note i did not realise that pictures could be uploaded and inserted this way. I’ve replaced the ones in the previous posts


Log in to reply
 

473
Online

38965
Users

10708
Topics

101618
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.