Surface 3 Fails to Image


  • Developer

    @wwarsin I am really sorry that it seams like we have just been turning in circles! But I might have an idea of what is wrong here now. Spent a long time closely looking at the pictures and videos you posted and I might have found something.

    Picture and video show version 5798 (FOG boot logo message) which FOG polls from the server. But later on it says “Re-reading Partition Tables”. As far as I could find out in the code repository this string was changed in version 5752. So I am pretty sure you have version 5798 installed on the server but your init files did not get updated on the last upgrade - so the clients booting up use an older version.

    Please try this on your FOG server (as root!):

    cd /var/www/fog/service/ipxe
    mv init.xz init.xz_old
    wget https://fogproject.org/inits/init.xz
    chown www-data:www-data init.xz
    

    Then try booting your surface again. Hopefully this will make a difference!

    Tom has changed the disk/partition enumeration from lsblk to fdisk lately. So I am really keen to see if this is working on your device out of the box. Looking forward to hearing from you!



  • @Sebastian-Roth

    Here’s a picture of the command;
    0_1452802234736_surface.png

    I have in “/dev/mmcblk0” in Host Primary Disk


  • Developer

    @wwarsin Looking at the last few lines of output it seams a bit like FOG doesn’t find any partitions on your device. It says “Saving Partition Tables (GTP)” and then “Task complete” straight way.
    Can you please run a debug session on the surface and see what you get from lsblk



  • @Sebastian-Roth

    I’m finally back in the office for a few days! Here’s the video: https://youtu.be/UFYNan98lmw This is with the Vanilla bzimage.


  • Developer

    @wwarsin Sorry for my late reply. Have been without internet for some days. Hope you are better again!

    Yes, bzImage is the kernel. Re-download and put in correct path (/var/www …) if you have upgraded. But there shouldn’t be a need to upgrade right now if you don’t see other issues. Probably best if you don’t to hopefully have more stable test results.

    Looking forward to the video.



  • @Sebastian-Roth I assume bzimage is the kernel? I downloaded this file after I upgraded to the newest SVN - I’ll hold off on upgraded fog until this is resolved.

    I’m out the rest of this week (food poisoning :( ) but will take a video of the surface next week.


  • Developer

    @wwarsin said:

    I upgraded to SVN 5798 and am still using the Microsoft Model 1663 network adapter.

    Whenever you upgrade you are using the current official FOG trunk kernel which does not have the patch included (to add the mentioned IDs).

    Probably good if you can take a video of the screen so we might have a chance to see if there is an error and when exactly things go wrong. As I don’t have a surface device I need your assistance to figure this out. But it looks like we are making progress. At least we seam to have networking up again - maybe only part of the time, not sure why?!



  • @Sebastian-Roth I just tested with the vanilla bzimage and in debug mode the network works! However when i tried to capture an image the network fails and reports

    I upgraded to SVN 5798 and am still using the Microsoft Model 1663 network adapter.

    Edit:
    I seem to be getting mixed results now… I created the capture task and it first booted to the no network screen. Then it booted to the white FOG splashscreen and another attempt looked like it loaded correctly to start capturing the image however the screen went by so fast that i couldn’t catch any errors (the surface just restarted)

    0_1451318841760_Fog_Failed_Capture.png



  • @Sebastian-Roth Hi Sebastian, I will try to get this to you next week or the week after due to the Holidays…

    Have a great Christmas and New Years!


  • Developer

    @wwarsin I am wondering if the device is actually recognized. Could you please boot the device into debug mode. When you see the shell unplug the USB NIC. Wait for a few seconds and plug it back in. Then run dmesg | tail -n 20. Would be great if you could take a picture of what you see on the screen. Hopefully we might see something similar to this: https://bugzilla.redhat.com/show_bug.cgi?id=1236679 (we don’t have the same issue in FOG, just posting this to show what the output might look like).

    I stumbled upon a newer driver (kernel source code) on realtek.com.tw. Version 2.05.0 (2015/8/13) instead of the 1.08.2 (2014) included in kernel 4.3.3. Will try to build a kernel with that realtek code.

    Edit (read this first): I just remembered that I only added ID 045e:07ab (Microsoft Model 1552) to the kernel I compiled last because this was the device you reported using at first. So this means that using my kernel (the one I compiled 10 days ago) would not work with Microsoft Model 1663 (045e:07c6). I compiled two new kernel images that you can find here: https://drive.google.com/folderview?id=0B-bOeHjoUmyMV095YVpsR3U5VFk&usp=sharing
    Both added with the Microsoft device IDs for 1552 AND 1663! bzImage.vanilla is a plain kernel and bzImage.realtek I compiled with the earlier mentioned driver code from the realtek website. Try vanilla first as I hope that we don’t need the newest realtek driver. Again, boot into debug mode and try the commands. As well try re-plugging the device to see if it is recognized (see dmesg | tail -n 20).


  • Developer

    @wwarsin Thanks for trying and reporting. Does not look very good to me. But as I can see in one of your earlier posts it has kind of worked (at least not “No such file or directory”) in the past. What messages did you see this time when booting up into debug mode??



  • @Sebastian-Roth I downloaded the bzimage you linked to and replaced the one /var/www/fog/service/ipxe and booted to

    Here are the results of the two commands:

    [root@fogclient /]# ls -al /sys/class/net/eth0/device/driver/module
    ls: cannot access /sys/class/net/eth0/device/driver/module: No such file or directory
    [root@fogclient /]# cd /sys/class/net
    [root@fogclient /]# ls
    lo@
    [root@fogclient /]# dmesg | grep 8152
    [    1.070704] usbcore: registered new interface driver r0152
    [root@fogclient /]#
    

  • Developer

    @Tom-Elliott Talking about this patch: http://svn.exactcode.de/t2/trunk/package/base/linux/surface-dock-eth.patch as I feel like this might be going down the right lane with this issue. Should work with any kernel version I reckon. Keep cdc_ether disabled, please.


  • Senior Developer

    @Sebastian-Roth What patch are you referring to?

    I removed my edits yes, cause you are absolutely correct that they didn’t matter anyway.

    So my custom patches are gone. I have not built a 4.3.3 kernel yet though I am aware it was released.

    I have not added CDC_ETHER either as I really don’t think it would matter either.

    I am still doing the mmc patch (which is part of why the slow to update to latest all the time – among working on the init scripts) but I know that part has no relevance to the issues in this thread.


  • Developer

    @wwarsin Yes, we changed the iPXE script which is probably why you don’t see it asking for the IP address anymore. Upgrading to the latest version means that the latest kernel has been downloaded as well. I am not sure if the patch is still part of the kernel. @Tom-Elliott??

    Re-reading all the posts I saw that you had a full dmesg output posted with your very first question already (thank god you did!):

    cdc_ether 1-2.4:2.0 eth0: register 'cdc_ether' at usb-0000:00:14.0-2.4, CDC Ethernet Device, 60:45:bd:f9:62:b6
    ...
    cdc_ether 1-2.4:2.0 eth0: kevent 12 may have been dropped
    
    

    So to me this means that an older kernel version was magicaly able to run your USB NIC device with the cdc_ether driver. I am not sure why this is not working anymore even if we add this driver back to the kernel. But I am not confident with the cdc_ether driver anyway and would hope that we don’t need it at all (RTL8152 chips being blacklisted is just on thing I don’t like about it).

    So we are back to the question: Are we able to make this USB NIC work with the r8152 driver and possibly how??

    In your last post I see

    cat: /sys/class/net/eth0/carrier: No such file or directory
    cat: /sys/class/net/eth0/carrier: Invalid argument
    

    Looks like eth0 is not available on the first try but pops up at some point. My guess is that Tom removed the patch after you last try as it didn’t seam to work. So the current version you installed when upgrading to the latest version might have installed a kernel without patch. But from what I see in your posts I have a feeling that we are pretty close to get this work with the patch.

    Feel free to try this kernel again: https://drive.google.com/folderview?id=0B-bOeHjoUmyMV095YVpsR3U5VFk&usp=sharing
    Boot into debug mode and wait for a few seconds. Then see what you get from ls -al /sys/class/net/eth0/device/driver/module and dmesg | grep 8152



  • @Sebastian-Roth

    I upgraded to SVN 5762 and the surface booted directly into debug mode (instead of having to manually enter the IP address of the fog server) but network still isn’t working. I ran lsusb again because i’ve acquired the Microsoft Model 1663 (USB to ethernet Gigabit adapter).
    045e:07c6

    If you prefer i test with the model 1552 (10/100 adapter) let me know but we’ll probably use the 1663 once we get this working.

    0_1450371194970_20151217_115206_resized.jpg


  • Developer

    @Imperilled Kernel won’t make a difference in your particular case! I would really like to see your issue fixed as well. Could you please open a new topic on this. Makes it a lot easier for everyone to follow if we don’t discuss two different topics in one thread! Please let us know what error you see when trying to upload an Multiple Partition - Single Disk image and we should be able to help you on this.



  • I will try your kernel tomorrow. I need to find a solution because my image are still in raw… 124Gb X 15 surface it’s a day to deploy with only one adapter :(


  • Developer

    @wwarsin Can you please try ifconfig -a and as well ls -al /sys/class/net/eth0/device/driver/module (to see which driver is actually used)…

    And as well could you please check dmesg dmesg | grep eth



  • @Tom-Elliott @Sebastian-Roth

    I installed the lastest SVN (5686) and attempted to boot with the bzimage it installed as well as replaced the bzimage file with the link you provided and I still receive the no network error…

    ifconfig in debug mode also still only shows lo

    Error ident-mapping new memmap (0x13ac72000)!
    Starting logging:* OK
    Populating /dev using udev: udevd[2950]: error creating epoll fd: Function not implemented
    done
    Initializing random number generator... done.
    Starting eth0 interface
    ip: SIOCSIFFLAGS: No such device
    cat: /sys/class/net/eth0/carrier: No such file or directory
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    cat: /sys/class/net/eth0/carrier: Invalid argument
    ssh-keygen: generating ew boot keys: RSA DSA ECDSA ED25519
    Starting sshd: OK
    

    I may not be able to do any testing until mid/late next week (or even the week after next) after today.



348
Online

5.8k
Users

13.1k
Topics

123.3k
Posts