Surface 3 Fails to Image
-
@wwarsin Yes, we changed the iPXE script which is probably why you don’t see it asking for the IP address anymore. Upgrading to the latest version means that the latest kernel has been downloaded as well. I am not sure if the patch is still part of the kernel. @Tom-Elliott??
Re-reading all the posts I saw that you had a full dmesg output posted with your very first question already (thank god you did!):
cdc_ether 1-2.4:2.0 eth0: register 'cdc_ether' at usb-0000:00:14.0-2.4, CDC Ethernet Device, 60:45:bd:f9:62:b6 ... cdc_ether 1-2.4:2.0 eth0: kevent 12 may have been dropped
So to me this means that an older kernel version was magicaly able to run your USB NIC device with the cdc_ether driver. I am not sure why this is not working anymore even if we add this driver back to the kernel. But I am not confident with the cdc_ether driver anyway and would hope that we don’t need it at all (RTL8152 chips being blacklisted is just on thing I don’t like about it).
So we are back to the question: Are we able to make this USB NIC work with the r8152 driver and possibly how??
In your last post I see
cat: /sys/class/net/eth0/carrier: No such file or directory cat: /sys/class/net/eth0/carrier: Invalid argument
Looks like eth0 is not available on the first try but pops up at some point. My guess is that Tom removed the patch after you last try as it didn’t seam to work. So the current version you installed when upgrading to the latest version might have installed a kernel without patch. But from what I see in your posts I have a feeling that we are pretty close to get this work with the patch.
Feel free to try this kernel again: https://drive.google.com/folderview?id=0B-bOeHjoUmyMV095YVpsR3U5VFk&usp=sharing
Boot into debug mode and wait for a few seconds. Then see what you get fromls -al /sys/class/net/eth0/device/driver/module
anddmesg | grep 8152
-
@Sebastian-Roth What patch are you referring to?
I removed my edits yes, cause you are absolutely correct that they didn’t matter anyway.
So my custom patches are gone. I have not built a 4.3.3 kernel yet though I am aware it was released.
I have not added CDC_ETHER either as I really don’t think it would matter either.
I am still doing the mmc patch (which is part of why the slow to update to latest all the time – among working on the init scripts) but I know that part has no relevance to the issues in this thread.
-
@Tom-Elliott Talking about this patch: http://svn.exactcode.de/t2/trunk/package/base/linux/surface-dock-eth.patch as I feel like this might be going down the right lane with this issue. Should work with any kernel version I reckon. Keep cdc_ether disabled, please.
-
@Sebastian-Roth I downloaded the bzimage you linked to and replaced the one /var/www/fog/service/ipxe and booted to
Here are the results of the two commands:
[root@fogclient /]# ls -al /sys/class/net/eth0/device/driver/module ls: cannot access /sys/class/net/eth0/device/driver/module: No such file or directory [root@fogclient /]# cd /sys/class/net [root@fogclient /]# ls lo@ [root@fogclient /]# dmesg | grep 8152 [ 1.070704] usbcore: registered new interface driver r0152 [root@fogclient /]#
-
@wwarsin Thanks for trying and reporting. Does not look very good to me. But as I can see in one of your earlier posts it has kind of worked (at least not “No such file or directory”) in the past. What messages did you see this time when booting up into debug mode??
-
@wwarsin I am wondering if the device is actually recognized. Could you please boot the device into debug mode. When you see the shell unplug the USB NIC. Wait for a few seconds and plug it back in. Then run
dmesg | tail -n 20
. Would be great if you could take a picture of what you see on the screen. Hopefully we might see something similar to this: https://bugzilla.redhat.com/show_bug.cgi?id=1236679 (we don’t have the same issue in FOG, just posting this to show what the output might look like).I stumbled upon a newer driver (kernel source code) on realtek.com.tw. Version 2.05.0 (2015/8/13) instead of the 1.08.2 (2014) included in kernel 4.3.3. Will try to build a kernel with that realtek code.
Edit (read this first): I just remembered that I only added ID 045e:07ab (Microsoft Model 1552) to the kernel I compiled last because this was the device you reported using at first. So this means that using my kernel (the one I compiled 10 days ago) would not work with Microsoft Model 1663 (045e:07c6). I compiled two new kernel images that you can find here: https://drive.google.com/folderview?id=0B-bOeHjoUmyMV095YVpsR3U5VFk&usp=sharing
Both added with the Microsoft device IDs for 1552 AND 1663! bzImage.vanilla is a plain kernel and bzImage.realtek I compiled with the earlier mentioned driver code from the realtek website. Try vanilla first as I hope that we don’t need the newest realtek driver. Again, boot into debug mode and try the commands. As well try re-plugging the device to see if it is recognized (seedmesg | tail -n 20
). -
@Sebastian-Roth Hi Sebastian, I will try to get this to you next week or the week after due to the Holidays…
Have a great Christmas and New Years!
-
@Sebastian-Roth I just tested with the vanilla bzimage and in debug mode the network works! However when i tried to capture an image the network fails and reports
I upgraded to SVN 5798 and am still using the Microsoft Model 1663 network adapter.
Edit:
I seem to be getting mixed results now… I created the capture task and it first booted to the no network screen. Then it booted to the white FOG splashscreen and another attempt looked like it loaded correctly to start capturing the image however the screen went by so fast that i couldn’t catch any errors (the surface just restarted) -
@wwarsin said:
I upgraded to SVN 5798 and am still using the Microsoft Model 1663 network adapter.
Whenever you upgrade you are using the current official FOG trunk kernel which does not have the patch included (to add the mentioned IDs).
Probably good if you can take a video of the screen so we might have a chance to see if there is an error and when exactly things go wrong. As I don’t have a surface device I need your assistance to figure this out. But it looks like we are making progress. At least we seam to have networking up again - maybe only part of the time, not sure why?!
-
@Sebastian-Roth I assume bzimage is the kernel? I downloaded this file after I upgraded to the newest SVN - I’ll hold off on upgraded fog until this is resolved.
I’m out the rest of this week (food poisoning ) but will take a video of the surface next week.
-
@wwarsin Sorry for my late reply. Have been without internet for some days. Hope you are better again!
Yes, bzImage is the kernel. Re-download and put in correct path (/var/www …) if you have upgraded. But there shouldn’t be a need to upgrade right now if you don’t see other issues. Probably best if you don’t to hopefully have more stable test results.
Looking forward to the video.
-
I’m finally back in the office for a few days! Here’s the video: https://youtu.be/UFYNan98lmw This is with the Vanilla bzimage.
-
@wwarsin Looking at the last few lines of output it seams a bit like FOG doesn’t find any partitions on your device. It says “Saving Partition Tables (GTP)” and then “Task complete” straight way.
Can you please run a debug session on the surface and see what you get fromlsblk
-
-
@wwarsin I am really sorry that it seams like we have just been turning in circles! But I might have an idea of what is wrong here now. Spent a long time closely looking at the pictures and videos you posted and I might have found something.
Picture and video show version 5798 (FOG boot logo message) which FOG polls from the server. But later on it says “Re-reading Partition Tables”. As far as I could find out in the code repository this string was changed in version 5752. So I am pretty sure you have version 5798 installed on the server but your init files did not get updated on the last upgrade - so the clients booting up use an older version.
Please try this on your FOG server (as root!):
cd /var/www/fog/service/ipxe mv init.xz init.xz_old wget https://fogproject.org/inits/init.xz chown www-data:www-data init.xz
Then try booting your surface again. Hopefully this will make a difference!
Tom has changed the disk/partition enumeration from
lsblk
tofdisk
lately. So I am really keen to see if this is working on your device out of the box. Looking forward to hearing from you! -
Success! It’s capturing the image now!
I’ll write back a little later, i’m going to try a deploy after this finishes to verify it works completely.
I appreciate all of the help you and Tom have been!
-
@Sebastian-Roth
So close! Now i’m getting the following error when i try to deploy:Checking write caching status on HDD…Failed
Could not set caching status (enableWriteCache)
Edit: I still have “/dev/mmcblk0” set as the Host Primary Disk
-
@Tom-Elliott I think center alignment of those pieces in that picture below don’t look very good. I suggest making only the initial logo/credits box be centered, and everything else left aligned.
-
@wwarsin Great! Can you please run another debug session and let us know what you get from
hdparm -i /dev/mmcblk0
Yeah, pretty close we are indeed!
-
Here’s the output of the command:
# hdparm -i /dev/mmcblk0 /dev/mmcblk0: HDIO_DRIVE_CMD(identify) failed: Invalid argument HDIO_Get_IDENTITY failed: Invalid argument #