X1 AIO Desktop - i7 vPro network issue with Intel I219-LM [was: Make new bzImage...]
-
@mandrade Did you start with the config file from the github site? I think you can just rename it as the default config then run menuconfig program picking your additional network drivers. It will use those settings as the defaults as you run through the menuconfig. Understand its been over 10 years since I built linux kernels but that is what I remember.
Ref: https://github.com/FOGProject/fogproject/tree/dev-branch/kernel You probably want the TomElliott.config.64 one.
[edit] It would be interesting to know what network adapter option you are loading to get this network adapter to work, so the devs can include it in the official kernel. That way you wouldn’t have to keep rebuilding the kernel as new official releases come out.
-
@george1421 said in Make new bzImage...:
think you can just rename it as the default config then run menuconfig progra
I used the standard Kitchensink one that is available when you download FOG_1.2.0 so no I did not. I will give that a try.
Thank you very much for the suggestions and help thus far!
-
@george1421 said:
Just for clarity ven_8086&dev_156F (8086:156F) comes back to a Intel I219-LM NIC.
Sure but not every I219-LM NIC is the same. Check out this listing to see when support for the different NICs (PCI IDs) was added. That’s why I always ask people about PCI IDs. Support for the 8086:156f was added with kernel 4.1 (while 15e3 seems to have been added only very recently - kernel version 4.6).
So I guess your NIC is fully supported using the 4.5 kernel from https://fogproject.org/kernels/. Guess this is a network issue. Spanning tree, 802.1x/MAB/NAC, EEE/802.3az, auto-negotiation!? @mandrade To see if I stand correct can you please use a dumb mini switch to connect between client and your main network. Then try compatibility test again.
@Tom-Elliott I think we should add
lspci
output (or simply PCI IDs) to the compatibility test screen when failing! This way people don’t need to extra-boot and find out about this information. Shouldn’t take much effort! What do you think? -
Hey Guys,
So I’m now able to image my new machines, it works a treat!
In the end I decided to scrap attempting to build my own Kernel and went with a Trunk version of FOG. I am aware that it isn’t stable but after deploying the new bleeding edge version of FOG all my machines are supported and are imaging without any issue.
We’re about to do a bulk deploy of 21 Machines, will let everybody know how that went. Experiences thus far, although frustrating in the beginning, are good. Thanks again for all the help supplied.
-
@mandrade I’m glad you settled on a solid choice here.
The thing you have to remember is the fog 1.2.0 is almost 3 years old. The devs have put great effort in getting the trunk build to the point it is today at the expense of delaying the official release of 1.3.0. They felt it was vital to build support for Win10, uefi support, and gpt disks into the trunk build since 1.3.0 is the last (intended) release of FOG until FOG 2.0 comes out.
-
@george1421 Intended.
-
Hi guys,
So the feedback is two fold. Success on the all in one ThinkCentre machines. We are now deploying without issue to those.
However, we have three and soon to be more Lenovo ThinkPad X1 Carbons to deploy. They have no onboard NIC and use an external Onelink+ Dock. I can PXE boot but when it starts loading the undionly.kpxe image it hangs.
Now I have tried boot to undionly.kkpxe and it gets further. However, it stop again at 55% as can be seen below:
The NIC is a Realtek USB RTL8135
-
@mandrade I’d be willing (and really interested) to find out what’s going on there. To properly diagnose this I need a packet dump of this. Please install the package
tcpdump
on your FOG server and runsudo tcpdump -i eth0 -w hang.pcap host x.x.x.x
(where x.x.x.x is the IP of the client you are trying to boot). Just leave that command sitting there and boot up your client. When it hangs stop tcpdump (ctrl+c) and send me that hang.pcap file via mail (will send you a private message with my mail address).Does it always hang at 55%??
-
Sure will see what I can do. Yep always at 55%.
-
@mandrade It stops at 55% on all three machines?
-
To be honest at this stage only tested the one host. The Idea was to first build an image to then image the other two, and subsequently others if and when they arrive. So at this point I was attempting to run an inventory to then capture that image in question.
I have an update though, if I unplug the dock and then re-plug it progresses through to the selection menu. However all the menu items are in red. Also running any of the items fails with an error. Saying “press ‘S’ for iPXE prompt or machine will reboot in 10 seconds”.
-
@mandrade You could also try
ipxe.kpxe
andipxe.kkpxe
-
Tried both ipxe.kpxe and ipxe.kkpxe. Both times it hangs at:
“iPXE initializing devices”
-
@mandrade Then I’d suggest doing the packet capture as Sebastian described, so we can see what’s going on.
-
@Sebastian-Roth said in Make new bzImage...:
sudo tcpdump -i eth0 -w hang.pcap host
I have that tcdump for you, let me know where you would like me to send it to.
-
@mandrade just as a test could you try a different system of the same model? If all of the models fail in the same way it is most likely the ipxe drivers or way they’re hanging that is the problem. Of course still run a tcpdump so we might see exactly what’s happening, at the network level at least. If only that one machine is seeing the issue I might suggest looking at possibly a memory issue on the machine. Of course this could be as simple as a bios firmware issue too.
-
Sure can try another machine but I think you may be right I think this may be an ipxe driver issue. I have also tried updating the BIOS version on the machine and that has made no difference. I’ve attempted to play around with the Network boot option also with no change.
I have the tcpdump here with me who could I send it to? e-mail address?
-
@mandrade Checking out your PCAP file right now. I see that the client (x.x.x.142) first requests
undionly.kpxe
from the FOG/TFTP server. This seems to work fine. Then there is no packets for more than 80 seconds (!) - I guess you stopped the client, right? After that the client requestsundionly.kkpxe
from the server. Please tell me this is because you where changing things while capturing. It’s not a problem. I just really hope that it’s not the client requesting two different files on it’s own. Next the client requestsdefault.ipxe
- fine - and thenboot.php
. Here I see that the server responds with a chunked answer. In theory iPXE should be able to handle chunked HTTP transfer properly but I am not sure if I’ve ever seen this on one of my machines. Maybe check your apache and PHP configuration to see why it would send chunked HTTP answer.POST /fog/service/ipxe/boot.php HTTP/1.1 Content-Length: 152 Content-Type: application/x-www-form-urlencoded Connection: keep-alive User-Agent: iPXE/1.0.0+ (9f91d) Host: x.x.x.14 mac0=aa%3Abb%3Acc%3Add%3Aee%3Aff&arch=x86_64&platform=pcbios&product=20FB001XAU&manufacturer=20FB001XAU&ipxever=1.0.0+%20(9f91d)&filename=undionly.kkpxe
Above the request directly followed by the answer:
HTTP/1.1 200 OK Date: Fri, 27 May 2016 02:30:55 GMT Server: Apache/2.4.7 (Ubuntu) Connection: close X-Frame-Options: sameorigin X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Strict-Transport-Security: max-age=31536000 Set-Cookie: PHPSESSID=53k5223j5h32kjh5kj3h453345432kh; path=/ Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/plain;charset=UTF-8 a18 #!ipxe set fog-ip x.x.x.14 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} ... :fog.reg kernel bzImage loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 keymap= web=x.x.x.14/fog/ consoleblank=0 rootfstype=ext4 loglevel=4 mode=autoreg imgfetch init.xz boot || goto MENU ... 0
Why do I think this has something to do with the chunked transfer? Well I am not really sure. What I see in the PCAP is that the server sends the full answer (in two chunks, first 532 bytes and second 2596 bytes) plus the final FIN,ACK packet. The client would need to ACK(nowledge) those 3128 bytes via FIN,ACK as well to finish and close the TCP connection. But the client only sends an ACK for 1981 bytes and is silent from then on as if it would have died - maybe it actually freezes at this stage?!? Are you able to ping the client from the server? Please give that a try while the PXE booting is going on. It should work up to the point where it tries to request
boot.php
.If this would be a network problem - like packets gone lost - we would see the client requesting the missing bytes via TCP retransmit. But there are only TCP retransmits from the server side as the client has most probably died.
Would you be able to build/download a iPXE binary as described in https://wiki.fogproject.org/wiki/index.php/IPXE#rom-o-matic.eu. Right at the end you want to add
http,httpcore,httpconn
into the “Enable Debug” field. Please take a picture or video of what you see on screen then. -
@mandrade Ok, I found out what’s causing the chunked HTTP transfer. Possibly disabling this for the moment might fix your issue. Please edit
/var/www/fog/service/ipxe/boot.php
(or/var/www/html/fog/service/ipxe/boot.php
) and delete (or comment) the lineflush();
(yes only that single one).No server or service restart needed. Simply boot up your client and see if it works.
@Tom-Elliott The
flush()
call seems to really push the data to be send to the client - forcing the apache webserver to send it as chunked response because it is not allowed to wait and calculate the full content length before sending the data. Let’s wait and see if this is actually causing the trouble here. Then we might think about reverting this change partly. We don’t need to push it that hard - usingflush()
- I’d say -
@Sebastian-Roth change made to remove all the extra flushing.