X1 AIO Desktop - i7 vPro network issue with Intel I219-LM [was: Make new bzImage...]
-
@Sebastian-Roth said in Make new bzImage...:
sudo tcpdump -i eth0 -w hang.pcap host
I have that tcdump for you, let me know where you would like me to send it to.
-
@mandrade just as a test could you try a different system of the same model? If all of the models fail in the same way it is most likely the ipxe drivers or way they’re hanging that is the problem. Of course still run a tcpdump so we might see exactly what’s happening, at the network level at least. If only that one machine is seeing the issue I might suggest looking at possibly a memory issue on the machine. Of course this could be as simple as a bios firmware issue too.
-
Sure can try another machine but I think you may be right I think this may be an ipxe driver issue. I have also tried updating the BIOS version on the machine and that has made no difference. I’ve attempted to play around with the Network boot option also with no change.
I have the tcpdump here with me who could I send it to? e-mail address?
-
@mandrade Checking out your PCAP file right now. I see that the client (x.x.x.142) first requests
undionly.kpxe
from the FOG/TFTP server. This seems to work fine. Then there is no packets for more than 80 seconds (!) - I guess you stopped the client, right? After that the client requestsundionly.kkpxe
from the server. Please tell me this is because you where changing things while capturing. It’s not a problem. I just really hope that it’s not the client requesting two different files on it’s own. Next the client requestsdefault.ipxe
- fine - and thenboot.php
. Here I see that the server responds with a chunked answer. In theory iPXE should be able to handle chunked HTTP transfer properly but I am not sure if I’ve ever seen this on one of my machines. Maybe check your apache and PHP configuration to see why it would send chunked HTTP answer.POST /fog/service/ipxe/boot.php HTTP/1.1 Content-Length: 152 Content-Type: application/x-www-form-urlencoded Connection: keep-alive User-Agent: iPXE/1.0.0+ (9f91d) Host: x.x.x.14 mac0=aa%3Abb%3Acc%3Add%3Aee%3Aff&arch=x86_64&platform=pcbios&product=20FB001XAU&manufacturer=20FB001XAU&ipxever=1.0.0+%20(9f91d)&filename=undionly.kkpxe
Above the request directly followed by the answer:
HTTP/1.1 200 OK Date: Fri, 27 May 2016 02:30:55 GMT Server: Apache/2.4.7 (Ubuntu) Connection: close X-Frame-Options: sameorigin X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Strict-Transport-Security: max-age=31536000 Set-Cookie: PHPSESSID=53k5223j5h32kjh5kj3h453345432kh; path=/ Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/plain;charset=UTF-8 a18 #!ipxe set fog-ip x.x.x.14 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} ... :fog.reg kernel bzImage loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 keymap= web=x.x.x.14/fog/ consoleblank=0 rootfstype=ext4 loglevel=4 mode=autoreg imgfetch init.xz boot || goto MENU ... 0
Why do I think this has something to do with the chunked transfer? Well I am not really sure. What I see in the PCAP is that the server sends the full answer (in two chunks, first 532 bytes and second 2596 bytes) plus the final FIN,ACK packet. The client would need to ACK(nowledge) those 3128 bytes via FIN,ACK as well to finish and close the TCP connection. But the client only sends an ACK for 1981 bytes and is silent from then on as if it would have died - maybe it actually freezes at this stage?!? Are you able to ping the client from the server? Please give that a try while the PXE booting is going on. It should work up to the point where it tries to request
boot.php
.If this would be a network problem - like packets gone lost - we would see the client requesting the missing bytes via TCP retransmit. But there are only TCP retransmits from the server side as the client has most probably died.
Would you be able to build/download a iPXE binary as described in https://wiki.fogproject.org/wiki/index.php/IPXE#rom-o-matic.eu. Right at the end you want to add
http,httpcore,httpconn
into the “Enable Debug” field. Please take a picture or video of what you see on screen then. -
@mandrade Ok, I found out what’s causing the chunked HTTP transfer. Possibly disabling this for the moment might fix your issue. Please edit
/var/www/fog/service/ipxe/boot.php
(or/var/www/html/fog/service/ipxe/boot.php
) and delete (or comment) the lineflush();
(yes only that single one).No server or service restart needed. Simply boot up your client and see if it works.
@Tom-Elliott The
flush()
call seems to really push the data to be send to the client - forcing the apache webserver to send it as chunked response because it is not allowed to wait and calculate the full content length before sending the data. Let’s wait and see if this is actually causing the trouble here. Then we might think about reverting this change partly. We don’t need to push it that hard - usingflush()
- I’d say -
@Sebastian-Roth change made to remove all the extra flushing.
-
ok so removing line flush(); from /var/www/html/fog/service/ipxe/boot.php means now it no longer hangs at the bootp.php but now hangs at:
-
oh and for the record, for this machine ipxe will only boot up with undionly.kkpxe. If I use undionly.kpxe it hangs.
-
I tried renaming the bg.png file and it went further, only now it hangs here:
![0_1464660646932_20160531_115220.jpg](Uploading 100%)
-
woops forgot to resize:
-
@mandrade On the picture I see that it seems to pop back to the iPXE shell after timing out in the bzImage. Does it hang there or are you able to type in commands? Please run the two iPXE commands
ifstat
andnstat
to see more information about the state of the network (please take a picture). As well runkernel http://x.x.x.142/fog/service/ipxe/bzImage
(use your FOG server IP instead of x.x.x) to see if it is able to load the linux kernel via HTTP.At this stage you have the same two options. Either generate a debugging enabled iPXE binary (see my message below) or you take another packet dump with
tcpdump
and I can have a look! -
@Sebastian-Roth said in Make new bzImage...:
ither generate a debugging enabled iPXE binar
I am able to type commands I type ‘S’ to be taken to the iPXE shell. So I ran ifstat, nstat and also tried to load the kernel. This was the result:
-
@mandrade if undionly.kkpxe works why not just use it? Those systems that work on undionly.kpxe shouldn’t have any issues using undionly.kkpxe and the systems that are currently hanging should work as well.
-
I meant the undionly.kkpxe boots past the ipxe boot screens and into the menus where I can inventory etc. The problem then comes when I make select a menu item like inventory for instance. The result you can see in the image below. I cannot inventory or take an image of this host using either undionly.kpxe or undionly.kkpxe only that the undionly.kkpxe gets further in the process.
-
@mandrade Thanks for the new screenshot. I guess I need another tcpdump capture of exactly this! Can you to another one (same command as mentioned before) and send it to me?
-
Hi, I had already sent a second TCPDUMP to the supplied e-mail. Let me know if you didn’t get it I’ll re-send if need be.
-
@mandrade Based on what I see of the image as found in the reply link It appears the place where it’s trying to download the file is unreachable. Maybe this is due to firewall? The IP address looks correct, but what’s the IP address of the system? Maybe a DHCP command is needed?
-
Thanks for the response Tom. Other workstations work fine, I am able to image them without issue. I did mention earlier that this setup is slightly different from the others in that this laptop doesn’t have an onboard NIC. It uses a OneLink+ mini dock. The thing is I can see it gets an IP from the DHCP server before it starts to the boot process. Somewhere along the line it loses it’s config somehow and then Fog server is no longer reachable.
-
@mandrade The boot process is a bit strange.
Most likely where you’re seeing the issue is one of a couple potential pitfalls. The biggest one, I believe, is that the tftp files you’re using appear to be from a newer (developmental) version of fog. As I understand this is a debug generated set of files, I suppose this isn’t the underlying issue, rather a symptom of something else.
The boot process, in simple terms, is not as simple as the initial DHCP. The first DHCP is for PXE to grab information from. Then it hands the system off to the bootfile, in your case undionly.{k,kk}pxe. When it is in undonly side, it needs to reestablish the DHCP protocol. Do you see the system getting an IP from there? Normally it would reboot, but if it’s just skipping and trying to boot, maybe the bootfile is doing something differently as well? Issues that could be happening, I suppose, are STP (Spanning Tree Protocol), firewall blocking this series of data, system is on a vlan that can’t reach the fog server to download the http data, or possibly (unlikely) all of the above.
-
I am indeed using a trunked version of FOG. There is no firewall, VLAN separation between the host and FOG. It does get an IP otherwise I would not be able to get to the menu to select an option am I right?
When I select inventory from the menu list it starts to attempt to load the bzImage but fails with connection timed out. All just very strange! I then run the same thing only this time on another host that has an onboard NIC and BOOM it works fine. So my take is that it’s something to do with the mini DOCK. Perhaps drivers? I dunno pure speculation from my part.