X1 AIO Desktop - i7 vPro network issue with Intel I219-LM [was: Make new bzImage...]
-
@mandrade It stops at 55% on all three machines?
-
To be honest at this stage only tested the one host. The Idea was to first build an image to then image the other two, and subsequently others if and when they arrive. So at this point I was attempting to run an inventory to then capture that image in question.
I have an update though, if I unplug the dock and then re-plug it progresses through to the selection menu. However all the menu items are in red. Also running any of the items fails with an error. Saying “press ‘S’ for iPXE prompt or machine will reboot in 10 seconds”.
-
@mandrade You could also try
ipxe.kpxe
andipxe.kkpxe
-
Tried both ipxe.kpxe and ipxe.kkpxe. Both times it hangs at:
“iPXE initializing devices”
-
@mandrade Then I’d suggest doing the packet capture as Sebastian described, so we can see what’s going on.
-
@Sebastian-Roth said in Make new bzImage...:
sudo tcpdump -i eth0 -w hang.pcap host
I have that tcdump for you, let me know where you would like me to send it to.
-
@mandrade just as a test could you try a different system of the same model? If all of the models fail in the same way it is most likely the ipxe drivers or way they’re hanging that is the problem. Of course still run a tcpdump so we might see exactly what’s happening, at the network level at least. If only that one machine is seeing the issue I might suggest looking at possibly a memory issue on the machine. Of course this could be as simple as a bios firmware issue too.
-
Sure can try another machine but I think you may be right I think this may be an ipxe driver issue. I have also tried updating the BIOS version on the machine and that has made no difference. I’ve attempted to play around with the Network boot option also with no change.
I have the tcpdump here with me who could I send it to? e-mail address?
-
@mandrade Checking out your PCAP file right now. I see that the client (x.x.x.142) first requests
undionly.kpxe
from the FOG/TFTP server. This seems to work fine. Then there is no packets for more than 80 seconds (!) - I guess you stopped the client, right? After that the client requestsundionly.kkpxe
from the server. Please tell me this is because you where changing things while capturing. It’s not a problem. I just really hope that it’s not the client requesting two different files on it’s own. Next the client requestsdefault.ipxe
- fine - and thenboot.php
. Here I see that the server responds with a chunked answer. In theory iPXE should be able to handle chunked HTTP transfer properly but I am not sure if I’ve ever seen this on one of my machines. Maybe check your apache and PHP configuration to see why it would send chunked HTTP answer.POST /fog/service/ipxe/boot.php HTTP/1.1 Content-Length: 152 Content-Type: application/x-www-form-urlencoded Connection: keep-alive User-Agent: iPXE/1.0.0+ (9f91d) Host: x.x.x.14 mac0=aa%3Abb%3Acc%3Add%3Aee%3Aff&arch=x86_64&platform=pcbios&product=20FB001XAU&manufacturer=20FB001XAU&ipxever=1.0.0+%20(9f91d)&filename=undionly.kkpxe
Above the request directly followed by the answer:
HTTP/1.1 200 OK Date: Fri, 27 May 2016 02:30:55 GMT Server: Apache/2.4.7 (Ubuntu) Connection: close X-Frame-Options: sameorigin X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Strict-Transport-Security: max-age=31536000 Set-Cookie: PHPSESSID=53k5223j5h32kjh5kj3h453345432kh; path=/ Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/plain;charset=UTF-8 a18 #!ipxe set fog-ip x.x.x.14 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} ... :fog.reg kernel bzImage loglevel=4 initrd=init.xz root=/dev/ram0 rw ramdisk_size=127000 keymap= web=x.x.x.14/fog/ consoleblank=0 rootfstype=ext4 loglevel=4 mode=autoreg imgfetch init.xz boot || goto MENU ... 0
Why do I think this has something to do with the chunked transfer? Well I am not really sure. What I see in the PCAP is that the server sends the full answer (in two chunks, first 532 bytes and second 2596 bytes) plus the final FIN,ACK packet. The client would need to ACK(nowledge) those 3128 bytes via FIN,ACK as well to finish and close the TCP connection. But the client only sends an ACK for 1981 bytes and is silent from then on as if it would have died - maybe it actually freezes at this stage?!? Are you able to ping the client from the server? Please give that a try while the PXE booting is going on. It should work up to the point where it tries to request
boot.php
.If this would be a network problem - like packets gone lost - we would see the client requesting the missing bytes via TCP retransmit. But there are only TCP retransmits from the server side as the client has most probably died.
Would you be able to build/download a iPXE binary as described in https://wiki.fogproject.org/wiki/index.php/IPXE#rom-o-matic.eu. Right at the end you want to add
http,httpcore,httpconn
into the “Enable Debug” field. Please take a picture or video of what you see on screen then. -
@mandrade Ok, I found out what’s causing the chunked HTTP transfer. Possibly disabling this for the moment might fix your issue. Please edit
/var/www/fog/service/ipxe/boot.php
(or/var/www/html/fog/service/ipxe/boot.php
) and delete (or comment) the lineflush();
(yes only that single one).No server or service restart needed. Simply boot up your client and see if it works.
@Tom-Elliott The
flush()
call seems to really push the data to be send to the client - forcing the apache webserver to send it as chunked response because it is not allowed to wait and calculate the full content length before sending the data. Let’s wait and see if this is actually causing the trouble here. Then we might think about reverting this change partly. We don’t need to push it that hard - usingflush()
- I’d say -
@Sebastian-Roth change made to remove all the extra flushing.
-
ok so removing line flush(); from /var/www/html/fog/service/ipxe/boot.php means now it no longer hangs at the bootp.php but now hangs at:
-
oh and for the record, for this machine ipxe will only boot up with undionly.kkpxe. If I use undionly.kpxe it hangs.
-
I tried renaming the bg.png file and it went further, only now it hangs here:
![0_1464660646932_20160531_115220.jpg](Uploading 100%)
-
woops forgot to resize:
-
@mandrade On the picture I see that it seems to pop back to the iPXE shell after timing out in the bzImage. Does it hang there or are you able to type in commands? Please run the two iPXE commands
ifstat
andnstat
to see more information about the state of the network (please take a picture). As well runkernel http://x.x.x.142/fog/service/ipxe/bzImage
(use your FOG server IP instead of x.x.x) to see if it is able to load the linux kernel via HTTP.At this stage you have the same two options. Either generate a debugging enabled iPXE binary (see my message below) or you take another packet dump with
tcpdump
and I can have a look! -
@Sebastian-Roth said in Make new bzImage...:
ither generate a debugging enabled iPXE binar
I am able to type commands I type ‘S’ to be taken to the iPXE shell. So I ran ifstat, nstat and also tried to load the kernel. This was the result:
-
@mandrade if undionly.kkpxe works why not just use it? Those systems that work on undionly.kpxe shouldn’t have any issues using undionly.kkpxe and the systems that are currently hanging should work as well.
-
I meant the undionly.kkpxe boots past the ipxe boot screens and into the menus where I can inventory etc. The problem then comes when I make select a menu item like inventory for instance. The result you can see in the image below. I cannot inventory or take an image of this host using either undionly.kpxe or undionly.kkpxe only that the undionly.kkpxe gets further in the process.
-
@mandrade Thanks for the new screenshot. I guess I need another tcpdump capture of exactly this! Can you to another one (same command as mentioned before) and send it to me?