PXE-T01 File not found (FOG running on Ubuntu 16 with DNSMASQ 2.75) And sometimes PXE-E53



  • So I have two competing issues here.

    I have setup FOG 1.5.9-RC2 on a VM running Ubuntu 16.04 LTS. I install DNSMASQ 2.75 for proxy because in this test environment I do not have a configurable DHCP server. My router is currently handling DHCP.

    The first issue is occasionally for no reason I can’t even connect and get a PXE-E53 error. I have gotten this error when running TCPDUMP and it seems as though the computer isn’t even reaching out to the FOG server for a DHCP request. Any ideas of how I could troubleshoot this further?

    The second issue I am having is that sometimes I will get a dhcp address and connect all the way until i reach the point that I get something very similar to the following :
    6b427139-20ef-4385-a0f0-2d8a15379b59-image.png

    I took this picture from another post on here as i cannot screen shot my PXE test as it is a physical device.

    To further troubleshoot this issue I ran a TCPDUMP while attempting to PXE and it appears the server is getting the request for the undionly.kpxe.0 but then it fails for some reason.

    ca5cd874-36f1-4244-bf7c-09066a3ce490-image.png

    I followed this guide https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server for my ltsp.conf file.

    Anyone have any ideas how I can troubleshoot this more?

    Thanks!



  • @george1421 Thank you for all your help!

    I was able to get it to boot consistently by changing some adapters on the VM.

    For the capturing image issue i resolved that as well.

    An error has been detected!
    Init Version: 20200517
    e2fsck failed to check /dev/sda1 (shrinkPartition)

    The solution was to remove the capture task and replace it with a debug task. Once in debug I ran fsck on the /dev/sda1 partition. Then I changed the image to Multiple Partition Image Single Disk Non resizable in the web GUI on the fog server.

    Thanks again for all your help.

    • Jon

  • Moderator

    @jweick said in PXE-T01 File not found (FOG running on Ubuntu 16 with DNSMASQ 2.75) And sometimes PXE-E53:

    PXE-E52 ProxyDHCP offers were recieved. No DHCP offers were recieved.

    What is your dhcp server?

    It sounds like you have a network communications issue.

    Do you have a cheap unmanaged switch to test between the pxe booting computer and this 16 port switch?



  • @george1421

    Plugging the two computers into a 16 port gig switch produces the a new error:

    PXE-E52 ProxyDHCP offers were recieved. No DHCP offers were recieved.


  • Moderator

    @jweick At this time I don’t have a feeling that is a fog issue because at this time the fog server isn’t even in the picture.

    As a test, do you have a dumb/cheap/$20 switch that you can put between the pxe booting computer and the building switch? I’d like to test to see if that solves the pxe booting issue. I’m kind of thinking spanning tree issues at the moment.



  • @george1421

    I think it has 0 packets because the request isnt making to the proxy dhcp…

    I reran the tcpdump during a successful fog boot and got the following:

    d60abdfa-86bf-4eae-8012-c323075a0273-image.png

    However the next boot failed with the same PXE-E53.

    Furthermore When it did boot into the capture it failed the capture process. Are there verbose logs for capturing images anywhere?

    Thanks again for all the help.

    • Jon

  • Moderator

    @jweick Does that say 0 packets captured? That’s not normal. Both pxe booting computer and fog server are on the same subnet? (i.e. 192.168.2.0/24) I realize that your subnet may not be exactly as my example, but the FOG server should see something.

    We may need to define an interface to listen on if it picked the wrong one. Post the output of ip a



  • @george1421

    Yes the Ubuntu VM and Client are both on the .0 subnet.

    I believe I fixed all the issues with the IP as now when it boots it actually completes a registration.

    I ran the PCAP during a failed boot and it captured 0 packets.

    9fb5531e-322b-4cac-9ac9-fff676775da9-image.png

    I have gotten it to boot to fog enough times to complete registration and even issue a capture however that has produced a new error. When I attempt to capture this device it boots normally but eventually fails out with:

    An error has been detected!
    Init Version: 20200517
    e2fsck failed to check /dev/sda1 (shrinkPartition)

    Are there any capture logs that are created during the capture process that I could review for more information?

    Thanks,

    Jon


  • Moderator

    @jweick Ok with dnsmasq installed, is the target computer on the same subnet as the FOG server?

    Did you fix the issues with the IP address change? (Just making sure we are not fighting that issue with the pxe boot issue).

    If the fog server IS on the same subnet as the target computer then lets use the fog server to grab a pcap (packet capture) of the fail booting process. I have a tutorial here: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue

    Upload the pcap to a file share site and either post the link here or DM me using the forum chat the link and I’ll take a look at it.



  • Thank you so much! That was it for my second issue.

    After I updated to 2.76 I was able to get to a fog screen and started the host registration. However I ran into some issues because I stupidly changed the IP of the FOG server. Originally the VM created itself on a .234 subnet. Since its a virtual switch I knew I was going to have issues attaching to physical devices so I moved it into the current subnet and duplicated the physical adapter down instead of using the virtual one. That has worked incredibly well.

    All that too say thank you so much for the dnsmasq 2.76 suggestion however I am still getting sporadic PXE-E53 no boot filename received. Sporadic probably isn’t the correct word because 60-70% of the time it fails and the remained it will actually boot into FOG. Could this be related to the IP switch? Or am I dealing with just how things are if I don’t configure a DHCP server?

    Any information would be greatly appreciated! Thank you again for all your help.


  • Moderator

    The first problem is that dnsmasq 2.75 is broken, well not really broken- its just not right. Before 2.76 dnsmasq always added .0 onto the file names to comply with the old syslinux standard. This is a mess and a problem. I do have a tutorial on how to compile 2.76 on your fog server. It takes a little skill but will work. https://forums.fogproject.org/topic/8725/compiling-dnsmasq-2-76-if-you-need-uefi-support

    If you want to see if I’m full of BS or not I’ll tell you how to hack it. In the /tftpboot directory on the fog server copy the file undionly.kpxe to undionly.kpxe.0 and uefi.efi to uefi.efi.0 (note that uefi based systems still may not work with 2.75). But once you create the .0 files your target computer should work. Understand this is not a long term fix (compiling 2.76 or later is) but it will get you going for now.

    Its strange even after several years ubuntu (or anyone I could find) hasn’t released anything later than 2.75 for ubuntu 16.04…

    If you use my config file from the link you posted earlier with dnsmasq 2.76 or later it will work straight away.


Log in to reply
 

289
Online

7.4k
Users

14.5k
Topics

136.6k
Posts