PXE-E78 Cannot locate boot server


  • @george1421

    To experiment, I used the new Fog 1.3 install to contact that same PC with a hardware inventory request from Fog.

    I noticed that…

    1. The WLON did not happen. In the past, WLON worked.
    2. When I manually turned that PC, the same results about finding the Fog server and press F8. When I did, I got the same PXE-E78 error and in Fog the active task showed the hardware inventory as still in progress.

    I thought that this would be a way to attempt communication that did not involve TFTP boot.

    I created a pcap file using the same command you gave me before
    (sudo tcpdump -w output.pcap port 67 or port 68 or port 69 or port 4011)
    but I don’t know if the port filters on this are suitable to capture needed info for the WLON/hardware inventory tasks…

    This pcap file is at:
    http://s000.tinyupload.com/?file_id=61578123182931059079


  • @george1421

    On another box, I installed Ubuntu 16.04 LTS and Fog 1.3.0-RC-11.

    I shut down the dnsmasq on the 10.0.253.24 box and started dnsmasq on the new box with an ltsp.conf etc.

    I get the same behavior.

    The PC boots and finds the internal DHCP (171.xxxx) and gets to the new box whose IP is 10.0.253.23.

    It prompts me for F8 to boot from the network. Then I get:

    UD 10.0.253.23

    Which times out with the PXE-E78 error.

    I captured a new pcap for this in case this is helpful.
    This pcap is at:
    http://s000.tinyupload.com/?file_id=97921552308199994673


  • @george1421

    OK. The problem with dns and resolv.conf seems ok now. I am able to do apt-get updates and ping external places such as google.com. Oddly, the resolv.conf just shows the loopback:

    compteach@iepcomlabsrv:/etc$ cat resolv.conf
    # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
    #     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
    nameserver 127.0.0.1
    compteach@iepcomlabsrv:/etc$
    compteach@iepcomlabsrv:/etc$ ping google.com
    PING google.com (110.164.6.251) 56(84) bytes of data.
    64 bytes from mx-ll-110.164.6-251.static.3bb.co.th (110.164.6.251): icmp_seq=1 ttl=55 time=2.59 ms
    64 bytes from mx-ll-110.164.6-251.static.3bb.co.th (110.164.6.251): icmp_seq=2 ttl=55 time=2.58 ms
    64 bytes from mx-ll-110.164.6-251.static.3bb.co.th (110.164.6.251): icmp_seq=3 ttl=55 time=2.75 ms
    ^C
    --- google.com ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2003ms
    rtt min/avg/max/mdev = 2.588/2.644/2.751/0.095 ms
    

    I made the changes to the ltsp.conf
    a) comment out the port=0
    b) change the dhcp-boot
    c) change pxe-service

    How every there is no change in the behavior.

    I captured a new pcap and posted it here:
    http://s000.tinyupload.com/?file_id=29844646319354557181

    And for completeness, here is my ltsp.conf:

    compteach@iepcomlabsrv:/etc/dnsmasq.d$ cat ltsp.conf
    # Don't function as a DNS server:
    # MKS  06-Oct-2016
    #port=0
    
    # Log lots of extra information about DHCP transactions.
    log-dhcp
    
    # Dnsmasq can also function as a TFTP server. You may uninstall
    # tftpd-hpa if you like, and uncomment the next line:
    # enable-tftp
    
    # Set the root directory for files available via FTP.
    tftp-root=/tftpboot
    
    # The boot filename, Server name, Server Ip Address
    dhcp-boot=undionly.kpxe,10.0.253.24,10.0.253.24
    
    # rootpath option, for NFS
    #dhcp-option=17,/images
    
    # kill multicast
    #dhcp-option=vendor:PXEClient,6,2b
    
    # Disable re-use of the DHCP servername and filename fields as extra
    # option space. That's to avoid confusing some old or broken DHCP clients.
    dhcp-no-override
    
    # PXE menu.  The first part is the text displayed to the user.  The second is the timeout, in seconds.
    pxe-prompt="Press F8 for boot menu", 3
    
    # The known types are x86PC, PC98, IA64_EFI, Alpha, Arc_x86,
    # Intel_Lean_Client, IA32_EFI, BC_EFI, Xscale_EFI and X86-64_EFI
    # This option is first and will be the default if there is no input from the user.
    pxe-service=X86PC, "Boot from network", undionly, 10.0.253.24
    
    # A boot service type of 0 is special, and will abort the
    # net boot procedure and continue booting from local media.
    #pxe-service=X86PC, "Boot from local hard disk", 0
    
    # If an integer boot service type, rather than a basename is given, then the
    # PXE client will search for a suitable boot service for that type on the
    # network. This search may be done by multicast or broadcast, or direct to a
    # server if its IP address is provided.
    # pxe-service=x86PC, "Install windows from RIS server", 1
    
    # This range(s) is for the public interface, where dnsmasq functions
    # as a proxy DHCP server providing boot information but no IP leases.
    # Any ip in the subnet will do, so you may just put your server NIC ip here.
    # Since dnsmasq is not providing true DHCP services, you do not want it
    # handing out IP addresses.  Just put your servers IP address for the interface
    # that is connected to the network on which the FOG clients exist.
    # If this setting is incorrect, the dnsmasq may not start, rendering
    # your proxyDHCP ineffective.
    dhcp-range=10.0.253.24,proxy
    
    # This range(s) is for the private network on 2-NIC servers,
    # where dnsmasq functions as a normal DHCP server, providing IP leases.
    # dhcp-range=192.168.0.20,192.168.0.250,8h
    
    # For static client IPs, and only for the private subnets,
    # you may put entries like this:
    # dhcp-host=00:20:e0:3b:13:af,10.160.31.111,client111,infinite
    #dhcp-host=f8:0f:41:a0:04:75,net:allow
    #dhcp-ignore=#allow
    
    
    
    

  • @george1421 More accurately, commenting the port=0 allows the DNSMasq portion be transfer the originating DNS information to the new host. Leaving port=0 enabled essentially turns off DNS information. If you’re planning to leave port=0 enabled, then you’ll likely need to change the next-server to point at an IP address rather than a hostname.

  • Moderator

    @Tom-Elliott said in PXE-E78 Cannot locate boot server:

    @mkstreet Comment the port=0 line of your ltsp.conf file and restart dnsmasq.

    I talked with Tom over IM and he said the port=0 command makes the dnsmasq server become a DNS server and does exactly what we are seeing with the resolve.conf file. While this has no impact on the next host being sent it should resolve the FOG server name resolving.


  • @mkstreet Comment the port=0 line of your ltsp.conf file and restart dnsmasq.

  • Moderator

    @Sebastian-Roth I think I like your first suggestion, updating the config file with the additional IP references:

    dhcp-boot=undionly.kpxe, 10.0.253.24, 10.0.253.24
    pxe-service=X86PC, "Boot from network", undionly, 10.0.253.24
    

    If that fails, get another pcap file of the booting process to let us see what changed in the conversation.

    The second though is that Wow, for ubuntu dnsmasq of 2.68 that was released 08-Dec-2013, where most of the distros are at 2.72. If this doesn’t work I can setup a ubuntu VM and compile the latest version of dnsmasq to see if that helps. But before I go through that effort lets see if your edits work.

  • Moderator

    I just had a look at the dnsmasq code (version 2.68-1ubuntu0.1 used in Ubuntu 14.04) and found that from the initially posted log output it seems like next-server (mess->siaddr.s_addr in the code) is actuelly not being set. Now I know what’s going on I think. If I remember correctly dnsmasq in proxy mode does not have to send the next-server information in the first DHCP answer (reply to the first DHCP discovery request). The client knows that there is a DHCP proxy server as it got a first quick message (only containing the filename) and should contact that server (port 4011) after finishing the normal DHCP handshake to setup an IP.

    In your case the next-server information sent by 10.0.253.1 is most probably interfering and confusing the client. I guess I need to think a little more about this to find a good solution… Maybe George has an idea.

  • Moderator

    @mkstreet I understand that it is hard or maybe impossible to change the config of that 10.0.253.1 server. As you said dnsmasq can be used for exactly this purpose. So let’s give it another go. I’d say dnsmasq is answering faster as the other server as it is located right within your subnet. The PCAP output kind of proofs this. 10.0.253.24 answered 0.5 seconds before 10.0.253.1 did. So that’s good!
    Then we only need to offer the correct PXE information to the client in one single DHCP answer. This is next-server and filename. Please modify the following line in your config and add the server IP as shown:

    dhcp-boot=undionly.kpxe, 10.0.253.24, 10.0.253.24
    pxe-service=X86PC, "Boot from network", undionly, 10.0.253.24
    

    The later one shouldn’t be used but setting this correctly doesn’t hurt I’d say. Please take another PCAP capture to see if the next-server info is now being sent by dnsmasq.

    [edit] I just saw that the information in the wiki page does not set those adresses. I haven’t played with dnsmasq in a while so this is just a quick idea. It’s kind of strange that you get an answer from dnsmasq that does not have next-server set… [/edit]


  • @george1421

    Hmmm… If I understand this correctly, then I cannot disable dhcp relay within 10.0.253.1 as other hardware in this subnet but outside my lab would still need dhcp service for other purposes.
    And, as you say, this path is getting messy.

    As for changing the main dhcp option 66, I could try to request this. Would this affect only my subnet or our whole facility? My lab is about 90% of my subnet, but the main dhcp is servicing the whole campus which is comprised of several subnets… If the option 66 will affect others outside my area, then it is hard for me to do.

    I am setting up the new version of FOG etc under VirtualBox. I think I will complete that, as it is a new clean install. I will see if this clean start resolves anything, as opposed to this – attempting to add DNSMASQ to an existing setup that (was) working.

  • Moderator

    @mkstreet said in PXE-E78 Cannot locate boot server:

    @george1421

    When I disconnect that cable to the outside, then the PC I want to load can no longer find the DHCP. So something external to the LAB must be helping get the initial requests to 10.0.253.24 ?

    This I understand. Since dnsmasq is not providing dhcp services for you, its only providing dhcpProxy services (filling in the gaps left out by your main dhcp server). What is strange is that your main dhcp server is sending out itself as the next server. I simply can’t understand why its not working here. It SHOULD BE WORKING.

    TBH right now I’m at a loss, on where to turn next everything should be working. If you can disable the dhcp relay in your router (10.0.253.1) for 10.0.253.0 subnet then you can have the fog server with isc dhcp enabled supply the IP address (then also dnsmasq is not needed) or on your router (10.0.253.1) add yor dnsmasq server as the last dhcp server in its list. But this is starting to get messy.

    The only other thing is to see if you can get your main dhcp server to NOT send out dhcp option 66 {next-server}. But its not clear if this will fix the issue either.


  • @george1421

    compteach@iepcomlabsrv:~$ netstat -an|grep 4011
    udp        0      0 0.0.0.0:4011            0.0.0.0:*
    compteach@iepcomlabsrv:~$
    
    

  • @george1421

    I did some experimenting and noticed this anomaly.

    The Fog server and the host I want to load are on two switches which are connected together.
    When I attach the Ethernet cable to the outside LAN / Internet to one of those switches, then I get the behavior as noted. Meaning the DHCP gets answered but when I press F8 to network boot, I get the PXE-E78 error.

    When I disconnect that cable to the outside, then the PC I want to load can no longer find the DHCP. So something external to the LAB must be helping get the initial requests to 10.0.253.24 ?

  • Moderator

    @george1421 I’m seeing the same thing as Sebastian saw. Let me check a pcap file I captured the other day.

    This is very strange indeed. Our configurations are basically the same, except I’m running the latest version of dnsmasq to test uefi booting. In my pcap dnsmasq is sending out its IP addresss in the dhcp offer for next server, but in yours its not sending out the next server address at all. This is the only difference. Plus after the ACK from my main dhcp server (soho router) the target computer connects to the dhcpProxy port and then download the boot file.

  • Moderator

    @george1421 Can you confirm that the dhcpProxy service is running on port 4011.

    netstat -an|grep 4011

    You should see an output like:

    udp        0      0 0.0.0.0:4011            0.0.0.0:*   
    
  • Moderator

    @mkstreet Ok then iptables is disabled. I’m looking over your previous pcap file now.

    This should be working!!

  • Moderator

    @mkstreet Those lines in the dnsmasq.conf are safe one.

    For iptables, if you key in iptables -L you should either get a response back like I don’t know what you are talking about or it should show you 3 filters with accept. If your iptables output looks like this one you have iptables turned on. The content isn’t important just if you see rules in there other than allow you need to disable iptables.

    hain INPUT (policy DROP)
    target     prot opt source               destination         
    ufw-before-logging-input  all  --  anywhere             anywhere            
    ufw-before-input  all  --  anywhere             anywhere            
    ufw-after-input  all  --  anywhere             anywhere            
    ufw-after-logging-input  all  --  anywhere             anywhere            
    ufw-reject-input  all  --  anywhere             anywhere            
    ufw-track-input  all  --  anywhere             anywhere            
    
    Chain FORWARD (policy DROP)
    target     prot opt source               destination         
    ufw-before-logging-forward  all  --  anywhere             anywhere            
    ufw-before-forward  all  --  anywhere             anywhere            
    ufw-after-forward  all  --  anywhere             anywhere            
    ufw-after-logging-forward  all  --  anywhere             anywhere            
    ufw-reject-forward  all  --  anywhere             anywhere            
    ufw-track-forward  all  --  anywhere             anywhere            
    
    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination         
    ufw-before-logging-output  all  --  anywhere             anywhere            
    ufw-before-output  all  --  anywhere             anywhere            
    ufw-after-output  all  --  anywhere             anywhere            
    ufw-after-logging-output  all  --  anywhere             anywhere            
    
    

  • I found this…

    compteach@iepcomlabsrv:/etc$ sudo iptables -L
    Chain INPUT (policy ACCEPT)
    target     prot opt source               destination
    
    Chain FORWARD (policy ACCEPT)
    target     prot opt source               destination
    
    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination
    

  • @george1421

    For the /etc/dnsmasq.conf, the last lines are NOT commented out as shown here:

    # For debugging purposes, log each DNS query as it passes through
    # dnsmasq.
    #log-queries
    # MKS 04-Oct-2016
    log-queries
    
    # Log lots of extra information about DHCP transactions.
    #log-dhcp
    # MKS 04-Oct-2016
    log-dhcp
    
    # Include another lot of configuration options.
    #conf-file=/etc/dnsmasq.more.conf
    #conf-dir=/etc/dnsmasq.d
    # MKS 04-Oct-2016
    conf-dir=/etc/dnsmasq.d
    
    

    I am not familiar with “iptables firewall” – where should I look?

  • Moderator

    @george1421 Just to be clear every line in /etc/dnsmasq.conf is commented out?

    And actually dnsmasq is suppose to rewrite the file. I takes what was in there and caches it internally and then puts the loopback address in to point to itself.

    Also you disabled the iptables firewall on this server right?

295
Online

8.9k
Users

15.6k
Topics

145.0k
Posts