PXE-E78 Cannot locate boot server



  • We recently changed out network setup at school.

    This has lead me to need to run DNSMASQ on the server with Fog.

    As I cannot make changes to the DNS etc, I followed the instructions for
    "Using FOG with an unmodifiable DHCP server"
    https://wiki.fogproject.org/wiki/index.php?title=Using_FOG_with_an_unmodifiable_DHCP_server/_Using_FOG_with_no_DHCP_server

    I followed the instructions / configuration under “DNSMASQ settings for iPXE”

    The section for Ubuntu specific “Additional Steps for 12.04.4, 12.04.5, 14.04, 14.10”, I did not do because it talked about the Network-Manager which does not seem to be installed on this Ubuntu box and these instructions reference a configuration file for Network-Manager which doesn’t exist either.

    When I boot a host PC, it locates the DHCP and I get the prompt to boot from the network from the DNSMASQ configuration file: pxe-prompt=“Press F8 for boot menu”, 3

    After I select this, the screen shows: UD 10.0.253.24
    (that’s the IP Address of the Ubuntu server where Fog etc are).

    It appears to wait / search for something for about 5 seconds. Then gives up and gives the
    PXE-E78 Cannot locate boot server

    and continues booting via the local HDD.

    The Ubuntu syslog shows this but no mention of TFTP:

    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 available DHCP subnet: 10.0.253.24/255.255.255.0
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 vendor class: PXEClient:Arch:00000:UNDI:002001
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 PXE(em1) f8:0f:41:a0:04:75 proxy
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 tags: allow, known, em1
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 bootfile name: undionly.kpxe
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 broadcast response
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 sent size: 1 option: 53 message-type 2
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 sent size: 4 option: 54 server-identifier 10.0.253.24
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 sent size: 9 option: 60 vendor-class 50:58:45:43:6c:69:65:6e:74
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 sent size: 17 option: 97 client-machine-id 00:ac:b2:0d:04:25:4e:e3:11:b0:cf:9a:87:4c…
    Oct 4 08:39:24 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 sent size: 60 option: 43 vendor-encap 06:01:03:08:07:80:00:01:0a:00:fd:18:09:14…
    Oct 4 08:39:28 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 available DHCP subnet: 10.0.253.24/255.255.255.0
    Oct 4 08:39:28 iepcomlabsrv dnsmasq-dhcp[1308]: 1117783157 vendor class: PXEClient:Arch:00000:UNDI:002001

    I have checked that TFTPD-HPA is up and running. If it has a log file separate from syslog, I don’t know where that is.

    I don’t know what to look at next to diagnose this further.


    Ubuntu 14.04 LTS
    Fog version 3121 (I plan to upgrade to the latest FOG but want to try to change as few things as possible at once)



  • @Sebastian-Roth I will try this in about ten days when we are back in session. Thanks!!



  • @george1421 I have forwarded the first wish to the network area. In am waiting a response.


  • Developer

    @mkstreet said:

    Then, I am guessing, my Fog Server will need to have isc-dhcp running (instead of dnsmasq)…?

    Yes! And that just sparked off an idea. Assuming that the “other” DHCP server is further away you could try running your very own ISC-DHCP server within your 10.0.253.0/16 subnet which is answering faster than the other one does anyway. This way you would force the clients to use all the DHCP information sent by your server before they get an answer from the outside one. Not very nice and I am sure it will cause trouble at some point. But it’s worth a try to see if it would work for you.

    Check all your settings in /opt/fog/.fogsettings (wiki help here) and turn on DHCP, re-run the installer and you should be up and running with a working ISC-DHCP config in just a few minutes.


  • Moderator

    @mkstreet If I had to pick one thing that your current network does that I would wish to stop is this.

    Your primary dhcp server (172.16.1.1) to not send the dhcp option 66 {next-server} with a value of 172.16.1.1 to your subnet (10.0.253.0/16). Understand I’m not saying this is your entire problem and there may be something else underlying issue. The second part (maybe wish), that with the next server being not sent at 172.16.1.1 but have the next server value being sent as the IP address of your FOG server. If that happens then dnsmasq is not needed.



  • @george1421

    OK. Two questions.

    1. Instead of using DNSMASQ, if I attempt to have the network changed (i.e. 172.16.1.1 and/or 10.0.253.1), what exactly needs to be changed? I will need to explain what I want in almost step by step fashion to non-native English speakers who control the network.

    2. Is there an alternative where I disconnect from the network just while I am loading the lab? Then, I am guessing, my Fog Server will need to have isc-dhcp running (instead of dnsmasq)…?


  • Moderator

    @Sebastian-Roth That in a way is what is already happending in my network. From some silly reason my internet router is sending the next server as it self (as seen in the pcap in my dnsmasq uefi thread). But in this situation the pxe booting client understands this, complains about getting two next server replies and does the right thing anyway. That is why I suggested trying a different client to test this, but his second client did the same things.

    @mkstreet I think we are at the point of diminishing returns. I don’t think we (from the FOG Project) are going to be able to help you solve this remotely. You NEED to have a network engineer involved with this to get the problem esolved. There is something going on in your environment that needs expert hands and tools for your network. The components are setup correctly for FOG there is some external force that is keeping this from working correctly.


  • Developer

    @george1421 said:

    What I don’t understand is why does it not continue and contact the proxyDHCP port? You have already confirmed that the iptables or firewalld firewall is not blocking port access. We have done a side by side comparison of your dnsmasq config file to mine.

    Well, I thought I had explained that part. From what I understand the issue is that the normal DHCP server (10.0.253.1) sends the next-server option pointing to 172.16.1.1 (see in the PCAP file). This is most probably confusing the client. Would you mind trying this out, George? Add next-server option (pointing to a totally unrelated server) to your DHCP server and see what it does to your setup. Could also be that the clients still boot properly in your case as it all depends on how the client interpret the information.



  • @george1421

    1. What device is your pxe boot client?
      ** Inside my lab, we have about 36 Lenovo ThinkPad model M72z
      We have been using and loading these with Fog for about 2 years up until recently.

    2. Do you have a different brand/model of device you could use?
      ** Yes. I have “regular” desktop PC’s which are outside the lab.
      I have just tried to network boot one of those. The case says its ACER Veriton.
      I see that it connects to 10.0.253.23 and then gets the same PXE-E78 error as the boxes inside the lab. I have attached a pcap of this here:
      http://s000.tinyupload.com/?file_id=48129942273982229370

    3. You did mention that you moved this FOG server from an isolated network to this new network. What device was supplying the DHCP services for that isolated network?
      ** In brief, my department (LAB and offices) had our own Internet connection that was separate from the rest of the campus. Although this seems very odd, the ISP (which was essentially the state enterprise telecom entity) provided DHCP services from their central office about 2 kms away. I know it’s strange but that is what it was. This was the network configuration when we initially began using Fog in about March 2014. Once we had Fog initially setup and had gotten some experience with Fog, we had few/none network issues. We knew even less about that network/DHCP setup. From March 2014 to about August 2015, we very actively used Fog and loaded our entire lab at least ten times, and had separate images for our (non-Lab) office desktop PC’s that we managed with Fog too. From August 2015 until now, we have not used Fog much as we didn’t need to update things. About a month ago, my department gave up our separate connection to the state telecom provider, and joined the campus0wide LAN/network that the rest of the campus had just implemented (a different provider too than the state telecom we had been using). Now, we want to update and reload our lab. This was when we discovered that connecting to the Fog server and loading boxes, which we had reached the point of just-turn-on-and-take-for-granted, no longer could find the Fog server. This is what led to the path of attempting dnsmasq and voila here we are today.


  • Moderator

    @mkstreet said in PXE-E78 Cannot locate boot server:

    10.0.253.23

    I’m not seeing anything different from the 10.0.253.23 system. The conversation stops after the ACK from the 172.1.1.1 dhcp server.



  • @Sebastian-Roth

    (Reply Part 2)

    I’ve disabled 10.0.253.24 (Ubu 14.04, Fog 3121) and started up 10.0.253.23 (Ubu 16.04, Fog 1.3)

    You are right that this box, which I just setup yesterday, hadn’t been made static. I’ve changed that now. After making the changes to the intefaces file, I rebooted the machine.

    Here is the interfaces file, before and after:

    before

    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ cat orig_interfaces 
    # interfaces(5) file used by ifup(8) and ifdown(8)
    auto lo
    iface lo inet loopback
    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ 
    

    after

    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ cat interfaces
    # interfaces(5) file used by ifup(8) and ifdown(8)
    auto lo
    iface lo inet loopback
    
    auto enp1s0
    iface enp1s0 inet static
    address 10.0.253.23
    netmask 255.255.255.0
    gateway 10.0.253.1
    dns-nameservers 110.164.252.222 8.8.8.8
    
    
    

    ifconfig output:

    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ ifconfig
    enp1s0    Link encap:Ethernet  HWaddr f8:0f:41:a0:04:59  
              inet addr:10.0.253.23  Bcast:10.0.253.255  Mask:255.255.255.0
              inet6 addr: fe80::fa0f:41ff:fea0:459/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:5100 errors:0 dropped:8 overruns:0 frame:0
              TX packets:2983 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000 
              RX bytes:5067659 (5.0 MB)  TX bytes:433865 (433.8 KB)
    
    lo        Link encap:Local Loopback  
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:760 errors:0 dropped:0 overruns:0 frame:0
              TX packets:760 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1 
              RX bytes:117502 (117.5 KB)  TX bytes:117502 (117.5 KB)
    
    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ 
    

    status for isc-dhcp

    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ sudo service isc-dhcp-server status
    ● isc-dhcp-server.service
       Loaded: not-found (Reason: No such file or directory)
       Active: inactive (dead)
    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ 
    

    I’ve just tried to boot the host box and got the same PXE-E78 error.
    Here is a pcap from that:
    http://s000.tinyupload.com/?file_id=00695804426205933233

    I’ve commented out the dchp-no-override in the ltsp.conf and restarted dnsmasq.
    I get the PXE-E78 error.
    Here is the pcap for this:
    http://s000.tinyupload.com/?file_id=03577227175846412978

    Here is my ltsp.conf

    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ cat ltsp.conf
    # Don't function as a DNS server:
    # port=0
    
    # Log lots of extra information about DHCP transactions.
    log-dhcp
    
    # Dnsmasq can also function as a TFTP server. You may uninstall
    # tftpd-hpa if you like, and uncomment the next line:
    # enable-tftp
    
    # Set the root directory for files available via FTP.
    tftp-root=/tftpboot
    
    # The boot filename, Server name, Server Ip Address
    dhcp-boot=undionly.kpxe,10.0.253.23,10.0.253.23
    
    # rootpath option, for NFS
    #dhcp-option=17,/images
    
    # kill multicast
    #dhcp-option=vendor:PXEClient,6,2b
    
    # Disable re-use of the DHCP servername and filename fields as extra
    # option space. That's to avoid confusing some old or broken DHCP clients.
    #
    # MKS 07-Oct-2016
    #dhcp-no-override
    
    # PXE menu.  The first part is the text displayed to the user.  The second is the timeout, in seconds.
    pxe-prompt="Press F8 for boot menu", 3
    
    # The known types are x86PC, PC98, IA64_EFI, Alpha, Arc_x86,
    # Intel_Lean_Client, IA32_EFI, BC_EFI, Xscale_EFI and X86-64_EFI
    # This option is first and will be the default if there is no input from the user.
    pxe-service=X86PC, "Boot from network", undionly, 10.0.253.23
    
    # A boot service type of 0 is special, and will abort the
    # net boot procedure and continue booting from local media.
    #pxe-service=X86PC, "Boot from local hard disk", 0
    
    # If an integer boot service type, rather than a basename is given, then the
    # PXE client will search for a suitable boot service for that type on the
    # network. This search may be done by multicast or broadcast, or direct to a
    # server if its IP address is provided.
    # pxe-service=x86PC, "Install windows from RIS server", 1
    
    # This range(s) is for the public interface, where dnsmasq functions
    # as a proxy DHCP server providing boot information but no IP leases.
    # Any ip in the subnet will do, so you may just put your server NIC ip here.
    # Since dnsmasq is not providing true DHCP services, you do not want it
    # handing out IP addresses.  Just put your servers IP address for the interface
    # that is connected to the network on which the FOG clients exist.
    # If this setting is incorrect, the dnsmasq may not start, rendering
    # your proxyDHCP ineffective.
    dhcp-range=10.0.253.23,proxy
    
    # This range(s) is for the private network on 2-NIC servers,
    # where dnsmasq functions as a normal DHCP server, providing IP leases.
    # dhcp-range=192.168.0.20,192.168.0.250,8h
    
    # For static client IPs, and only for the private subnets,
    # you may put entries like this:
    # dhcp-host=00:20:e0:3b:13:af,10.160.31.111,client111,infinite
    
    

    On this box, I did try to check iptables and got this:

    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ iptables -L
    modprobe: ERROR: could not insert 'ip_tables': Operation not permitted
    iptables v1.6.0: can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
    Perhaps iptables or your kernel needs to be upgraded.
    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ 
    
    

    And for netstat:

    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ netstat -an | grep 4011
    udp        0      0 0.0.0.0:4011            0.0.0.0:*                          
    ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ 
    

  • Moderator

    What I don’t understand is why does it not continue and contact the proxyDHCP port? You have already confirmed that the iptables or firewalld firewall is not blocking port access. We have done a side by side comparison of your dnsmasq config file to mine.

    The only variable we haven’t ruled out is a bad/fault pxe client (firmware/bios). What device is your pxe boot client? Is the firmware up to date on it? Do you have a different brand/model of device you could use? Assuming your FOG server is setup correctly now we have to start looking outside of the FOG server for the issue.

    You did mention that you moved this FOG server from an isolated network to this new network. What device was supplying the DHCP services for that isolated network?


  • Moderator

    This is your pxe boot process on the top and one from my FOG-Pi server. The only difference here is my client is asking for a uefi file and your is asking for the bios (legacy file). You will see again in yours the transactions stop at line 5 In the bottom image you will see once the client gets the ACK back from the dhcp server it turns around again and connects to the dhcpProxy port on port #4011 and downloads the file name to get, then the client ( at .16) requests the file via tftp from the FOG server. This is what suppose to happen.

    0_1475800286160_PictCompare.png


  • Moderator

    I’ll restate again, it should be working.

    I want you to learn what I’m seeing (not that I really know what I’m looking at). If you install wireshark on your computer you can review these pcap files.

    Below is the communication that is going on as viewed by your FOG server. In line 1 you see the client send a discover packet (basically hello I’m here I need network info). Then you see in step 2, two devices reply with an offer (here’s your network info). In step 3 you see the client again say “great this is a list of additional stuff I need”. In step 4 your main dhcp server says “ok here is the additional stuff you requested”, note here the dnsmasq box did not reply because it couldn’t add anything to what I already sent. Now here is where the process falls down. When the client gets the ACK back what it should do is contact the dhcpProxy on port 4011 and request the file name to download then reach out to the tftp server (listed in the next server field) and download the boot loader file (undionly.kpxe). That is what is suppose to happen. Now let me show you a side by side of what a proper exchange should look like.

    0_1475799856802_pcap_05-10-16a.png



  • @Sebastian-Roth

    Yesterday, I did send a pcap from 10.0.253.24 (Ubu 14.04/Fog 3121).
    This is my original Fog setup. It is setup with static IP.
    Here is the link to yesterdays pcap from that box:
    http://s000.tinyupload.com/?file_id=29844646319354557181

    In addition, I created another pcap today for Tom Elliott from that box . Here is the link to that pcap:
    http://s000.tinyupload.com/?file_id=06163152893454480790

    I will now go check the settings on 10.0.253.23 (Ubu 16.xx / Fog 1.3 release candidate 11).
    I suspect you are correct that this one is not static, but I will check that now and post information about that box (and pcap).



  • @Tom-Elliott

    RE: WOL.
    Yes, I meant Wake On LAN by WLON.
    This used to work but doesn’t work now.
    Yesterday, I offered this information and the pcap as additional information that might help uncover something.

    RE: dchp-no-override
    For DHCP-NO-OVERRIDE, I commented this out in ltsp.conf and restarted dnsmasq … on 10.0.253.24 (Ubuntu 14.04, Fog 3121) and it had no effect.

    I captured a pcap of this:
    http://s000.tinyupload.com/?file_id=06163152893454480790

    Should I put this ltsp.conf option back or leave commented out?



  • @george1421

    It’s a good question about isc-dhcp. I haven’t ever been using it as far as I know.
    But I checked this morning after booting the lab

    For 10.0.253.24 (Ubun 14.04, Fog 3121), I get the following:

    compteach@iepcomlabsrv:~$ sudo service isc-dhcp-server status
    isc-dhcp-server stop/waiting
    compteach@iepcomlabsrv:~$
    compteach@iepcomlabsrv:~$
    compteach@iepcomlabsrv:~$ sudo service isc-dhcp-server stop
    stop: Unknown instance:
    compteach@iepcomlabsrv:~$
    compteach@iepcomlabsrv:~$ sudo service isc-dhcp-server status
    isc-dhcp-server stop/waiting
    compteach@iepcomlabsrv:~$
    
    

  • Moderator

    @Sebastian-Roth said in PXE-E78 Cannot locate boot server:

    @mkstreet you seem to still have the FOG server configured via DHCP!

    I was just thinking about this on the drive in this morning. This fog server was on an isolated network so it was the dhcp server then. I was wondering if the OP remember to stop the isc dhcp server when he setup dnsmasq?? This might cause this exact issue since dnsmasq would not be able to bind to the udp ports since they are already in use. But I also considered that dnsmasq should complain about not being able to bind to the ports either so I kind of pushed that idea to the back on possible causes. As I stated before this configuration should be working. Dnsmasq is not that hard to setup.


  • Developer

    @mkstreet Beside the issues we were talking about already you seem to still have the FOG server configured via DHCP! This time I see dnsmasq answers from 10.0.253.23 in the PCAP files. This would cause problems even if every thing else is fine. Make sure you setup your FOG server to have a static IP!!


  • Senior Developer

    @mkstreet What is WLON? I’m imagining it’s (Wake on lan?), this will only work at a “Layer 2” level. To prove, if you have a system on the same switch as the fog server, and try to WOL to it, it should turn on (unless it’s one of the systems like Apple that only allows WOL to work if the machine is sleeping – not powered off).

    What if you commented “dhcp-no-override”? If I’m understanding this particular option – Per the man page:

    –dhcp-no-override
    (IPv4 only) Disable re-use of the DHCP servername and filename fields as extra option space. If it can, dnsmasq moves the boot server and filename information (from dhcp-boot) out of their dedicated fields into DHCP options. This make extra space available in the DHCP packet for options but can, rarely, confuse old or broken clients. This flag forces “simple and safe” behaviour to avoid problems in such a case.

    If I’m to understand this particular item, it prevents the configuration (in proxy mode?) from overriding the information that’s sent in the “main” packet.


Log in to reply
 

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.