PXE-E78 Cannot locate boot server
-
@Tom-Elliott said in PXE-E78 Cannot locate boot server:
@mkstreet Comment the port=0 line of your ltsp.conf file and restart dnsmasq.
I talked with Tom over IM and he said the port=0 command makes the dnsmasq server become a DNS server and does exactly what we are seeing with the resolve.conf file. While this has no impact on the next host being sent it should resolve the FOG server name resolving.
-
@george1421 More accurately, commenting the port=0 allows the DNSMasq portion be transfer the originating DNS information to the new host. Leaving port=0 enabled essentially turns off DNS information. If you’re planning to leave port=0 enabled, then you’ll likely need to change the next-server to point at an IP address rather than a hostname.
-
OK. The problem with dns and resolv.conf seems ok now. I am able to do apt-get updates and ping external places such as google.com. Oddly, the resolv.conf just shows the loopback:
compteach@iepcomlabsrv:/etc$ cat resolv.conf # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN nameserver 127.0.0.1 compteach@iepcomlabsrv:/etc$ compteach@iepcomlabsrv:/etc$ ping google.com PING google.com (110.164.6.251) 56(84) bytes of data. 64 bytes from mx-ll-110.164.6-251.static.3bb.co.th (110.164.6.251): icmp_seq=1 ttl=55 time=2.59 ms 64 bytes from mx-ll-110.164.6-251.static.3bb.co.th (110.164.6.251): icmp_seq=2 ttl=55 time=2.58 ms 64 bytes from mx-ll-110.164.6-251.static.3bb.co.th (110.164.6.251): icmp_seq=3 ttl=55 time=2.75 ms ^C --- google.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2003ms rtt min/avg/max/mdev = 2.588/2.644/2.751/0.095 ms
I made the changes to the ltsp.conf
a) comment out the port=0
b) change the dhcp-boot
c) change pxe-serviceHow every there is no change in the behavior.
I captured a new pcap and posted it here:
http://s000.tinyupload.com/?file_id=29844646319354557181And for completeness, here is my ltsp.conf:
compteach@iepcomlabsrv:/etc/dnsmasq.d$ cat ltsp.conf # Don't function as a DNS server: # MKS 06-Oct-2016 #port=0 # Log lots of extra information about DHCP transactions. log-dhcp # Dnsmasq can also function as a TFTP server. You may uninstall # tftpd-hpa if you like, and uncomment the next line: # enable-tftp # Set the root directory for files available via FTP. tftp-root=/tftpboot # The boot filename, Server name, Server Ip Address dhcp-boot=undionly.kpxe,10.0.253.24,10.0.253.24 # rootpath option, for NFS #dhcp-option=17,/images # kill multicast #dhcp-option=vendor:PXEClient,6,2b # Disable re-use of the DHCP servername and filename fields as extra # option space. That's to avoid confusing some old or broken DHCP clients. dhcp-no-override # PXE menu. The first part is the text displayed to the user. The second is the timeout, in seconds. pxe-prompt="Press F8 for boot menu", 3 # The known types are x86PC, PC98, IA64_EFI, Alpha, Arc_x86, # Intel_Lean_Client, IA32_EFI, BC_EFI, Xscale_EFI and X86-64_EFI # This option is first and will be the default if there is no input from the user. pxe-service=X86PC, "Boot from network", undionly, 10.0.253.24 # A boot service type of 0 is special, and will abort the # net boot procedure and continue booting from local media. #pxe-service=X86PC, "Boot from local hard disk", 0 # If an integer boot service type, rather than a basename is given, then the # PXE client will search for a suitable boot service for that type on the # network. This search may be done by multicast or broadcast, or direct to a # server if its IP address is provided. # pxe-service=x86PC, "Install windows from RIS server", 1 # This range(s) is for the public interface, where dnsmasq functions # as a proxy DHCP server providing boot information but no IP leases. # Any ip in the subnet will do, so you may just put your server NIC ip here. # Since dnsmasq is not providing true DHCP services, you do not want it # handing out IP addresses. Just put your servers IP address for the interface # that is connected to the network on which the FOG clients exist. # If this setting is incorrect, the dnsmasq may not start, rendering # your proxyDHCP ineffective. dhcp-range=10.0.253.24,proxy # This range(s) is for the private network on 2-NIC servers, # where dnsmasq functions as a normal DHCP server, providing IP leases. # dhcp-range=192.168.0.20,192.168.0.250,8h # For static client IPs, and only for the private subnets, # you may put entries like this: # dhcp-host=00:20:e0:3b:13:af,10.160.31.111,client111,infinite #dhcp-host=f8:0f:41:a0:04:75,net:allow #dhcp-ignore=#allow
-
On another box, I installed Ubuntu 16.04 LTS and Fog 1.3.0-RC-11.
I shut down the dnsmasq on the 10.0.253.24 box and started dnsmasq on the new box with an ltsp.conf etc.
I get the same behavior.
The PC boots and finds the internal DHCP (171.xxxx) and gets to the new box whose IP is 10.0.253.23.
It prompts me for F8 to boot from the network. Then I get:
UD 10.0.253.23
Which times out with the PXE-E78 error.
I captured a new pcap for this in case this is helpful.
This pcap is at:
http://s000.tinyupload.com/?file_id=97921552308199994673 -
To experiment, I used the new Fog 1.3 install to contact that same PC with a hardware inventory request from Fog.
I noticed that…
- The WLON did not happen. In the past, WLON worked.
- When I manually turned that PC, the same results about finding the Fog server and press F8. When I did, I got the same PXE-E78 error and in Fog the active task showed the hardware inventory as still in progress.
I thought that this would be a way to attempt communication that did not involve TFTP boot.
I created a pcap file using the same command you gave me before
(sudo tcpdump -w output.pcap port 67 or port 68 or port 69 or port 4011)
but I don’t know if the port filters on this are suitable to capture needed info for the WLON/hardware inventory tasks…This pcap file is at:
http://s000.tinyupload.com/?file_id=61578123182931059079 -
@mkstreet What is WLON? I’m imagining it’s (Wake on lan?), this will only work at a “Layer 2” level. To prove, if you have a system on the same switch as the fog server, and try to WOL to it, it should turn on (unless it’s one of the systems like Apple that only allows WOL to work if the machine is sleeping – not powered off).
What if you commented “dhcp-no-override”? If I’m understanding this particular option – Per the man page:
–dhcp-no-override
(IPv4 only) Disable re-use of the DHCP servername and filename fields as extra option space. If it can, dnsmasq moves the boot server and filename information (from dhcp-boot) out of their dedicated fields into DHCP options. This make extra space available in the DHCP packet for options but can, rarely, confuse old or broken clients. This flag forces “simple and safe” behaviour to avoid problems in such a case.If I’m to understand this particular item, it prevents the configuration (in proxy mode?) from overriding the information that’s sent in the “main” packet.
-
@mkstreet Beside the issues we were talking about already you seem to still have the FOG server configured via DHCP! This time I see dnsmasq answers from 10.0.253.23 in the PCAP files. This would cause problems even if every thing else is fine. Make sure you setup your FOG server to have a static IP!!
-
@Sebastian-Roth said in PXE-E78 Cannot locate boot server:
@mkstreet you seem to still have the FOG server configured via DHCP!
I was just thinking about this on the drive in this morning. This fog server was on an isolated network so it was the dhcp server then. I was wondering if the OP remember to stop the isc dhcp server when he setup dnsmasq?? This might cause this exact issue since dnsmasq would not be able to bind to the udp ports since they are already in use. But I also considered that dnsmasq should complain about not being able to bind to the ports either so I kind of pushed that idea to the back on possible causes. As I stated before this configuration should be working. Dnsmasq is not that hard to setup.
-
It’s a good question about isc-dhcp. I haven’t ever been using it as far as I know.
But I checked this morning after booting the labFor 10.0.253.24 (Ubun 14.04, Fog 3121), I get the following:
compteach@iepcomlabsrv:~$ sudo service isc-dhcp-server status isc-dhcp-server stop/waiting compteach@iepcomlabsrv:~$ compteach@iepcomlabsrv:~$ compteach@iepcomlabsrv:~$ sudo service isc-dhcp-server stop stop: Unknown instance: compteach@iepcomlabsrv:~$ compteach@iepcomlabsrv:~$ sudo service isc-dhcp-server status isc-dhcp-server stop/waiting compteach@iepcomlabsrv:~$
-
RE: WOL.
Yes, I meant Wake On LAN by WLON.
This used to work but doesn’t work now.
Yesterday, I offered this information and the pcap as additional information that might help uncover something.RE: dchp-no-override
For DHCP-NO-OVERRIDE, I commented this out in ltsp.conf and restarted dnsmasq … on 10.0.253.24 (Ubuntu 14.04, Fog 3121) and it had no effect.I captured a pcap of this:
http://s000.tinyupload.com/?file_id=06163152893454480790Should I put this ltsp.conf option back or leave commented out?
-
Yesterday, I did send a pcap from 10.0.253.24 (Ubu 14.04/Fog 3121).
This is my original Fog setup. It is setup with static IP.
Here is the link to yesterdays pcap from that box:
http://s000.tinyupload.com/?file_id=29844646319354557181In addition, I created another pcap today for Tom Elliott from that box . Here is the link to that pcap:
http://s000.tinyupload.com/?file_id=06163152893454480790I will now go check the settings on 10.0.253.23 (Ubu 16.xx / Fog 1.3 release candidate 11).
I suspect you are correct that this one is not static, but I will check that now and post information about that box (and pcap). -
I’ll restate again, it should be working.
I want you to learn what I’m seeing (not that I really know what I’m looking at). If you install wireshark on your computer you can review these pcap files.
Below is the communication that is going on as viewed by your FOG server. In line 1 you see the client send a discover packet (basically hello I’m here I need network info). Then you see in step 2, two devices reply with an offer (here’s your network info). In step 3 you see the client again say “great this is a list of additional stuff I need”. In step 4 your main dhcp server says “ok here is the additional stuff you requested”, note here the dnsmasq box did not reply because it couldn’t add anything to what I already sent. Now here is where the process falls down. When the client gets the ACK back what it should do is contact the dhcpProxy on port 4011 and request the file name to download then reach out to the tftp server (listed in the next server field) and download the boot loader file (undionly.kpxe). That is what is suppose to happen. Now let me show you a side by side of what a proper exchange should look like.
-
This is your pxe boot process on the top and one from my FOG-Pi server. The only difference here is my client is asking for a uefi file and your is asking for the bios (legacy file). You will see again in yours the transactions stop at line 5 In the bottom image you will see once the client gets the ACK back from the dhcp server it turns around again and connects to the dhcpProxy port on port #4011 and downloads the file name to get, then the client ( at .16) requests the file via tftp from the FOG server. This is what suppose to happen.
-
What I don’t understand is why does it not continue and contact the proxyDHCP port? You have already confirmed that the iptables or firewalld firewall is not blocking port access. We have done a side by side comparison of your dnsmasq config file to mine.
The only variable we haven’t ruled out is a bad/fault pxe client (firmware/bios). What device is your pxe boot client? Is the firmware up to date on it? Do you have a different brand/model of device you could use? Assuming your FOG server is setup correctly now we have to start looking outside of the FOG server for the issue.
You did mention that you moved this FOG server from an isolated network to this new network. What device was supplying the DHCP services for that isolated network?
-
(Reply Part 2)
I’ve disabled 10.0.253.24 (Ubu 14.04, Fog 3121) and started up 10.0.253.23 (Ubu 16.04, Fog 1.3)
You are right that this box, which I just setup yesterday, hadn’t been made static. I’ve changed that now. After making the changes to the intefaces file, I rebooted the machine.
Here is the interfaces file, before and after:
before
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ cat orig_interfaces # interfaces(5) file used by ifup(8) and ifdown(8) auto lo iface lo inet loopback ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$
after
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ cat interfaces # interfaces(5) file used by ifup(8) and ifdown(8) auto lo iface lo inet loopback auto enp1s0 iface enp1s0 inet static address 10.0.253.23 netmask 255.255.255.0 gateway 10.0.253.1 dns-nameservers 110.164.252.222 8.8.8.8
ifconfig output:
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ ifconfig enp1s0 Link encap:Ethernet HWaddr f8:0f:41:a0:04:59 inet addr:10.0.253.23 Bcast:10.0.253.255 Mask:255.255.255.0 inet6 addr: fe80::fa0f:41ff:fea0:459/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5100 errors:0 dropped:8 overruns:0 frame:0 TX packets:2983 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5067659 (5.0 MB) TX bytes:433865 (433.8 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:760 errors:0 dropped:0 overruns:0 frame:0 TX packets:760 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:117502 (117.5 KB) TX bytes:117502 (117.5 KB) ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$
status for isc-dhcp
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ sudo service isc-dhcp-server status ● isc-dhcp-server.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead) ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$
I’ve just tried to boot the host box and got the same PXE-E78 error.
Here is a pcap from that:
http://s000.tinyupload.com/?file_id=00695804426205933233I’ve commented out the dchp-no-override in the ltsp.conf and restarted dnsmasq.
I get the PXE-E78 error.
Here is the pcap for this:
http://s000.tinyupload.com/?file_id=03577227175846412978Here is my ltsp.conf
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ cat ltsp.conf # Don't function as a DNS server: # port=0 # Log lots of extra information about DHCP transactions. log-dhcp # Dnsmasq can also function as a TFTP server. You may uninstall # tftpd-hpa if you like, and uncomment the next line: # enable-tftp # Set the root directory for files available via FTP. tftp-root=/tftpboot # The boot filename, Server name, Server Ip Address dhcp-boot=undionly.kpxe,10.0.253.23,10.0.253.23 # rootpath option, for NFS #dhcp-option=17,/images # kill multicast #dhcp-option=vendor:PXEClient,6,2b # Disable re-use of the DHCP servername and filename fields as extra # option space. That's to avoid confusing some old or broken DHCP clients. # # MKS 07-Oct-2016 #dhcp-no-override # PXE menu. The first part is the text displayed to the user. The second is the timeout, in seconds. pxe-prompt="Press F8 for boot menu", 3 # The known types are x86PC, PC98, IA64_EFI, Alpha, Arc_x86, # Intel_Lean_Client, IA32_EFI, BC_EFI, Xscale_EFI and X86-64_EFI # This option is first and will be the default if there is no input from the user. pxe-service=X86PC, "Boot from network", undionly, 10.0.253.23 # A boot service type of 0 is special, and will abort the # net boot procedure and continue booting from local media. #pxe-service=X86PC, "Boot from local hard disk", 0 # If an integer boot service type, rather than a basename is given, then the # PXE client will search for a suitable boot service for that type on the # network. This search may be done by multicast or broadcast, or direct to a # server if its IP address is provided. # pxe-service=x86PC, "Install windows from RIS server", 1 # This range(s) is for the public interface, where dnsmasq functions # as a proxy DHCP server providing boot information but no IP leases. # Any ip in the subnet will do, so you may just put your server NIC ip here. # Since dnsmasq is not providing true DHCP services, you do not want it # handing out IP addresses. Just put your servers IP address for the interface # that is connected to the network on which the FOG clients exist. # If this setting is incorrect, the dnsmasq may not start, rendering # your proxyDHCP ineffective. dhcp-range=10.0.253.23,proxy # This range(s) is for the private network on 2-NIC servers, # where dnsmasq functions as a normal DHCP server, providing IP leases. # dhcp-range=192.168.0.20,192.168.0.250,8h # For static client IPs, and only for the private subnets, # you may put entries like this: # dhcp-host=00:20:e0:3b:13:af,10.160.31.111,client111,infinite
On this box, I did try to check iptables and got this:
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ iptables -L modprobe: ERROR: could not insert 'ip_tables': Operation not permitted iptables v1.6.0: can't initialize iptables table `filter': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded. ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$
And for netstat:
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ netstat -an | grep 4011 udp 0 0 0.0.0.0:4011 0.0.0.0:* ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$
-
@mkstreet said in PXE-E78 Cannot locate boot server:
10.0.253.23
I’m not seeing anything different from the 10.0.253.23 system. The conversation stops after the ACK from the 172.1.1.1 dhcp server.
-
-
What device is your pxe boot client?
** Inside my lab, we have about 36 Lenovo ThinkPad model M72z
We have been using and loading these with Fog for about 2 years up until recently. -
Do you have a different brand/model of device you could use?
** Yes. I have “regular” desktop PC’s which are outside the lab.
I have just tried to network boot one of those. The case says its ACER Veriton.
I see that it connects to 10.0.253.23 and then gets the same PXE-E78 error as the boxes inside the lab. I have attached a pcap of this here:
http://s000.tinyupload.com/?file_id=48129942273982229370 -
You did mention that you moved this FOG server from an isolated network to this new network. What device was supplying the DHCP services for that isolated network?
** In brief, my department (LAB and offices) had our own Internet connection that was separate from the rest of the campus. Although this seems very odd, the ISP (which was essentially the state enterprise telecom entity) provided DHCP services from their central office about 2 kms away. I know it’s strange but that is what it was. This was the network configuration when we initially began using Fog in about March 2014. Once we had Fog initially setup and had gotten some experience with Fog, we had few/none network issues. We knew even less about that network/DHCP setup. From March 2014 to about August 2015, we very actively used Fog and loaded our entire lab at least ten times, and had separate images for our (non-Lab) office desktop PC’s that we managed with Fog too. From August 2015 until now, we have not used Fog much as we didn’t need to update things. About a month ago, my department gave up our separate connection to the state telecom provider, and joined the campus0wide LAN/network that the rest of the campus had just implemented (a different provider too than the state telecom we had been using). Now, we want to update and reload our lab. This was when we discovered that connecting to the Fog server and loading boxes, which we had reached the point of just-turn-on-and-take-for-granted, no longer could find the Fog server. This is what led to the path of attempting dnsmasq and voila here we are today.
-
-
@george1421 said:
What I don’t understand is why does it not continue and contact the proxyDHCP port? You have already confirmed that the iptables or firewalld firewall is not blocking port access. We have done a side by side comparison of your dnsmasq config file to mine.
Well, I thought I had explained that part. From what I understand the issue is that the normal DHCP server (10.0.253.1) sends the next-server option pointing to 172.16.1.1 (see in the PCAP file). This is most probably confusing the client. Would you mind trying this out, George? Add next-server option (pointing to a totally unrelated server) to your DHCP server and see what it does to your setup. Could also be that the clients still boot properly in your case as it all depends on how the client interpret the information.
-
@Sebastian-Roth That in a way is what is already happending in my network. From some silly reason my internet router is sending the next server as it self (as seen in the pcap in my dnsmasq uefi thread). But in this situation the pxe booting client understands this, complains about getting two next server replies and does the right thing anyway. That is why I suggested trying a different client to test this, but his second client did the same things.
@mkstreet I think we are at the point of diminishing returns. I don’t think we (from the FOG Project) are going to be able to help you solve this remotely. You NEED to have a network engineer involved with this to get the problem esolved. There is something going on in your environment that needs expert hands and tools for your network. The components are setup correctly for FOG there is some external force that is keeping this from working correctly.
-
OK. Two questions.
-
Instead of using DNSMASQ, if I attempt to have the network changed (i.e. 172.16.1.1 and/or 10.0.253.1), what exactly needs to be changed? I will need to explain what I want in almost step by step fashion to non-native English speakers who control the network.
-
Is there an alternative where I disconnect from the network just while I am loading the lab? Then, I am guessing, my Fog Server will need to have isc-dhcp running (instead of dnsmasq)…?
-