PXE-E78 Cannot locate boot server
-
Yesterday, I did send a pcap from 10.0.253.24 (Ubu 14.04/Fog 3121).
This is my original Fog setup. It is setup with static IP.
Here is the link to yesterdays pcap from that box:
http://s000.tinyupload.com/?file_id=29844646319354557181In addition, I created another pcap today for Tom Elliott from that box . Here is the link to that pcap:
http://s000.tinyupload.com/?file_id=06163152893454480790I will now go check the settings on 10.0.253.23 (Ubu 16.xx / Fog 1.3 release candidate 11).
I suspect you are correct that this one is not static, but I will check that now and post information about that box (and pcap). -
I’ll restate again, it should be working.
I want you to learn what I’m seeing (not that I really know what I’m looking at). If you install wireshark on your computer you can review these pcap files.
Below is the communication that is going on as viewed by your FOG server. In line 1 you see the client send a discover packet (basically hello I’m here I need network info). Then you see in step 2, two devices reply with an offer (here’s your network info). In step 3 you see the client again say “great this is a list of additional stuff I need”. In step 4 your main dhcp server says “ok here is the additional stuff you requested”, note here the dnsmasq box did not reply because it couldn’t add anything to what I already sent. Now here is where the process falls down. When the client gets the ACK back what it should do is contact the dhcpProxy on port 4011 and request the file name to download then reach out to the tftp server (listed in the next server field) and download the boot loader file (undionly.kpxe). That is what is suppose to happen. Now let me show you a side by side of what a proper exchange should look like.
-
This is your pxe boot process on the top and one from my FOG-Pi server. The only difference here is my client is asking for a uefi file and your is asking for the bios (legacy file). You will see again in yours the transactions stop at line 5 In the bottom image you will see once the client gets the ACK back from the dhcp server it turns around again and connects to the dhcpProxy port on port #4011 and downloads the file name to get, then the client ( at .16) requests the file via tftp from the FOG server. This is what suppose to happen.
-
What I don’t understand is why does it not continue and contact the proxyDHCP port? You have already confirmed that the iptables or firewalld firewall is not blocking port access. We have done a side by side comparison of your dnsmasq config file to mine.
The only variable we haven’t ruled out is a bad/fault pxe client (firmware/bios). What device is your pxe boot client? Is the firmware up to date on it? Do you have a different brand/model of device you could use? Assuming your FOG server is setup correctly now we have to start looking outside of the FOG server for the issue.
You did mention that you moved this FOG server from an isolated network to this new network. What device was supplying the DHCP services for that isolated network?
-
(Reply Part 2)
I’ve disabled 10.0.253.24 (Ubu 14.04, Fog 3121) and started up 10.0.253.23 (Ubu 16.04, Fog 1.3)
You are right that this box, which I just setup yesterday, hadn’t been made static. I’ve changed that now. After making the changes to the intefaces file, I rebooted the machine.
Here is the interfaces file, before and after:
before
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ cat orig_interfaces # interfaces(5) file used by ifup(8) and ifdown(8) auto lo iface lo inet loopback ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$
after
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ cat interfaces # interfaces(5) file used by ifup(8) and ifdown(8) auto lo iface lo inet loopback auto enp1s0 iface enp1s0 inet static address 10.0.253.23 netmask 255.255.255.0 gateway 10.0.253.1 dns-nameservers 110.164.252.222 8.8.8.8
ifconfig output:
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ ifconfig enp1s0 Link encap:Ethernet HWaddr f8:0f:41:a0:04:59 inet addr:10.0.253.23 Bcast:10.0.253.255 Mask:255.255.255.0 inet6 addr: fe80::fa0f:41ff:fea0:459/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5100 errors:0 dropped:8 overruns:0 frame:0 TX packets:2983 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:5067659 (5.0 MB) TX bytes:433865 (433.8 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:760 errors:0 dropped:0 overruns:0 frame:0 TX packets:760 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:117502 (117.5 KB) TX bytes:117502 (117.5 KB) ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$
status for isc-dhcp
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$ sudo service isc-dhcp-server status ● isc-dhcp-server.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead) ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/network$
I’ve just tried to boot the host box and got the same PXE-E78 error.
Here is a pcap from that:
http://s000.tinyupload.com/?file_id=00695804426205933233I’ve commented out the dchp-no-override in the ltsp.conf and restarted dnsmasq.
I get the PXE-E78 error.
Here is the pcap for this:
http://s000.tinyupload.com/?file_id=03577227175846412978Here is my ltsp.conf
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ cat ltsp.conf # Don't function as a DNS server: # port=0 # Log lots of extra information about DHCP transactions. log-dhcp # Dnsmasq can also function as a TFTP server. You may uninstall # tftpd-hpa if you like, and uncomment the next line: # enable-tftp # Set the root directory for files available via FTP. tftp-root=/tftpboot # The boot filename, Server name, Server Ip Address dhcp-boot=undionly.kpxe,10.0.253.23,10.0.253.23 # rootpath option, for NFS #dhcp-option=17,/images # kill multicast #dhcp-option=vendor:PXEClient,6,2b # Disable re-use of the DHCP servername and filename fields as extra # option space. That's to avoid confusing some old or broken DHCP clients. # # MKS 07-Oct-2016 #dhcp-no-override # PXE menu. The first part is the text displayed to the user. The second is the timeout, in seconds. pxe-prompt="Press F8 for boot menu", 3 # The known types are x86PC, PC98, IA64_EFI, Alpha, Arc_x86, # Intel_Lean_Client, IA32_EFI, BC_EFI, Xscale_EFI and X86-64_EFI # This option is first and will be the default if there is no input from the user. pxe-service=X86PC, "Boot from network", undionly, 10.0.253.23 # A boot service type of 0 is special, and will abort the # net boot procedure and continue booting from local media. #pxe-service=X86PC, "Boot from local hard disk", 0 # If an integer boot service type, rather than a basename is given, then the # PXE client will search for a suitable boot service for that type on the # network. This search may be done by multicast or broadcast, or direct to a # server if its IP address is provided. # pxe-service=x86PC, "Install windows from RIS server", 1 # This range(s) is for the public interface, where dnsmasq functions # as a proxy DHCP server providing boot information but no IP leases. # Any ip in the subnet will do, so you may just put your server NIC ip here. # Since dnsmasq is not providing true DHCP services, you do not want it # handing out IP addresses. Just put your servers IP address for the interface # that is connected to the network on which the FOG clients exist. # If this setting is incorrect, the dnsmasq may not start, rendering # your proxyDHCP ineffective. dhcp-range=10.0.253.23,proxy # This range(s) is for the private network on 2-NIC servers, # where dnsmasq functions as a normal DHCP server, providing IP leases. # dhcp-range=192.168.0.20,192.168.0.250,8h # For static client IPs, and only for the private subnets, # you may put entries like this: # dhcp-host=00:20:e0:3b:13:af,10.160.31.111,client111,infinite
On this box, I did try to check iptables and got this:
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ iptables -L modprobe: ERROR: could not insert 'ip_tables': Operation not permitted iptables v1.6.0: can't initialize iptables table `filter': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded. ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$
And for netstat:
ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$ netstat -an | grep 4011 udp 0 0 0.0.0.0:4011 0.0.0.0:* ubucomlab@ubucomlab-ThinkCentre-M72z:/etc/dnsmasq.d$
-
@mkstreet said in PXE-E78 Cannot locate boot server:
10.0.253.23
I’m not seeing anything different from the 10.0.253.23 system. The conversation stops after the ACK from the 172.1.1.1 dhcp server.
-
-
What device is your pxe boot client?
** Inside my lab, we have about 36 Lenovo ThinkPad model M72z
We have been using and loading these with Fog for about 2 years up until recently. -
Do you have a different brand/model of device you could use?
** Yes. I have “regular” desktop PC’s which are outside the lab.
I have just tried to network boot one of those. The case says its ACER Veriton.
I see that it connects to 10.0.253.23 and then gets the same PXE-E78 error as the boxes inside the lab. I have attached a pcap of this here:
http://s000.tinyupload.com/?file_id=48129942273982229370 -
You did mention that you moved this FOG server from an isolated network to this new network. What device was supplying the DHCP services for that isolated network?
** In brief, my department (LAB and offices) had our own Internet connection that was separate from the rest of the campus. Although this seems very odd, the ISP (which was essentially the state enterprise telecom entity) provided DHCP services from their central office about 2 kms away. I know it’s strange but that is what it was. This was the network configuration when we initially began using Fog in about March 2014. Once we had Fog initially setup and had gotten some experience with Fog, we had few/none network issues. We knew even less about that network/DHCP setup. From March 2014 to about August 2015, we very actively used Fog and loaded our entire lab at least ten times, and had separate images for our (non-Lab) office desktop PC’s that we managed with Fog too. From August 2015 until now, we have not used Fog much as we didn’t need to update things. About a month ago, my department gave up our separate connection to the state telecom provider, and joined the campus0wide LAN/network that the rest of the campus had just implemented (a different provider too than the state telecom we had been using). Now, we want to update and reload our lab. This was when we discovered that connecting to the Fog server and loading boxes, which we had reached the point of just-turn-on-and-take-for-granted, no longer could find the Fog server. This is what led to the path of attempting dnsmasq and voila here we are today.
-
-
@george1421 said:
What I don’t understand is why does it not continue and contact the proxyDHCP port? You have already confirmed that the iptables or firewalld firewall is not blocking port access. We have done a side by side comparison of your dnsmasq config file to mine.
Well, I thought I had explained that part. From what I understand the issue is that the normal DHCP server (10.0.253.1) sends the next-server option pointing to 172.16.1.1 (see in the PCAP file). This is most probably confusing the client. Would you mind trying this out, George? Add next-server option (pointing to a totally unrelated server) to your DHCP server and see what it does to your setup. Could also be that the clients still boot properly in your case as it all depends on how the client interpret the information.
-
@Sebastian-Roth That in a way is what is already happending in my network. From some silly reason my internet router is sending the next server as it self (as seen in the pcap in my dnsmasq uefi thread). But in this situation the pxe booting client understands this, complains about getting two next server replies and does the right thing anyway. That is why I suggested trying a different client to test this, but his second client did the same things.
@mkstreet I think we are at the point of diminishing returns. I don’t think we (from the FOG Project) are going to be able to help you solve this remotely. You NEED to have a network engineer involved with this to get the problem esolved. There is something going on in your environment that needs expert hands and tools for your network. The components are setup correctly for FOG there is some external force that is keeping this from working correctly.
-
OK. Two questions.
-
Instead of using DNSMASQ, if I attempt to have the network changed (i.e. 172.16.1.1 and/or 10.0.253.1), what exactly needs to be changed? I will need to explain what I want in almost step by step fashion to non-native English speakers who control the network.
-
Is there an alternative where I disconnect from the network just while I am loading the lab? Then, I am guessing, my Fog Server will need to have isc-dhcp running (instead of dnsmasq)…?
-
-
@mkstreet If I had to pick one thing that your current network does that I would wish to stop is this.
Your primary dhcp server (172.16.1.1) to not send the dhcp option 66 {next-server} with a value of 172.16.1.1 to your subnet (10.0.253.0/16). Understand I’m not saying this is your entire problem and there may be something else underlying issue. The second part (maybe wish), that with the next server being not sent at 172.16.1.1 but have the next server value being sent as the IP address of your FOG server. If that happens then dnsmasq is not needed.
-
@mkstreet said:
Then, I am guessing, my Fog Server will need to have isc-dhcp running (instead of dnsmasq)…?
Yes! And that just sparked off an idea. Assuming that the “other” DHCP server is further away you could try running your very own ISC-DHCP server within your 10.0.253.0/16 subnet which is answering faster than the other one does anyway. This way you would force the clients to use all the DHCP information sent by your server before they get an answer from the outside one. Not very nice and I am sure it will cause trouble at some point. But it’s worth a try to see if it would work for you.
Check all your settings in /opt/fog/.fogsettings (wiki help here) and turn on DHCP, re-run the installer and you should be up and running with a working ISC-DHCP config in just a few minutes.
-
@george1421 I have forwarded the first wish to the network area. In am waiting a response.
-
@Sebastian-Roth I will try this in about ten days when we are back in session. Thanks!!