PXE boot VM sucess PXE Boot Laptop Fail



  • I have am running 1.5.9-RC2 on Ubuntu 18.04 LTS, I have Option 66 and 67 set to direct PXE boot as directed. Option 67 being Undionly.kpxe. If I use KVM to create a new machine by PXE booting I am successful. When I try to PXE boot from various laptops I I have a TFTP time out error PXE-E32 and PXE-M0F. I have used Wireshark to monitor the communication. With the VM there is the expected TFTP traffic. With the Laptops there is none. The network ports and switches are configured the same between VM to Fog, and Fog to the laptops. I am using legacy boot on the laptops. Out of curiosity I did test it the other way and had the same outcome. What might I be missing?


  • Moderator

    @dskinner Ok since it DOES work somewhere on your site there is also another possibility that an unexpected dhcp server or proxydhcp server is messing with booting on the same vlan as the fog server. So lets help your network engineers. Lets grab a pcap (packet capture) of the pxe booting process on the same vlan as the fog server. To do this we’ll use the fog server to run tcpdump (packet capture on). This program will capture the dhcp processes of the pxe booting computer as well as the file transfer of the boot loader (ipxe.efi/undionly.kpxe).

    I have a tutorial here: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue

    You can review the output pcap with wireshark. You can take the pcap to your network engineers, review it your self, or post the pcap to a file share site and either pm me the link or post the link in the forum.

    I’ll tell you what you should be seeing. The dhcp process is pretty simple. There is a DHCP DISCOVER packet from the target computer, there is an OFFER from one or more DHCP servers (this is where I would focus, ensure the responding dhcp servers are correct and have the correct settings in place in the dhcp options section), a DHCP REQUEST from the target computer and then finally an ACK from the dhcp server. So this part of the process should only take 4 network packets. If there are more or repeating requests or offers then I would look into that. Once the dhcp process is done then the pxe booting part starts. The target computer will take the name of the boot loader (option 67) and the boot server (option 66) given by the dhcp server and request that file via tftp from the boot server. You should see one initial request for just the file size from the pxe booting client to the tftp server then the client will ask for the boot loader file itself.



  • I restarted a PC in a different building and it immediately booted to FOG. This other building is on a different VLAN than my test PCs or the Fog Server. The FOG server and the test PC are on the same VLAN. We changed VLANs on my test port and the laptop booted to FOG as intended. We are currently looking at ACL settings. I have created a ticket with our network engineers who swore there are no differences between switches or VLANs. Beyond that, I will be setting up DNSMASQ as soon as I find the resolution to the differing VLANs and thank you for your help.


  • Senior Developer

    @dskinner said in PXE boot VM sucess PXE Boot Laptop Fail:

    I miss typed the default gateway address.

    This could definitely cause the issue described. So you need to re-run the installer after changes in .fogsettings



  • @Sebastian-Roth said in PXE boot VM sucess PXE Boot Laptop Fail:

    Just saw this and wanted to say that changing values in .fogsettings does not cause any change in FOG itself. You need to re-run the installer to see the change being applied. If you set us know what exactly you changed we might be able to tell you if this could be related or not.

    I miss typed the default gateway address.



  • @george1421 Thank you for the information. I have another project that I have to pick up for the day but I will be sure to get back to you with my results later this week.


  • Senior Developer

    @dskinner said in PXE boot VM sucess PXE Boot Laptop Fail:

    I just found on error in .fogsettings, fixed it and restarted the Fog VM. This made no effect on the timeout issue.

    Just saw this and wanted to say that changing values in .fogsettings does not cause any change in FOG itself. You need to re-run the installer to see the change being applied. If you set us know what exactly you changed we might be able to tell you if this could be related or not.


  • Moderator

    @dskinner Sorry I didn’t see this come in.

    OK if the fog server and the clients are on the same vlan and your dhcp server is inaccessible then I will recommend that you install dnsmasq on your FOG server and manage just the pxe boot information locally. DNSMASQ takes about 10 minutes to install and configure if you use my tutorial.

    The instructions are for centos but you can adapt for ubuntu: https://forums.fogproject.org/topic/12796/installing-dnsmasq-on-your-fog-server

    In the tutorial there is <fog_server_ip_address> tags. Replace the entire tag with the IP address of your fog server. Make sure you remove the greater and less than signs. The rest is just copy and paste.

    Understand in this configuration only the pxe boot info is being supplemented by dnsmasq.

    The “problem” with your off site dhcp server is that it is probably a static dhcp option 67. With this static dhcp server you can only configure it to pxe boot bios or uefi but not both modes at the same time, because each mode takes a different boot loader. The dnsmasq configuration will dynamically change the boot loader name depending on the PXE booting client.

    The only “gotcha” with dnsmasq is that if your target computers are on a different subnet than your fog (dnsmasq) server you will need to make a small adjustment to your vlan router. But in your case that isn’t an issue right now.



  • The DHCP server is a VM hosted on a Cicso blade server off campus. That is outsourced and

    I can work with the tech on that. I have asked him to confirm setting 66 and 67 a few times in the process, and should be able to get limited support from them for easier stuff. Otherwise, Laptops and the imaging server are on the same VLAN.


  • Moderator

    @dskinner said in PXE boot VM sucess PXE Boot Laptop Fail:

    The DHCP server is a VM hosted on a Cicso blade server off campus. That is outsourced and I am not familiar with the specs on that.

    So you don’t have control of the dhcp server settings? That is OK we can work around that.

    So just to be clear the FOG server and the target computers are on the same VLAN. Is that the case moving forward too?



  • @george1421 @Sebastian-Roth The machines are on the same VLAN, and same switch Cicso 4510 r+e. The DHCP server is a VM hosted on a Cicso blade server off campus. That is outsourced and I am not familiar with the specs on that. I just found on error in .fogsettings, fixed it and restarted the Fog VM. This made no effect on the timeout issue.


  • Moderator

    To add to Sebastian’s questions, what DHCP server are you using (manufacturer and model). Please answer them all so we can give you a direction to move in.


  • Senior Developer

    @dskinner Is the FOG server on a different VLAN/subnet than the clients? Are you sure the DHCP server is sending the correct gateway/router address?



  • @Sebastian-Roth I have test these. I am able to manually tftp the file with the windows laptop with the firewall on and off. The fog server and the laptop are on the same vlan as the VM that was able to pxe boot to the fog menu without issue.


  • Senior Developer

    @dskinner I would suspect this to be a firewall or routing issue. But as I don’t know your network in detail it’s just a wild guess.

    Please go through the tests mentioned in the wiki and report back what you find: https://wiki.fogproject.org/wiki/index.php?title=Tftp_timeout


Log in to reply
 

294
Online

7.4k
Users

14.5k
Topics

136.5k
Posts