Yet another PXE-M0F error topic!



  • Hello FOG community!

    I have been trying to set up FOG for a good week now, but it feels like I’m constantly running into a brick wall. Head first.
    I’ve tried searching this forum, but I’ve come up with nothing so far :(

    My FOG machine: Centos 7 VM on a ESXI host. 10.254.10.29
    The Host machine(s) I’ve been trying to PXE boot with: Dell Latitude 5480, HP EliteDest 800 G3 SFF. Both are on DHCP and usually get an address from the 10.252.80.0/22 range.

    Our schools DHCP, AD and all the other fun stuff is centrally managed by an external company. The DHCP server serves all the schools in our city (450 000 people). I’ve asked them to make the required modifications to the DHCP server. Here is the e-mail ticket I sent them:
    Windows Server DHCP
    Option 66 (066 Boot Server Host Name)
    String Value: 10.254.10.29
    Option 67 (067 Bootfile Name)
    String Value: undionly.kpxe

    They ended up replying that they have made the modifications that I asked for and that can probably be backed up by the fact that before the changes were made I was getting a TFTP timeout error.

    Now that these changes have been made I’ve been stuck on a “PXE-M0F: Exiting Intel Boot Agent” error. I have tried configuring a Dnsmasq FOG server, but that hasn’t worked either. So far I’ve done a fresh install of Centos and reinstalled FOG for 4 times with 2 kinds of configurations:

    • Directing FOG to a Windows DHCP server
    • Using proxy DHCP (Dnsmasq)

    Both of these have still had me stuck on the PXE-M0F error.

    Just in case, here is a full transcript when I try PXE booting:

    Intel® Boot Agent CL v0.1.09
    Copyright © 1997-2013, Intel Corporation
    CLIENT MAC ADDR: A4 4C C8 10 F7 FB GUID: 44454C4C 3300 1031 8052 B2C04F364832
    CLIENT IP: 10.252.80.249 MASK: 255.255.252.0 DHCP IP: 10.254.255.16
    GATEWAY IP: 10.252.80.1
    TFTP.
    PXE-M0F: Exiting Intel Boot Agent.
    No Boot Device Found. Press any key to reboot the machine_

    Please do ask for any other kinds of information that is needed. Any help is much appreciated.

    EDIT: Here are the settings used by the current FOG install:

    
     * Here are the settings FOG will use:
     * Base Linux: Redhat
     * Detected Linux Distribution: CentOS Linux
     * Server IP Address: 10.254.10.29
     * Server Subnet Mask: 255.255.255.0
     * Interface: ens192
     * Installation Type: Normal Server
     * Internationalization: 0
     * Image Storage Location: /images
     * Using FOG DHCP: No
     * DHCP will NOT be setup but you must setup your
     | current DHCP server to use FOG for PXE services.
    
     * On a Linux DHCP server you must set: next-server and filename
    
     * On a Windows DHCP server you must set options 066 and 067
    
     * Option 066/next-server is the IP of the FOG Server: (e.g. 10.254.10.29)
     * Option 067/filename is the bootfile: (e.g. undionly.kpxe)
    
    

    EDIT 2: I am able to connect to TFTP from Windows CMD on the client machines and copy the undionly.kpxe file with no problems.

    PS C:\Windows\system32> tftp 10.254.10.29 get undionly.kpxe
    Transfer successful: 97715 bytes in 13 second(s), 7516 bytes/s
    

    EDIT 3: At this point I’m just trying everything:

    [root@FOG-server bin]# systemctl disable firewalld
    Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
    Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
    

    No good, I’ll re-enable it.

    EDIT 4: I tried tcpdump described in this : guide by @george1421 . I ended up having to modify the capture command a little bit to make it work though. Here’s what I used:

    tcpdump -i any -w /var/output.pcap port 67 or port 68 or port 69 or port 4011
    

    I’ll attach the dump files for the curious:
    2_1525419413445_output3.pcap
    1_1525419413445_output2.pcap
    0_1525419413445_output.pcap

    After having a look at these I discovered that my host machine is requesting to read the file “boot\pxeboot.com”, from the correct location (the IP of my FOG machine (10.254.10.29)). Is this right? Or is this the root of all evil?



  • Hi once again!

    Our problem, as it stands today, is solved. We ended up trying to install the FOG server on the same subnet with the host machines, which worked. After that we tried putting the machine back on the subnet where all our VMs are located. That broke it. Later the network admin realized that during his configuration he accidentally left a space after the IP on option 66. After that was fixed things stated working out.

    So in the end what I was chasing was a typo.

    A big thanks to @george1421 for being as helpful as You could during the troubleshooting process.

    I am now happily deploying my Windows 10 images. :)


  • Moderator

    @sven-ervin If you have the default route configured correctly on the fog server that is all you need to make imaging work across subnets from the fog server side.

    For the remote subnets, you need to ensure that the pxe boot options are being sent to target computers so they can locate the FOG server.

    Issues you will have when the FOG server is isolated from target computer because of a router.

    1. Multicasting will not work unless you setup a multicast router.
    2. WOL may have some difficulties.


  • A little bit of an update:

    I ended up contacting our “helpdesk” where my problem got directed to the coolest guy I have ever had the opportunity to work with.

    We wound up installing FOG-server on another machine on the same subnet as the hosts (10.252.80.0/22). After doing this everything works as expected.

    We are still working on getting it to work across different subnets. Is there any possibility that the FOG server configuration might be at fault when dealing with communication across subnets?



  • @george1421 Oh yeah, of course! My bad. I totally agree on the fact that it’s a problem with our infastructure. As far as I can tell FOG is doing what it is supposed to do just fine.

    I also shot an email to our network administration explaining the whole situation. Let’s see where that lands us.


  • Moderator

    @sven-ervin You would have to physically move that machine to the imaging network to install your image on it. The problem is your networking infrastructure not FOG at the moment. This is only guessing since we can’t see what the client is being told, but my guess is that you have a WDS or SCCM server on your campus that is currently configured for imaging.



  • @george1421 Now that I’ve read through your last response again, there is one thing that is confusing me. If i create this dedicated imaging network, then how would I point host machines to PXE boot on this network?



  • @george1421 Alright, I will be sure to keep that in mind! But As today is Friday and I’ve already stayed int almost 3 hours too long than I’'ll try to get some rest over the weekend and have a new crack at it on Monday.

    Huge thank you for all your help!


  • Moderator

    @sven-ervin One way to “get round” (I don’t like the term since it implies deception) the issue is to create a dedicated deployment network. In this case your fog server would have its roles changed a bit. Your fog server would have 2 network adapters. One would be connected to your isolated deployment network. and one to your business network.

    The fog server would then image on the dedicated imaging network. You could manage the fog server from the business network. The fog server would supply dhcp and pxe boot info only to the imaging network. If your clients need to connect to your AD during OOBE then you would need to configure your fog server as a NAT router so that traffic could flow between the imaging network and the business network without needing any modifications to your business network infrastructure.

    In this configuration once the system was imaging you could then move it from the imaging network to the production network without issue. To use the fog client properly you will need to use the dns name of the fog server and create what is know as a split horizon dns. For example lets say your fog server FQDN will be known as fog1.domain.com. On the business network that FQDN should map to the business network interface of your fog server. On the dedicated imaging network you will need to install a dns server on FOG and then create an entry for fog1.domain.com to point to the imaging network interface of the fog server. At least that is how I think it should work.

    I know its a lot of fiddling around, but if you have a locked in environment you have to be a bit creative to be functional.



  • Well… that sucks :D

    Alright, well, I’m just going to try putting the FOG machine in the same subnet with the host machine and see if that is going to change anything.

    EDIT Deleted rant


  • Moderator

    @sven-ervin again your pcaps are strange (not that you are doing anything wrong collecting them).

    If I had to guess your VLAN/subnet router is not using broadcasts to relay dhcp information back to the client, but using unicasts from the dhcp relay back to the target computer. In this case wireshark will not see unicast messages unless the wireshark computer is on a mirrored port to the pxe booting client (or you happen to know of a network hub still in existence at your computer. A hub would mirror all traffic to all network ports).

    So from here on out I’m going to just read the tea leaves here.

    1. Your pxe booting client is a bios/legacy mode system (dell to be specific)
    2. The client computer is being told to boot boot\pxeboot.com which is a WDS/SCCM boot loader.
    3. WDS/SCCM uses a proxydhcp configuration (its akin to dnsmasq). So its settings are/will override anything you set in the dhcp server option 67.

    So what can you do? It will be hard to overcome if wds is running in your network. You will need to configure your dhcp relay on your subnet router to no send dhcp boot information from the pxe booting subnet to your wds server. That way it won’t respond and take over the client.

    You should also check with your networking team to see why/how dhcp boot file boot\pxeboot.com is being sent to the target computer. They may have insight on this.



  • First of all, thank you!

    Latest fresh FOG install config:

    * Here are the settings FOG will use:
     * Base Linux: Redhat
     * Detected Linux Distribution: CentOS Linux
     * Server IP Address: 10.254.10.29
     * Server Subnet Mask: 255.255.255.0
     * Interface: ens192
     * Installation Type: Normal Server
     * Internationalization: 0
     * Image Storage Location: /images
     * Using FOG DHCP: No
     * DHCP will NOT be setup but you must setup your
     | current DHCP server to use FOG for PXE services.
    
     * On a Linux DHCP server you must set: next-server and filename
    
     * On a Windows DHCP server you must set options 066 and 067
    
     * Option 066/next-server is the IP of the FOG Server: (e.g. 10.254.10.29)
     * Option 067/filename is the bootfile: (e.g. undionly.kpxe)
    
    
     * Are you sure you wish to continue (Y/N)
    

    Attached are 2 files.“Output 4” is from the FOG machine itself and the other one (67,68,69,4011Capture1) is from Wireshark scanning the same ports at the same time.

    0_1525439990605_output4.pcap
    0_1525440047448_67,68,69,4011capture1.pcap

    EDIT: I did some additional testing and I’ll post these results as code snippets for easier referencing:

    FOG machine on a 10.254.10.1/24 subnet
    0_1525443426142_output5.pcap

    Host machine subnet 10.252.80.0/22
    0_1525443818489_67,68,69,4011capture2.pcap

    As far as I myself can understand there is no actual PXE communication going on in the 10.252.80.0/22 subnet.



  • I am pretty sure I had Dnsmasq service disabled, but just to be on the safe side I am doing a complete nuke n’ pave as I’m typing this. I already have Wireshark setup on the same subnet. I shall post the results as soon as I get them.


  • Moderator

    At this point I’m not sure of your configuration. Dealing with clients on different subnets from the fog server and dhcp server are not the easiest to debug.

    Your pcaps are not complete and what is there is a bit strange to say the least. Understand I’m not saying you captured them wrong, they are not what you might typically see. If they are accurate, I can understand why pxe booting is not working for you. One other comment is around the boot file name. undionly.kpxe is for bios (legacy) mode computers. A uefi based computer can not boot with a bios (legacy) mode boot file. For uefi computers you will need to send the ipxe.efi boot file name. The point is that dhcp servers that use static boot file names are now problematic.

    A proper dhcp sequence goes

    1. Discover (client to world)
    2. Offer (dhcp(s) to client)
    3. Request (client to dhcp server)
    4. ACK (dhcp server to client)
      (PXE)
    5. client requests boot image size from tftp server
    6. tftp server responds
    7. cleint asks for file.

    What I would recommend is a few things. Lets go back to your original configuration with the company managed dhcp server. Since the fog server and pxe booting clients are on different subnets you shouldn’t need to change your fog server. You might shut off dnsmasq and/or isc-dhcp server if you turned them on.

    Now take a computer on the same subnet as where you are trying to pxe boot the computer. Install wireshark on it and use the same capture filter. Then pxe boot the target computer. You will be capturing the conversation between the pxe booting computer and the dhcp server. We need to find out what main dhcp server is telling the pxe booting client. I suspect something not what we expect.

    Also make sure the target computer is in bios (legacy) mode so it will boot the undionly.kpxe file. We can get this going, we just need to understand what is happening on your network.



447
Online

6.2k
Users

13.6k
Topics

128.0k
Posts