TFTP/PXE Boot Issues
-
Server
- FOG Version: 1.4.4
- OS: Ubuntu 16.04
Description
I’d like to start off by saying I’m pretty new to this stuff so thanks in advance for your help and patience. I’ve been playing with this the past 3 days and am having trouble getting any clients to boot into FOG. I’m using my own DHCP server which is on the server subnet and most of my clients are a separate workstation subnet. Initially the FOG server was on the server subnet. Trying to PXE boot would timeout from the client subnet. Trying to boot from the server subnet got a little further, but still ended up timing out as seen below.
The 172.16.50.23 IP corresponds to our system center server which PXE used to boot into but that server is offline now. DHCP Option 66 is no longer pointing to it, but it looks like somehow PXE is still getting that IP address from somewhere.After this happened, I changed the IP of the FOG server to be on the workstation subnet. Now PXE boot from the workstation subnet times out with PXE-E32 and trying from the server subnet gives PXE-E11 followed by PXE-E38.
I’ve tried resetting and updating the fog user password, checked my permissions, and tested the tftp connection from windows and Linux (Linux succeeded, windows gives “can’t write to the local file ‘undionly.kpxe’”). I been banging my head against the wall for hours and seems like I get stuck at this point every time despite rebuilding the server using different versions of FOG and different versions of Linux. I’d appreciate any advice anyone can give and please let me know if you need any more info or screenshots from me.
Thanks,
Cam -
I can’t find anything that explains the picture you posted initially other than there is a second dhcp server somewhere with a different next server (dhcp option 66) setting.
-
ok just to be sure I have the right picture here. The fog server pxe booting client and dhcp server are all of the 172.16.50.x/24 subnet?
If that is the case we can use the fog server to understand what is going on in your network. But first lets make sure I know what I think I know.
-
This post is deleted! -
That was correct, but I recently moved the FOG server to a different subnet for testing purposes. Ideally I’d like to house it on the 172.16.50.x/24 subnet, so I’ll move it back since the issues still exists.
-
@CamGreezy well if you could leave it for some testing it will help.
The idea is that we will use the fog server and the tcpdump utility to record the dhcp / pxe booting process to understand who the actors are in the pxe booting process. The output of the tcpdump utility we can look at with wireshark to understand who is saying what. If the fog server is on a different subnet then we won’t hear the dhcp announcements.
This tutorial will help you generate the necessary pcap file. https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
Then upload that pcap file to a google drive or dropbox and share the link with me in the forum or via FOG Project messaging if you don’t want it public.
-
https://drive.google.com/file/d/0B7KKPwLD3OQQU3l5RGg5MTNpSGs/view?usp=sharing Is the output file when trying to boot from the same subnet.
https://drive.google.com/file/d/0B7KKPwLD3OQQZGNEdFc0WmF2Smc/view?usp=sharing Is the output file when trying to boot from the workstation subnet.
-
@CamGreezy digesting it right now.
-
@CamGreezy first file (will need you to process what I will tell you)
pxe booting hardware vmware vm in bios (legacy) mode.
DHCP server 172.16.50.41 (which may also be a AD/DNS server too) responds with:- next server 172.16.50.29
- boot file undionly.kpxe
Then we see a normal request and ack sequence
Target computer queries 172.16.50.29 for undionly.kpxe file size
Target then asks for undionly.kpxe from tftp server
Then nothing… I would expect to see the target computer download that file.
Then after a timeout the client goes through a discover / offer sequence.
I don’t see any reference to 172.16.50.23 in the first one.
-
@CamGreezy The second one doesn’t have any usable information. It just looks like more of the post dhcp request from the first one.
-
I can’t find anything that explains the picture you posted initially other than there is a second dhcp server somewhere with a different next server (dhcp option 66) setting.
-
Oh ok, that’s exactly the issue. Have not worked with PXE much before and thought the option 66 and 67 info would replicate to our 2nd DC, but that was not the case. Fixing that lets me boot right into FOG. Thanks for the help!
-
I expect the 2nd file didn’t give any output because it was from a different subnet. Could you please point me to the info I need to configure FOG to run across subnets before you mark this as solved?
Thanks again, feel a little silly but now I know!
-
@CamGreezy Great you found the issue. This one was a real brain buster because I didn’t see a second offer in the dhcp pcap you sent. A lot of times if there are two dhcp servers on the same network you will see an offer from both. That tells us right away what is wrong. Obviously you had a silent dhcp server when we were scanning.
Anyway. It sounds like you are good to go now.
-
@CamGreezy said in TFTP/PXE Boot Issues:
I expect the 2nd file didn’t give any output because it was from a different subnet. Could you please point me to the info I need to configure FOG to run across subnets before you mark this as solved?
Whoops, I just marked it solved.
FOG will run across subnets as long as you have routing setup correctly on the fog server. If the fog server can ping hosts on remote subnets then you should be able to image across subnets. Are you seeing a specific issue?
-
Nothing specific, just know it’s something I have to do that I haven’t yet. I was having trouble finding anything referring to that specifically on the wiki so if you have a link to any documentation that would be helpful, otherwise I’ll put some time into it and make a new post if I hit any specific issues.
-
@CamGreezy If the subnet is at the same site then as long as routing is working then you can image. It uses the same tools. There is an issue with WOL (wake on lan) if you have different subnets you need to allow directed broadcasts on your network. But from an imaging stanpoint as long if the fog server can reach it, it should work.
Now if your remote site is connected over a low speed link (like mpls) then you might want to think about installing a fog storage node at the remote location to help with the imaging process.