Unable to PXE boot on from different subnet

Defcon

@george1421 Yep, you’re right. No luck haha! Any other thoughts?

george1421

@Defcon As I pointed out earlier, I find it strange that your packet size is 558. Is that consistent across your entire network or just the subnets in question?

The case of 10.80.x.x, is that vlan connected at GbE speeds or is that vlan at a remote location (off-campus)?

Is it correct to say every computer on that vlan can’t pxe boot or is it random?

Defcon

@george1421 It’s at a remote location, and the PXE booting is random for sure. The ones that work at the moment are the Dell Optiplex GX520 desktops. I tested out the PXE at high school and the packet size is 1503. The 1503 packet size is what I am pulling here as well, where the server is at. Yikes…

george1421

@Defcon So just to confirm at that remote location random pc’s will pxe boot?

How is that remote location connected to the main location? What technology is being used (mpls, dsl, vpn over internet, etc.)

Defcon

@george1421 Hey George, apologizes on the long response. I wanted to get more information regarding this, but the employee was gone on vacation.

The school is actually a direct connect, no VPN etc. Apparently we own that fiber as well. Regarding the random PC’s that boot, so far it’s only the GX520 desktop that’s able to boot into PXE. All of the schools that are connected to the main district are direct connect (something new I learned). Do you think this may be a switch issue?

george1421

@Defcon ok lets focus on just a single computer on a single subnet that doesn’t pxe boot. Once that is identified, I would like to take a (dumb) unmanaged switch and place it between the building computer and the pxe booting computer to see if that ‘masks’ the issue.

Sebastian Roth

I feel like there are some parts of the picture still missing. So first I shall mention that the packet size is not playing a role here as far as I can see. The figure 558 is just the size of the full packet (TFTP plus headers for UDP, IP and ethernet/MAC). In the TFTP RFC (page 6) it says:

The data field is from zero to 512 bytes long.

So this is perfectly fine I reckon.

Second: The first transfer finishing and another one just starting right after it in the second picture of the initial post looks strange at first but there is a major time gap between those two so I think this is because it’s trying again after a reboot.

@Defcon I am still missing the actual error here. The text you posted (ending with “Installation failed cannot continue”) seems to be copied from wireshark and is probably just the readable strings within the transfered iPXE binary. Could you please describe exactly what you see when things go wrong. Take a picture or even video of a failed PXE boot!

Defcon

@Sebastian-Roth Thank you for the reply! Sorry for late response; during the summer months of school people take a lot of vacation during this time.

I heard back from the technician over there, and she sent over screen shots, and here there are provided below.

HP 7800 Error Screen
alt text

Dell Optiplex 390 Error Screen
alt text

george1421

@Defcon I’m still of a mind to say that this is/could be a spanning tree issue. Can we assume both of these computers in the pictures above are on the same subnet in the same building?

Please test this idea by placing an unmanaged switch between the pxe booting computer and the building switch. Then pxe boot the computer. Confirm if you can get to the fog iPXE menu.

What you are experiencing is what we typically see when spanning tree ( a good thing ) is turned on, but is not configured for one of the fast spanning tree protocols (fast-STP, RSTP, or what ever your switch mfg calls it). Placing the unmanaged switch between the pxe booting computer and the building switch will keep that building switch port from winking as the pxe booting computer starts up.

JLE

@defcon said in Unable to PXE boot on from different subnet:

On the subnet that is 10.80.x.x the Dell computer won’t PXE boot, but when I bring the computer physically on this network it boots just fine in PXE.

When you move this computer are you hooking it up to an entirely different switch? I recently ran into both of those errors you have posted a picture of. Here’s a checklist I’ve found that works for us:

Ip-helper address or dhcp-relays set up on each VLAN, and on each switch.
Spanning-Tree set to rapid-pvst (because of the switch model that we have.)
Portfast enabled.

Specifically with that bottom error I had to add all of the hosts to a new group (that I called Encryp Reset), go into the group general settings for that group - reset their encryption data. Deploy an image to the hosts again - got that same error again (no configuration methods succeeded) Then I rebooted the computer and upon the next cycle it worked just fine. I’ve had to do this maybe 50-60 times so far. Random Dell Optiplex 990s just seem to do it.

Defcon

I just want to say thank you for all the help! I am going to see if there is a switch laying around where I can test this. @JLE It’s on a completely different switch. I had a feeling it was some sort of issue regarding the switch just unsure what. Unfortunately I don’t have access to configuring the switches (apparently a third party does this??). I’ll get back to you guys one I can find a unmanaged switch switch, and tested it out.

lmioperations

Something JLE said is the first thing I thought about (ip helper-addresses). If you can get access to login to the switch, find the relevant documentation online for your switch and verify if the VLAN has an ip helper-address configured.

On our ProCurves, we could check like this:

show ip helper-address

or we could configure like this:

config
vlan <vlan_number>
ip helper-address <IP_of_your_DHCP_server>
write mem
end

Unable to PXE boot on from different subnet

155

12.1k

17.3k

155.4k