Optiplex 390 with Realtek NIC
Running FOG version 1.4.2
These machines imaged back in the Summer, but for some reason they now decide to get the TFTP timeout error. Plugging an Optiplex 780 with a Intel NIC in it works perfectly fine on the same drop the Realtek is plugged into.
i tried to download the latest kernel and rename it, then point that host to use that specific kernel, but still did the same thing.
the DHCP sever is and always has been set to use undionly.kpxe.
This also isnt specific to this drop, there are other 390s doing the same thing on the other side of the building.
Just as an FYI, we had to remove the DHCP relays on the AP’s. As the switches were apparently handling the relays anyway, the ones on the AP’s were not needed.
They are supposed to handle the wireless clients getting DHCP. We have one DHCP server, the APs relay DHCP requests to the server. This is in all buildings. Its really just odd that its Realtek acting this way. All the Intel nics do not act the same way, and none of them have this issue. Plus these AP’s were not in this building until December, which is why they imaged this summer just fine.
So they aren’t actually handing out addresses, they are just relaying it. Not sure why a wired client is trying to relay thru the AP. and why only Realtek. (thankfully we only have a handful of these)
@adukes40 Can you explain a bit more about your network? Why are AP’s handing out DHCP addresses? Are you using consumer APs in your network?
A pcap will tell you a bit more about what is going on. I can tell you we had issues (many moons ago) with 780 and 790 with spanning tree enabled on the building switch (with and old netgear switch). We finally found that we needed to enable fast spanning tree on the switch port. Actually the device was not getting a dhcp address during pxe boot, but via windows it was working fine. Not saying that this is your case here, but you might want to ensure that its not a spanning tree issue either. Placing an unmanaged switch between the computer and the building switch is a quick test to see if its a spanning tree issue.
adukes40 last edited by adukes40
Quick update, I have tested at the building where the FOG and DHCP server reside. the 390s there work just fine. I’m assuming the DHCP server grabs the request before the AP’s get a chance to. On the other hand, we took a 390 to a remote site where AP’s need to relay to the DHCP server. Having the exact same issue as my building. It seems the AP is passing the DHCP correctly, but it cannot do theTFTP, which is where the timeout is occurring.
This is going to be a fun one. I will try to keep it updating as I find out new information regarding the Realtek nics and Aerohive APs.
Welp, took the 390 over to the other building. only sees one gateway… something must have got jacked in this building during the switch configs. I will need to contact our state guy that handles these. I will see if I can get info out of them.
The building where this is failing is on the same campus as the FOG server, but they are separate subnets. I will take the machine over there and see if it does the same thing. If so I will attempt the pcap.
Still doesn’t make sense why only the realtek machines are seeing this.
Possibly realtek’s PXE implementation is a little different and that’s why. But this definitely looks like an interesting issue that you better look into using George’s advice on wireshark/tcpdump…
@adukes40 Is the FOG server on the same subnet (vlan) as these 390?
If they are then we can get a quick answer if you follow the instructions in this tutorial: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
If they are not, getting the answer is a bit harder but possible. You can use wireshark on a computer that’s plugged into the same subnet as the 390s to capture part of the dhcp/pxe boot process. That will give us an idea who the actors are during this pxe boot.
Having the FOG server collect the pcap will give us the best quality pcap since it will see the entire conversation. But the client has to be on the same subnet as the fog server to do that.
Upload the pcap to dropbox or a google drive and share the link with us or DM me directly and I’ll take a look at it for you.
adukes40 last edited by adukes40
Something I noticed about this picture I did not notice prior. This is showing two gateways. There should only be one. I booted up a Latitude 3330 with an Intel nic and I get the correct gateway. I rebooted the same 390, now instead of 3.62, it shows 3.70, along with the 1.1. I have never seen that before in 10 years.
EDIT: Digging a little more, the 3.xxx IP address are the DHCP reservations of our access points. going to investigate farther. Still doesn’t make sense why only the realtek machines are seeing this.
@adukes40 If your fog server and target computers are on the same subnet (really helps to see the entire pxe booting process) then we have a tutorial on how to capture a pcap of the pxe boot dialog. The pcap tells us what is really flying down your wires. Hopefully we can find what is unique with the 390s.
Capture the pxe booting process until it fails then upload the pcap to dropbox or a google drive and share the link with us. The filter in the tutorial is crafted to only capture pxe booting and nothing else.
I will be back at the shop tomorrow, I will get a pic then.
It grabs DHCP, but not TFTP. error 32 I believe, but I will get that pic for you.
they now decide to get the TFTP timeout error.
Can you please take a picture of that and post here. Just so we really see where exactly this happens. More often than not we see something on a picture that helps immensely in finding a solution.
@wayne-workman before testing anything else I will just fire these back real quick…
Optiplex worked summer 2017 - Correct
Optiplex 780s plugged into the ports that the 390 was using still works - so network problems are ruled out. - My thoughts
DHCP server is ruled out since you’ve not changed it at all. - Options 66&67 have not been touched.
Check your patch cables between the 390 and the building, ensure they are not kinked/ripped/broken/shredded/broken heads/clips/loose connections/etc (high schoolers are really hard on patch cables). - I have the 390 in my office I’m testing with, which is not working either. Plugged another 390 in the same spot, no go. That’s when i plugged the 780 in and it worked.
Please try a dumb mini-switch between a 390 and the building’s port - let us know if that changes things. - this is the port I still need to do. However, we just placed Aerohive AP’s up in our school, which requires us to have our state trunk ports on the switches. Ont the other hand, I specified the port to trunk, but being the 780 worked, this shouldn’t be an issue either.
Did you do a bios update on the 390? Have you tried updating the bios? - Have not done that yet.
Wayne Workman last edited by
@adukes40 I’m just thinking out-loud here…
- The Optiplex 390 worked for you in summer of 2017.
- The Optiplex 390 no-longer works.
- Optiplex 780s plugged into the ports that the 390 was using still works - so network problems are ruled out.
- Did you do a bios update on the 390? Have you tried updating the bios?
- DHCP server is ruled out since you’ve not changed it at all.
- Please try a dumb mini-switch between a 390 and the building’s port - let us know if that changes things.
- I don’t think it’s a kernel problem.
- Check your patch cables between the 390 and the building, ensure they are not kinked/ripped/broken/shredded/broken heads/clips/loose connections/etc (high schoolers are really hard on patch cables).