Pxe boot halts with MTFTP message on specific platforms

mecsr

So, just a few weeks ago, I had no problems pxe booting into fog on vmware workstation VMs and HP Elite 8000 workstations. In the last week or two they all suddenly stopped booting to pxe. It will get an ip and all that jazz but then it says MTFTP… and then sits there until it times out.

I played with my dnsmasq settings a little with no change.
They all have next-server and filename options set on the DHCP server and are added in fog. They worked fine before, what could have changed that would have caused this. Maybe a linux update?

Every other model computer we use boots to pxe just fine, even in different buildings, so that makes it a bit more confusing that just this one model and VMs don’t work.

Any help would be wonderful.

Thanks,
-JJ

Oh and other general info
Fog is running on ubuntu server 14.04 x64
apache web server
version 1.20 (I am not opposed to a svn release if there’s one that might fix this problem)

Tom Elliott

No changes, except it sounds like those systems are booting from UEFI now.

mecsr

The 8000 elite is about 4 years old, it doesn’t support uefi, I just double checked and I have the vm set to legacy mode.
Or, are you saying that UEFI iPxe boot is working in the latest svn?

Sebastian Roth

Please try capturing network traffic with wireshark to see if TFTP traffic is going back and forth…

Tom Elliott

What dhcp server are you using?

mecsr

@Uncle Frank - Like turn on wireshark on the computer that is hosting the vm and then monitor it?

@Tom Elliott - The DHCP server isn’t something I have much control over. I know it’s on a linux machine and I can make host specific options for next-server and filename. It’s managed by the college that my department is in.

I read something about option 43 relating to this, but I don’t know if that’s s something I can change.[url]http://scug.be/sccm/2011/01/13/configmgr-2007-pxe-boot-amp-mtftp-defaulting-and-make-you-wait-for-10-15-minutes/[/url]

The only thing that has changed is that the port the FOG server is on got upgraded to gigabit, so it may be going to a different switch in the upper infrastructure than it did before. But if that was the problem I would imagine that it would be on more than just 2 specific models that the problem would occur. Though I did read something about portfast needing to be enabled on the switch level, but I can’t find where that was again.

Another related problem that may be of help that only seems to apply to the 8000 elites. Before, when I had a new computer that wasn’t registered in fog, nor having a reserved ip address on the dhcp server, if I booted to network on the same switch as the fog server the dnsmasq proxy dhcp would get it to boot to fog no problem. Now, only on this one computer model that we have a ton of, it gives a a error that says something to the effect of “No proxy dhcp requests were received” And then something about port 4011. I’ll try and recreate the error later when I have the time to put up the exact message.

Thanks for all the help, I really appreciate it.

Thanks,
-JJ

mecsr

I did the wireshark thing it showed tftp being aborted with error code 59, which is of course undefined in wireshark.

I attached some screenshots to help visualize the problem and the tools I have to work with…

fyi
mefog is the hostname of the fog server and the ip is 10.2.114.238
10.2.114.174 is the ip of the vm I’m trying to boot to pxe.

Thanks again,
-JJ

[url=“/_imported_xf_attachments/1/1706_dhcp options.PNG?:”]dhcp options.PNG[/url][url=“/_imported_xf_attachments/1/1705_dhcp entry.PNG?:”]dhcp entry.PNG[/url][url=“/_imported_xf_attachments/1/1707_mtftp.PNG?:”]mtftp.PNG[/url][url=“/_imported_xf_attachments/1/1708_wireshark.PNG?:”]wireshark.PNG[/url]

Tom Elliott

Does undionly.0 exist “real” or symlinked?

mecsr

It’s a symlink

mecsr

I’m trying the latest svn to see if it makes a difference. And I’ll try pointing the filename to undionly.kpxe instead

mecsr

It didn’t make a difference. =(

Sebastian Roth

AFAIK ‘TFTP Aborted’ here just means that the client tricks the server to aquire the file size before requesting the whole file. Do you see any more TFTP traffic after this (maybe leave it for a few minutes)??

By the way, 59 is the size of the packet and NOT the error code!

For a more detailed explanation see here: [url]http://www.vercot.com/~serva/an/WindowsPXE1.html[/url] (8.7- Troubleshooting TFTP issues.)

mecsr

Oh yeah, I knew that about the packet size. I just haven’t used wireshark in a while, how silly of me.
I had left the packet capture on and noticed that there was some more tftp traffic about 10 minutes after the tftp abort.
It transfers 203 blocks one packet at a time and then send default.ipxe.
Since I wasn’t monitoring it when this happened I don’t know if that means it eventually did get to the boot screen.
So I started an upload task and stayed and watched this time.
It seems it does boot into it, yay! But it takes an extra 10 minutes on VM’s and the HP 8000 elite. And there’s the annoying proxy dhcp issue on the 8000 elite as well.
I’m not sure I’d quite call this resolved though, but I suppose having to wait an extra 10 minutes isn’t that big of a deal.

Thanks,
-JJ

Sebastian Roth

Interesting stuff… I’m sure I read something about 10 minute issue somewhere yesterday… Ah, found it! But it is in german I suppose: [url]https://forum.opsi.org/viewtopic.php?f=7&t=910[/url]
Try online translator or maybe I just give you some hints on what they are talking about. In the first post you see two packet dumps. First one is working perfectly, second one (pay attention to the timestamps) hast the 10 minute issue. And then in the last post the same guy says that he’s solved it and mentions a misconfiguration of their DHCP server. In his case vendor-specific options (option 43) were misinterpreted by the TFTP client and made it do this weird thing of timing out after 10 minutes before retrying again… hope that helps.

EDIT: Just found another post in a different forum: [url]https://social.technet.microsoft.com/forums/systemcenter/en-US/db17272b-68f1-4b18-a9b7-b0391bf846d8/pxe-booting-strange-problem[/url]
And Tom is right, it boils down to the question of who and why this options was configured?! We see quite a few people having TFTP issues lately. Maybe same issue…??

Tom Elliott

If 43 fixes it the question then becomes did you implement or has there been VoIP introduced in your network?

mecsr

Well after updating to the latest svn and watching and monitoring the packets as the mtftp thing took 10 minutes on a vm and then restarting the server, the 8000 elites magically started working again and it looks like VMs too.
Maybe option 43 was enabled somewhere I don’t have control over, it’s a common model in the college and some people using the same dhcp server use landesk for their imaging, maybe they were trying something with that. I’ll ask around, I don’t like magical solutions to problems other people might be looking for answers too.
Maybe I just had to let the mtftp run it’s course a few times?

As for VOIP, that does exist on our network, lots of ip phones, but it’s never caused a problem before.

Now I am having a new problem though. The answers probably somewhere else in these forums though. I’m now getting a can’t mount nfs error. Probably to do with updating FOG versions, oh and by the by I really like the new interface and I’m excited to try out this mac os X image option. But first I gotta solve this NFS not mounting thing.

Thanks for all your help. I love these FOG forums, so very helpful all the time.

-JJ

Sebastian Roth

[quote=“mecsr, post: 42909, member: 23886”]As for VOIP, that does exist on our network, lots of ip phones, but it’s never caused a problem before.[/quote]
Kind of weird isn’t it. Just another one of those “haven’t changed anything” issues. Just kidding…

[quote=“mecsr, post: 42909, member: 23886”]I’m now getting a can’t mount nfs error.[/quote]
Try mounting the shares from another computer and see if you get any errors…
[CODE]mkdir -p /mnt/nfstest && mount -v <server-ip>:/images /nfstest
umount /nfstest
mount -v <server-ip>:/images/dev /nfstest
umount /nfstest && rmdir /nfstest[/CODE]

mecsr

Thanks, I just tried that, but I noticed the same problem in another thread where someone is having the same problem, and where you told them to try the same thing. So I’m going to post the results over there
[url]http://fogproject.org/forum/threads/problem-with-permission-denied-after-fog-server-restart.12503/[/url]

Gotta help keep the forum organized and what not right. I’ll post here again if I here back about whether or not option 43 was touched by someone else.

Thanks again for all the help.

Pxe boot halts with MTFTP message on specific platforms

204

12.2k

17.3k

155.5k