Using a KVM/Libvirt VM as Master Image- Odd pxe boot issues
We have some VMS which we use to image for a bunch of machines in our training room. We updates those and refresh the master image on FOG from them. There are some differences in how the PXE boot works with the VMs compared to the physical machines:
For an Ubuntu VM we have to intervene and press F8 to get pxe boot to happen. If not the pxe boot times out and the vms boots. On our physical machines doing nothing will cause the PXE environment to boot without issue.
We have one Windows XP machine which is used for the trainers machines as there is one course which requires PDF slides that need one to authenticate to a secure server ever time they are displayed (I know its really stupid but its not up to us.) Unfortunately this features doesn’t work in the PDF viewers on Linux so we use the Windows XP vm to display the slides. This VM won’t boot to PXE no matter what. The PXE boot process starts but it always fails to find anything and ends up booting the VM.
Any ideas why there are these different behavious? I should imagine that once the bios start booting PXE there shouldn’t be any difference in the process of locating the pxe image file, downloading it and booting to the fog menu as this all happens before the OS boots up?
@mxc bumping this…
@mxc Any news on this?? PCAP file?
Wayne Workman last edited by
@mxc It could be something as simple as an IP conflict or a bad patch cable. Check the simple things first.
I know this is going to sound weird but the Linux KVM image started failing to find the tftp server as well. Once I set up tcpdump to investigate on the fog server it all started working.
I tried the same with the windows image and now its working too. Only straw I can cluck at here is latency of some kind in the vm network layer. Maybe by delaying the boot up it gave the dhcp server time to respond?
Will see what I experience going forward and try and build up a conceptual model of what is going on so I can test and verify.
I think I was wrong in my understanding that the Ubuntu instance times out. I have been trying so many things while also doing a full days work I get a bit lost :)
Will do the wireshark dump tomorrow.
The F12 is the option that SEABIOS uses to allow one to interrupt the boot process to choose which device to boot. It can be configured to boot straight into PXE mode if desired.
I think I may have been mistaken on the need to push F8. The F8 settings come from my dnsmasq configuration file. I am not sure if its needed but it was added a year ago to get things working so its just been left there.
So in short the Ubuntu PXE boot in the vm works fine.
The windows vm is on the same infrastructure. It appears never to download anything from the tftp server. I will run a tcpdump on the tftp server tomorrow and see what gives.
Thanks for the help.
Since I don’t know KVM you may need to explain with your issue #1, what function does F8 perform. When pxe times out what is the error code displayed? The error should be something like PXE-EXX. I can say for ESXi, setting to pxe boot happens every time without having to touch the keyboard.
Along with what Sebastian said, some bios has the option to do different things when being woke up by WOL. On some bios you can configure it to PXE boot when it receives a WOL command but boot normally on a traditional power up/resume.
For your second issue, is this on the same visualization infrastructure as issue #1. If you remove the OS from the equation are both VMs acting exactly the same? When you say they fail, what are they doing?
While I don’t think this is a factor here yet (you are not booting yet), what version of FOG are you using. If you are using a newer version of FOG, look at the FOG management screen, there is a cloud in the upper part of the screen. What numbers do you see in that cloud?
@mxc Is Wake-on-LAN used when those things happen? Weird question but I’ve seen machines that react differently (PXE vs. local disk) when started by WOL… I guess this is not the case for you but just asking to make sure.
Are you absolutely sure the PXE times out on the ubuntu VMs? Can you take a screenshot/picture/video of the error you see when it times out?
The best would be if you’d install wireshark on your VM server machine. Then capture the network packets going between the client and the FOG server. Wireshark display filter
bootp || tftpand then save as a PCAP file. We can have a look and hopefully will find out what’s going on. I’ve looked at dozens of PCAP files over the years and I pretty much always found out what is causing the issue. If you server is a linux host you can use
tcpdump -i <network-interface> -w file.pcap port 67 or port 68 or port 69on the console.