Chainloading failed / boot looping
-
@gwhitfield Have you watched all the movies in the “Tremors” franchise?
-
The root if the issue here is (the Dell 7010). The 7010 will not completely boot the iPXE kernel. This is needed for FOG to function correctly. Its not an issue with FOG, but a compatibility issue between a component that FOG uses (iPXE) and the Dell 7010 EFI firmware. When you MDT boot you can use either cdrom or usb flash drive. This process doesn’t use iPXE so I would expect this to work. I can also boot the 7010 into the FOG Live debugger using a usb (similar to the way MDT boots). But again this doesn’t use iPXE to boot.
What needs to happen.
- Dell fixes their silly firmware
- The iPXE guys can work out a method to patch iPXE to work around this issue.
- You switch back to BIOS booting for this hardware platform.
-
@Wayne-Workman -only saw the first one, enjoyed it immensely for no good reason.
-
@gwhitfield said:
I neglected to mention that I did have a positive UEFI boot and was able to upload from and re-deploy a test image to a Dell E5550 a few days ago using snponly.efi
-
When you MDT boot you can use either cdrom or usb flash drive. This process doesn’t use iPXE so I would expect this to work.
I’m fairly certain I PXE booted the 7010 UEFI using WDS to deploy the MDT image. Only reason I’m uncertain at all is simply how many different imaging related projects I have going on… I will be able to test/confirm later today or tomorrow.
-
@gwhitfield said:
I’m fairly certain I PXE booted the 7010 UEFI using WDS to deploy the MDT image.
WDS doesn’t use iPXE.
-
@Wayne-Workman - Ahhh…they appeared similar enough in the netboot phase that I got the impression it was the same technology. I been duped.
-
@Wayne-Workman said:
Oh sht that’s funny!!!
#1- I gotta see that movie again. It feels like 20 years since…
#2 - I apologize. I know just enough to be dangerous and under the circumstances sometimes it’s hard to know what’s relevant to the discussion much less what’s important. -
@gwhitfield There are 5 of them now. They are all worth watching, in order, with the wife/gf.
-
@Wayne-Workman said:
@gwhitfield It’s missing kernel parameters.
I upgraded to trunk and no change. I was hoping the kernel parameters issue would be resolved automagically since I never intentionally made any changes to anything except putting different efi boot file names in DHCP. I looked and can tell there’s tons of info on kernel parameters and editing the boot menu but isn’t there a default setup that should be in place with a re-install? Do I need to edit something now?
-
@gwhitfield What boot file are you using at the moment? What mode (uefi/bios) is the target computer in right now? Is this a Optiplex 7010 or … something else?
We need to know all your testing details.
-
@Wayne-Workman - I’ll give up on the 7010, they’re just an experiment anyway. I’m currently testing on the ESXi 5.5 VM set to UEFI and E1000 nic, also testing with a Dell E5550 set to UEFI boot first from IPV4 nic and then Boot Manager. They both appear to be doing the same thing when being served the same boot file.
Scope options in DHCP are:
66 - IP of FOG server
67 - whatever file name I’m testing out.Boot file ipxe.efi - they ask “Please enter tftp server:”
Boot file snponly.efi - they ask “Please enter tftp server:”
Boot file snp.efi - they ask “Please enter tftp server:”
Boot file intel.efi - they ask “Please enter tftp server:”
Boot file realtek.efi - They say “DHCP failed, hit ‘s’ for shell, reboot in 10 seconds”
Boot file undionly.kpxe - E5550 flashes quickly and boots to OS, VM looks like below (I don’t have an UEFI disk in the system, just wanted to see the network boot okay.)
-
@gwhitfield If I’m not mistaken, the “Please enter tftp server” has to do with either another DHCP server conflicting or some IP helper thingy.
-
@gwhitfield As you said. Let’s try to focus on one issue! We won’t get the Dell 7010 to netboot with iPXE in UEFI mode any time soon I reckon. As George said he is able to netboot ESXi 5.5 VMs with iPXE on e1000 NIC setting just fine. So it can be done. Please stick to snponly.efi as George reported this works for him.
From your first picture we see that getting an IP via DHCP (and the next-server information) within iPXE has worked. What changed? I kind of doubt that upgrading to the latest FOG version broke this. But we’ll work this out. What happens if you enter the TFTP (FOG server) IP? Does it get you to the menu or fails as well?
I am still wondering about the output you got from accessing boot.php in your browser. Have you possibly changed your iPXE menu to be hidden? Check the settings in the web interface: FOG Configuration -> iPXE Boot Menu -> “Hide Menu” is checked???
@Quazz The message means that although iPXE was able to get an IP via DHCP it did not receive next-server (option 67) from the DHCP. Sounds kind of weird because we know that on the first try (PXE ROM getting IP and boot information) it got the information as it would not have been able to load the iPXE binary otherwise. Would be great to see a packet dump of a client/VM booting to that message. Wireshark on the DHCP server (display filter:
bootp || tftp
) would be great! -
I can say I would also want to know what is going on in this condition, where ipxe is getting partial or no dhcp information. Which is a bit crazy since the PXE rom was able to load the iPXE kernel from the boot server using the boot file value. This is not the first time I heard of this situation. I don’t know if this condition is because of a potential dhcp proxy server is in the environment, or because of a slow network link causing the port to not go into the forwarding state until sometime after the ipxe kernel needs it. While this isn’t really a FOG issue, it does tend to color the perception that FOG is not ready for production use.
-
@Sebastian-Roth - Current boot menu settings:
I did (and do) have the boot menu hidden but when I un-hide it I do get the menu after entering the FOG IP. Then it fails. I did make sure of the e1000 NIC and snponly.efi settings. This environment has a 2012 Standard server doing DHCP to approx 75 BIOS machines (no proxy). This UEFI VM is only used for testing in preparation for adding UEFI to the mix this Fall. Therefore I have the policies and options set to allow BIOS and UEFI machines to grab their own boot files which works very well for the BIOS machines. Seems like I’m almost there. I have other FOG servers doing the same thing but they’re 2008 boxes and I can’t set policies so I have to leave them alone or face the wrath of a lot of people not being able to boot their BIOS machines.
@george1421 - Having relied HEAVILY on FOG for many years I can say that my perception of FOG is rose colored! Its all just a little bump in the road, probably of my own doing rather than FOG’s. -
@gwhitfield Just for clarity these two environment you mentioned (2008 dhcp and 2012 dhcp) are in different broadcast domain and subnets?
As Sebastian said, the next step is to get a pcap of the communication between the target and dhcp server to see what is going on with this second stage dhcp request. The first stage request is working since the ipxe kernel is making it to the target computer, its just when the ipxe kernel issues a dhcp request the dhcp server is not issuing the option 66 value corectly.
The preferred way to to setup wireshark on a mirrored port. Since the dhcp communications are broadcasts you can pick up this information from any location in the same broadcast domain. If your fog server is on the same subnet as the target computer, you can install tcpdump on your fog server and pick up that traffic too. This would get all of the broadcast traffic plus any unicast communication between the target and the fog server.
-
@gwhitfield said:
I did (and do) have the boot menu hidden but when I un-hide it I do get the menu after entering the FOG IP. Then it fails.
Could you please be more specific on how things fail?? Which item do you select from the menu and what happens then? Do you try to boot from local disk? Maybe change the “Exit to Hard Drive Type (EFI)” (seen in your screenshot) and see if that works. Have you actually tried scheduling a task for this VM? What happens if you do so? Please let us know the exact errors you see (picture if possible)!
As well I am still happy to have a look at the PCAP file to see what’s causing the “enter tftp server” hickup…
-
@george1421 - The 2008dhcp and 2012 dhcp are all different locations with different subnets and broadcast domain. exported tcpdump (filtered as suggested) from FOG server : 0_1456840985605_GBfogboot.csv
Never used tcpdump or wireshark, will need to bring in a buddy to assist with a wireshark capture if you still want one.
Did I say THANK YOU" for your help?! -
@gwhitfield The CSV is a good start! I think I can see some weirdness already but unfortunately CSV is missing the most important bits of information. Try
tcpdump -w output.pcap port 67 or port 68 or port 69 or host 192.168.120.135
on your FOG server. Make sure your client is actually getting the IP 192.168.120.135 fro your DHCP server. This way we can also see the clients’ HTTP request. Might be helpful as well.