Chainloading failed / boot looping
-
@gwhitfield said:
I’m fairly certain I PXE booted the 7010 UEFI using WDS to deploy the MDT image.
WDS doesn’t use iPXE.
-
@Wayne-Workman - Ahhh…they appeared similar enough in the netboot phase that I got the impression it was the same technology. I been duped.
-
@Wayne-Workman said:
Oh sht that’s funny!!!
#1- I gotta see that movie again. It feels like 20 years since…
#2 - I apologize. I know just enough to be dangerous and under the circumstances sometimes it’s hard to know what’s relevant to the discussion much less what’s important. -
@gwhitfield There are 5 of them now. They are all worth watching, in order, with the wife/gf.
-
@Wayne-Workman said:
@gwhitfield It’s missing kernel parameters.
I upgraded to trunk and no change. I was hoping the kernel parameters issue would be resolved automagically since I never intentionally made any changes to anything except putting different efi boot file names in DHCP. I looked and can tell there’s tons of info on kernel parameters and editing the boot menu but isn’t there a default setup that should be in place with a re-install? Do I need to edit something now?
-
@gwhitfield What boot file are you using at the moment? What mode (uefi/bios) is the target computer in right now? Is this a Optiplex 7010 or … something else?
We need to know all your testing details.
-
@Wayne-Workman - I’ll give up on the 7010, they’re just an experiment anyway. I’m currently testing on the ESXi 5.5 VM set to UEFI and E1000 nic, also testing with a Dell E5550 set to UEFI boot first from IPV4 nic and then Boot Manager. They both appear to be doing the same thing when being served the same boot file.
Scope options in DHCP are:
66 - IP of FOG server
67 - whatever file name I’m testing out.Boot file ipxe.efi - they ask “Please enter tftp server:”
Boot file snponly.efi - they ask “Please enter tftp server:”
Boot file snp.efi - they ask “Please enter tftp server:”
Boot file intel.efi - they ask “Please enter tftp server:”
Boot file realtek.efi - They say “DHCP failed, hit ‘s’ for shell, reboot in 10 seconds”
Boot file undionly.kpxe - E5550 flashes quickly and boots to OS, VM looks like below (I don’t have an UEFI disk in the system, just wanted to see the network boot okay.)
-
@gwhitfield If I’m not mistaken, the “Please enter tftp server” has to do with either another DHCP server conflicting or some IP helper thingy.
-
@gwhitfield As you said. Let’s try to focus on one issue! We won’t get the Dell 7010 to netboot with iPXE in UEFI mode any time soon I reckon. As George said he is able to netboot ESXi 5.5 VMs with iPXE on e1000 NIC setting just fine. So it can be done. Please stick to snponly.efi as George reported this works for him.
From your first picture we see that getting an IP via DHCP (and the next-server information) within iPXE has worked. What changed? I kind of doubt that upgrading to the latest FOG version broke this. But we’ll work this out. What happens if you enter the TFTP (FOG server) IP? Does it get you to the menu or fails as well?
I am still wondering about the output you got from accessing boot.php in your browser. Have you possibly changed your iPXE menu to be hidden? Check the settings in the web interface: FOG Configuration -> iPXE Boot Menu -> “Hide Menu” is checked???
@Quazz The message means that although iPXE was able to get an IP via DHCP it did not receive next-server (option 67) from the DHCP. Sounds kind of weird because we know that on the first try (PXE ROM getting IP and boot information) it got the information as it would not have been able to load the iPXE binary otherwise. Would be great to see a packet dump of a client/VM booting to that message. Wireshark on the DHCP server (display filter:
bootp || tftp
) would be great! -
I can say I would also want to know what is going on in this condition, where ipxe is getting partial or no dhcp information. Which is a bit crazy since the PXE rom was able to load the iPXE kernel from the boot server using the boot file value. This is not the first time I heard of this situation. I don’t know if this condition is because of a potential dhcp proxy server is in the environment, or because of a slow network link causing the port to not go into the forwarding state until sometime after the ipxe kernel needs it. While this isn’t really a FOG issue, it does tend to color the perception that FOG is not ready for production use.
-
@Sebastian-Roth - Current boot menu settings:
I did (and do) have the boot menu hidden but when I un-hide it I do get the menu after entering the FOG IP. Then it fails. I did make sure of the e1000 NIC and snponly.efi settings. This environment has a 2012 Standard server doing DHCP to approx 75 BIOS machines (no proxy). This UEFI VM is only used for testing in preparation for adding UEFI to the mix this Fall. Therefore I have the policies and options set to allow BIOS and UEFI machines to grab their own boot files which works very well for the BIOS machines. Seems like I’m almost there. I have other FOG servers doing the same thing but they’re 2008 boxes and I can’t set policies so I have to leave them alone or face the wrath of a lot of people not being able to boot their BIOS machines.
@george1421 - Having relied HEAVILY on FOG for many years I can say that my perception of FOG is rose colored! Its all just a little bump in the road, probably of my own doing rather than FOG’s. -
@gwhitfield Just for clarity these two environment you mentioned (2008 dhcp and 2012 dhcp) are in different broadcast domain and subnets?
As Sebastian said, the next step is to get a pcap of the communication between the target and dhcp server to see what is going on with this second stage dhcp request. The first stage request is working since the ipxe kernel is making it to the target computer, its just when the ipxe kernel issues a dhcp request the dhcp server is not issuing the option 66 value corectly.
The preferred way to to setup wireshark on a mirrored port. Since the dhcp communications are broadcasts you can pick up this information from any location in the same broadcast domain. If your fog server is on the same subnet as the target computer, you can install tcpdump on your fog server and pick up that traffic too. This would get all of the broadcast traffic plus any unicast communication between the target and the fog server.
-
@gwhitfield said:
I did (and do) have the boot menu hidden but when I un-hide it I do get the menu after entering the FOG IP. Then it fails.
Could you please be more specific on how things fail?? Which item do you select from the menu and what happens then? Do you try to boot from local disk? Maybe change the “Exit to Hard Drive Type (EFI)” (seen in your screenshot) and see if that works. Have you actually tried scheduling a task for this VM? What happens if you do so? Please let us know the exact errors you see (picture if possible)!
As well I am still happy to have a look at the PCAP file to see what’s causing the “enter tftp server” hickup…
-
@george1421 - The 2008dhcp and 2012 dhcp are all different locations with different subnets and broadcast domain. exported tcpdump (filtered as suggested) from FOG server : 0_1456840985605_GBfogboot.csv
Never used tcpdump or wireshark, will need to bring in a buddy to assist with a wireshark capture if you still want one.
Did I say THANK YOU" for your help?! -
@gwhitfield The CSV is a good start! I think I can see some weirdness already but unfortunately CSV is missing the most important bits of information. Try
tcpdump -w output.pcap port 67 or port 68 or port 69 or host 192.168.120.135
on your FOG server. Make sure your client is actually getting the IP 192.168.120.135 fro your DHCP server. This way we can also see the clients’ HTTP request. Might be helpful as well. -
@Sebastian-Roth here’s the output. IP 120.135 confirmed
0_1456847693267_output.pcap -
@gwhitfield Your DHCP server is actually offering different information depending on the request being sent by the client. The first DHCP DORA (discovery, offer, request, ack) sequence issues by the VMs PXE ROM comes with all the PXE info (next-server/option 66: 192.168.120.19 and filename/option 67: snponly.efi) included. Seams fine. Then the iPXE binary is loaded via TFTP and sends its DHCP discovery request. The request looks a bit different from the first one (that’s normal for iPXE!) as is provides option 175 and some other things.
Hmmmmmmmm here I noticed something that might cause the issue. In the first request the client sends vendor class identifier “PXEClient:Arch:00007:UNDI:003016” but the iPXE binary sends “PXEClient:Arch:00009:UNDI:003010”. See the difference in arch. I guess you setup vendor classes to match ID 7 only? Those classes are still a mystery to me. Some UEFI firmwares send 7 others 9 and iPXE might do 7 or 9 as well. I guess that it somehow changed when you updated to the latest iPXE binaries.
So back to what happens next: The answer from your DHCP server comes without any PXE information whatsoever - most probably caused by the class mismatch just mentioned I hope. This is why iPXE does not find the next-server/tftp server IP by itself. -
Again: Have you ever tried registering this MAC address in the FOG web interface by hand and scheduling an upload task for it? What happens when you PXE boot the client then? Picture or video of an error would be great. Otherwise I can only guess what’s going on.
-
@Sebastian-Roth said:
Hmmmmmmmm here I noticed something that might cause the issue. In the first request the client sends vendor class identifier “PXEClient:Arch:00007:UNDI:003016” but the iPXE binary sends “PXEClient:Arch:00009:UNDI:003010”. See the difference in arch. I guess you setup vendor classes to match ID 7 only? Those classes are still a mystery to me. Some UEFI firmwares send 7 others 9 and iPXE might do 7 or 9 as well. I guess that it somehow changed when you updated to the latest iPXE binaries.
We may need to update the wiki to be sure to include all arch settings. I see it lists this for the Linux dhcp, but not for the windows 2012 setup (step 3). @Wayne-Workman
-
@george1421 said:
We may need to update the wiki to be sure to include all arch settings. I see it lists this for the Linux dhcp, but not for the windows 2012 setup (step 3). @Wayne-Workman
The steps are the same for all architecture types - you’d just change the number in step 3 and then maybe give the names something that is specific to the arch you have setup.
That said - I also understand that someone who doesn’t understand it already will be totally lost for how to set it up for additional architectures. So we do need more steps. Maybe even a video.
wiki
Also - in case anyone is wondering what the heck we are talking about, we are talking about this: https://wiki.fogproject.org/wiki/index.php?title=BIOS_and_UEFI_Co-Existence