Chainloading failed / boot looping
-
@gwhitfield said:
I did (and do) have the boot menu hidden but when I un-hide it I do get the menu after entering the FOG IP. Then it fails.
Could you please be more specific on how things fail?? Which item do you select from the menu and what happens then? Do you try to boot from local disk? Maybe change the “Exit to Hard Drive Type (EFI)” (seen in your screenshot) and see if that works. Have you actually tried scheduling a task for this VM? What happens if you do so? Please let us know the exact errors you see (picture if possible)!
As well I am still happy to have a look at the PCAP file to see what’s causing the “enter tftp server” hickup…
-
@george1421 - The 2008dhcp and 2012 dhcp are all different locations with different subnets and broadcast domain. exported tcpdump (filtered as suggested) from FOG server : 0_1456840985605_GBfogboot.csv
Never used tcpdump or wireshark, will need to bring in a buddy to assist with a wireshark capture if you still want one.
Did I say THANK YOU" for your help?! -
@gwhitfield The CSV is a good start! I think I can see some weirdness already but unfortunately CSV is missing the most important bits of information. Try
tcpdump -w output.pcap port 67 or port 68 or port 69 or host 192.168.120.135
on your FOG server. Make sure your client is actually getting the IP 192.168.120.135 fro your DHCP server. This way we can also see the clients’ HTTP request. Might be helpful as well. -
@Sebastian-Roth here’s the output. IP 120.135 confirmed
0_1456847693267_output.pcap -
@gwhitfield Your DHCP server is actually offering different information depending on the request being sent by the client. The first DHCP DORA (discovery, offer, request, ack) sequence issues by the VMs PXE ROM comes with all the PXE info (next-server/option 66: 192.168.120.19 and filename/option 67: snponly.efi) included. Seams fine. Then the iPXE binary is loaded via TFTP and sends its DHCP discovery request. The request looks a bit different from the first one (that’s normal for iPXE!) as is provides option 175 and some other things.
Hmmmmmmmm here I noticed something that might cause the issue. In the first request the client sends vendor class identifier “PXEClient:Arch:00007:UNDI:003016” but the iPXE binary sends “PXEClient:Arch:00009:UNDI:003010”. See the difference in arch. I guess you setup vendor classes to match ID 7 only? Those classes are still a mystery to me. Some UEFI firmwares send 7 others 9 and iPXE might do 7 or 9 as well. I guess that it somehow changed when you updated to the latest iPXE binaries.
So back to what happens next: The answer from your DHCP server comes without any PXE information whatsoever - most probably caused by the class mismatch just mentioned I hope. This is why iPXE does not find the next-server/tftp server IP by itself. -
Again: Have you ever tried registering this MAC address in the FOG web interface by hand and scheduling an upload task for it? What happens when you PXE boot the client then? Picture or video of an error would be great. Otherwise I can only guess what’s going on.
-
@Sebastian-Roth said:
Hmmmmmmmm here I noticed something that might cause the issue. In the first request the client sends vendor class identifier “PXEClient:Arch:00007:UNDI:003016” but the iPXE binary sends “PXEClient:Arch:00009:UNDI:003010”. See the difference in arch. I guess you setup vendor classes to match ID 7 only? Those classes are still a mystery to me. Some UEFI firmwares send 7 others 9 and iPXE might do 7 or 9 as well. I guess that it somehow changed when you updated to the latest iPXE binaries.
We may need to update the wiki to be sure to include all arch settings. I see it lists this for the Linux dhcp, but not for the windows 2012 setup (step 3). @Wayne-Workman
-
@george1421 said:
We may need to update the wiki to be sure to include all arch settings. I see it lists this for the Linux dhcp, but not for the windows 2012 setup (step 3). @Wayne-Workman
The steps are the same for all architecture types - you’d just change the number in step 3 and then maybe give the names something that is specific to the arch you have setup.
That said - I also understand that someone who doesn’t understand it already will be totally lost for how to set it up for additional architectures. So we do need more steps. Maybe even a video.
wiki
Also - in case anyone is wondering what the heck we are talking about, we are talking about this: https://wiki.fogproject.org/wiki/index.php?title=BIOS_and_UEFI_Co-Existence
-
@Wayne-Workman said:
That said - I also understand that someone who doesn’t understand it already will be totally lost for how to set it up for additional architectures. So we do need more steps.
Maybe just add some text under step three to rinse wash as repeat for “PXEClient:Arch:00002”, “PXEClient:Arch:00006”, “PXEClient:Arch:00008” and “PXEClient:Arch:00009”. If you don’t read through the linux section, the windows admins would not know these values are also required. (and for full disclosure, I did not create them either. Will do now…)
-
@Sebastian-Roth said:
Again: Have you ever tried registering this MAC address in the FOG web interface by hand and scheduling an upload task for it? What happens when you PXE boot the client then? Picture or video of an error would be great. Otherwise I can only guess what’s going on.
Yes, the MAC is registered and an image task runs if/when set. It appears that if there’s no task it doesn’t go to the HD as the next option.
-
@gwhitfield said:
Yes, the MAC is registered and an image task runs if/when set. It appears that if there’s no task it doesn’t go to the HD as the next option.
That should be your exit condition. (i.e. sanboot, grub, exit, etc). Some systems have different exit conditions I’m sorry to say.
-
@george1421 - For the time being I decided to simplify and set the Policies to:
1: Arch=00000 (BIOS) - undionly.kpxe (works great)
2. Arch<>00000 (all other) - snponly.efi
I was hoping that this would tell anything not reporting as a BIOS machine to get the same boot file regardless of architecture. Doesn’t seem to have worked. -
@gwhitfield That won’t work for 32 bit UEFI systems.
-
@gwhitfield Yeah I agree with wayne, plus I don’t think wild card matches are supported. you have to spell out each one to get a match. No windows cheating here.
-
Hold horses. I just changed boot order to look at some image details and the VM won’t boot to the HD. I think my test VM got reverted to a BIOS image on a UEFI disk. I’m guessing that’s the explanation for chainloading failure. Reinstalling OS now, should know shortly…
-
@gwhitfield - Nope, not it. Correcting image on the disk didn’t change anything. However, at some point recently I stopped being prompted for the IP address and I hope that’s a good thing, I’m pretty sure that happened when I changed the DHCP policies. I remember with some of our older FOG machines we had to correct the chainloading by adding/editing some files. Is this possibly as simple as that ?? I did try all the different exit conditions and they all respond exactly the same way.
-
@gwhitfield - another output after reinstall OS and changing DHCP policy:
0_1456859580183_output2.pcap -
@gwhitfield Ok, the PCAP file looks good now from my point of view! DHCP setup correct now I reckon! Reading through the whole post again I think the exit type is the only issue you have on your VM really. Booting into a task works fine as you said. @george1421 Which exit type do you use on your ESXi 5.5??
So we can go back to see if you can get things working with you Dell E5550 as well. You know that you can set exit type for each host. So play with those settings! If you cannot make it work on the E5550 you can try other iPXE binaries (and setup you DHCP to hand out the binaries depending on the clients’ MAC address). Probably best to open a complete new discussion on PXE booting E5550 in UEFI mode if you can’t make it work right away.
-
@Sebastian-Roth
Funny, no matter what I set as my exit type, I see this as HD boot instruction:
“:fog.local
chain -ar ${boot-url}/service/ipxe/grub.exe --config-file=“rootnoverify (hd0);chainloader +1” || goto MENU”but when I look at Wayne’s upload I see this:
:fog.local
sanboot --no-describe --drive 0x80 || goto MENUI would expect to see a changed response each time I change exit type.
Edit: Actually, the parameters for boot only change when I change the BIOS exit type, they don’t change when I change the EFI exit type. Maybe I have a corrupt file of some sort?
-
This works for my VM - not that I understand why nor do I know if it’s a sign something is still wrong with my setup. My OS comes up if no tasks are pending and I am able to upload and deploy an image. Haven’t tried multicasting but I’m going with the assumption for now that it will work fine too. Current boot file is ipxe.efi but have also tested fine on snponly.efi and snp.efi.
I will work on the Dell E5550 laptop tomorrow and let you know what happens. THANK YOU EVERYONE!!! (Edit 3/2/16: E5550 works just fine)