Host Startup; Booting into LVM Disk Fails
-
@dholtz-docbox one of the two are incorrect.
Lets go this way. Your dhcp server is .10.1 so then is your fog/dnsmasq server .1.42?
If that is the case both IP addresses below need to point to your FOG/dnsmasq server or the target will not pxe boot. In this setup dnsmasq is only supplying the {next server} and {boot file} and not an ip address that comes from your dhcp server.
-
@george1421 : Yeah, I noticed that - that PXE won’t boot this way, which makes sense. I reverted the changes. I guess what I have been trying to figure out is what Wayne mentioned, that the, “Duplicate option 66 (next server) from DHCP proxy and DHCP server,” message is present. So far, everyone of my network captures has yielded the aforementioned results. Where the only thing that strikes me as odd is the fact that both 10.1.10.1 and 10.1.10.42 are trying to make an offer. That said, only 10.1.10.1 ACK’s, but it has no boot filename or anything. If everything is setup the way it should be, 10.1.10.1 should be supplying “undionly.kpxe” in its boot filename, right?
Edit> Oh, re-reading your previous post, and reading what I wrote again, something might have clicked. So… 10.1.10.1 is serving the IP and 10.1.10.42 is serving the next-server, which is its IP, 10.1.10.42, correct? So… this would be typical behavior, if that is all correct… Which brings me back to not being sure why there is a duplicate option 66.
-
@dholtz-docbox Understand I did not read the entire thread so I’m not sure the root of your issue (sorry very busy today). But using dhcpProxy (dnsmasq) you will see two offers. But they are offering different things. (you can see that if you dig into the packet payload). You should get the ack from your dhcp server (which is what you are seeing). That is locking in the address for the client. 1.42 is the dhcpPROXY so it will be supplying dhcp options 66 and 67 (if you set it up correctly). The payload of dhcp 66 and 67 must point to your FOG server because that is where its getting the iPXE boot file.
-
When I’m debugging pxe booting I like to use this command from the FOG server (assuming the fog server, target computer and dhcp server are on the same subnet)
tcpdump -w output.pcap port 67 or port 68 or port 69 or port 4011
since dhcp is broadcast based any computer can pick this up, but being done from the fog server you will get the unicast dhcpProxy (4011) and the tftp (69) communications. If you want to do this with tcpdump and then boot the target computer to the error and then post the pcap here (which you can also look at with wireshark) I can tell you if its correct or not.BUT, if you are getting to the FOG iPXE menu then this is not your problem. Because getting to the iPXE menu is where the dhcp/pxe process stops and then transitions over to the iPXE kernel which is used to load the FOG Engine (the customized linux OS that captures and deploys images on the target comnputer).
-
@george1421 : Absolutely - I just want to make sure I put the right information out there in light of that.
Will Wireshark explicitly show options 66 and 67? I guess I don’t know how to validate that “the payload of DHCP 66 and 67” are correct. My assumption is that 66 is the “Next server IP address” and that 67 is “Boot file name”.
-
@george1421 : I do get into the FOG iPXE menu, so I guess… that’s good to know. It seems to be related to the kernel then…? Which I believe was a path I was on earlier, but wasn’t sure where I was going with it at the time.
-
@dholtz-docbox OK while this is just a picture of the pcap, I can see that who ever sent this packet (just off the screen) is sending the next server (option 66) to 10.1.10.42 (hopefully your fog server) an next server (dhcp option 67) as undionly.kpxe this is a proper dhcp offer response from dnsmasq in my opinion.
-
@george1421 : Thank you for clarifying that, that was my suspicion in the end too.
Also, thank you for taking the time to revisit this topic. I know I am close, given what successes I have had so far.
-
@dholtz-docbox said in Host Startup; Booting into LVM Disk Fails:
@george1421 : I do get into the FOG iPXE menu, so I guess… that’s good to know. It seems to be related to the kernel then…? Which I believe was a path I was on earlier, but wasn’t sure where I was going with it at the time.
You are correct then, its not a pxe/dhcp issue. I guess I need to read the thread now.
-
@george1421 OK, I’m being super lazy now. What is the current issue then?
-
Well, the good news there is that we chopped off a pretty huge segment of what I thought to be wrong. Which was certainly a big question of mine initially. So, pending a few screenshots I linked, I don’t think there is much to catch up on
-
@george1421 : Haha, okay, let’s start over, knowing what we know now…
-
@dholtz-docbox If you are capturing a linux image that uses LVM, the last I knew LVM disk structure wasn’t supported. Understand I don’t work with cloning linux systems so I can’t give you first hand experience with them. But the last I knew only normal partitions are supported.
@Developers Is LVM now supported with FOG 1.3.0RCx?
-
After discussing what I have, the current issue is as follows…
My system has two (2) drives; the first drive /dev/sda is not formatted or touched, and the second drive /dev/sdb is formatted with Ubuntu 14.04.5 - Server using LVM. There is nothing complex about the LVM partition - other than it being LVM - as it only contains one primary partition currently - along with any of its other typical partitions.
This is where the issue comes into play.
I have the host registered with the FOG server, and all I am trying to do is boot the host into its system through PXE. However, every time it goes to boot into the drive, it either hangs - if using SANBOOT - or hits a chainloading loop - when using EXIT.
I am told the chainloading loop is easier to deal with than the SANBOOT issue.
Since then, I have been trying to narrow down various things, and DHCP configuration was the first to square away.
-
@george1421 Just thinking after I hit the submit button, if you deploy to a target computer and of course it will fail on reboot, but then go into FOG and schedule another deployment , but be sure you pick DEBUG deploy option. Then pxe boot the target computer. That will boot you into the FOG Engine and then drop you to a command prompt after pressing enter a few times. From there you can see check the disk using standard linux tools to see if you can detect anything wrong with the disk structure.
-
@george1421 to bring you up to speed, scroll down and look at the big photo he posted. in there, iPXE is saying: “Duplicate option 66 (next server) from DHCP proxy and DHCP server.”
This is the least of the problems though, for some reason the iPXE boot script isn’t getting generated right by the fog server.
-
@Wayne-Workman : Oh! Right - that was the other concern of mine I have on my whiteboard. Thanks for bringing that up again. Should I just… delete /var/www/fog and call .foginstall again on the FOG Server? Or what would be the best way to invalidate this?
-
@dholtz-docbox Looking at a thread from a while ago here:
https://forums.fogproject.org/topic/8736/chainloading-failedAssign an image to this host - it doesn’t need to be a legit image, just make one up in Image Management, and assign it. See if the chainloading error goes away or not.
-
I also wanted to update you on this…
I believe some services needed to be restarted after upgrading to RC-14, or something needed to be restarted. But after looking at the DHCP issues and talking w/ George, I have queried it again, for EXIT mode, and now I get…
Edit> I removed the output. It was misleading, because I had the PXE menu enabled. George and I are talking, and when looking at the SANBOOT option, it tries to boot into 0x80 instead of 0x81, which would be the second drive - which is the drive I am looking to boot from.
My SANBOOT output is…
#!ipxe set fog-ip 10.1.10.42 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} sanboot --no-describe --drive 0x80
The EXIT output is still the same, with the last line being exit.
-
Immediately after calling the link I gave you earlier, check the apache error logs for entries from the moment prior.
Web Interface -> FOG Configuration -> Log Viewer -> Apache Error
Sort by newest first, be aware of the timestamps. Copy/paste what you find.