Host Startup; Booting into LVM Disk Fails
-
Unfortunately, I have to move forward with the LVM installation for now. Given the time until our next release, it is too risky to greenlight the change to EXT. That said, I had the same issue with both SANBOOT and EXIT w/ the EXT install, so the problem is similar I feel - wild guess.
I was also hopeful that installing EXT would eliminate the extra boot options in my BIOS. It seems that was not the case, so maybe they are related to Server Edition instead? I just find it odd that, when installing Ubuntu to the disk, that I can no longer boot from the disk and must boot from one of those Ubuntu options. Which one also has relevance, as the other one appears to have no operating system on it. Perhaps it’s the swap?
As far as the other information…
- host mac => 08:60:6e:fa:05:af
- server ip => 10.1.10.42
and here’s an image of the boot output…
-
@dholtz-docbox Ok, in your browser click on this link and give us a copy of the output:
10.1.10.42/fog/service/ipxe/boot.php?mac=08:60:6e:fa:05:af
It appears you have your DHCP server setup for FOG, and you also have dnsmasq setup for FOG? This isn’t necessary. Choose one or the other, I’d suggest using the full DHCP server.
And what boot file are you using again? I don’t think you listed that earlier. You can quickly find this just by looking at what you have set in DHCP option 067.
-
#!ipxe set fog-ip 10.1.10.42 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} exit
-
As far as whether I have DHCP setup for FOG and DNSMasq, I should only have DNSMasq? Is there a way to validate this? The .fogsettings for the server are as follows…
## Start of FOG Settings ## Created by the FOG Installer ## Version: 1.3.0-RC-11 ## Install time: Fri 30 Sep 2016 05:26:11 PM EDT ipaddress='10.1.10.42' interface='eth0' submask='255.255.255.0' routeraddress='10.1.10.1' plainrouter='10.1.10.1' dnsaddress='# No dns added' username='fog' password='{password}' osid='2' osname='Debian' dodhcp='N' bldhcp='0' dhcpd='' blexports='1' installtype='N' snmysqluser='root' snmysqlpass='' snmysqlhost='localhost' installlang='0' donate='0' storageLocation='/images' fogupdateloaded=1 docroot='/var/www/' webroot='/fog/' caCreated='yes' startrange='' endrange='' bootfilename='undionly.kpxe' packages='apache2 bc build-essential cpp curl g++ gawk gcc gzip htmldoc lftp libapache2-mod-php5 libc6 libcurl3 m4 mysql-client mysql-server net-tools nfs-kernel-server openssh-server php5 php5-cli php5-curl php5-fpm php5-gd php5-json php5-ldap php5-mcrypt php5-mysqlnd php-gettext sysv-rc-conf tar tftpd-hpa tftp-hpa vsftpd wget xinetd zlib1g ' noTftpBuild='' notpxedefaultfile='' sslpath='/opt/fog/snapins/ssl/' backupPath='/home/' php_ver='5' php_verAdds='-5.6' sslprivkey='/opt/fog/snapins/ssl//.srvprivate.key' ## End of FOG Settings
Then I have the following dnsmasq.conf…
# Don't function as a DNS server: port=0 # Log lots of extra information about DHCP transactions. log-dhcp # Set the root directory for files available via FTP. tftp-root=/tftpboot # The boot filename, Server name, Server Ip Address dhcp-boot=undionly.kpxe,,10.1.10.42 # Disable re-use of the DHCP servername and filename fields as extra # option space. That's to avoid confusing some old or broken DHCP clients. dhcp-no-override # PXE menu. The first part is the text displayed to the user. The second is the timeout, in seconds. pxe-prompt="Booting FOG Client", 0 # The known types are x86PC, PC98, IA64_EFI, Alpha, Arc_x86, # Intel_Lean_Client, IA32_EFI, BC_EFI, Xscale_EFI and X86-64_EFI # This option is first and will be the default if there is no input from the user. pxe-service=X86PC, "Boot to FOG", undionly pxe-service=X86-64_EFI, "Boot to FOG UEFI", ipxe dhcp-range=10.1.10.1,proxy
Edit> I would be led to believe that only DNSMasq should be handled because of…
dodhcp='N' bldhcp='0'
… no?
-
You had me thinking…
One of the other VM’s on the network I used, at one point, to do an installation, where it setup using FOG’s DHCP. Is it possible something is still lingering on this machine? I am going to shut it off and give it a whack, in the mean time…
Edit> Nope. No difference.
-
@dholtz-docbox said in Host Startup; Booting into LVM Disk Fails:
… no?
Read what your screenshot says.
“Duplicate option 66 (next server) from DHCP proxy and DHCP server.”This means your DHCP server is configured for FOG, and dnsmasq is configured for FOG. dnsmasq will refuse to run on a box that is also serving DHCP, so these two things are on two different systems. You don’t need both, and a full DHCP server is superior by far.
Use wireshark with the
bootp
filter to determine where the other DHCP server is, and do some ipconfig /release /renew commands to get them to respond - but you should already know where it is really. -
@dholtz-docbox said in Host Startup; Booting into LVM Disk Fails:
#!ipxe set fog-ip 10.1.10.42 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} exit
So, here’s the output my server gives for an unregistered host (the mac you gave), and the output of a registered host I have here. I’m on 1.3.0 RC-14.
#!ipxe set fog-ip 10.2.1.11 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} cpuid --ext 29 && set arch x86_64 || set arch i386 iseq ${platform} efi && set key 0x1b || set key 0x01 iseq ${platform} efi && set keyName ESC || set keyName CTRL + A prompt --key ${key} --timeout 4000 Booting... (Press ${keyName} to access the menu) && goto menuAccess || sanboot --no-describe --drive 0x80 :menuAccess login params param mac0 ${net0/mac} param arch ${arch} param platform ${platform} param username ${username} param password ${password} param menuaccess 1 param debug 1 isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :bootme chain -ar http://10.2.1.11/fog/service/ipxe/boot.php##params
#!ipxe set fog-ip 10.2.1.11 set fog-webroot fog set boot-url http://${fog-ip}/${fog-webroot} cpuid --ext 29 && set arch x86_64 || set arch i386 iseq ${platform} efi && set key 0x1b || set key 0x01 iseq ${platform} efi && set keyName ESC || set keyName CTRL + A prompt --key ${key} --timeout 4000 Booting... (Press ${keyName} to access the menu) && goto menuAccess || sanboot --no-describe --drive 0x80 :menuAccess login params param mac0 ${net0/mac} param arch ${arch} param platform ${platform} param username ${username} param password ${password} param menuaccess 1 param debug 1 isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme :bootme chain -ar http://10.2.1.11/fog/service/ipxe/boot.php##params
Have you modified any files yourself or modified your iPXE boot menu at all?
-
Yeah, I was wondering about that line - it has been bothering me. I have never used wireshark, so let me grab it and look around.
You have a lot more output
I have not edited anything though; I have only run the installfog.sh script over the existing installation.
Let me go and determine who else is serving DHCP real quick.
Edit> Which machine should I be using wireshark on? I’m not quite sure how to use the tool in the requested manner.
Edit> Nevermind. Wireshark was starting up with one of its windows resized to its minimum height - the window which lists the packets. I am looking through it now.
Edit>
Am I right in looking at this, in that, both 10.1.10.1 and 10.1.10.42 are serving DHCP, where only, w/ DNSMasq, 10.1.10.1 should be serving it?
Edit> I believe that is right, reading into how DHCP works. There should only be one DHCP Offer, correct? Both 10.1.10.1 and 10.1.10.42 offering would indicate that 10.1.10.42 is setup, somehow, to serve its own DHCP? I guess my next question is, if my dnsmasq.conf is correct, what is lingering that is causing 10.1.10.42 to also serve DHCP.
-
@dholtz-docbox TBH: I did not read the entire thread here, but your dhcp process looks normal if you are running dnsmasq in dhcpPROXY mode. In that case your primary dhcp server is 10.1.10…42 and 10.1.10.1 is the dnsmasq.
Looking down a bit more 10.1.10.1 is giving the ack so that tells me that 10.1.10.1 is your dhcp server. That way I rad this pcap is that you DO have two dhcp servers on your subnet. If .10.1 is your defined dhcp server for your subnet, you need to understand what .10.42 is doing. It should not be issuing an Offer if it has nothing to offer the client sending the discover.
-
@george1421 : I am not sure if our terminology is in sync, but to clarify… the router which handles DHCP is 10.1.10.1 and the FOG Server is 10.1.10.42. When you say “primary DHCP”, from whose point-of-view is that? My current assumption is that the primary DHCP would be the one serving IP’s; where the FOG Server would be the DNSMasq?
Further, I agree, 10.1.10.42 should NOT be offering anything. But when I look at Wireshark, 10.1.10.42 the one with a boot filename, where 10.1.10.1 has no real supporting information.
That said… does my dnsmasq.conf need to be flipped around, such that…
dhcp-boot=undionly.kpxe,,10.1.10.1
and…
dhcp-range=10.1.10.42,proxy
… or something like that? Curious if I had something flipped around incorrectly.
-
@dholtz-docbox one of the two are incorrect.
Lets go this way. Your dhcp server is .10.1 so then is your fog/dnsmasq server .1.42?
If that is the case both IP addresses below need to point to your FOG/dnsmasq server or the target will not pxe boot. In this setup dnsmasq is only supplying the {next server} and {boot file} and not an ip address that comes from your dhcp server.
-
@george1421 : Yeah, I noticed that - that PXE won’t boot this way, which makes sense. I reverted the changes. I guess what I have been trying to figure out is what Wayne mentioned, that the, “Duplicate option 66 (next server) from DHCP proxy and DHCP server,” message is present. So far, everyone of my network captures has yielded the aforementioned results. Where the only thing that strikes me as odd is the fact that both 10.1.10.1 and 10.1.10.42 are trying to make an offer. That said, only 10.1.10.1 ACK’s, but it has no boot filename or anything. If everything is setup the way it should be, 10.1.10.1 should be supplying “undionly.kpxe” in its boot filename, right?
Edit> Oh, re-reading your previous post, and reading what I wrote again, something might have clicked. So… 10.1.10.1 is serving the IP and 10.1.10.42 is serving the next-server, which is its IP, 10.1.10.42, correct? So… this would be typical behavior, if that is all correct… Which brings me back to not being sure why there is a duplicate option 66.
-
@dholtz-docbox Understand I did not read the entire thread so I’m not sure the root of your issue (sorry very busy today). But using dhcpProxy (dnsmasq) you will see two offers. But they are offering different things. (you can see that if you dig into the packet payload). You should get the ack from your dhcp server (which is what you are seeing). That is locking in the address for the client. 1.42 is the dhcpPROXY so it will be supplying dhcp options 66 and 67 (if you set it up correctly). The payload of dhcp 66 and 67 must point to your FOG server because that is where its getting the iPXE boot file.
-
When I’m debugging pxe booting I like to use this command from the FOG server (assuming the fog server, target computer and dhcp server are on the same subnet)
tcpdump -w output.pcap port 67 or port 68 or port 69 or port 4011
since dhcp is broadcast based any computer can pick this up, but being done from the fog server you will get the unicast dhcpProxy (4011) and the tftp (69) communications. If you want to do this with tcpdump and then boot the target computer to the error and then post the pcap here (which you can also look at with wireshark) I can tell you if its correct or not.BUT, if you are getting to the FOG iPXE menu then this is not your problem. Because getting to the iPXE menu is where the dhcp/pxe process stops and then transitions over to the iPXE kernel which is used to load the FOG Engine (the customized linux OS that captures and deploys images on the target comnputer).
-
@george1421 : Absolutely - I just want to make sure I put the right information out there in light of that.
Will Wireshark explicitly show options 66 and 67? I guess I don’t know how to validate that “the payload of DHCP 66 and 67” are correct. My assumption is that 66 is the “Next server IP address” and that 67 is “Boot file name”.
-
@george1421 : I do get into the FOG iPXE menu, so I guess… that’s good to know. It seems to be related to the kernel then…? Which I believe was a path I was on earlier, but wasn’t sure where I was going with it at the time.
-
@dholtz-docbox OK while this is just a picture of the pcap, I can see that who ever sent this packet (just off the screen) is sending the next server (option 66) to 10.1.10.42 (hopefully your fog server) an next server (dhcp option 67) as undionly.kpxe this is a proper dhcp offer response from dnsmasq in my opinion.
-
@george1421 : Thank you for clarifying that, that was my suspicion in the end too.
Also, thank you for taking the time to revisit this topic. I know I am close, given what successes I have had so far.
-
@dholtz-docbox said in Host Startup; Booting into LVM Disk Fails:
@george1421 : I do get into the FOG iPXE menu, so I guess… that’s good to know. It seems to be related to the kernel then…? Which I believe was a path I was on earlier, but wasn’t sure where I was going with it at the time.
You are correct then, its not a pxe/dhcp issue. I guess I need to read the thread now.
-
@george1421 OK, I’m being super lazy now. What is the current issue then?