Client boots from FOG server in one network, but registers with FOG server in different network
-
Hey Folks,
I was creating an additional FOG server for another building, and after I got FOG installed on it, I copied over my ltsp.conf file to the 2nd server. I also copied the undionly.kpxe to undionly.0 (rather than symlinking them).
Somehow I must have messed up the boot kernels in my original FOG server, because machines that pxe boot in this network now actually try to register with the FOG server in the other building! My hunch is that I copied undionly.kpxe from server2 over to server1 by mistake, rather than copying it to undionly.0 on the same server.
Server1 is on the 10.105.0.0 network, and Server2 is on 10.95.0.0 network.
Here’s what I see in syslog, when I grep for dnsmasq:
Sep 8 16:25:31 server1 dnsmasq-dhcp[1088]: 3451364178 available DHCP subnet: 10.105.0.45/255.255.0.0
Sep 8 16:25:31 server1 dnsmasq-dhcp[1088]: 3451364178 vendor class: PXEClient:Arch:00000:UNDI:002001
Sep 8 16:25:31 server1 dnsmasq-dhcp[1088]: 3451364178 PXE(eth0) 78:2b:cb:b7:a3:52 proxy
Sep 8 16:25:31 server1 dnsmasq-dhcp[1088]: 3451364178 tags: eth0
Sep 8 16:25:31 server1 dnsmasq-dhcp[1088]: 3451364178 bootfile name: undionly.kpxe
Sep 8 16:25:31 server1 dnsmasq-dhcp[1088]: 3451364178 next server: 10.95.0.45Given the above, I’m thinking that I need to regenerate (or edit) my undionly.kpxe and undionly.0 files on Server1, as I assume that that’s the one that is telling my pxe booted clients where to find the next server.
If I’m barking up the wrong tree, please where else I should be looking. My ltsp.conf files on Server1 and Server2 both look good.
Thanks a lot in advance!
M
-
It occurs to me now, after hunting around some more that I could perhaps achieve what I want simply by following the directions used to change an IP address. In my case, I wouldn’t even need to change fog .settings file, but would just need to rerun the installer.
Thoughts?
-
undionly.kpxe doesn’t have anything in it that tells it to use a certain server. It accepts arguments.
it’s dnsmasq. It’s bleeding over to the other building somehow.
-
@Wayne-Workman Well, I wondered about that, so I disabled the port on the switch that connects server2 to the network. When I did that, that machine hung when trying to pxeboot and I got an error something like file not found (can’t recall the exact wording, but was the kind of error I received when tftp server wasn’t working).
At one point I did have the wrong broadcast entry in the interface file of server2. The IP address, gateway, subnet mask correctly specified the network for building2, but the broadcast entry was for building1. I fixed that, and restarted the server, etc. Could that info be lingering in the network somehow? How could tell?
-
@MarcB I suspect ip did change and pxe is looking at the wrong server. Edit the /tftpboot/default.ipxe file to look at the appropriate server.
-
Here’s /tftp/default.ipxe on server1 (it contains the address of server1):
#!ipxe
cpuid --ext 29 && set arch x86_64 || set arch i386
params
param mac0 ${net0/mac}
param arch ${arch}
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
:bootme
chain http://10.105.0.45/fog/service/ipxe/boot.php##params=======
And here’s the same file from Server2 (it contains the correct address for Server2):
#!ipxe
cpuid --ext 29 && set arch x86_64 || set arch i386
params
param mac0 ${net0/mac}
param arch ${arch}
isset ${net1/mac} && param mac1 ${net1/mac} || goto bootme
isset ${net2/mac} && param mac2 ${net2/mac} || goto bootme
:bootme
chain http://10.95.0.45/fog/service/ipxe/boot.php##params=====================================================
-
I know this is a crazy problem to track down, as there were so many changes. I’d like to mention that server1 has worked great for years, and that server2 worked in building2, and allowed me to the image the machine there that I needed to.
-
More looking at logs on server1 revealed several instances of this:
Sep 8 07:17:31 Server1 dnsmasq-dhcp[1160]: DHCP, proxy on subnet 10.95.0.45
Sep 8 07:17:31 Server1 dnsmasq-dhcp[1160]: DHCP, proxy on subnet 10.105.0.45This makes me think that server1 is indeed the one that’s screwed up, and is telling pxe clients to boot to the server in the building2.
-
Okay - I found the error. I had created a copy of ltsp.conf in dnsmasq.d and renamed it ltsp.conf.server2. I scp’d it to server2 there, but it was still in the /etc/dnsmasq.d. I just reread the README and see that every file in that directory will be read by dnsmasq unless it has “.dpkg-dist”,“.dpkg-old” or “.dpkg-new” appended to the end.
I (incorrectly) figured that renaming the file would keep it from being read.
Thanks for your help, and for your great, great product!
-
@MarcB said in Client boots from FOG server in one network, but registers with FOG server in different network:
I just reread the README and see that every file in that directory will be read by dnsmasq unless it has “.dpkg-dist”,“.dpkg-old” or “.dpkg-new” appended to the end.
Lol… that would explain what I found long ago, and ranted about in this thread:
https://forums.fogproject.org/topic/8127/has-something-changed-with-uefi/24@Wayne-Workman said in Has something changed with UEFI?:
Also, I want to point out some stuff with dnsmasq that has tripped me up before.
Firstly, it uses WHAT-EVER it finds inside
/etc/dnsmasq.d
Doesn’t matter what it’s named. ltsp.conf, ltsp.conf.old, MyXmasWishList.txt - it does not care. If you have backup configurations in there, move them somewhere else.
Maybe that issue is resolved in the newer version, I don’t know.
Second - when dnsmasq sends out it’s ProxyDHCP - it appends
.0
to thefilename
it gives. You could do some complex stuff with symbolic links, but I prefer not to. I prefer to copy the file I want to use. In your case, let’s go with ipxe.efi. You’d copy that like so:cp /tftpboot/ipxe.efi /tftpboot/ipxe.efi.0