Hosts are looking for tftp server.
-
@george1421 I still don’t understand why we don’t see an offer packet from your main dhcp server.
What is your dhcp server?
Is the pxe booting target computer on the same subnet as the fog server?
What I see is a non-standard pxe boot. The client is getting an ip address from somewhere because its talking to the tftp server -
@george1421 Ok , I’ll try tomorrow.
It’s very very strange all this. Same hosts yesterday just one asked for TFTP server. Today 15 hosts ???
With this config I use fog almost 4 months on different models HP and I saw one or two times to ask me for tftp server -
@george1421 the client takes IP from DHCP of the University. Me in the lab I use dnsmasq for proxy server. My server 192.168.149.43 is configured like IP helper in the switch of my lab which has 4 subnet and 400 hosts. We have only one address NAT for all hosts 132.208… which address all hosts use for exit address. It works like router . DHCP of the University is on 132.208. and works fine it gives IP
Exit address 192.168.148.1
-
@marted Well what I’m saying is if the fog server and the pxe booting computer is on the same subnet, then what I see in the pcap is non-standard.
The standard dhcp/pxe boot is this.
Client -> Discover
DHCP -> Offer
ProxyDHCP -> Offer
Client -> Request
DHCP -> ACK
Client to ProxyDHCP -> Boot info request (udp port 4011)
ProxyDHCP to Client -> Boot info (udp port 4011)
Client -> TFTP server boot file size
TFTP -> Client boot file size is
Client -TFTP server give me file XXXXThat is what I expect to see.
Its possible that your dhcp-relay service is sending the off subnet dhcp offer via a unicast, in that case the fog server wouldn’t see it.
BUT what I see in the pcap is
Discover
Offer from proxy dhcp
then right away another Discover from the client. This means it didn’t get an acceptable offer to give it an IP address.This is non-standard.
-
@george1421 I reinstalled recently the server with https .
first time When I installed it I put the DNS IP for the server but this time I just skipped it. !Maybe this is the problem. Our DNS is DHCP server
-
@marted As I see it right now this is NOT a FOG problem. You are not even to FOG yet. I’ll say that as long as the FOG server is NOT your DHCP server. There isn’t a fog configuration that would cause what I’m seeing so far. I’m not saying this to redirect any fault here. Its just that at this stage the communication is between the target computer and your dhcp server.
-
@george1421 when a host boot I see it takes ip from dhcp, I see the dhcp address, I see the fog server on dhcp proxy place and everything is fine just to tftp: server. I put tftp server and it boot in fog menu. Like I say 25 hosts boot and 10 to 15 hosts ask for tftp server BUT have already IP and ready, just tftp server missing. Next time I send task other hosts ask for tftp server, always different hosts
-
@marted Well lets start tomorrow by fixing the dnsmasq setting then grab another pcap of the pxe boot process. Lets make new assumptions based on the correct (and well tested) dnsmasq file.
There ARE tweaks we can make to the dnsmasq configuration file to cover certain circumstances.
Like in this section
pxe-service=X86PC, "Boot to FOG", undionly.kpxe pxe-service=X86-64_EFI, "Boot to FOG UEFI", ipxe.efi pxe-service=BC_EFI, "Boot to FOG UEFI PXE-BC", ipxe.efi
we can specify the boot server in the services line. It would look like this
pxe-service=X86PC, "Boot to FOG", undionly.kpxe, 192.168.149.43 pxe-service=X86-64_EFI, "Boot to FOG UEFI", ipxe.efi, 192.168.149.43 pxe-service=BC_EFI, "Boot to FOG UEFI PXE-BC", ipxe.efi, 192.168.149.43
Its not typical that we need to do that, but in certain environments it necessary.
-
@george1421 thank you so much. I’ll fix that tomorrow and will post the resold. Thanks again!
-
@george1421 said in Hosts are looking for tftp server.:
I still don’t understand why we don’t see an offer packet from your main dhcp server.
Because the capture was taken on the FOG server (I guess) and the DHCP offer & ACKs are not broadcasted but send directly (unicast MAC) to the client.
-
@Sebastian-Roth tel me how to make the test with wireshark to see the actual situation. Thanks
-
@marted For a single client you could use a monitoring port on the switch or connect it to a hub to capture the traffic. But it’s quite a task to do and you still don’t get the full truth. You’d need to capture on the DHCP server to get all the packets. But make sure you do filter on capture or later on using display filters and export to a new PCAP so we don’t have all your network traffic in it.
Capture filter:
port 67 or port 68 or port 4011
On the other hand you won’t see the TFTP requests on the FOG server this way.
-
@george1421 @Sebastian-Roth I chanced the options in dnsmasq, restarted and nothing changed, always 5 to 10 different hosts ask for tftp server after taking an IP from DHCP of the University ![0_1583510428281_547BADC4-E100-4376-B025-9D12F6A3F622.jpeg](Uploading 0%)
-
@marted Are these 5-10 hosts all on the same subnet as the fog server? There is something going on here that isn’t apparent.
-
@george1421 Yes, all of them. Next time when I stop the hosts and started with wake up on LAN now other 10-15 ask for tftp server, always in the same room of 25 hosts
-
@george1421 if I just press enter without entering tftp server it gives this
Same host on the next boot
-
@george1421 now one more thing - when I boot manually host by host with F12 every host boot correctly with no problems. The problem come only when I try to boot them all with a task and wake up on LAN. I have impression that there is a limit of hosts to connect to tftp server simultaneously at the same time. this is a new model very fast with i7 8th Gen , 1 Tb SSD and it boot for few seconds.
-
@marted Well let me say that iPXE is working exactly as it was programmed to do. If it doesn’t receive pxe boot information from either the dhcp server or a proxydhcp server then it will prompt the user.
ref: https://github.com/FOGProject/fogproject/blob/master/src/ipxe/src/ipxescriptThis is a dhcp (proxydhcp) issue and not anything to do with tftp. If it was a tftp issue the iPXE boot loader would not be running on the target computer asking for a boot server.
Along the lines of a random dhcp issue, that can come from having two or more dhcp servers on your campus that have different configuration for the subnets. Where the first dhcp server that responds wins the election. Now that a proxydhcp server is involved, if the proxydhcp server doesn’t respond in time to too late the client will not use (or have) any pxe boot information.
I’ll ask the question again, is the computer that is showing this random ask for tftp server on the same subnet as the fog server? If so then there is something wrong with the proxy dhcp process because since its on the local subnet as the pxe booting computer it should hear the discover every time (the first pcap was showing that). What I did not see in the first pcap was the main dhcp server responding. Based on what I’m seeing in the pcap I would say the main dhcp is either responding sluggishly or random dhcp servers are in play.
If the target computer is on a different subnet, then you will need to load wireshark on a witness computer with the capture filters that Sebastian provided. This will only allow us to see the dhcp process, but at least we can see what actors are involved here.
IMO the issue at the moment is an network infrastructure one and not anything to do with FOG, other than we need network booting to work to get FOG to work. Since we don’t know your networking infrastructure we can only make suggestions where to look based on our experiences and intimately knowing how FOG works.
-
@george1421 @Sebastian-Roth you’re right. This is a issue of the dnsmasq (DHCP proxy server) not FOG. If you want change the place of the topic.
The Dnsmasq is not capable to handle many requests at a time. All tests I made Yesterday I found that up to 10-12 computers at a time there is no issues. Like I said earlier in my posts, the problem is ONLY with this new model we have, because they boot simultaneously and I guess almost all at the same time ‘‘ask’’ the proxy dhcp for information. Like you said if the proxy is not capable to handle the request for a host, this host will pass to the next dhcp in the network, and because we don’t have 3th one dhcp in the network, it will return to the main dhcp (DHCP of the University) . We see this request in the wireshark file like a request on the exit IP 192.168.148.1 and answer from it.
Now the question is how to fix this situation. In this close private network we have 10 rooms each room with 25 computers, all of them (250) installed on 4 sub net 192.168.148.0, 192.168.149.0 192.168.150.0 192.168.151.0. The server FOG is a virtual server fixed on 192.168.149.43 and configured on our private switch in the lab like an IP Helper (DHCP proxy). Up to now almost 5 months, no issues with FOG for booting. Like I said this is the first time we have a problem like this, simply because in other rooms the old models, when I send a task for 25 hosts they don’t wake up on LAN exactly in the same time, and because of that they don’t '‘ask’ dhcp proxy for information in the same time. Now the new model hosts I see it do that.
My questions (I am just asking I don’t know the question is correct or no )
Is it possible to setup the dnsmasq to handle requests one at a time and like this to be able to proceed all requests?
Can we have second port open to handle part of the requests?
or second dnsmasq on the same server?
or second server only with dnsmasq installed which will transfer only the information which leads to the real FOG server?
or getting(install) better network card?
If you have some other suggestion I am open to listen.
I know it is always possible just to boot the hosts one at a time with F12 and it will work a 100% or make small groups of 5-10 hosts for this model, but I like very much the way FOG can handle many hosts at a time and.
Another thing I turnoff all hosts in the evening and when I wake up on LAN room by room in the morning just in this room I have to go and reboot again manually or enter tftp server info.
I hope to find some solution!
Thanks again for all your help -
@marted That’s an interesting one. From what you describe it really sounds as if dnsmasq is not able to serve all of them at the same time. If that’s the case we should be able to see this in the logs. First figure out which log file is used:
grep "dnsmasq" /var/log/messages /var/log/syslog /var/log/daemon.log
Depending on the Linux OS you have the logging might be in a different file. When you have found it schedule a deploy task for those Dell AIO 7470 hosts and run
tail -f /var/log/syslog | grep "dnsmasq" | tee /tmp/dnsmasq.log
to see all the log messages coming in life as well as save those to a separate log file in /tmp/dnsmasq.log.Together with a lost of MAC addresses of the Dell AIO 7470 hosts and the log file you should be able to see which one got the PXE/TFTP information and which didn’t on that run. Maybe there are hints in the log that one was skipped. Not sure. Upload the log file here if you need help with finding anything in it.