Hosts are looking for tftp server.
-
@marted Well lets start tomorrow by fixing the dnsmasq setting then grab another pcap of the pxe boot process. Lets make new assumptions based on the correct (and well tested) dnsmasq file.
There ARE tweaks we can make to the dnsmasq configuration file to cover certain circumstances.
Like in this section
pxe-service=X86PC, "Boot to FOG", undionly.kpxe pxe-service=X86-64_EFI, "Boot to FOG UEFI", ipxe.efi pxe-service=BC_EFI, "Boot to FOG UEFI PXE-BC", ipxe.efi
we can specify the boot server in the services line. It would look like this
pxe-service=X86PC, "Boot to FOG", undionly.kpxe, 192.168.149.43 pxe-service=X86-64_EFI, "Boot to FOG UEFI", ipxe.efi, 192.168.149.43 pxe-service=BC_EFI, "Boot to FOG UEFI PXE-BC", ipxe.efi, 192.168.149.43
Its not typical that we need to do that, but in certain environments it necessary.
-
@george1421 thank you so much. I’ll fix that tomorrow and will post the resold. Thanks again!
-
@george1421 said in Hosts are looking for tftp server.:
I still don’t understand why we don’t see an offer packet from your main dhcp server.
Because the capture was taken on the FOG server (I guess) and the DHCP offer & ACKs are not broadcasted but send directly (unicast MAC) to the client.
-
@Sebastian-Roth tel me how to make the test with wireshark to see the actual situation. Thanks
-
@marted For a single client you could use a monitoring port on the switch or connect it to a hub to capture the traffic. But it’s quite a task to do and you still don’t get the full truth. You’d need to capture on the DHCP server to get all the packets. But make sure you do filter on capture or later on using display filters and export to a new PCAP so we don’t have all your network traffic in it.
Capture filter:
port 67 or port 68 or port 4011
On the other hand you won’t see the TFTP requests on the FOG server this way.
-
@george1421 @Sebastian-Roth I chanced the options in dnsmasq, restarted and nothing changed, always 5 to 10 different hosts ask for tftp server after taking an IP from DHCP of the University ![0_1583510428281_547BADC4-E100-4376-B025-9D12F6A3F622.jpeg](Uploading 0%)
-
@marted Are these 5-10 hosts all on the same subnet as the fog server? There is something going on here that isn’t apparent.
-
@george1421 Yes, all of them. Next time when I stop the hosts and started with wake up on LAN now other 10-15 ask for tftp server, always in the same room of 25 hosts
-
@george1421 if I just press enter without entering tftp server it gives this
Same host on the next boot
-
@george1421 now one more thing - when I boot manually host by host with F12 every host boot correctly with no problems. The problem come only when I try to boot them all with a task and wake up on LAN. I have impression that there is a limit of hosts to connect to tftp server simultaneously at the same time. this is a new model very fast with i7 8th Gen , 1 Tb SSD and it boot for few seconds.
-
@marted Well let me say that iPXE is working exactly as it was programmed to do. If it doesn’t receive pxe boot information from either the dhcp server or a proxydhcp server then it will prompt the user.
ref: https://github.com/FOGProject/fogproject/blob/master/src/ipxe/src/ipxescriptThis is a dhcp (proxydhcp) issue and not anything to do with tftp. If it was a tftp issue the iPXE boot loader would not be running on the target computer asking for a boot server.
Along the lines of a random dhcp issue, that can come from having two or more dhcp servers on your campus that have different configuration for the subnets. Where the first dhcp server that responds wins the election. Now that a proxydhcp server is involved, if the proxydhcp server doesn’t respond in time to too late the client will not use (or have) any pxe boot information.
I’ll ask the question again, is the computer that is showing this random ask for tftp server on the same subnet as the fog server? If so then there is something wrong with the proxy dhcp process because since its on the local subnet as the pxe booting computer it should hear the discover every time (the first pcap was showing that). What I did not see in the first pcap was the main dhcp server responding. Based on what I’m seeing in the pcap I would say the main dhcp is either responding sluggishly or random dhcp servers are in play.
If the target computer is on a different subnet, then you will need to load wireshark on a witness computer with the capture filters that Sebastian provided. This will only allow us to see the dhcp process, but at least we can see what actors are involved here.
IMO the issue at the moment is an network infrastructure one and not anything to do with FOG, other than we need network booting to work to get FOG to work. Since we don’t know your networking infrastructure we can only make suggestions where to look based on our experiences and intimately knowing how FOG works.
-
@george1421 @Sebastian-Roth you’re right. This is a issue of the dnsmasq (DHCP proxy server) not FOG. If you want change the place of the topic.
The Dnsmasq is not capable to handle many requests at a time. All tests I made Yesterday I found that up to 10-12 computers at a time there is no issues. Like I said earlier in my posts, the problem is ONLY with this new model we have, because they boot simultaneously and I guess almost all at the same time ‘‘ask’’ the proxy dhcp for information. Like you said if the proxy is not capable to handle the request for a host, this host will pass to the next dhcp in the network, and because we don’t have 3th one dhcp in the network, it will return to the main dhcp (DHCP of the University) . We see this request in the wireshark file like a request on the exit IP 192.168.148.1 and answer from it.
Now the question is how to fix this situation. In this close private network we have 10 rooms each room with 25 computers, all of them (250) installed on 4 sub net 192.168.148.0, 192.168.149.0 192.168.150.0 192.168.151.0. The server FOG is a virtual server fixed on 192.168.149.43 and configured on our private switch in the lab like an IP Helper (DHCP proxy). Up to now almost 5 months, no issues with FOG for booting. Like I said this is the first time we have a problem like this, simply because in other rooms the old models, when I send a task for 25 hosts they don’t wake up on LAN exactly in the same time, and because of that they don’t '‘ask’ dhcp proxy for information in the same time. Now the new model hosts I see it do that.
My questions (I am just asking I don’t know the question is correct or no )
Is it possible to setup the dnsmasq to handle requests one at a time and like this to be able to proceed all requests?
Can we have second port open to handle part of the requests?
or second dnsmasq on the same server?
or second server only with dnsmasq installed which will transfer only the information which leads to the real FOG server?
or getting(install) better network card?
If you have some other suggestion I am open to listen.
I know it is always possible just to boot the hosts one at a time with F12 and it will work a 100% or make small groups of 5-10 hosts for this model, but I like very much the way FOG can handle many hosts at a time and.
Another thing I turnoff all hosts in the evening and when I wake up on LAN room by room in the morning just in this room I have to go and reboot again manually or enter tftp server info.
I hope to find some solution!
Thanks again for all your help -
@marted That’s an interesting one. From what you describe it really sounds as if dnsmasq is not able to serve all of them at the same time. If that’s the case we should be able to see this in the logs. First figure out which log file is used:
grep "dnsmasq" /var/log/messages /var/log/syslog /var/log/daemon.log
Depending on the Linux OS you have the logging might be in a different file. When you have found it schedule a deploy task for those Dell AIO 7470 hosts and run
tail -f /var/log/syslog | grep "dnsmasq" | tee /tmp/dnsmasq.log
to see all the log messages coming in life as well as save those to a separate log file in /tmp/dnsmasq.log.Together with a lost of MAC addresses of the Dell AIO 7470 hosts and the log file you should be able to see which one got the PXE/TFTP information and which didn’t on that run. Maybe there are hints in the log that one was skipped. Not sure. Upload the log file here if you need help with finding anything in it.
-
@Sebastian-Roth I got the log file dnsmasq.log
This are the MAC addresses which asked for tftp server
00:4e:01:c5:f4:67
00:4e:01:c5:fa:98
00:4e:01:c5:e7:c4
00:4e:01:c5:a5:9a -
@marted Well done!
First thing I notice is that we see pretty much every request coming in twice in the logs. Makes me wonder if this might confuse the clients as they probably get two responses from that as well. Probably these duplicated messages come from the IP helper?!
Though it’s interesting you get a 100% success rate on PXE booting when it’s not a multicast.
As well further down in the log we see it repeat the same log messages three times before it goes on to actually send out the information:
Mar 9 12:55:47 foglabunix dnsmasq-dhcp[744]: 1635745377 available DHCP subnet: 192.168.149.43/255.255.252.0 Mar 9 12:55:47 foglabunix dnsmasq-dhcp[744]: 1635745377 vendor class: PXEClient:Arch:00007:UNDI:003010 Mar 9 12:55:47 foglabunix dnsmasq-dhcp[744]: 1635745377 user class: iPXE Mar 9 12:55:47 foglabunix dnsmasq-dhcp[744]: 1635745377 available DHCP subnet: 192.168.149.43/255.255.252.0 Mar 9 12:55:47 foglabunix dnsmasq-dhcp[744]: 1635745377 vendor class: PXEClient:Arch:00007:UNDI:003010 Mar 9 12:55:47 foglabunix dnsmasq-dhcp[744]: 1635745377 user class: iPXE Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 available DHCP subnet: 192.168.149.43/255.255.252.0 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 vendor class: PXEClient:Arch:00007:UNDI:003010 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 user class: iPXE Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 PXE(ens32) 00:4e:01:c6:36:08 proxy Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 tags: UEFI, ens32 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 bootfile name: ipxe.efi Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 server name: 192.168.149.43 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 next server: 192.168.149.43 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 sent size: 1 option: 53 message-type 5 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 sent size: 4 option: 54 server-identifier 192.168.149.43 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 sent size: 9 option: 60 vendor-class 50:58:45:43:6c:69:65:6e:74 Mar 9 12:55:48 foglabunix dnsmasq-dhcp[744]: 1635745377 sent size: 17 option: 97 client-machine-id ...
See if you can figure out why all the DHCP messages seem to be duplicates in your network. This might be the key. Not sure though but it’s still worth looking at and fixing it.
-
@Sebastian-Roth said in Hosts are looking for tftp server.:
grep “dnsmasq” /var/log/
I have just seen the tftpd log and something is wrong. See the time I test today two times
root@foglabunix:/var/log# systemctl status tftpd-hpa ● tftpd-hpa.service - LSB: HPA's tftp server Loaded: loaded (/etc/init.d/tftpd-hpa; generated) Active: active (running) since Mon 2020-03-09 12:13:55 EDT; 2h 2min ago Docs: man:systemd-sysv-generator(8) Process: 1473 ExecStart=/etc/init.d/tftpd-hpa start (code=exited, status=0/SUCCESS) Tasks: 1 (limit: 4915) CGroup: /system.slice/tftpd-hpa.service └─1509 /usr/sbin/in.tftpd --listen --user root --address :69 -s /tftpboot Mar 09 12:55:51 foglabunix in.tftpd[3843]: tftp: client does not accept options Mar 09 12:55:51 foglabunix in.tftpd[3845]: tftp: client does not accept options Mar 09 12:55:51 foglabunix in.tftpd[3849]: tftp: client does not accept options Mar 09 12:55:51 foglabunix in.tftpd[3851]: tftp: client does not accept options Mar 09 12:55:51 foglabunix in.tftpd[3853]: tftp: client does not accept options Mar 09 13:24:03 foglabunix in.tftpd[6395]: tftp: client does not accept options Mar 09 13:24:03 foglabunix in.tftpd[6406]: tftp: client does not accept options Mar 09 13:24:03 foglabunix in.tftpd[6419]: tftp: client does not accept options Mar 09 13:24:03 foglabunix in.tftpd[6421]: tftp: client does not accept options Mar 09 13:24:03 foglabunix in.tftpd[6432]: tftp: client does not accept options
and all log from today
Mar 9 11:34:40 foglabunix in.tftpd[10796]: tftp: client does not accept options Mar 9 11:35:35 foglabunix in.tftpd[10979]: tftp: client does not accept options Mar 9 12:24:07 foglabunix in.tftpd[14779]: tftp: client does not accept options Mar 9 12:25:07 foglabunix in.tftpd[14950]: tftp: client does not accept options Mar 9 12:33:42 foglabunix in.tftpd[15559]: tftp: client does not accept options Mar 9 12:13:55 foglabunix tftpd-hpa[1473]: * Starting HPA's tftpd in.tftpd Mar 9 12:13:55 foglabunix tftpd-hpa[1473]: ...done. Mar 9 12:39:34 foglabunix in.tftpd[2389]: tftp: client does not accept options Mar 9 12:39:36 foglabunix in.tftpd[2391]: tftp: client does not accept options Mar 9 12:39:36 foglabunix in.tftpd[2393]: tftp: client does not accept options Mar 9 12:39:36 foglabunix in.tftpd[2395]: tftp: client does not accept options Mar 9 12:39:44 foglabunix in.tftpd[2411]: tftp: client does not accept options Mar 9 12:39:44 foglabunix in.tftpd[2413]: tftp: client does not accept options Mar 9 12:39:44 foglabunix in.tftpd[2415]: tftp: client does not accept options Mar 9 12:39:44 foglabunix in.tftpd[2417]: tftp: client does not accept options Mar 9 12:39:44 foglabunix in.tftpd[2419]: tftp: client does not accept options Mar 9 12:39:45 foglabunix in.tftpd[2421]: tftp: client does not accept options Mar 9 12:40:05 foglabunix in.tftpd[2455]: tftp: client does not accept options Mar 9 12:40:05 foglabunix in.tftpd[2457]: tftp: client does not accept options Mar 9 12:40:05 foglabunix in.tftpd[2458]: tftp: client does not accept options Mar 9 12:40:05 foglabunix in.tftpd[2461]: tftp: client does not accept options Mar 9 12:40:05 foglabunix in.tftpd[2462]: tftp: client does not accept options Mar 9 12:55:41 foglabunix in.tftpd[3796]: tftp: client does not accept options Mar 9 12:55:41 foglabunix in.tftpd[3798]: tftp: client does not accept options Mar 9 12:55:41 foglabunix in.tftpd[3800]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3815]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3817]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3819]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3821]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3823]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3825]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3826]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3827]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3831]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3833]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3834]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3837]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3838]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3840]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3843]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3845]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3847]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3849]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3851]: tftp: client does not accept options Mar 9 12:55:51 foglabunix in.tftpd[3853]: tftp: client does not accept options Mar 9 12:56:12 foglabunix in.tftpd[3890]: tftp: client does not accept options Mar 9 13:02:59 foglabunix in.tftpd[4521]: tftp: client does not accept options Mar 9 13:04:02 foglabunix in.tftpd[4599]: tftp: client does not accept options Mar 9 13:23:53 foglabunix in.tftpd[6370]: tftp: client does not accept options Mar 9 13:23:53 foglabunix in.tftpd[6372]: tftp: client does not accept options Mar 9 13:23:53 foglabunix in.tftpd[6374]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6394]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6395]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6398]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6401]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6400]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6402]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6406]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6408]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6409]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6411]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6413]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6416]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6418]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6419]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6421]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6424]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6426]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6428]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6430]: tftp: client does not accept options Mar 9 13:24:03 foglabunix in.tftpd[6432]: tftp: client does not accept options Mar 9 13:24:24 foglabunix in.tftpd[6479]: tftp: client does not accept options Mar 9 13:24:33 foglabunix in.tftpd[6490]: tftp: client does not accept options Mar 9 13:25:35 foglabunix in.tftpd[6553]: tftp: client does not accept options Mar 9 13:31:52 foglabunix in.tftpd[7174]: tftp: client does not accept options Mar 9 13:32:53 foglabunix in.tftpd[7241]: tftp: client does not accept options Mar 9 13:52:48 foglabunix in.tftpd[9009]: tftp: client does not accept options
-
@marted said in Hosts are looking for tftp server.:
tftp: client does not accept options
As far as I know this is ok. It means that the client requests the size and TFTP server just says it doesn’t support querying size. I have seen this often. Should not cause a problem.
Have you looked at why DHCP queries come in duplicated?
-
@Sebastian-Roth said in Hosts are looking for tftp server.:
Have you looked at why DHCP queries come in duplicated?
I have no idea. I looked in the config file. Nothing different than your example in wiki. I’ll make today a test with tcpdump on 69 to see the traffic info on the server. Also I’ll check this options in dnsmasq like:
--tftp-no-fail Do not abort startup if specified tftp root directories are inaccessible. --tftp-max=<connections> Set the maximum number of concurrent TFTP connections allowed. This defaults to 50. When serving a large number of TFTP connections, per-process file descriptor limits may be encountered. Dnsmasq needs one file descriptor for each concurrent TFTP connection and one file descriptor per unique file (plus a few others). So serving the same file simultaneously to n clients will use require about n + 10 file descriptors, serving different files simultaneously to n clients will require about (2*n) + 10 descriptors. If --tftp-port-range is given, that can affect the number of concurrent connections. --tftp-no-blocksize Stop the TFTP server from negotiating the "blocksize" option with a client. Some buggy clients request this option but then behave badly when it is granted. --tftp-port-range=<start>,<end> A TFTP server listens on a well-known port (69) for connection initiation, but it also uses a dynamically-allocated port for each connection. Normally these are allocated by the OS, but this option specifies a range of ports for use by TFTP transfers. This can be useful when TFTP has to traverse a firewall. The start of the range cannot be lower than 1025 unless dnsmasq is running as root. The number of concurrent TFTP connections is limited by the size of the port range.
I’ll try also to capture a log with a different model clients to see if there is a différents.
-
@Sebastian-Roth said in Hosts are looking for tftp server.:
Have you looked at why DHCP queries come in duplicated?
I think I can explain this (or at least make up something that sounds good).
What I saw in a previous pcap on this issue was with the target computer on the same subnet as the FOG server (running dnsmasq) but the main dhcp server is on a different subnet. When the target issued a DHCP discover, there was an OFFER from dnsmasq (as it should) but there was also an OFFER from the dhcp-helper service on the subnet router. This OFFER from the dhcp-helper service was a reflection of the dhcp OFFER from dnsmasq.
(educated guess follows) The dhcp-helper service is configured to listen on the interface where the fog server is as well as the target computer. It is configured this way to allow the remote dhcp server to reply dhcp requests on the local subnet. This is standard and typical. Now for dnsmasq to reply with pxe boot information for remote subnets we would typically add the dnsmasq server as the last server in the dhcp-helper service. This would then inform the dnsmasq server when a client was pxe booting on a remote subnet. The problem comes where the dhcp-helper service is listening on the same subnet where the dnsmasq server is. The dnsmasq server replies to the OFFER directly to the target computer, but the dhcp-helper service also hears the DISCOVER and as its programmed sends to the DISCOVER to dnsmasq where it replies to the dhcp-helper service which then echos the OFFER from dnsmasq back to the target computer generating 2 offers from the same service (dnsmasq) from only one DISCOVER request.
-
@marted said in Hosts are looking for tftp server.:
The Dnsmasq is not capable to handle many requests at a time.
Its possible on a really busy FOG server that dnsmasq doesn’t have enough time to respond to all of the requests, but I find that a bit hard to believe. You could try to move dnsmasq to a standalone linux server to see if it helps. But I don’t think the speed of dnsmasq is your issue here.
While its not a clean solution you could place a dnsmasq server on each of the 4 subnets and then remove the fog server from the dhcp-helper service. Each dnsmasq server on each subnet would be responsible for providing the pxe boot information for just that subnet. Just thinking out of the box, but a raspberry pi running raspbian would work for the dnsmasq server on each subnet. A standalone VM (not the fog server) with dnsmasq running with an interface on each subnet would also work.