Some hosts are unable to get an address through DHCP

mageta52

I’m sorry, I forgot to post the results! No it still is not working.

If I let the hosts boot into their old OS, I am able to ping the FOG server, so I don’t believe there is a physical communication issue happening.

Wayne Workman

@mageta52 Then we will need to see the output of ip addr show and cat /etc/dhcp/dhcpd.conf, maybe it’s something simple.

mageta52

[root@localhost ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:13:20:04:2f:3d brd ff:ff:ff:ff:ff:ff
    inet 192.168.235.52/24 brd 192.168.235.255 scope global enp2s0
       valid_lft forever preferred_lft forever
    inet6 fe80::213:20ff:fe04:2f3d/64 scope link 
       valid_lft forever preferred_lft forever
3: enp4s2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 00:03:47:ad:c3:61 brd ff:ff:ff:ff:ff:ff
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:62:65:da brd ff:ff:ff:ff:ff:ff
    inet 192.168.124.1/24 brd 192.168.124.255 scope global virbr0
       valid_lft forever preferred_lft forever
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue master virbr0 state DOWN group default qlen 500
    link/ether 52:54:00:62:65:da brd ff:ff:ff:ff:ff:ff

[root@localhost ~]# cat /etc/dhcp/dhcpd.conf 
# DHCP Server Configuration file\n#see /usr/share/doc/dhcp*/dhcpd.conf.sample
# This file was created by FOG
#Definition of PXE-specific options
# Code 1: Multicast IP Address of bootfile
# Code 2: UDP Port that client should monitor for MTFTP Responses
# Code 3: UDP Port that MTFTP servers are using to listen for MTFTP requests
# Code 4: Number of seconds a client must listen for activity before trying
#         to start a new MTFTP transfer
# Code 5: Number of seconds a client must listen before trying to restart
#         a MTFTP transfer
option space PXE;
option PXE.mtftp-ip code 1 = ip-address;
option PXE.mtftp-cport code 2 = unsigned integer 16;
option PXE.mtftp-sport code 3 = unsigned integer 16;
option PXE.mtftp-tmout code 4 = unsigned integer 8;
option PXE.mtftp-delay code 5 = unsigned integer 8;
option arch code 93 = unsigned integer 16;
use-host-decl-names on;
ddns-update-style interim;
ignore client-updates;
# Specify subnet of ether device you do NOT want service.
# For systems with two or more ethernet devices.
# subnet 136.165.0.0 netmask 255.255.0.0 {}
subnet 192.168.235.0 netmask 255.255.255.0{
    option subnet-mask 255.255.255.0;
    range dynamic-bootp 192.168.235.10 192.168.235.254;
    default-lease-time 21600;
    max-lease-time 43200;
    #option routers 0.0.0.0
    #option routers 0.0.0.0
    next-server 192.168.235.52;
    class "Legacy" {
        match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00000";
        filename "undionly.kkpxe";
    }
    class "UEFI-32-2" {
        match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00002";
        filename "i386-efi/ipxe.efi";
    }
    class "UEFI-32-1" {
        match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00006";
        filename "i386-efi/ipxe.efi";
    }
    class "UEFI-64-1" {
        match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00007";
        filename "ipxe.efi";
    }
    class "UEFI-64-2" {
        match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00008";
        filename "ipxe.efi";
    }
    class "UEFI-64-3" {
        match if substring(option vendor-class-identifier, 0, 20) = "PXEClient:Arch:00009";
        filename "ipxe.efi";
    }
}

Wayne Workman

@mageta52 The configuration looks fine. Can you attach a laptop or something to this network and see if you can get DHCP using the laptop’s OS?

mageta52

The hosts to be imaged all have a previous OS on them, so I can just boot them up and switch them to obtain an address automatically since they’re already hooked up to the server via the switch.

I tried /release /renew a couple of times on the machines that failed to get an address during PXE and they could both get an address when booted into windows. DHCP seems to be working fine.

So I went back to PXE booting, one of the hosts registered fine, I moved onto the next; failed at DHCP, same for the second host.

Wayne Workman

@mageta52 Ok then. This is a switch or hardware issue. What model of computers are you using? and what model of switch? Is it a mini switch or like an enterprise grade Cisco Catalyst ? If it’s a managed switch, what is your configuration? Do you have portfast enabled? Spanning tree? 802.11x power saving options?

mageta52

These are custom build 1RU’s with an Asrock Z97E motherboard.

The switch is a D-link DGS 1024D switch. I looked it up to confirm that it is indeed unmanaged

Wayne Workman

@mageta52 Can you try a different boot file? It might help. Right now, for legacy, you have undionly.kkpxe configured.

This is in your /etc/dhcp/dhcpd.conf file. They are labeled pretty well in there, it’s the one named “Legacy”.

If your NIC on the motherboard is realtek, use realtek.kpxe and if it’s intel, use intel.kpxe

You might also try out ipxe.kpxe as well.

The computers need fully turned off and back on for the settings to take right. On the fog server, after making a change, restart dhcp with systemctl restart dhcpd

Also, do you know if the motherboard is operating in BIOS mode or UEFI mode? The above instructions are for BIOS.

Sebastian Roth

@mageta52 said:

I tried /release /renew a couple of times on the machines that failed to get an address during PXE and they could both get an address when booted into windows. DHCP seems to be working fine.

Sounds very much like a spanning tree issue to me. Can you please try connecting an unmanaged mini switch in between the client and your D-link DGS 1024D switch. Does PXE boot work then? Search the wiki for spanning tree and port fast!

mageta52

Looks like they’re running BIOS, not UEFI. The nic is an Intel NIC.

By modifying the boot file I get the same error for both of your suggestions.
“Waiting for link up on net0… Down (http://ipxe.org/38086101)
DHCP failed, hit S for pxe shell, rebooting in 10 seconds”

mageta52

@Sebastian-Roth I checked the manual for this switch and the only mention of spanning tree is in the glossary. It gives no mention of the feature anywhere else, and I don’t think that a flat switch like this even supports it.

Wayne Workman

@Sebastian-Roth

@mageta52 said in Some hosts are unable to get an address through DHCP:

The switch is a D-link DGS 1024D switch. I looked it up to confirm that it is indeed unmanaged

mageta52

@Wayne-Workman So, I took the switch out of there and put in a different one; Trendnet TE100-S16, which is another unmanaged switch.

The pattern I’m seeing is that one host will make it through and get registered, then I move up to the next PC, and it fails to get through DHCP.

I’m not sure what the deciding factor is on whether or not they get through, but I’ve never had more than one get through consecutively

Sebastian Roth

@mageta52 Are you familiar with capturing a packet dump (network packets) from the wire using wireshark or tcpdump? This might be really helpful!

mageta52

@Sebastian-Roth I could try analyzing the traffic coming into the fog server during an attempted PXE boot to see what’s going on, I’ll report back in a while with the findings.

Sebastian Roth

@mageta52 You are more than welcome to upload a pcap file here in the forums or send me a private message if you need help with finding the issue in the packet dump. As you can read in the forums we’ve done this a couple of times and I feel this is one of the best ways to help people debugging their network issues. When you start looking at the packets and understanding what’s going on - this is when you find the solution. No worries, we’ll help you with that.

mageta52

Once again, the same pattern, machine 1 gets through and can register, machine 2 fails and stalls out.

I’m not terribly familiar with Tcpdump, but it’s built in, so this is what I got from the second machine…

16:52:11.886022 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:14.893389 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 548
16:52:14.893609 IP 192.168.235.52 > 192.168.235.17: ICMP echo request, id 35325, seq 0, length 28
16:52:15.887222 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:15.894726 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:16.889224 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:17.891224 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:18.902506 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 548
16:52:18.902730 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:19.903230 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:19.903360 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:20.905230 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:22.911623 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 548
16:52:22.911836 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:23.912965 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:23.913229 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:24.915229 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:26.920739 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 548
16:52:26.920963 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:27.922086 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:27.923228 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:28.925225 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:30.929858 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 548
16:52:30.930083 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:31.931204 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:31.931234 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:32.933228 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:34.938973 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 548
16:52:34.939191 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:35.939229 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:35.939363 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:36.941229 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:38.948090 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 548
16:52:38.948310 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:39.949434 IP 192.168.235.52.bootps > 255.255.255.255.bootpc: UDP, length 300
16:52:39.951227 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28
16:52:40.953226 ARP, Request who-has 192.168.235.17 tell 192.168.235.52, length 28

It looks like it wants to assign 192.168.235.17, but is unable to. My guess is that the arp is to make sure that the address is not in use already, but I’m not able to figure out if that’s coming from the PC, or from the fog server? Unfortunately I lost the successful exchange with the first machine. The output got blown away by the data transfer during the machine registration. If more info is needed I can provide it. If there are any switches to turn on with tcpdump for better results let me know and I’ll run the test again.

Wayne Workman

@mageta52 Those are just ARP broadcasts. We need to see everything. Look here: https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_TFTP#Troubleshooting
There’s instructions in there for TCPDump.

Sebastian Roth

@mageta52 This is not looking bad. But we need the full “content” of all those packets. You just need to add the -w command line parameter to dump to a file. As well a filter is probably a good idea: tcpdump -w /tmp/dhcp_works_sometimes.pcap udp or arp. Leave the command sitting there and do your client bootups. After success and failure stop tcpdump (ctrl + c) and upload that file to the forum.

mageta52

@Sebastian-Roth I’m completely swamped at work this week, and have to abandon this for a while, but I hope to capture the data on Monday.

Some hosts are unable to get an address through DHCP

123

12.7k

17.6k

156.6k