• Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login
  • Recent
  • Unsolved
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Register
  • Login

Trouble with DHCP after loading undionly.kpxe (xcp-ng)

Scheduled Pinned Locked Moved Solved
FOG Problems
dhcp boot error
3
9
1.1k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    magikmw
    last edited by magikmw Sep 18, 2019, 5:02 PM Sep 18, 2019, 1:13 PM

    Hi,

    Some details first:

    • FOG v.1.5.7 (current github master), did not setup DHCP during installation, ip 10.0.20.32/24
    • DHCP served by router at 10.0.20.1/24, options 66/67 set to 10.0.20.32 / undionly.kpxe
    • both FOG and the client I’m testing with run as VMs on a xcp-ng hypervisor in the same VLAN
    • router and hypervisor are connected via a single switch with no STP or similar enabled
    • both VMs have no networking problems once booted, and client receives IP via DHCP with no problems

    The problem:
    The PXE seems to pickup and IP from DHCP and loads undionly.kpxe correctly, however it ends up failing in the next step: ipxe_dhcp_fail.png
    I’ve seen several threads with similar errors, but nothing I found worked (disable STP/use fastport doesn’t seem to apply).

    Troubleshooting I did already:

    • I’ve dumped DHCP/TFTP traffic passing through the hypervisor, available here: removed link
    • Seems like the ipxe client doesn’t accept or receive the DHCP offer.
    • Dropping to the ipxe shell and trying to ping anything results in a connection failure.
    • From the shell I can set a static IP address and that allows me to ping, but not to boot (autoboot and dhcp net0 commands just fail)
    • I’ve tried all .kpxe, .kkpxe and .pxe files aviable in /tftpboot with identical results.
    • The legacy bootfile (pxelinux.0.old) does boot into a FOG menu and even allows booting from the hard drive, however register option results in a kernel panic (which I think is to be expected).

    I’m at a total loss. The only thing I haven’t tried yet is trying to boot a physical machine to see if it’s not a problem with the hypervisor networking. Even then, not sure what would cause this. There are no other network problems in the environment.

    Hope anybody can help me. Let me know if you need any more information.

    G 1 Reply Last reply Sep 18, 2019, 3:12 PM Reply Quote 0
    • M
      magikmw
      last edited by Sep 18, 2019, 10:56 PM

      I’ve figured it out! Turns out, I’ve had a NIC bond misconfigured. Switch had settings for a static LAG, the host just used an active-active backup. I’ve reconfigured both to use proper LACP, effectively bonding the two ports on both devices, and now the broadcasts work, and the FOG menu boots!

      Thanks both @george1421 and @Sebastian-Roth for your help, I don’t think I’d find energy to go digging for this stuff if you didn’t push me in the right direction.
      Turns out it wasn’t really about FOG or PXE, but it did help find an issue with my network I had no idea existed. Comes to show how interconnected technology is.

      Here’s the article that helped me configure the LACP, if anyone faces a similar problem: https://support.citrix.com/article/CTX135690 (xcp-ng is basically opensource Citrix Hypervisor, formerly XenServer).

      1 Reply Last reply Reply Quote 2
      • G
        george1421 Moderator @magikmw
        last edited by Sep 18, 2019, 3:12 PM

        @magikmw Try to use ipxe.kpxe instead of undionly.kpxe for the iPXE boot loader. Its possible that the undi component of your hypervisor isn’t compatible with iPXE. Possibly one of the drivers in ipxe.kpxe will work better than the undi driver.

        Since I don’t know that hypervisor, what NIC emulation is your VM configured for?

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        M 1 Reply Last reply Sep 18, 2019, 3:50 PM Reply Quote 0
        • M
          magikmw @george1421
          last edited by Sep 18, 2019, 3:50 PM

          @george1421 ipxe.kpxe produced the exact same result (I’ve tried it before, too).

          I have two options for NIC emulation: Realtek RTL819 (default) and Intel e1000.

          G 1 Reply Last reply Sep 18, 2019, 3:52 PM Reply Quote 0
          • G
            george1421 Moderator @magikmw
            last edited by Sep 18, 2019, 3:52 PM

            @magikmw So I would pick the e1000 emulation if I had a choice.

            Now is that network connection bridged or natt’d? I see the target system is getting 10.0.20.206, is that the correct IP address for your network?

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

            M 1 Reply Last reply Sep 18, 2019, 3:58 PM Reply Quote 0
            • M
              magikmw @george1421
              last edited by Sep 18, 2019, 3:58 PM

              @george1421 Alright. I’ve tried with e1000, no change.

              The VM connection is bridged via virtual switch to a physical switch between the host and router.
              Both virtual switch and physical switch trunk a VLAN to the router/DHCP (it’s transparent to the VM).
              The host’s connection is a bonded 2 port NIC.
              10.0.20.206 is correct (DHCP range is 10.0.20.200-240).

              1 Reply Last reply Reply Quote 0
              • S
                Sebastian Roth Moderator
                last edited by Sep 18, 2019, 4:36 PM

                @magikmw said in Trouble with DHCP after loading undionly.kpxe (xcp-ng):

                I’ve dumped DHCP/TFTP traffic passing through the hypervisor,

                That’s great and looks really interesting. First I noticed that packets seem to be partly duplicated in the PCAP. I see 18 DHCP Discover packets from your client within one second (0.85 s really) before the DHCP server sends an Offer. Very slow response in a network. Similar with the subsequent DHCP Request and DHCP ACK - 9 Requests (this time in a very short time) before ACK is sent. Looks really strange to me.

                To make a long story short, I just noticed that on the first round (BIOS PXE boot) there are two Offer and two ACK packets, one of each VLAN tagged and one of each without. On the secount round (iPXE) I only see DHCP Offers with VLAN tag (ID 20 by the way). So to me it seems like the DHCP server behaves different depending on the DHCP Discover packet. Even more strange than the stuff before.

                VLAN is the key I suppose! Can’t you terminate the VLAN on the switch or hypervisor?

                Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                M 1 Reply Last reply Sep 18, 2019, 9:32 PM Reply Quote 1
                • M
                  magikmw @Sebastian Roth
                  last edited by magikmw Sep 18, 2019, 5:02 PM Sep 18, 2019, 9:32 PM

                  @Sebastian-Roth
                  I’m sorry, I think that’s a dead end. I’ve made a mistake of dumping packets from all interfaces on the host without thinking about it. Here’s just the dump from the bridge, showing packets as they appear outside of the VM’s interface: removed link

                  The untagged packets were just the same thing, just from VM’s perspective - as I mentioned the VLAN is transparent to the VM, and the VM behaves like it’s just plugged into any dumb switch for what it cares.

                  I’m looking into dumping the packets from just the VM’s virtual interface, but it’s a bit tricky as the interface is only created after the VM is set to start, so I’ll have to juggle it a bit.

                  Anyway the DISCOVER and OFFER packets do appear different before and after getting the *.kpxe file from FOG, with ‘after’ being bigger a few bytes. Here’s a diff on two of them:

                  < Frame 1: 441 bytes on wire (3528 bits), 441 bytes captured (3528 bits)
                  ---
                  > Frame 2: 458 bytes on wire (3664 bits), 458 bytes captured (3664 bits)
                  14,17c14,17
                  <     Transaction ID: 0x512c7b59
                  <     Seconds elapsed: 8
                  <     Bootp flags: 0x0000 (Unicast)
                  <         0... .... .... .... = Broadcast flag: Unicast
                  ---
                  >     Transaction ID: 0xaa16170a
                  >     Seconds elapsed: 4
                  >     Bootp flags: 0x8000, Broadcast flag (Broadcast)
                  >         1... .... .... .... = Broadcast flag: Broadcast
                  48c48
                  <         Length: 21
                  ---
                  >         Length: 23
                  55a56
                  >         Parameter Request List Item: (26) Interface MTU
                  59a61
                  >         Parameter Request List Item: (119) Domain Search
                  71,72c73,74
                  <         Length: 45
                  <         Value: b105018086100e2201011901012101011801011101011301…
                  ---
                  >         Length: 60
                  >         Value: b1050800000000eb03010000170101220101160101130101…
                  79c81
                  <         Client Identifier (UUID): f278572e-dd57-1b1a-2e8b-c3d9b21795b9
                  ---
                  >         Client Identifier (UUID): 2e5778f2-57dd-1a1b-2e8b-c3d9b21795b9
                  82c84
                  < 
                  ---
                  >  
                  

                  Apparently the new one is a broadcast instead of unicast? I’m not sure what the significance is.

                  Same for offer:

                  < Frame 3: 346 bytes on wire (2768 bits), 346 bytes captured (2768 bits)
                  < Ethernet II, Src: Ubiquiti_bd:c7:6f (78:8a:20:bd:c7:6f), Dst: 6e:4e:8f:1e:0b:fb (6e:4e:8f:1e:0b:fb)
                  ---
                  > Frame 7: 359 bytes on wire (2872 bits), 359 bytes captured (2872 bits)
                  > Ethernet II, Src: Ubiquiti_bd:c7:6f (78:8a:20:bd:c7:6f), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
                  7c7
                  < Internet Protocol Version 4, Src: 10.0.20.1, Dst: 10.0.20.202
                  ---
                  > Internet Protocol Version 4, Src: 10.0.20.1, Dst: 255.255.255.255
                  14c14
                  <     Transaction ID: 0x512c7b59
                  ---
                  >     Transaction ID: 0xaa16170a
                  16c16
                  <     Bootp flags: 0x0000 (Unicast)
                  ---
                  >     Bootp flags: 0x8000, Broadcast flag (Broadcast)
                  33a34
                  >     Option: (119) Domain Search
                  35d35
                  < 
                  
                  1 Reply Last reply Reply Quote 0
                  • M
                    magikmw
                    last edited by Sep 18, 2019, 10:56 PM

                    I’ve figured it out! Turns out, I’ve had a NIC bond misconfigured. Switch had settings for a static LAG, the host just used an active-active backup. I’ve reconfigured both to use proper LACP, effectively bonding the two ports on both devices, and now the broadcasts work, and the FOG menu boots!

                    Thanks both @george1421 and @Sebastian-Roth for your help, I don’t think I’d find energy to go digging for this stuff if you didn’t push me in the right direction.
                    Turns out it wasn’t really about FOG or PXE, but it did help find an issue with my network I had no idea existed. Comes to show how interconnected technology is.

                    Here’s the article that helped me configure the LACP, if anyone faces a similar problem: https://support.citrix.com/article/CTX135690 (xcp-ng is basically opensource Citrix Hypervisor, formerly XenServer).

                    1 Reply Last reply Reply Quote 2
                    • S
                      Sebastian Roth Moderator
                      last edited by Sep 19, 2019, 7:01 AM

                      @magikmw Glad to see you found and fixed this. It’s good to know you really know your way around in that network stuff. It’s very hard for us to diagnose and help out with that kind of things.

                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                      1 Reply Last reply Reply Quote 0
                      • 1 / 1
                      1 / 1
                      • First post
                        1/9
                        Last post

                      242

                      Online

                      12.0k

                      Users

                      17.3k

                      Topics

                      155.2k

                      Posts
                      Copyright © 2012-2024 FOG Project