• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Pxe-Boot gets hung up on TFTP

    Scheduled Pinned Locked Moved
    FOG Problems
    2
    10
    2.8k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Z
      zaccx32
      last edited by

      Hello,
      We have been using FOG for deployment for a while with no issues until recently. While most machines boot perfectly, random machines will contact DHCP and then get stuck on TFPT. It will eventually boot up but takes hours and sometimes days for image deployment. If we move the “problem” machine to a new VLAN, the machine boots just fine. We are primarily working with Dell Optiplex 3010,3020,3040, and 3050s doing a legacy boot. Any ideas? Thanks!

      1 Reply Last reply Reply Quote 0
      • george1421G
        george1421 Moderator
        last edited by george1421

        @zaccx32 said in Pxe-Boot gets hung up on TFTP:

        get stuck on TFPT

        1. What does this mean, gets stuck??
        2. What error do you see?
        3. What mode is this computer in bios or uefi?
        4. Is the FOG server on the same subnet as the target computer?
        5. Is it predictable and you can create the error on demand?
        6. All of the troubled systems have the latest available firmware (esp the 3010s)?
        7. What device is your dhcp server (mfg and model)

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

        1 Reply Last reply Reply Quote 0
        • Z
          zaccx32
          last edited by

          What does this mean, gets stuck?? - When the machine tries to contact TFTP, normally it will take a fraction of a second, however, certain machines will sit there and try to contact it for 15 minutes.
          What error do you see? - there are no errors. It eventually boots up but takes over 30 minutes to boot into windows.
          What mode is this computer in bios or uefi? Legacy
          Is the FOG server on the same subnet as the target computer? All of the issues are in a different subnet.
          Is it predictable and you can create the error on demand? This is not predictable and happens at random. Will will swap the machine out with same make and model and it will boot fine. We have not been able to reproduce it on demand.
          All of the troubled systems have the latest available firmware (esp the 3010s)? This is a good question. Is there a good way to check the version of the FOG client on the machines?
          What device is your dhcp server (mfg and model) Windows 2012

          Thanks in advance! I am new to this world and have been learning on the go so I apologize for any miscommunication!

          george1421G 1 Reply Last reply Reply Quote 0
          • george1421G
            george1421 Moderator @zaccx32
            last edited by

            @zaccx32 said in Pxe-Boot gets hung up on TFTP:

            All of the troubled systems have the latest available firmware (esp the 3010s)? This is a good question. Is there a good way to check the version of the FOG client on the machines?

            I should have said dell bios version.

            So the same computer on the same network port will sometimes go fast and other times wait 15 minutes? If so that sounds like network infrastructure. That iPXE boot loader is pretty small (< 100KB). It should go lighting quick. If you could predict the delay it would be interesting to get a pcap (packet capture) of the pxe booting process from a mirrored port using wireshark

            I had an idea if you had a rogue dhcp server it might cause this issue without something timing out. Again if you can predict the problem you can capture at least the dhcp part using a witness computer on the same subnet as the failing computer. For wireshark you would use the capture filter of port 67 or port 68. It would be interesting to see if you are getting more than one OFFER packet and to see if you know the source of each OFFER host system.

            Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

            Z 1 Reply Last reply Reply Quote 0
            • Z
              zaccx32
              last edited by

              I just did a capture with wireshark and we are getting a single OFFER packet and there is a successful 3 way handshake with the gateway but then no other communication. It also randomly happened on another machine earlier when it was working fine this morning. I also updated the BIOS and that made no difference.

              george1421G 1 Reply Last reply Reply Quote 0
              • george1421G
                george1421 Moderator @zaccx32
                last edited by

                @zaccx32 Ok on a witness computer you get all 4 packets (discover, offer, request, ack) and they happen pretty quickly.

                If you look at the offer packet. In the ethernet header (above the dhcp options) there should be a {next-server} field and that should be the IP address of the fog server. Down a little bit there should be a {boot-file} field, for bios computers it should be undionly.kpxe. If both are there, then scroll down a bit to the dhcp options 66 and 67 those values should mirror the header exactly.

                Now if you can get a mirror port setup what we should see right after the ACK is the computer attempting to reach out to the {next-server} and download the {boot-file} using the tftp protocol. To capture this on a mirror port you would need to use the capture filter of port 67 or port 68 or port 69 To capture all of that in one pcap. I think somewhere between the ACK and the tftp download something must be falling down. If you startup tcpdump on the FOG server you can capture the tftp request without needing a mirror port. So you will use wireshark on the witness computer and tcpdump on the fog server to get the entire picture. For tcpdump you can get the commands here: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue there has to be something going on in that 15 minute pxe booting gap. Like a lot of retrans, or something.

                Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                1 Reply Last reply Reply Quote 0
                • Z
                  zaccx32 @george1421
                  last edited by

                  @george1421
                  I have been running more captures on a machine that is booting very slowly and have noticed that it is getting multiple OFFER packets. 2 are from a single server and another packet if from another server. We do know what each of the servers are. We do have two DHCP servers within the same scope that are for load balancing.

                  george1421G 1 Reply Last reply Reply Quote 0
                  • george1421G
                    george1421 Moderator @zaccx32
                    last edited by

                    @zaccx32 said in Pxe-Boot gets hung up on TFTP:

                    We do have two DHCP servers within the same scope that are for load balancing

                    Make sure that both are setup to support pxe booting. We’ve see situations where one is configured and the other is not and the clients will get random pxe boots depending on which dhcp server responds first.

                    Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                    Z 1 Reply Last reply Reply Quote 0
                    • Z
                      zaccx32 @george1421
                      last edited by

                      @george1421
                      Hey George,
                      I took a look at the two DHCP servers and both have the same server options with the the correct Boot server host name and bootfile name. I have not noticed anything out of the ordinary with DNS and DHCP. I did find a way to identify which machines are being affected by looking for extremely slow copy speeds when deploying files with PDQ. 4MB files will take over an hour to copy on machines that are being affected. I also have noticed that the time to boot is inconsistent. Most times it will take 15-20 minutes but a few times it takes 5 minutes but then takes 20 minutes on the next boot. When doing a pcap, I have noticed quite a bit of DNS query name errors as well. Pcap.PNG DHCP.PNG

                      george1421G 1 Reply Last reply Reply Quote 0
                      • george1421G
                        george1421 Moderator @zaccx32
                        last edited by

                        @zaccx32 said in Pxe-Boot gets hung up on TFTP:

                        I did find a way to identify which machines are being affected by looking for extremely slow copy speeds when deploying files with PDQ. 4MB files will take over an hour to copy on machines that are being affected.

                        This here kind of tells me its network infrastructure. What I would do is look at the network switch port (hopefully you have a managed switch) and look at the port counters. See if you are having a lot of crc errors. Can you generalize and say all computers from area A have a problem but not from area B. If this is the case then the troubles may be on an uplink port between the area A switch and the next switch in line. Again the port counters might give you a clue to what is not right. IMO if you can duplicate the error with 2 different servers then its probably not the FOG server at fault.

                        I did notice something in your dhcp screen shot. Its not a problem with fog, but in your polycom scope, you should probably remove the undionly.kpxe boot file name. Its not relevant to a voip phone and may cause problems.

                        The other thing since your dhcp server supports profiles you might want to take a look at this wiki page to setup profiles for both bios and uefi booting: https://wiki.fogproject.org/wiki/index.php/BIOS_and_UEFI_Co-Existence#Using_Windows_Server_2012_.28R1_and_later.29_DHCP_Policy

                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post

                        241

                        Online

                        12.0k

                        Users

                        17.3k

                        Topics

                        155.2k

                        Posts
                        Copyright © 2012-2024 FOG Project