• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    SVN 4380 Cloud 5419 (on Ubuntu 14.04.3) Fog not consistently tftp booting from location

    Scheduled Pinned Locked Moved Solved
    General
    4
    11
    3.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      Malos
      last edited by Malos

      Not 100% sure if this is a “bug report” or not, and not sure how to even write this one out as I like to provide as much info as possible but…

      I’m having the darndest time getting a fog client to TFTP boot (and more importantly, pull down the image itself locally) consistently from a storage node in a remote subnet. Sometimes boot.php on the master install will direct clients to image from the local storage node which is 100% the desired behavior, and sometimes boot.php will direct the client to pull down bzimage etc from the master server.

      On each storage node, I am replacing the storage node IP in the chain line of /tftpboot/default.ipxe with the IP of the master install. This is per some documentation I read once upon a time for an older version of fog, and it’s worked fine up until recently.

      Poking around the database in the location table, I’m noting that the lTftpEnabled column entries all contain o (as in lowercase orange, not zero) for locations that appear as TFTP Boot Enabled in the GUI, rather than 1 which is what I would expect?

      In fact, changing this “o” to any value at all appears to retain the TFTP Boot Enabled setting in the GUI. Changing this value to 0 does change TFTP Boot Enabled in the GUI to “unchecked” as one would expect. Unchecking it in the GUI and saving the change actually blanks out this field, rather than changing it to zero.

      I don’t know if the databse oddity noted is relevant at all, but the expected behavior from my end is if the location is set for a host, then it needs to pull boot and image files strictly from that local tftp-enabled storage node.

      Sorry if this is rambling. I’m having a difficult time nailing down problem duplication steps because it doesn’t seem to be acting consistently either way.

      1 Reply Last reply Reply Quote 0
      • Tom ElliottT
        Tom Elliott
        last edited by

        Are you using the location plugin? Tftp enabled is only for locations and it is not truly tftp directing. All that option does is tell the host to get its bzImage and init.xz from its specified location. What plugins do you have and what are their relevant entries?

        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

        M 1 Reply Last reply Reply Quote 0
        • M
          Malos @Tom Elliott
          last edited by Malos

          @Tom-Elliott Yes, the location plugin is installed. bzImage and init.xz being pulled from the specified location is exactly something that I desire, so I’m not off my rocker (yet) so far, even if my initial understanding of what Tftp enabled was a bit off!

          Each storage nodes referenced by a given location are local to all clients in that location, unfortunately sometimes the client does boot and pulls down bzimage and init (and then the image) from the master server which is not local to that client.

          Location plugin is the only one installed, and it lists its location as “…/lib/plugins/location/”

          1 Reply Last reply Reply Quote 0
          • S
            Sebastian Roth Moderator
            last edited by

            @Malos said:

            … sometimes … sometimes … sometimes …

            Are you able to reproduce under which circumstances clients boot from the right/wrong server? To me this sounds like there are several DHCP servers offering information to the clients. Sometimes they get the “correct” info first but sometimes not.
            Are you willing and able to hook a hub in front of one of your clients and capture the traffic using wireshark/tcpdump? I’d be really interested to see the packet dump. Hopefully we can figure things out this way.

            Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

            Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

            M 2 Replies Last reply Reply Quote 1
            • M
              Malos @Sebastian Roth
              last edited by

              @Sebastian-Roth

              Got something concrete that I picked up on finally!

              When a host with a pending task boots, it pulls down bzimage and init (and then the image) from the master server, and then if I shut off the host halfway through the task and reboot it, it boots and pulls bzimage and init/image from the correct storage node location, rinse repeat and pulls bzimage and init/image from master, rinse repeat from node etc.

              This flipping action happens very consistently once the task has started.

              OK! Now, if I power off the host before the bzimage and init pulldown finishes (so, before the screen flashes and clears over to the imaging process itself), booting the host again will pull everything down from the same server just as before. So it’s almost like something gets toggled in the database side of things, perhaps in the task itself right as the image kicks off that might be causing this?

              1 Reply Last reply Reply Quote 0
              • M
                Malos @Sebastian Roth
                last edited by

                @Sebastian-Roth
                It looks like whatever is causing the flip is changing the taskNFSMemberID column in the task (in tasks table) to 1 (the master server in my case) or to 3 (the correct location storage node)

                I would be willing to capture a dump somehow if you feel it would be helpful, but I’m very certain that there are not multiple DHCP servers, as there’s no other noted issues in my environment.

                Wayne WorkmanW 1 Reply Last reply Reply Quote 0
                • Wayne WorkmanW
                  Wayne Workman @Malos
                  last edited by

                  @Malos said:

                  I would be willing to capture a dump somehow if you feel it would be helpful, but I’m very certain that there are not multiple DHCP servers, as there’s no other noted issues in my environment.

                  https://wiki.fogproject.org/wiki/index.php/Troubleshoot_TFTP
                  There are steps in there for doing a capture on the fog server.

                  But, since we’re looking at DHCP specifically - you can simply do a capture with Wireshark using any computer that is connected to the same network that you’ll be booting the trouble-host on. The capturing computer will hear all of the broadcast messages on the network and that’s what Sebastian was wanting to look at.

                  Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG!
                  Daily Clean Installation Results:
                  https://fogtesting.fogproject.us/
                  FOG Reporting:
                  https://fog-external-reporting-results.fogproject.us/

                  1 Reply Last reply Reply Quote 0
                  • S
                    Sebastian Roth Moderator
                    last edited by Sebastian Roth

                    @Wayne-Workman Thanks for explaining and pointing this out. We’d actually see all the broadcasts and don’t really need a hub. You are right. But @Malos’s findings sound pretty reasonable (reproduceable) and I think we better have a look down that alley before picking up the big gun. 😉

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    1 Reply Last reply Reply Quote 0
                    • S
                      Sebastian Roth Moderator
                      last edited by

                      I’ve poked through the code a little and to me it seams like things might go wrong here: lib/reg-task/TaskQueue.class.php
                      But I don’t know enough about the PHP code and @Tom-Elliott needs to have a look I suppose.

                      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                      Tom ElliottT 1 Reply Last reply Reply Quote 0
                      • Tom ElliottT
                        Tom Elliott @Sebastian Roth
                        last edited by

                        @Sebastian-Roth I found and fixed the issue last night. Thanks for pointing out but for this particular problem it was related to the change items hook of the location plugin and the location association class. The problem was I was trying to get the storage node from the association which doesn’t maintain the node or group information. The other half of it was the storage group was getting the list of all enabled nodes, not all enabled nodes that are within its group. This should be fully fixed now.

                        Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                        M 1 Reply Last reply Reply Quote 0
                        • M
                          Malos @Tom Elliott
                          last edited by

                          @Tom-Elliott Confirmed, tasks are pulling down the boot files and image data from the correct node consistently, and updating nodes pulls from the newly set node as well. Awesome work, thanks!

                          1 Reply Last reply Reply Quote 0
                          • 1 / 1
                          • First post
                            Last post

                          184

                          Online

                          12.0k

                          Users

                          17.3k

                          Topics

                          155.2k

                          Posts
                          Copyright © 2012-2024 FOG Project