• Recent
    • Unsolved
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    Crash due to timeout in tg3 kernel module: tg3_stop_block timed out, ofs=4c00, enable_bit=2

    Scheduled Pinned Locked Moved Solved
    FOG Problems
    4
    28
    8.1k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tom ElliottT
      Tom Elliott
      last edited by

      What does the command dmesg show? It sounds to me like we just need to add the tg3 firmware module to the build like all the other tg3 nics I had to do before.

      Please help us build the FOG community with everyone involved. It's not just about coding - way more we need people to test things, update documentation and most importantly work on uniting the community of people enjoying and working on FOG! Get in contact with me (chat bubble in the top right corner) if you want to join in.

      Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

      Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

      P 1 Reply Last reply Reply Quote 0
      • P
        Paulo.Guedes @Tom Elliott
        last edited by Paulo.Guedes

        @tom-elliott
        Hello Tom, I have added a few dmesg logs in the messages below. I think it’s not related to the firmwares, since the kernel builds ok, but the module crashes.

        Hello all, it’s a real pleasure to finally say that IT WORKED!!! Wow, it finally worked! I almost can’t believe it. Thank you so much for all your help.

        Aham. The solution was found by Sebastian (thanks Sebastian!!!). Here I just describe the process.

        The message thread that contains the solution and a patch. It describes precisely the failure scenario: The same NIC, boot over the network, then a 10/100 switch, then the way the tg3 kernel module breaks with a timeout.
        https://www.mail-archive.com/netdev@vger.kernel.org/msg189347.html

        The patch:
        https://www.mail-archive.com/netdev@vger.kernel.org/msg189923/0001-tg3-Add-clock-override-support-for-5762.patch

        The kernel version: 4.13.3
        https://www.kernel.org/pub/linux/kernel/v4.x/
        https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.13.3.tar.xz

        Basically I followed the instructions to rebuild a static image.
        Download the kernel and the patch; extract the kernel, apply the patch. Build an image (mine was a 64 bit one).
        https://wiki.fogproject.org/wiki/index.php?title=Build_TomElliott_Kernel

        Install the build inside fog, then try to image something over ethernet with the regular procedure: using pxe to boot.

        Without a patch, the deploy will fail with a timeout crash inside tg3. Now it should work flawlessly.
        If you wish to just

        If you wish, I’ve built a 64-bit image, ready to be used inside fog. Here it is.
        https://goo.gl/n1qBES

        Regards,
        Paulo
        p.s.: I really hope nothing has changed inside the firmware repository, and the fix is not due to a new firmware. Maybe it’s worth trying the same kernel with the same firmware repository, but without the patch (to see if it breaks). Anyway, it works, and this is what matters:)

        1 Reply Last reply Reply Quote 0
        • S
          Sebastian Roth Moderator
          last edited by Sebastian Roth

          @Paulo-Guedes Oh that’s really great to hear that we have figured out this at least! Probably a real pleasure to see it image nicely now!!!

          We are more than happy to add a patch to the FOG kernel but we also should look into if it will make it into the official kernel as well. Last comment on the mailing list was:

          Good. We will work on required changes and upstream proper patch after
          sanity test with multiple speeds.

          Can anyone figure out if and where this patch made it into the upstream kernel? If not we ought to push the developers to do so.

          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

          P 1 Reply Last reply Reply Quote 0
          • P
            Paulo.Guedes @Sebastian Roth
            last edited by

            @sebastian-roth
            As far as I can tell, the patch for tg3 was not inside the release candidates for the current kernel. I’ve tested 4.15-RC8 and it was not working. Then RC9 was released (no idea about it). Two days ago a brand new stable version was released. Will try it and see what happens.
            https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.15.tar.xz

            I just checked the changelog and it mentions nothing related to tg3, tigon, timeout or broadcom. I would bet this patch is not in here yet. Here it is.
            https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.14.15

            I will try to run more tests today. One with a 4.13.3 without the patch, to see if it breaks (and hence, the patch is the real fix). And another with 4.15 (with and without patch), to see if it is fixed and, in case it’s not, if the patch applies cleandly and works. Meanwhile, yesterday I wrote in another thread (with the same bug), asking people from there to double check our findings. Maybe they can take a look too, and see what happens.

            1 Reply Last reply Reply Quote 1
            • S
              Sebastian Roth Moderator
              last edited by

              @Paulo-Guedes Yeah right, seems like the patch didn’t make it into the kernel yet. Probably a good idea to get in contact with the guy posting the patch. You can find his e-mail address in the patch file! Definitely send him a short message to see what the current state is and tell him that the fix is working great to fix your issue.

              Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

              Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

              P 1 Reply Last reply Reply Quote 0
              • P
                Paulo.Guedes @Sebastian Roth
                last edited by

                @sebastian-roth
                Hello Sebastian, all,

                1. Stable kernels 4.13.3 and 4.15 crash without the patch. Patch is not merged yet in the main branch.

                2. Stable kernels 4.13.3 and 4.15 work great with the patch: no timeouts on tg3. Fast transfers on gigabit links and 10/100 links.

                3. Wrote to the patch author as Sebastian suggested, with my results and asking when it will be merged. Waiting for his answers. Patch has a slight offset for 4.15 (2 lines, probably new comments or code) but works anyway. Will keep you updated on this.

                4. Deploy for single machines (in parallel without multicast) is finally checked. Tested overnight with a bunch of machines and it’s ok.

                5. If you wish, I can upload the patched 4.15 kernel tomorrow, just in case someone wants to use it.

                6. Multicast deploy for groups of machines is working too, but much slower (about 10x) than my 10/100 network could transfer. Same network, same machines, no cable touched, nothing reset and… the deploy already starts at a slow speed (between 100 and 200 MB/min). Just reporting. Will start reading about it, to try to understand the problem. If anyone can point me on the right direction, please answer this message.

                1 Reply Last reply Reply Quote 0
                • S
                  Sebastian Roth Moderator
                  last edited by

                  @Paulo-Guedes Great stuff! Keep it up and I am sure we’ll have you up and running soon.

                  About multicast… First, please open a new thread on this topic. I don’t like to mix things up all in one thread. And then keep in mind that it’s always the slowest part of the chain which limiting the speed. So if there is just one single client with a crappy hard drive it will slow down all the other hosts. So I’d start by testing multicast in groups of maybe 3 to 5 machines each and see if those are all going at the same slow pace or if some groups are faster than others.

                  Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                  Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                  1 Reply Last reply Reply Quote 1
                  • S
                    Sebastian Roth Moderator
                    last edited by

                    @Tom-Elliott Paulo told me that he’s sent a message to the guy at Broadcom to ask if the fix would be included in the main line kernel at some point but he hasn’t got an answer from him. So I am wondering if you are happy adding the patch to our kernel for now? Paulo has had huge trouble and the patch solved the network issues for him. Take a look here: https://www.mail-archive.com/netdev@vger.kernel.org/msg189923/0001-tg3-Add-clock-override-support-for-5762.patch

                    Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                    Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                    1 Reply Last reply Reply Quote 0
                    • P
                      Paulo.Guedes
                      last edited by

                      Hello, just updating.

                      1. No answer so far from Broadcom. Tom, adding the patch would be good.

                      2. Added a link to this discussion in another thread. I think it’s the same problem.
                        Maybe they can also report on the problem.
                        https://forums.fogproject.org/topic/9976/hp-elitedesk-705-g2-mini

                      3. Mentioned the patch and test results in another forum. Hope this helps the patch to enter the main kernel faster.
                        https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664

                      1 Reply Last reply Reply Quote 1
                      • S
                        Sebastian Roth Moderator
                        last edited by Sebastian Roth

                        Somehow I have lost track of this. Luckily I somehow came across this again and added the patch now as it seems like it still hasn’t made it into the main line kernel. Also added the patch information to our wiki article on kernel compiling. Just in case anyone reads this thread and wonders where it all went.

                        @Paulo-Guedes Have you ever heard back from that broadcom guy?

                        Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                        Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                        1 Reply Last reply Reply Quote 0
                        • S
                          Sebastian Roth Moderator
                          last edited by

                          @Paulo-Guedes Ahh, I just saw that a fix was actually added upstream in Juli this year: https://lkml.org/lkml/2018/7/23/671 (just didn’t notice it a little further down the code)

                          Can you confirm this is fixing your issue? Have you used one of the official FOG kernels since then? Which versions?

                          Web GUI issue? Please check apache error (debian/ubuntu: /var/log/apache2/error.log, centos/fedora/rhel: /var/log/httpd/error_log) and php-fpm log (/var/log/php*-fpm.log)

                          Please support FOG if you like it: https://wiki.fogproject.org/wiki/index.php/Support_FOG

                          1 Reply Last reply Reply Quote 0
                          • 1
                          • 2
                          • 2 / 2
                          • First post
                            Last post

                          230

                          Online

                          12.0k

                          Users

                          17.3k

                          Topics

                          155.2k

                          Posts
                          Copyright © 2012-2024 FOG Project