Hello, sorry for taking long to answer this. Too much (time-sensitive) things at work. Well, I managed to find out a few new details on this bug. It seems that this AMD architecture is still not yet well supported.
This message mentions a few changes on the tigon3, including a workaround that is specific for my network card. I tested it, but it’s not working.
https://lkml.org/lkml/2017/12/31/125
<…>
Siva Reddy Kallam (3):
tg3: Update copyright
tg3: Add workaround to restrict 5762 MRRS to 2048
tg3: Enable PHY reset in MTU change path for 5720
<…>
According to this thread, the fix still does not solve the issue. Last post: 2018-01-16.
It’s the patch for tg3, aimed to my specific ethernet card (5762).
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664
Meanwhile, I have downloaded and rebuilt the latest linux release candidate, which has this patch for the tg3 module.
The 4.15-rc8 is available here:
https://git.kernel.org/torvalds/t/linux-4.15-rc8.tar.gz
The bzimage file was created as a static TomElliot 64 bit image.
https://wiki.fogproject.org/wiki/index.php?title=Build_TomElliott_Kernel
Unfortunately, my tests with this kernel showed no improvements on the timeout issue. The problem still happens. I tried a few kernel parameters, without success. This is a vanilla (+TomElliot config) kernel. Not tainted, although it has the firmware repository inside.
However, I finally got kernel logs. You can check them in the links below.
log_01_acpi_off.txt
https://pastebin.com/FGQNiLqk
log_02_maxcpus_1.txt
https://pastebin.com/2eEJnA3Z
log_03_nmi_watchdog_off.txt
https://pastebin.com/Su44AqiX
log_04_nmi_watchdog_off.txt
https://pastebin.com/4ja0UZ0c
log_05_noapic_nolapic.txt
https://pastebin.com/fZNJbME5
The kernel parameters were used as follows. Some were inspired by the logs (tsc), some just to… see what happens.
debug loglevel=7
debug loglevel=7 acpi=off
debug loglevel=7 acpi=off tsc=unstable
debug loglevel=7 acpi=off tsc=unstable maxcpus=1
debug loglevel=7 acpi=off tsc=unstable maxcpus=1 nmi_watchdog=0
debug loglevel=7 acpi=off tsc=unstable maxcpus=1 nmi_watchdog=0 noapic nolapic
Sometimes it’s difficult to get logs as the machine hangs right after the network stops working.
Here is the mrrs patch for tg3, related to the 5762 hw version. My test has this applied, but still does not fix the problem.
https://github.com/torvalds/linux/commit/4419bb1cedcda0272e1dc410345c5a1d1da0e367#diff-ee9b0abeec638cc316efd5b30e0e01e8
Any ideas? Would you like logs with other parameters? Is there anything I can do to provide further information? lsusb? lspci? lscpu? anything?
Regards,
Paulo
p.s.: by the way, I also spotted network issues on a live Ubuntu image (17.10.1), both on wired (tg3) and wireless (iwlwifi) network cards.