DHCP Lease Failing on 1.5.5 after upgrade - again
-
@Sebastian-Roth Yes I will have access to few machines to make test.
-
@totoro Ok, here we go, I started to build all the kernel versions between 4.18.3 and 4.19.1 to see if can pinpoint it to a specific kernel version introducing the issue. Find the kernels here: https://fogproject.org/kernels/r8169/ (only 64 bit as your machines seems to be that arch)
Please start testing upwards starting from 4.18.4. Download the kernels manually, put in
/var/www/html/fog/service/ipxe/
and then set Host Kernel in the host settings of your test machine. Schedule a task for that machine, boot it up and see if it is able to get an IP from your DHCP. If it fails on 4.18.4 already, then please go back to 4.18.3 again and make sure it works with that. Just to make sure it’s not something else causing the issue at that point.Be aware, this is only the very first stage. To be able to send in a proper bug report to the kernel developers we’ll probably need to test different commits between the official kernel releases as well. Stay tuned!
-
@Sebastian-Roth So I make test with allmost all kernel so the problem come back when I pass from 4.18.20.64 to 4.19.64 kernel so the problem is on the last one.
I hope it’s help. -
@totoro Great, thanks for testing. I’ll look into the code changes between 4.18.20 and 4.19. Probably will compile some more kernels to test for you soon.
-
@Sebastian-Roth No more test ?
-
@totoro Sorry for the delay. Turns out there have been major changes in that part of the code between those versions:
git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git cd linux-stable/ git diff --stat v4.18.20 v4.19 ... drivers/net/ethernet/realtek/Kconfig | 3 +- drivers/net/ethernet/realtek/r8169.c | 1120 +++++-------- ...
I am still working my way through those to see how we can properly test which one of those 1120 lines of changed code is causing the issue…
-
@totoro Ok, I think it’s best to compile binaries for each and every commit that seems related to that network driver. In the same download location you find new binaries named like
bzImage_r8169_...
- numbered from 1 to 64 (might not have compiled and uploaded all of them but will so soon). Please test those one by one and see where exactly the problem starts. -
@totoro I just updated the binaries again to have the commit hash included in the name just to make sure we don’t mix up anything in the later analyses. Some were compiled again as well. Hope you have not started testing yet.
Please start from
bzImage_r8169_01_...
and go through tobzImage_r8169_51...
-
@totoro Any news? Did you get to test some of the binaries yet?
-
@totoro I would appreciate you letting us know if you are still interested in debugging this issue any further.
-
@Sebastian-Roth Sorry lot’s of work here. I’m making test today.
-
@Sebastian-Roth Hi again, it’s made me crazy… some time is not working any time it’s working again (on the same PC who we have all the time the problem before). And some time just at pxe boot, it’s not working too, so I look on the web about a bios or hardware issue, it’s like they have a problem so I don’t think the problem come with fog but a random bug somewhere in the bios.Thank’s for your help, and sorry to make you lose time.
-
@totoro Thanks for letting me know. But are you sure it’s random even with the exact same kernel booted every time? I am just asking because it could be that the kernel builds I provided could have an alternating outcome. Maybe
1_...
works,2_...
fails and3_...
works again. Just an idea… -
@Sebastian-Roth Sometime we have a problem before the PXE menu. The PXE boot is waiting for an IP, and don’t receive it. Or some time, PXE says to check cable… I have no idea else a bios or hardware problem, we do try by changing switch, and put directly the fog server and client on same switch no change… per ups we have some hardware series problem, but only on PXE it’s strange