@Sebastian-Roth To my memory we replaced kernels (both 32 & 64 bit), init’s/bzimage (expanded to more and more files etc during troubleshooting to try to find the culprit) none helped other than upgrading to 1.5.8. I assume it’s something Dell specific with some weird OEM magic that messes up something somehow somewhere…
Posts made by no0NE
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
Sorry for reviving an old thread (of mine).
Just wanted to post the solution if anyone would encounter the issue and was looking for a solution finding this thread.We solved this about the time when FOG 1.5.8 came out early spring 2020.
No matter what we did we still had the issue with ~ 7-10% of our Dell 7070 ultra PC’s after we recieved the final batch of user PC’s.
Soon after 1.5.8 rolled out i tried upgrading our FOG server to that, afterwards it all worked fine with the troublesome PC’s!
We tried lifting over the kernel and all the loaded files up until the imaging session starts as far as we know from the test environment (1.5.8) to production (1.5.7), but the issue was still there…? After a while we upgraded production FOG server to 1.5.8 and it started working there as well…Something in 1.5.7 glitches in some cases with some specific hardwares even if same models (can’t figure any difference other than perhaps some glitch depending on what MAC adresses being used if for any reason the NIC drivers misbehaves in a specific combination of MAC’s ending in some value…?). While 1.5.8. solved that - not being the Kernel itself solving this.
(if anyone would know what could cause this, i’m just curious technically to find out the reason for the errors, any developer that knows what changed other than the kernel that could solve this?)
You can close this otherwise dead thread
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
@Quazz said in Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG:
https://bugzilla.redhat.com/show_bug.cgi?id=1652865
Some other people having a similar issue, problem possibly fixed on newer kernel version.
@no0NE Go to the Kernel Update page and try grabbing the Kernel 5.1.16
Thanks for the info!
I’m quite sure it’s related to the kernels, i’ve tried ubuntu 19.10 kernel 5.3.0-18 and it works fine there… (what made me confused and starting this thread was due to the same PC’s - 1 to start with and now 3 tested all worked fine with same firmwares, OS’s, network/cables etc…!)I forgot to tell i also tried the TomElliot 5.1.16 (“mac/nvme fix”) kernel with the same results during my kernel troubleshooting this monday.
So i can’t DOA the PC to Dell, but i’ll try to call their support to ask them what may differ between the physical PC’s settings wise, hardware, firmwares etc.
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
UPDATE; new BIOS released 21st October, 1.1.2. Updated to this version, same issue remains on this PC.
-
Fresh win 10 install from microsoft - network works perfect. Also, all Ubuntu version i’ve tried, latest 19.10 works fine (kernel 5.3.0-18)
-
Debian Buster, PXE FOG 1.5.7 etc same issue, works perfect still on 1st PC.
Today (22:nd October) i got information from our branch office that they’ve managed to run FOG with our help on the 2 Dell 7070 ultras there and they work fine as well! Very strange that this single PC would behave like this and only in Linux env. If it weren’t for it working fine in Windows i’d DOA it, but i’ll reach out to Dell for any help on what this may be caused by.
-
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
Thanks for the feedback.
We switched to Dell from another brand, so we don’t have any adapters laying around, but i have a Dell DA200 USB C dongle with ethernet at home, good idea to try that, i’ll bring it tomorrow just to test!I quickly tried some version of Ubuntu live ( i think 18.04) on it and it worked that time last week. But i downloaded Debian Buster now and tried as well just to make sure as much as possible, it’s the same problem there now! Only on one of the PC’s, Debian Buster on “the first” PC of the same model continues to work fine with the same cables & BIOS here as well… Thanks for pushing me to actually double check that again!
I got to DOA submit this PC to Dell soon, but unsure how they’ll see this as it’s working in Windows, but i’ll hope they have enough goodwill with us being a new Dell customer with a big order going in.
Before doing that, i’m reinstalling windows 10 manually now from USB / Win 10 media creator just to make sure that works still or now.Btw, we began our environment change focused on the 3060, then the 3070 came and we were preparing purchasing of that model, but we slowly trickled into optimizing our physical environment with the 7070 Ultra instead and landed on this model to purchase now. It’s quite different from 3060/3070 since it’s built on laptop parts to begin with. It’s still quite expandable and configurable, but no full size PCI-e etc
UPDATE
After talking with a Dell tech who helped me brainstorm a few things and we doublechecked the revision of the NIC, which is the same on both/all machines (Rev 11 of the I219-LM Intel ethernet NIC), we didn’t get any wiser, so basically still at the same spot.
What i’m concluding is that it seems to work on newest Kernels even on this troubling PC, so i’m leaning to have to build my own kernel and it should sort itself out (what i’d probably had done by now, if it weren’t for all the other PC’s already working…)
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
@Sebastian-Roth said in Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG:
@no0NE Have you had the first device (working fine) on the same switch port and with the same cable that you now see the problem with the second “faulty” device?
Absolutely sure the firmware is the exact same version on both?
Yes & Yes, sadly. Other than that, i’ve tried several other straight & crossover cables to be fully sure. I’m running out of ideas to troubleshoot, hence this post… Good suggestions if i hadn’t already checked, thanks!
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
@george1421
Thanks for the reply!As you say, the issue is not on the server side, but rather on the client.
The Dell 7070 Ultra is a desktop built with laptop components (google it, it’s quite cool ). It’s powered through USB-C from our monitors (65w), together with DP & USB hub functionality, but the ethernet is connected separetly directly to the ethernet jack of the PC (Intel I219-LM network card).
I ping and scan the traffic going to the PC from my admin PC, the FOG server, the other PC of the same model, can start the ping in windows and all’s fine, but as soon as it boots into FOG the packet losses starts to occur.
To isolate as much problem sources as possible, i’ve also connected ethernet directly between FOG server and this PC and the packet losses is still the same - massive!
I’ve also tried with dumb & smart/L3 switches and results are the same…Thanks for trying to help brainstorm how to find the error source since i might’ve missed something, but from these points i think i’ve eliminated your possible fault points.
-
RE: Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
Just to add; most of the times trying to start the imaging process on “PC #2” it will fail after trying to check in for a while with an error stating “no route to host” or similar connection related errors (due to all the packet losses…) with an automated reboot after 1 min. sometimes though it slowly gets to the PartClone stage and then is stuck there at ~0.5-1mB/s with continued massive packet loss.
I’d like to try it out on further PC’s of this model, it feels like a NIC/Hardware issue if it’s only this PC, but the first PC works repeatedly every time (20+ times). And it works without issues in Windows/UEFI…
The packet loss against this PC is the same when pinged from any PC on the network, incl. from the FOG server itself.
Latest available UEFI/BIOS for this model (v 1.0.2)
-
Massive packet loss/NIC issues with new Dell 7070 Ultra in FOG
Hi.
Our company recently purchased a few Dell 7070 Ultra to start preparing our environment for a change to this PC/Setup in our production environment.
So far i’ve successfully captured and deployed a test image from 1 of my 2 test PC’s of this model, my problem is that the 2:nd PC of the exact same model & batch suddenly after loading the bzimage/kernel file (throughout the rest of the imaging process) has massive packet losses/response times, ranging from ~500-5000+ms, with a lot of dropped packages all together. Resulting in imaging taking a weekend instead of ~5 min. Imaging works correctly also with our legacy hardware running undionly.kkpxe/BIOS
Once in Windows/UEFI/anywhere else than FOG the PC has standard response times and everything works perfectly, showing that the NIC seems to work fine…
The problem starts before any imaging/capturing begins - as soon as the kernel is loaded, pointing probably to a driver issue, what confuses me is why the first PC works like a charm every time in that case…
I’ve manually upgraded the kernel to Kernel.TomElliott.4.19.64.64 (from included .48 kernel) - no difference.
At this stage i’d like to try further PC’s from this model, but it will take a couple of months before that’s possible.
So my question is, do you have anything else to point me in a direction to troubleshoot further, or is there a newer kernel/drivers that might simply work better? It is a brand new model and even a completely new series from Dell after all…
I’ve tried changing ports & cables between the PC’s that work and doesn’t work, it’s always this specific PC that doesn’t work with any combination of cables etc… I’ve had one imaging that suddenly 99% of the process seemed to work and i managed to deploy the image to the PC that time, but that’s once in about 50+ tries, randomly during imaging it might start working with ~1ms for 5-10 sec and then it stops working again, sometimes (maybe 10% of the time) if i pull the ethernet cable out for a couple of seconds and put it back in it works for the first 5-10 seconds as well… Really feels like a driver issue.
Do you think it’s a new/other kernel version that should solve this, or a newer FOG version altogether or something else?
FOG 1.5.7 stable, ARM (FOG test environment on Raspberry Pi 4, 4gB, Raspbian Buster, latest updates as of last week)
Imaging/capturing; Dell 7070 Ultra, i5 8365U, 8gB RAM, UEFI, with default ipxe.efi & .48 x64 & .64 x64 FOG/Tom kernels
NIC; Intel I219-LM
UPDATE; new BIOS released 21st October, 1.1.2. Updated to this version, same issue remains on this PC. Fresh win 10 install from microsoft - network works perfect, Debian Buster, FOG 1.5.7 etc same issue, works perfect still on 1st PC.
Today (22:nd October) i got information from our branch office that they’ve managed to run FOG with our help on the 2 Dell 7070 ultras there and they work fine as well! Very strange that this single PC would behave like this and only in Linux env.
Any help greatly appreciated, thanks!
Best regards,
Robin, IT Specialist. (With client PC environment responsibility among other things)