Very slow registering host and pushing images to Dell Optiplex 7070 ultras.
-
We are trying to image the new dell optiplex ultra 7070s and it has been hit or miss some work fine and then the other just move like a slug it’s so slow. It’s even really slow to register the hosts to our fogserver, and To even get the imaging process is so slow I have one computer that has been running for 2 hours now and its just stuck on the first part clone screen. Im in the process of imaging 7 computers now and out of those 7 only 3 of them worked like normal. I have tried different kernals as well and this has not helped at all. any ideas?
-
just to show you how slow it is
-
Also we are getting this every so often -
@darkxeno What version of FOG are you using? What version of the FOS Linux kernel are you using? You can see the kernel version in the fog configuration page. It would be a number like 4.19.64.
From the picture that is uploaded I see that (potentially) the client is on a different subnet than the fog server?
Also on the systems that have this issue of not being able to get an IP address, if you place a dumb (read cheap) unmanaged switch between your building switch and the pxe booting computer does it work correctly every time?
-
@george1421 I’m running fog 1.5.7 my kernel is bzImage Version: 4.19.64
Also yes they are in different sub-nets We have been using fog form years and Ive always had fog in a different sub net. I just did 50 other computers earlier this week with no issue it just seems to be hit or miss with the ultra 7070s. I normally us a dumb switch on my image bench I have. but I have alos connected up to our normal switch and the same issue keeps happening.
-
@darkxeno First you have to remember I don’t know your network design so I need to ask 20 questions to understand how things are setup. So we’ve already ruled out a spanning tree issue since you’ve used a cheap switch on your bench and the issue is only with the 7070s so far. You are on the latest version of FOG with a moderately new kernel.
For the 7070s, under windows device manager can you get us the hardware id of that nic? We’ll need both the vendor and hardware IDs. Then I can look into the linux kernel to see if there is a driver for it. If its a realtek nic, I have a test kernel you can try that has an updated nic driver specifically for realtek 8168 series nics.
-
@george1421 give me a few seconds and I will get that to you
Also this is another issue Thats happens as well
-
@george1421 this is what I got from Device manager
PCI\VEN_8086&DEV_15BE&SUBSYS_091B1028&REV_11
PCI\VEN_8086&DEV_15BE&SUBSYS_091B1028
PCI\VEN_8086&DEV_15BE&CC_020000
PCI\VEN_8086&DEV_15BE&CC_0200edit: Looks like its an Intel I219-V Ethernet Controller
-
this is the error I get when I do get to partclone
-
@darkxeno said in Very slow registering host and pushing images to Dell Optiplex 7070 ultras.:
So on the linux side it would be [8086:15BE] While I know that network adapter, that may have been a new model. But I looked in the linux docs and that driver was added in kernel version 4.12.x, you are at 4.19.64 so its been supported for quite a while.
-
@george1421 do you think it has anythign to do with the ultra having a nvme drive?
-
@darkxeno Well what’s not clear is if its a network issue or a disk (nvme) issue at the moment. We are seeing newer (select) nvme drives having a slowness issue writing the image to disk. That issue is being discussed in this thread: https://forums.fogproject.org/topic/13777/extremely-slow-deploy-to-nvme-drives/28
One thing you can test is to go into the global kernel parameters under the fog configuration and add in this kernel parameter
nvme_core.default_ps_max_latency_us=0
You will see an error with that name during booting, but its a warning, it will set the kernel parameter correctly. The issue is that there is a dot in the parameter name. See if that kernel parameter changes anything with write speed. What it does is keep the nvme drive from using any low power mode during imaging. We are seeing random successes with that.We might also want to get you to move to the dev branch of FOG, that will take your FOG version to 1.5.7.55 or later. The developers added a nvme command line tool that we can use to push commands directly to the nvme disk. But lets first see if the kernel parameter masks the issue.
-
@george1421 I will try that Update you onmonday when I get back to work thanks for your help today
-
@george1421 that seemed to do the trick I’ll do some more Monday and let you now I will go ahead and move to the dev branch as well
-
@darkxeno Thank you for the feedback.
@Sebastian-Roth tagging you only so you are aware of the issue with the Dell system disks too.
-
@george1421 Yeah seems like we have another slow disk issue model here. Good to know the kernel parameter did help in this case.
@darkxeno About all the other pre partclone (blue screen) issues on bootup. Those have nothing to do with the slowness issue but have different causes. Nie as we see you made you made it past that point several times I am wondering if it’s some network/DHCP issue that happens not always but only sometimes?? Maybe you have a rough DHCP in your network?
-
@Sebastian-Roth said in Very slow registering host and pushing images to Dell Optiplex 7070 ultras.:
Maybe you have a rough DHCP in your network?
Just as a comment, I’ve seen organizations that have a primary and secondary dhcp servers setup intentionally, but only one of them has the pxe boot information. But looking at the OPs picture it looks like FOS Linux is unable to get an IP address. At that point the PXE boot information isn’t needed. But it does make me think its a networking (or possibly) a dhcp server issue. We probably can/should trap that error with wireshark so we can see who is saying what to the target computer.
-
@Sebastian-Roth I’m curious on this, the reason being is one we have been using fog for years since v 0.27 and haven’t have this issue before. And it’s only been on these 7070 ultras we just got in. I just did an image on the regular optiplex 7070s and had no issue with them at all. However this past Tuesday we had a battery back up fail and would keep failing every time we though we had it back up so we had a lot of unsafe shutdowns. So that could be a possibility. However when I pushed those other images that was after all of that was resolved and we didn’t have those issues but we’ll do a wireshark and check I’m curious now.
-
@darkxeno First thing you have to keep in mind that the XX70 generation is still very new. The FOG Project relies heavily on the Linux Kernel developers for hardware support. When hardware support is added by the Linux kernel developers the FOG Project devs integrate that kernel into FOS Linux. The Linux kernel developers will always be behind in releasing supporting hardware drivers because hardware vendors and chip manufacturers sometimes release hardware in a closed manner where they release the chips and provide the windows driver.
But from a wireshark perspective, we can see what is flying down the wire using a witness computer connected to the same subnet, or the FOG server if its on the same subnet as the pxe booting computer. The FOG server is preferable since it would also collect unicast communications between the target computer and the FOG server. If the fog server can be used (i.e. same subnet as target computer) I have a tutorial for that: https://forums.fogproject.org/topic/9673/when-dhcp-pxe-booting-process-goes-bad-and-you-have-no-clue
If you are using wireshark from a witness computer (i.e. computer plugged into the same subnet as the pxe booting computer) you can (should) use the capture filter of
udp port 67 or udp port 68
That will only capture the dhcp/bootp part of the booting process. This is possible since dhcp uses broadcast messages to communicate so all hosts on the subnet will receive these broadcast dhcp messages. -
@george1421 thanks for the info I’ll get that done Monday and I’ll left you know what we find