• Hello!

    I have been using FOG for the past 6 months with great success, thank you so much to the developers!

    We just got in a new Lenovo model, the T14 Gen 2. I created my golden image and it captured with no problems. Now, however, when I go to deploy the image, I get very slow deployment speed. I have done some troubleshooting and I can’t determine the cause.
    Sometimes it goes really slowly (40 hours for a 29GB image) and sometimes it hangs in various places before partclone even runs. No problem pxe booting or sending inventory to server.

    1. I have the FOG server (running kernal 5.10.34, FOG version 1.5.9) on a switch that the PCs are directly connected to. I tested with this switch removed from the equation, same speed. (~10-50MB/min)
    2. I tried deploying a different image to the same PC just in case my new image was causing the problem, same result.
    3. I tried deploying an image to a different model PC (Lenovo T490) and it worked perfectly, usually about 5-10 minutes to deploy.
    4. I have checked top on the server, and it is hardly under any load at all.

    This feels like a network issue, but like I said, on another laptop it works perfectly.

    Can anyone please help point me in the right direction?

    Thank you!


  • @saad-naeem
    @dustindizzle11

    Thank you guys both for the heads up!
    Just got more of these models in stock.
    Didn’t think about it until the imaging hung. (I have not updated FOG kernel…)

    Using USB-C LAN adapters and those are helping to get the job done.

    I may try updating the FOG kernel, in which case I’ll post here about results.

    -Maurice


  • @mmarquis
    I had the same problem with these Gen 2 laptops

    Problem:
    It seems like these laptops are coming with bad driver for Wired Intel I219-V Ethernet adapter
    Wireless, USB to LAN or USB-C works fine for me

    Solution:

    • First Update BIOS on freshly out of the box laptop and then deploy your image as usual (using Ethernet LAN)
    • Use USB-C or USB-to-LAN adapter to deploy your image and update BIOS afterwards

    I have tested them on couple of laptops and after BIOS are updated then Ethernet LAN will work fine and you can deploy your image again using Ethernet LAN

  • Senior Developer

    @dustindizzle11 Nice! Let us know what you find.

    We have new FOS Linux kernel version released since then. Though it’s only just minor update you might give it a try. Also try out older kernels (e.g. 4.x) just in case.

    Also you might test Live Linux ISOs with different kernel versions. If you find a more recent kernel (newer than 5.10.53) is fixed, then let us know.


  • Not Hijacking this post, but just wanted to “upvote” this and say I am having the same exact problem with the same exact model , Lenovo T14 Gen 2. I narrowed it down to what you guys have found so far before finding this post (that the nic driver in the kernel is most likely the issue). Using a lan adapter as a workaround, but if I find anything I more I will let you guys know.

    Dustin

  • Senior Developer

    @mmarquis Well too bad we can’t fix this easily. As a next step I would grab a Linux Live boot ISO with a recent Linux kernel and see if networking is slow there as well. Maybe use Arch, burn a CD/DVD or write the ISO to a USB key drive and boot it up. When it’s up check if you have an IP and can ping the FOG server. Then mount the FOG NFS share and copy one of the image files over:

    mkdir /images
    mount -t nfs -o nolock,proto=tcp,rsize=32768,intr,noatime 192.168.x.y:/images /images
    

    To get some stats when copying you can use different tools but I am not sure if those are part of the Arch Live ISO. See which one is working for you:

    pv /images/IMAGENAME/d1p3.img > /tmp/test.img
    rsync --progress /images/IMAGENAME/d1p3.img /tmp
    dd if=/images/IMAGENAME/d1p3.img of=/tmp/test.img status=progress
    

  • @sebastian-roth
    Hi Sebastian,

    Today I put the PC and FOG server on an even dumber switch than they were already on. (Just kidding, but this one is a 5 port un-managed Netgear switch we all know so well.)

    Same behavior on the dumber switch unfortunately.

    I’m about to send out my last one, the one I’ve been testing with. I don’t know if we’ll ever get this model again, lol. I really hope the next time I image with FOG it goes back to normal. I’ll have new machines in next week, so I suppose we’ll see what happens.

    I appreciate the help, at this point it’s just the mystery of the thing for me. Let me know if you have any other thoughts or ideas.

  • Senior Developer

    @mmarquis said in Lenovo T14 Gen 2:

    Can’t say I know what I’m looking at, but I hope you find something cool in there.

    Unfortunately not. It’s an Intel network chip using the Linux e1000e driver from what we see in the outputs. I was hoping to find some messages that point to network driver issues but there is nothing in the dmesg output. Looks all clean.

    Did you try my suggestion on using a dumb mini switch to connect the Lenovo T14 Gen 2 to your normal network switch?


  • @sebastian-roth

    dmesg_bootup.txt
    dmesg_deploy1.txt

    Can’t say I know what I’m looking at, but I hope you find something cool in there.

    Thank you!

  • Senior Developer

    @mmarquis Well that’s interesting. I suggest you schedule a debug deploy task for another host which doesn’t have the network problems to make sure all the steps outlined really work in your setup. We have done this with many other people over the years and it usually works.

    The system booting up to so the actual work is a lightweight custom Linux OS based on buildroot.org. We call it FOS. No iptables tools and no rules set. It’s still based on old school init.d scripts instead of systemd. To check if SSHd is running: ps ax | grep ssh

    Do you see a proper IP address being assinged in the ip a s output? On the one hand I could imagine the network issue to be a problem but then I think a task would fail way earlier in the process if network is completely down because at the beginning of a task it checks into the FOG server and errors out on failure. Why would that work but not SSH? Possible but kind of unlikely.

    Edit: Now that I read your post again and think about it I wonder if it’s just inbound traffic that is problematic on the Lenovo. But still, response packets from the task checkin obviously make it through. Can’t imagine that inbound TCP SYN packets are dropped because of a driver issue.

    The other option you have to get the dmesg outputs over is using a USB thumb drive (best format with FAT32). Plug that in, use lsblk to find it’s device name, mount and copy over the text files. In this case you only have one command shell in the device itself. So take the first dmesg output and copy to USB drive before starting the task (command fog). Step through the task and when you have enough of waiting in a slow partclone screen you should be able to cancel that with ctrl-c to get back to the shell and grab another dmesg output.


  • @sebastian-roth

    I’m having trouble ssh’ing into this thing.
    I even put them on their own little network, and I can ping from the Linux session to my Windows device, but I can’t ping the PC we’re troubleshooting.

    The bash shell doesn’t recognize “iptables” or “systemctl”
    It feels like we’re now troubleshooting something else, so I apologize. But I’m not able to do the steps you outlined.
    (Incidentally, I can ping and SSH to the FOG server from the same Windows PC.)

    I’m not sure if this is just the same network driver issue we’re facing in general. Any thoughts?

  • Senior Developer

    @mmarquis Sounds like a network (driver) issue. Finding and fixing this issue is probably going to be a long endeavor with deep knowledge of the Linux kernel involved. Though there are a few easy things you can try before diving into the big ocean.

    There is a slight chance it’s some kind of EEE (Energy-Efficient Ethernet) thing causing this. I suppose that would only happen if the Lenovo T14 Gen 2 is connected to a EEE-capable switch. So you might check the switch settings and disable EEE (single port or all together) or even easier, grab an old dumb mini switch and hook that in between. That way EEE should not be triggered in the driver because the dumb mini switch lacks EEE functionality.

    Second thing you might want to look at is getting a full dmesg output to see if there is information on why speed is so slow.

    • Schedule a new deploy task for one of your Lenovo T14 Gen 2 devices but this time enable the Schedule as debug task checkbox just before you hit the button to create the task in the FOG web UI.
    • Boot the device up as usual and you will end up in a command shell (after you hit ENTER twice as shown on the screen).
    • Run command ip a s to find out the IP address pulled from the DHCP server and then passwd to set a temporary root user password in this FOS session.
    • Use PuTTY or any other SSH command tool to connect to the device and login as root. Now you have two command shells.
    • Run command dmesg > dmesg_bootup.txt in the SSH command window. Output will be written to a text file and you won’t see it on screen. Leave the SSH command shell open, we’ll need that later again.
    • Use WinSCP or any other SSH file transfer tool to login and copy over the dmesg_bootup.txt file to your computer. Leave that connected as well.
    • Now go back to the command shell on the device itself and fire up the FOG task using the simple command fog. Now you need to step through the process pressing ENTER key. Go all the way through to where you have the blue partclone window with slow speed (possibly not the first partclone screen but one of the later ones with the biggest partition).
    • Go back on the SSH command shell and get another dmesg output: dmesg > dmesg_deploy1.txt
    • See if you can copy the new dmesg_deploy1.txt to your computer as well using WinSCP - reload file listing to see the newly created file.
    • At this stage we should have enough information and you can stop the task by shutting down the Lenovo T14 Gen 2 through a halt command on the SSH terminal and cancel it in the FOG web UI.

    Upload the text files here in the forums or to an external file share and post a link here.


  • @sebastian-roth

    Okay, so doing a little more troubleshooting today.

    Plugged the T14 into a Lenovo USB-C docking station and presto! Now we’re getting good speeds again.

    I believe this would indicate some weirdness with the LAN card. Do you guys recommend any way of fixing that?

  • Senior Developer

    @george1421 @Tom-Elliott What do you think about the speed? Doesn’t look like an IO issue on the disk I would think.


  • @sebastian-roth
    Thank you Sebastian!

    I scoured the BIOS settings and did some googling, I don’t see any VMD settings on this BIOS.

    For some reason I’m having trouble uploading an image,
    But here’s a link to a picture of the screen:
    https://ibb.co/vV8N5Pg

    screenshot

    Regardless, the output is:
    4.0 GB copied in 2.23 s (1.9 GB/s)

    (I can type out all of the output if needed, but I figured that was the key data.)

  • Senior Developer

    @mmarquis said in Lenovo T14 Gen 2:

    Now it dropped back down to 15MB/min with a ~30+hr estimate.

    Hmm, anyway it was worth the try. What @george1421 was referring to (I think) is this post in the forums: https://forums.fogproject.org/post/141640

    You want to read all of this as they found this to be cause by some UEFI setting (“Storage Controller for VMD”). The USB key connected to the computer just kind of masked the issue but they were able to fix it by changing that particular UEFI setting. It could still be the same (VMD) thing in your case just using the USB key is not masking the issue. So still check your settings to see if you can find it on the Lenovo T14 Gen 2 as well.

    Other than that you’ll need to do some debugging to find out if it’s disk or network IO causing the slowness. More easy to test is disk IO, so start with that. Cancel the current deploy task and schedule a new one for this same host but this time mark the checkbox for debug in the web UI just before you click the “Create Task” button.

    Now boot the host machine up as usual and it should bring you to a command shell. Here you run lsblk to find out what the hard drive is named - could be /dev/sda or /dev/nvme.... Now with that run the following two commands:

    Warning: The dd command will write zeros to your disk and for that reason will ERASE any data on it. It won’t cause any harm to your disk but data will be wiped out!! Use with caution and make sure you think about it and understand before going ahead.

    hdparm -Tt /dev/sda
    dd if=/dev/zero of=/dev/sda bs=1G count=4 oflag=direct
    

    Take a picture of the output on screen (or note it down) and post here.


  • @sebastian-roth
    Thank you Sebastian!

    I did as you suggested.

    So, it seemed to have an effect. At first it was hitting 300+MB/min and my estimate was around 2 hours. Now it dropped back down to 15MB/min with a ~30+hr estimate. 😞

  • Senior Developer

    @mmarquis Try restarting the whole imaging with the USB key connected right from the early startup. So shut the host down hard and PXE boot into the existing task again.


  • @george1421
    Thank you for the response!

    Ha! That is strange. I think this little black bar popped up when I did that:
    “sd 0:0:0:0: [sda] No Caching mode page found”

    Have given it about 5 minutes, no change in speed.

  • Moderator

    @mmarquis This is going to sound strange, but insert a usb flash drive (any size) and see if imaging speed returns to normal.

356
Online

8.8k
Users

15.5k
Topics

144.5k
Posts