Slowdown Unicast and Multicast after upgrading FOG Server
-
Here the output:
/dev/sda: ATA device, with non-removable media Model Number: Samsung SSD 860 EVO 500GB Serial Number: S3Z2NB1KA50028H Firmware Revision: RVT01B6Q Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x005e) Supported: 11 8 7 6 5 Likely used: 11 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 976773168 Logical Sector size: 512 bytes Physical Sector size: 512 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 476940 MBytes device size with M = 1000*1000: 500107 MBytes (500 GB) cache/buffer size = unknown Form Factor: 2.5 inch Nominal Media Rotation Rate: Solid State Device Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 1 Current = 1 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name Write-Read-Verify feature set * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Phy event counters * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT * DMA Setup Auto-Activate optimization Device-initiated interface power management * Asynchronous notification (eg. media change) * Software settings preservation Device Sleep (DEVSLP) * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) * reserved 69[4] * DOWNLOAD MICROCODE DMA command * SET MAX SETPASSWORD/UNLOCK DMA commands * WRITE BUFFER DMA command * READ BUFFER DMA command * Data Set Management TRIM supported (limit 8 blocks) * Deterministic read ZEROs after TRIM Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 4min for SECURITY ERASE UNIT. 8min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 5002538e408a5e55 NAA : 5 IEEE OUI : 002538 Unique ID : e408a5e55 Device Sleep: DEVSLP Exit Timeout (DETO): 50 ms (drive) Minimum DEVSLP Assertion Time (MDAT): 30 ms (drive) Checksum: correct
-
@mp12 I assume you ran the last hdparm from a debug console on the target computer. If so lets run this one too
hdparm -Tt /dev/sda
That should give us the disk performance test. I’m not totally convinced its a target computer issue, but we need to start collecting data where we can. -
/dev/sda: Timing cached reads: 29692 MB in 1.99 seconds = 14911.18 MB/sec Timing buffered disk reads: 1614 MB in 3.00 seconds = 537.87 MB/sec
-
@mp12 Read performance is what I would expect from an SSD drive.
The next bit we will test write performance. For this we need to collect the structure of the existing SSD drive. What we need to find is a partition that has at least 1GB of disk space.
Show me the output from this command:
lsblk
(executed on the target computer in a debug console)NOTE: The document I’m working from is referenced here: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast
-
@george1421 He did some write tests earlier at around ~100MB/s from RAM to disk using dd.
-
@Quazz Interesting, on the previous dd test. I would like to see this dd test and then the next step is an iperf test. That will test local disk and then network without involving the nfs stack or partclone. At least in my mind is how I would break it down. Something had to have changed besides fog.
-
Server
----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from x.x.x.x, port 50672 [ 5] local x.x.x.x port 5201 connected to x.x.x.x port 50674 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 108 MBytes 903 Mbits/sec [ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec [ 5] 2.00-3.00 sec 112 MBytes 942 Mbits/sec [ 5] 3.00-4.00 sec 112 MBytes 942 Mbits/sec [ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec [ 5] 6.00-7.00 sec 112 MBytes 942 Mbits/sec [ 5] 7.00-8.00 sec 112 MBytes 942 Mbits/sec [ 5] 8.00-9.00 sec 112 MBytes 942 Mbits/sec [ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec [ 5] 10.00-10.04 sec 4.35 MBytes 937 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 5] 0.00-10.04 sec 1.10 GBytes 939 Mbits/sec 11 sender [ 5] 0.00-10.04 sec 1.10 GBytes 938 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5201 -----------------------------------------------------------
Client
Connecting to host x.x.x.x, port 5201 [ 5] local x.x.x.x port 50674 connected to x.x.x.x port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 113 MBytes 947 Mbits/sec 3 258 KBytes [ 5] 1.00-2.00 sec 112 MBytes 943 Mbits/sec 0 364 KBytes [ 5] 2.00-3.00 sec 112 MBytes 939 Mbits/sec 2 232 KBytes [ 5] 3.00-4.00 sec 112 MBytes 943 Mbits/sec 1 318 KBytes [ 5] 4.00-5.00 sec 112 MBytes 943 Mbits/sec 2 211 KBytes [ 5] 5.00-6.00 sec 112 MBytes 943 Mbits/sec 0 364 KBytes [ 5] 6.00-7.00 sec 112 MBytes 943 Mbits/sec 1 267 KBytes [ 5] 7.00-8.00 sec 112 MBytes 943 Mbits/sec 1 364 KBytes [ 5] 8.00-9.00 sec 112 MBytes 943 Mbits/sec 0 366 KBytes [ 5] 9.00-10.00 sec 112 MBytes 943 Mbits/sec 1 282 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 11 sender [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec receiver iperf Done.
-
@mp12 I found somewhat similiar situations on google where the conclusion was to use Secure Erase on the drive first. Worth a try just to see if it helps?
-
Did my best. After trying I noticed that the “frozen state” was active. Removing the SSD power cable while PC was powered on removed the frozen state. Then I was able to do a secure erase. A dd from ramdisk to /dev/sda was at the same speed as before. So no luck at all.
-
@Sebastian-Roth @Quazz @george1421
I have some good and bad news.
First the good ones:
I created a deploy using the FOG.Wiki https://wiki.fogproject.org/wiki/index.php/Debug_Mode#Win_7
Therefor I booted a Clonezilla (2.6.4-10-amd64) Flashdrive and mounted the NFS-Share from FOG Server.I started the deploy with the following command:
cat /images/IMAGEPATH/d1p2.img | zstd -d | sudo partclone.restore -O /dev/sda2 -N -f -i
Tried a deploy with the fog client and still one-third of the expected speed.
I think there is something wrong in the deploy process.
Main difference I can see is that the FOS uses Partclone 0.3.12 and Clonezilla 0.3.13. -
@mp12 I vaguely recalled someone having this problem before with similar outcomes.
Linking here for further reference: https://forums.fogproject.org/topic/13733/hp-elitebook-830-gen-6-issues-capturing-images-and-deploying-images/10
Our kernels and inits have since received a few upgrades, however.
Are you on the dev-branch, by the way? I don’t believe 1.5.7 was launched with partclone 0.3.12 (that’s for the upcoming release).
If not, then try the init and kernel files from https://dev.fogproject.org/blue/organizations/jenkins/fos/detail/master/122/artifacts
-
@Quazz said in Slowdown Unicast and Multicast after upgrading FOG Server:
Are you on the dev-branch
Yes I am on the dev-branch 1.5.7.109 (bzImage Version: 4.19.101)
-
@Quazz said in Slowdown Unicast and Multicast after upgrading FOG Server:
I vaguely recalled someone having this problem before with similar outcomes.
Linking here for further reference: https://forums.fogproject.org/topic/13733/hp-elitebook-830-gen-6-issues-capturing-images-and-deploying-images/10Tried the bzImage529 and checked if RAMDISK size is correct.
Also tried the following Kernels which all end up with a KERNEL PANIC:
4.13.4, 4.11.1, 4.10.1, 4.9.11 and 4.8.11
Other kernels starting with 4.15.2 and above seem to work but not with the sufficient speed.
-
@mp12 said in Slowdown Unicast and Multicast after upgrading FOG Server:
Therefor I booted a Clonezilla (2.6.4-10-amd64) Flashdrive
From what you wrote so far I would expect the kernel to make the difference. What kernel is in the CloneZilla you used? Boot to a command shell and run
uname -a
. -
Linux debian 5.3.0-1-amd64 #1 SMP Debian 5.3.7-1 (2019-10-19) x86_64 GNU/Linux
-
@Sebastian-Roth I mean, the strange part is that the older versions gave him proper speed too. And it’s not like this is a universal problem since others have not experienced such a large performance difference between FOG versions.
So I guess it’s time for more information to try and pin down the source of it all.
@mp12 Can you list a full spec list of a troubled machine? (or perhaps even two or three different ones if you have that)
It almost assuredly has to be some kind of interaction between certain kind of hardware and the Linux kernel (and its config), so we have to try and narrow it down or at least get a clearer picture of what we’re dealing with.
-
@Quazz would the complete fog inventory for this system give us enough data or do we need to dig deeper? If it doesn’t give us all of the data at least it would be a start. What would be grand is of the OP had 2 systems on his campus where one worked correctly and the other is slow. Then we could contrast and compare these two systems.
@Sebastian-Roth Is there a place where we can still download the 1.5.5 or 1.5.6 binaries zip file? I’m wondering if we replace the current bzImage and init.xz with the ones from 1.5.5 or 1.5.6 does that change the performance of these systems. While I highly doubt its the FOG server, this would at least isolate the issue to the FOS Linux install (unless that is our conclusion already).
-
@george1421 Good point on trying the older binaries. Though I’d expect that you get issues going back to very early binary versions like 1.5.3… They are all available on the fogproject.org website:
https://fogproject.org/binaries1.5.6.zip
https://fogproject.org/binaries1.5.5.zip
and so on all the way to 1.3.0… -
@Sebastian-Roth Yes going back to 1.5.3 version of FOS (at least for the inits) would be a good test of before and after upgrades causing this slowness. All the OP needs to do is download and extract the init.xz file from the zip file and move it to the FOG server to test.
-
@george1421 @mp12 Using init and kernel binaries from the same archive as you will run into kernel panics quite easily if you do otherwise.
@mp12 I am wondering if you see the same slowness on many different types of hardware or if it’s all machines with the Samsung SSD 860 EVO 500GB??
EDIT: Reading through the whole topic again I stumbled upon this:
[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 11 sender [ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec receiver
Eleven retries for a file transfer over a period of 10 seconds seems a lot to me. So we might look at a combination of issues here.