Capture speed very fast, deploy starts at nearly 7mb/min and at 10% starts slowing down gradually to 3mb/min!! please help?

Kevin Talbot

This post is deleted!

Kevin Talbot

Ok everybody I think I have nailed it however I don’t know if I can do anything about it??

Turns out we changed the model of SSD we put into our oldest client machines (Dell Optiplex 380’s)

I put a brand new (out the packet) “OLD MODEL” SSD in the same client machine I’ve been testing and deployed in 4 minutes flat…compared to the new model SSD which takes 15mins!!!

Question is can I do anything about it as the new model SSD is supposedly faster than the old model SSD

Both Kingston:

Old model = SHFS37A/120G
New model = SA400S37/120G

Can anyone think of anything that can be done as we have around 150 of these new model SSD’s…grrrr ???

Wayne Workman

7mb/min is like crippled turtle speed. I think you meant Gb/min. Also you didn’t give us any of the details we always need, like FOG Version and host model. What makes you think anything is even wrong? Does the capture complete?

Kevin Talbot

Sorry yes I mean’t Gb/min!

Brand new Ubuntu Server 16.04.3 running fog 1.4.4

i3 chip, 8gb ram and SSD

40gb image used to deploy to one host in approx 6 min on the old version of fog.

Kevin Talbot

I am just trying Partclone Zstd on level 10 to see what happens! It was on the default Partclone Gzip level 6.

Kevin Talbot

I thought that had cured it as it started at over 8gb/min but it’s now down to less then 3gb/min at 20%

Please help?

Tom Elliott

Have you tried rebooting the fog server? Any number of variables can cause slowdown too. Hard drive fragmentation, hard drive errors, network issues, cable problems, RAM issues, etc… I’d start with the simplest, restart the fog server and see if it helps clear the issue up for you.

Kevin Talbot

Hi Tom thanks for the reply!

It’s a brand new node, I built it today and its been rebooted multiple times.

I have rebuilt the whole thing from scratch as it was all on ageing hardware.

Now have VM running Ubuntu server 16.04.3 and fog 1.4.4 all bang up to date. (database)
Also Dell Optiplex 3020, i3 chip with 8gb ram, new SSD, running the same (node)

I have tried different compression levels and types, different patch cables, rebooted main switch etc etc. Deployments start at different levels due to compression but all end up coming down to 3gb/min which is no good.

On the old hardware I used to get constant gb/min speed for the duration of the deploy.

I have now taken to plugging the node and host into a seperate gig switch and isloating once started imaging and it’s doing the same thing!

Very frustrating, I must be missing something very simple somewhere?

Sebastian Roth

@Kevin-Talbot Maybe it is just the disk (client or server) having an issue. Have you tried different clients yet?

Wayne Workman

I’ve seen this several times. When the write rate starts to slowly drop down to nothingness, that means that the transfer has actually stopped. It stopped the moment the speeds started gradually declining. The displayed write speed is just an average, not actual.

Kevin Talbot

@sebastian-roth The node server and client machine both have brand new SSD drives taken out of their packets today!

Kevin Talbot

@wayne-workman The old rig could deploy an image at a constant 5gb/min and that was an optiplex 360 with 2gb ram and a 1tb mechanical 7200 hard drive running ubuntu desktop 12 and fog1.2

I think this all started when I started upgrading ubuntu and fog on the above old hardware to ubuntu 14 and fog 1.4.4

I thought the 1tb hard drive had a bad block so thought I would start fresh, however the new rig I build today which is the latest ubuntu server 16.04.3 and fog 1.4.4 with all updates on much newer/faster hardware is deploying much much slower!

I think this is a fog issue but can’t pin point it yet! I have seen mention of replacing the init image with a previous versions init file???

Kevin Talbot

It’s the same issue as this: http://www.edugeek.net/forums/o-s-deployment/103004-tasks-slowing-down.html

But I don’t know much about the init file?

Wayne Workman

@kevin-talbot Basic troubleshooting dictates that we first check if this problem affects other images. Do you have other images you can try to deploy to see if you get the same problem?

Sebastian Roth

@Kevin-Talbot I feel this will be an interesting one to figure out as it could really be anything in the chain I reckon. From the description so far it’s not possible to pin it down yet.

The node server and client machine both have brand new SSD drives taken out of their packets today!

Well just got a bad new one there, you never know. I am not saying this is it. But as we don’t have an idea yet we need to try it all out.

Now have VM running Ubuntu server 16.04.3 and fog 1.4.4

Maybe there is an issue with the VM, what kind of virtualization technology do you use?

I’d start by taking a look at the network traffic to see if it actually slows down (because of lost packets or what) or if the connection breaks altogether. By the way, is this multicast or unicast? Tried both yet? Have tcpdump installed. While the job is running issue tcdpump -w /tmp/speed.pcap host x.x.x.x (put in the client’s IP address to filter out just that traffic). Leave that sitting for a 10 seconds and stop using Ctrl+c. Now check the size of this file ls -alh /tmp/speed.pcap and if it’s less than 1 MB you can easily capture again (overwrite the same file or rename to have a second one) for one or two minutes. If the first capture is bigger than 1 MB just stop there for now. In any case please upload those captures to your dropbox/googledrive and post a link here so we can have a look.

Other things you could test:

Try different client machines.
Try multicast vs. unicast.
Try mounting the NFS share and copying the image by hand (in case this is only happening in unicast) using rsync -a --progress <src> <dst> to get an idea of transfer speed.
Run memtest on client and server just to make sure.
Setup things on an isolated network switch or even just using a single (cross over in the old days) network cable to connect client and server if your server is setup to hand out DHCP to the client (if not I’d set it up to be able to do this test).
…

Kevin Talbot

Ok everybody I think I have nailed it however I don’t know if I can do anything about it??

Turns out we changed the model of SSD we put into our oldest client machines (Dell Optiplex 380’s)

I put a brand new (out the packet) “OLD MODEL” SSD in the same client machine I’ve been testing and deployed in 4 minutes flat…compared to the new model SSD which takes 15mins!!!

Question is can I do anything about it as the new model SSD is supposedly faster than the old model SSD

Both Kingston:

Old model = SHFS37A/120G
New model = SA400S37/120G

Can anyone think of anything that can be done as we have around 150 of these new model SSD’s…grrrr ???

george1421

@kevin-talbot What would be interesting to see is this…

Use a disk bench marking tool like CrystalDiskMark and compare the write performance between the two disks. The tool used doesn’t matter you just need to use the same tool on both systems to get a relative index between the two drives. The most important number for FOG imaging is sequential writes.

Sebastian Roth

@kevin-talbot said in Capture speed very fast, deploy starts at nearly 7mb/min and at 10% starts slowing down gradually to 3mb/min!! please help?:

SA400S37

So far I haven’t found people talking about speed issue with those drives anywhere on the net. But looking at the specs I see that this drive uses TLC NAND flash.

To quote someone from this reddit post

There are different types of FLASH that are vastly differing in reliability. Read up on SLC, MLC and TLC. Basically the Pro series store less data in each FLASH cell. The more data you cram into each FLASH cell the less reliable the disk is.

Capture speed very fast, deploy starts at nearly 7mb/min and at 10% starts slowing down gradually to 3mb/min!! please help?

116

12.2k

17.4k

155.5k