Multicast very slow
-
@plegrand OK very good. This tells us that your network between the fog server and the target computer is good. You are getting ~28MB/s transfer rate. That’s a bit of a lie because your 100MB/s network can only transfer 12MB/s. The 20MB/s is the rate at which partclone can expand the image onto the disk of the target computer. But if you can’t feed partclone fast enough that rate will drop off quickly. If you are getting an expansion rate of the image faster than the network speeds your network rate is good.
Now can you make the same test with a computer located on the switch far away from the FOG server. In your drawing it would be the switch I want to test on the left of the drawing. This test will be the full network path between the fog server and the target PC.
One question I didn’t ask, how many target computers are in your multicast session?
-
@george1421
To answer to your last question it was a session with 15 target computer and theses target computer was very near the fog server, on my drawing it’s the swith just below the fog server -
@plegrand Well that makes me think a little differently if they are that near to your fog server.
Ok lets do a new test for multicast. Lets test 2 computers and then 3 computers. Now these computers must be the same model as the one done in your first test. We need to remove the variable of different models in this test.
What I expect to see for 2 computers is about the same speed as 1 computer in the multicast. For 3 computers slightly less than 2 computers. In my mind I question at what point do we go from an acceptable level of speed to bad.
I can tell you with unicasting on a pure 1GbE network you can fill up a 1GbE link with 3 simultaneous unicast deployments of FOG with modern target computers. That is a concern for the link to the fog server and switch to switch links mainly. Under a multicast you should get just slightly less than unicast speeds.
-
@plegrand Also I just looked into these switching thinking they were old. They are not (at least performance wise) they are better capacity than Cisco small business switch SG300.
So I have to ask you why are you running 100MB/s to the desktops?
Do you have the capabilities to run a second network wire from the switch where the fog server is to the switch where the target computers are? (I’ll explain more if you can).
-
@george1421
With a target computer far from the fog server it seems to works alsofile:///home/pascal/Bureau/multicast-205.jpg
-
@george1421
I’m not sure to well understand
Actually my diagram is not “really” true.
The fog server is on a GB port
All desktop clients are on 100 M ports
I cant do anything else, no enough GB ports -
@george1421 For these tests for the moment i’ve no time enough but i’ll do these tests in a near futur
-
@george1421 said in Multicast very slow:
Do you have the capabilities to run a second network wire from the switch where the fog server is to the switch where the target computers are? (I’ll explain more if you can).
It’s quiet already the case
The switch (OS6250) where are connected all my target computers is just below the switch where the fog server is (OS6450).
And directly connected by a 1GB link
Then, only target computers are on a 100MB link -
@plegrand said in Multicast very slow:
I cant do anything else, no enough GB ports
When I looked up the switch configuration it said that all ports are 10/100/1000 rated. Maybe I did not understand.
The the OS6250 switch is only 100Mb/s then I understand why you have 100MB/s to the desktop.
OK on delaying the test. There is a bottleneck someplace we just need to find it.
-
@george1421
The 6450 has all his ports 10/100/1000
not the 6250 only the uplink ports are on 1000 -
@george1421 said in Multicast very slow:
Under a multicast you should get just slightly less than unicast speeds.
So what’s the multicast interest if it’s not faster, apart from not flooding the network?
I’m going to make some tests this morning with 2, 3, 4 target computers.
I will tell you the result -
@george1421
Then i made some testsMulticast session (test) with 2 target computers : about 1.7GB/min rate
Multicast session (test) with 3 target computers : about 1.7GB/min rate
Multicast session (test) with 4 target computers
The 2 first target computers didn’t start i cant understand why.
then i removed these to computers from the session (test)
i create a new multicast session (test2) for these 2 computersThen there is 2 sessions running (test and test2) and all the target computers have about 1.15GB/min rate
All the computers have 2 partitions and strangely the first partition is slower than the second : about 700MB/min rate
-
@plegrand I’d say keep on doing more tests. If you see some machines not joining the session then you might want to cancel it and restart the FOG server just to have a clean test setup on every run.
All the computers have 2 partitions and strangely the first partition is slower than the second : about 700MB/min rate
From my point of view this is another hint that it’s not as much a network/switch issue but more due to the clients actually deploying the data to disk. There might be some clients that have a dieing disk that might be causing the slowdown for all clients in one session. You know that with multicast the slowest link of the chain is dictates the overall speed! While it causes way less network traffic than unicast for a group of computers it has the caveat of being regulated by the slowest part speed-wise.
-
@plegrand said in Multicast very slow:
So what’s the multicast interest if it’s not faster, apart from not flooding the network?
This answer is simple. Lets say you want to send the same image to 5 computers and your base image is 20GB in size.
With a multicast we send out one 20GB image for 20 or 100 systems. In the unicast situation for those 5 systems you would have to transmit 100GB worth of data over your network. So from a network load standpoint you will get less network impact with multicast.
-
@plegrand said in Multicast very slow:
Then there is 2 sessions running (test and test2) and all the target computers have about 1.15GB/min rate
Just so I’m clear on this. When you were able to get 4 computers imaging your transfer rate was 1.15GB/min? That’s still 19MB/sec. You are still above the 100Mb/s theoretical limit.
All the computers have 2 partitions and strangely the first partition is slower than the second : about 700MB/min rate
I can explain this. The number is based on an incorrect calculation. The issue is that first partition is pretty small, like 500MB. It transfers so fast that the speed numbers get skewed. The second partition is typically the contents of the drive. You can see this if you look at the disk manager in Windows. Look at the size of the first partition.
I might need to explain how image multicasting works. There is one computer (FOG Server) that is sending the image out. As each multicast client boots up it checks in with the multicast sender through a discovery process. The muticast sender (FOG Server) configures the multicast sender service to wait for X number of clients to check in before going, or after the first client checks in wait for XX seconds before going even if not all have checked in. Once the multicast stream starts, no other late clients can check in (they are blocked). So in the image stream the FOG server sends out the first block of data then stops. It waits for every multicast receiver (target computers) to respond with “OK!”. The FOG server will not send the next block until it hears “OK!” from every client. If something happens and one client didn’t get the block correctly it will send “Retrans” back to the FOG server and the fog server will resend that block back to the client computer (while the others sit and wait until everyone replies with “OK!”. This is why we say multicasting can only go as fast as the slowest computer in the multicast stream. Consider you have 4, 8-core desktops all with SSD drives and one with a Penitum-4 and a slow HDD. If you imaged them all in one stream the 8 core systems would image at the rate the Penitum-4 system can write data to its slow HDD. If you have a system with a failing hard drive if the block transferred to it’s checksum doesn’t match the checksum of the block on the disk it will send a “Retrans” command back to the FOG server while the other clients wait. The point is when everything works it works well, when you have one bad actor everyone suffers.