Deploy (Unicast): Wrong number of client per node.

Tom Elliott

@mp12 Is this still a problem?

I updated the code base for the getUsed/getQueued hosts. I’m not sure it will do anything, but status update would still be nice if at all possible.

mp12

@Tom-Elliott upgraded to r6285.

No effect here. Started task with four clients.
Max client on both nodes is set to two.

Only node1 processes the four image tasks.

Tom Elliott

@mp12 This seems, to me, to be a timing issue. What happens if you slowly let the clients check in rather than all try to beat the clock?

mp12

@Tom-Elliott thanks for the tip.

Same setup as before.

node1 max client 2
node2 max client 2
number of deployed clients 4

Started the deploys with a delay of 10 sec. between each client.
I tested it via basic tasks/deploy and with the clients boot menu/quick image.

The clients got split correctly. This seems to fix the problem?!

Tested a delay of 7 sec. and the split got messed up.

Tom Elliott

@mp12 this gives me a starting point on what I can do to try to fix the problem you’re seeing. I don’t know why time matters but apparently it does.

I’ll see if I can make things work properly but trying to race between quite literally seconds is not a simple feat.

Wayne Workman

@Tom-Elliott Don’t let the clients start themselves - have the server do it.

Tom Elliott

@Wayne-Workman What?

mp12

@Tom-Elliott maybe a delay between the WOL should be enough? I think 10 sec. should be enough and it also protects the electrical fuse against defective.

Wayne Workman

@Tom-Elliott The only way that a queue would accidentally overfill is if the clients were checking for themselves and then seeing that a slot is open (when two check at once) and going ahead and starting themselves - instead of the server evaluating each request and saying yes or no to it.

Sounds like it takes roughly 7 seconds from a client thinking it’s good to go - to the queue count being updated.

Tom Elliott

I believe this is now fixed, but confirmation would be good. I’m solving, but it can be unsolved again if this is not working still/again.

mp12

@Tom-Elliott

FOG r6439

Still having problems.

node1 max client: 3
node2 max client: 3
number of clients in the deployed group: 6

–> node1 deploys five clients.
–> node2 deploys zero clients.
–> one client is queued.

Tom Elliott

@mp12 What kind of tasks are you running? Are the ALL deploy’s?

mp12

@Tom-Elliott

I am running a instant group deploy (basic tasks). The group has six clients.

Tom Elliott

@mp12 THey are not multicast?

mp12

@Tom-Elliott

No multicast!
Just a normal deploy “type=1”

Tom Elliott

@mp12 And you’re 100% sure you’re running 6439? I ask because I tested this same type of thing quite a lot yesterday.

Granted with only two hosts, but I started both systems at the same time. One won in the battle and the other was pushed to the back.

mp12

@Tom-Elliott

Yes 100% sure. Just did an upgrade when I read your post from yesterday.
I will test with two and four clients tomorrow.

Tom Elliott

@mp12 do all 5 systems actually start receiving the image, or does only three receive the image, and the other 2 wait in line?

See, splitting between multiple nodes is not a straightforward thing. You can still queue many systems to receive an image.

At the time the systems are booting, they’re not magically going to switch between using different nodes. The reason for this is because the clients haven’t started doing anything. From the client’s perspective (when it’s booting up) it sees the same node as the optimalnode it needs to use. The “load” isn’t even calculated until the first system checks in. If 5 systems boot up and decide to use the same optimal node, there’s nothing I can really do for it.

mp12

@Tom-Elliott said:

@mp12 do all 5 systems actually start receiving the image, or does only three receive the image, and the other 2 wait in line?

They all recieved the image at the same time. Yesterday I ran 4 Clients on node1. No split between both nodes.

See, splitting between multiple nodes is not a straightforward thing. You can still queue many systems to receive an image.

I see splitting is not as easy as I thought
Just remembered in fog 0.32 I never had problems with the splitting.

Now I am running another test with the same node config as before.

Started four group deploy (unicast) where each group has two clients.
Between each group deploy there is a delay of some minutes. I thought that the clients now can find a proper node.
In my “Active Tasks” I see eight tasks (six of them are running and two are queued). So far so good.

When I log into the shell of both nodes I see:

node1: six connected clients receiving images via nfs.
node2: zero clients.

Tom Elliott

@mp12 so both nodes are a part of the same storage group and contain the same images?

Deploy (Unicast): Wrong number of client per node.

97

12.7k

17.6k

156.7k