Deploy (Unicast): Wrong number of client per node.
-
@mp12 This seems, to me, to be a timing issue. What happens if you slowly let the clients check in rather than all try to beat the clock?
-
@Tom-Elliott thanks for the tip.
Same setup as before.
node1 max client 2
node2 max client 2
number of deployed clients 4Started the deploys with a delay of 10 sec. between each client.
I tested it via basic tasks/deploy and with the clients boot menu/quick image.The clients got split correctly. This seems to fix the problem?!
Tested a delay of 7 sec. and the split got messed up.
-
@mp12 this gives me a starting point on what I can do to try to fix the problem you’re seeing. I don’t know why time matters but apparently it does.
I’ll see if I can make things work properly but trying to race between quite literally seconds is not a simple feat.
-
@Tom-Elliott Don’t let the clients start themselves - have the server do it.
-
@Wayne-Workman What?
-
@Tom-Elliott maybe a delay between the WOL should be enough? I think 10 sec. should be enough and it also protects the electrical fuse against defective.
-
@Tom-Elliott The only way that a queue would accidentally overfill is if the clients were checking for themselves and then seeing that a slot is open (when two check at once) and going ahead and starting themselves - instead of the server evaluating each request and saying yes or no to it.
Sounds like it takes roughly 7 seconds from a client thinking it’s good to go - to the queue count being updated.
-
I believe this is now fixed, but confirmation would be good. I’m solving, but it can be unsolved again if this is not working still/again.
-
FOG r6439
Still having problems.
node1 max client: 3
node2 max client: 3
number of clients in the deployed group: 6–> node1 deploys five clients.
–> node2 deploys zero clients.
–> one client is queued. -
@mp12 What kind of tasks are you running? Are the ALL deploy’s?
-
I am running a instant group deploy (basic tasks). The group has six clients.
-
@mp12 THey are not multicast?
-
No multicast!
Just a normal deploy “type=1” -
@mp12 And you’re 100% sure you’re running 6439? I ask because I tested this same type of thing quite a lot yesterday.
Granted with only two hosts, but I started both systems at the same time. One won in the battle and the other was pushed to the back.
-
Yes 100% sure. Just did an upgrade when I read your post from yesterday.
I will test with two and four clients tomorrow. -
@mp12 do all 5 systems actually start receiving the image, or does only three receive the image, and the other 2 wait in line?
See, splitting between multiple nodes is not a straightforward thing. You can still queue many systems to receive an image.
At the time the systems are booting, they’re not magically going to switch between using different nodes. The reason for this is because the clients haven’t started doing anything. From the client’s perspective (when it’s booting up) it sees the same node as the optimalnode it needs to use. The “load” isn’t even calculated until the first system checks in. If 5 systems boot up and decide to use the same optimal node, there’s nothing I can really do for it.
-
@Tom-Elliott said:
@mp12 do all 5 systems actually start receiving the image, or does only three receive the image, and the other 2 wait in line?
They all recieved the image at the same time. Yesterday I ran 4 Clients on node1. No split between both nodes.
See, splitting between multiple nodes is not a straightforward thing. You can still queue many systems to receive an image.
I see splitting is not as easy as I thought
Just remembered in fog 0.32 I never had problems with the splitting.Now I am running another test with the same node config as before.
Started four group deploy (unicast) where each group has two clients.
Between each group deploy there is a delay of some minutes. I thought that the clients now can find a proper node.
In my “Active Tasks” I see eight tasks (six of them are running and two are queued). So far so good.When I log into the shell of both nodes I see:
node1: six connected clients receiving images via nfs.
node2: zero clients. -
@mp12 so both nodes are a part of the same storage group and contain the same images?
-
Yes they do.
When I start single deploys with one client and a delay of 10 seconds between, everything works fine. Group deploys won’t work without the splitting error.
-
@mp12 Is this actually causing a problem for you, or are you just trying to help make fog better?