Display errors on queued computers (Unicast)

tian

There are some display errors (and/or bugs?) when the max. allowed number of computers is deploying (Unicast) and some computers have been queued.

The first strange thing is that always “Attempting to check in … Failed” is displayed every 5 seconds when the time is increased:
0_1458815654252_2016-03-24 Fog queue display error 1.jpg
In the scheduler log i can see this message every 5 seconds (The host names are matching the queued computers):

[03-24-16 11:26:26 am] * 6 active tasks awaiting check-in.
[03-24-16 11:26:26 am] | Sending WOL Packet(s)
[03-24-16 11:26:26 am] 		- Host: pc1 WOL sent to all macs associated
[03-24-16 11:26:27 am] 		- Host: pc2 WOL sent to all macs associated
[03-24-16 11:26:27 am] 		- Host: pc3 WOL sent to all macs associated
[03-24-16 11:26:27 am] 		- Host: pc4 WOL sent to all macs associated
[03-24-16 11:26:28 am] 		- Host: pc5 WOL sent to all macs associated
[03-24-16 11:26:28 am] 		- Host: pc6 WOL sent to all macs associated
[03-24-16 11:26:28 am] * No tasks found!

The second error is that all the queued computers display “No open slots, There are 0 before me.”

When there is a free slot again partclone begins to deploy the image to one (not all) of the queued computers.

Also the minutes are just displayed as “i” :
1_1458815654268_2016-03-24 Fog queue display error 2.jpg

Please tell me if you need some more information.

Tom Elliott

As you’re on 6917 according to the picture, can you try updating and tell isif this is still happening?

tian

@Tom-Elliott It is the same (on the queued computers screen and scheduler log) in version 6929 .

Tom Elliott

@tian What’s the node’s maxclients? What’s the group’s total slots? What’s the group’s available slots?

The node clients can be found in storage management, the group’s total and available slots are found on the dashboard page. Change the disk usage selector to chose a node within the same group the host is on.

tian

@Tom-Elliott
What’s the node’s maxclients? -> 13
What’s the group’s total slots? -> ?
What’s the group’s available slots? -> ?

The last two values I don’t know how the get - this is what i can choose on the dashboard disk usage graph:
0_1458830004653_2016-03-24 fog dashboard.png
(We only have a single storage node/group that have been created by fog by default)

When I click on the disk usage graph I just can get the servers hardware information.

Tom Elliott

@tian So you have a few other tasks already running. THose messages are correct then.

Tom Elliott

Basically No open slots means all slots are taken up. The “There are 0 before me” means once a tasking completes, this host will start.

This timing is reset every 5 minutes or so so sometimes you might see the info shift. It’s all based on timing and how many clients are tasked and queued.

tian

@Tom-Elliott There is only one group deployment task running. I just wondered because I’m sure some time ago the output was different.

“There are 0 before me” is displayed for every one of the 6 queued computers now. On earlier fog versions (like 0.32) there has been also messages that 0, 1, 2, 3 … are in front of a computer. So there is no “fixed queue” anymore? (and the queued hosts are picked randomly now?)

Everything else is fine at the moment (pxe, partclone, renaming, active directory, snapins, …) so if this is the normal output now I just have to get used to it.

Tom Elliott

@tian Did all 6 show the same message in the first 5 minutes of it “waiting”?

Tom Elliott

I feel I need to describe. Beyond the 0 before me also showing.

the i is linux/sql shorthand for minutes. (date(‘Y-m-d H:i:s’))

This is because the m flag is often used by the “month” designator.

I could have the data show a TON more info for you (5 seconds), (2 minutes, 35 seconds), etc…

But it screws up the formatting of the line so I chose to go with the shorthand designations as used by the date command. This is solely there to show you somewhat nicer how long it’s been. I thought about TLA for this, but just went with the single letter for now.

Wayne Workman

@Tom-Elliott

A colon would look nicer.
0:5:35 left…

0:2:29 left…

Tom Elliott

@Wayne-Workman that’s not a timing to show you how long you have until you get to image
that’s a timing to show how long it’s been waiting.

tian

@Tom-Elliott said:

@tian Did all 6 show the same message in the first 5 minutes of it “waiting”?

I just watched the queued computers for 10+ minutes. All of the queued computers show “There are 0 before me” from the very beginning. The number never changes to 1, 2, 3 or sth. else.

Also thanks for the explanation for the time format.

Wayne Workman

@Tom-Elliott Well, still, it’d look nicer.

In line for 0:8:23

Tom Elliott

All should now be setup properly.

Please update and if you can test with multiples and let me know.

tian

@Tom-Elliott I just did a test with version 6971 and it looks different now but still seems not completely correct.
The number in “There are x before me” is counting now, but it displays the same number on all the queued computers again:

When there are four comupter waiting it is “3” on all waiting computers
with five computers it is “4” on all waiting computers
at the end all six queued computers display “There are 5 before me” on all waiting computers
I waited 10+ minutes again, but the computers didn’t display different numbers.

Here you can see the change of the number (that takes place on all computers) when one more computer is waiting:
0_1459244785571_2016-03-29 Fog queue display 3.jpg

Tom Elliott

@tian Can you update and try again? I re-adjusted how the counting is to work, in hopes it will work a little better.

Cpasjuste

Hi,

Just a little message to say that i do have the same problem with SVN 6975 : i have an “attempting to check in” loop when trying to restore an image. It was working fine before i updated (i don’t remember which version i was previously but it was SVN 69xx).

Tom Elliott

@Cpasjuste What do you mean? Pictures please if possible.

Cpasjuste

@Tom-Elliott sure, here it is ! So i create a basic task (deploy), then it just loop on the “attempting to check in”. This task did work fine (same machine/image) a few hours ago just before i update to latest git (svn).

alt text

Display errors on queued computers (Unicast)

83

12.7k

17.6k

156.8k