Display errors on queued computers (Unicast)



  • There are some display errors (and/or bugs?) when the max. allowed number of computers is deploying (Unicast) and some computers have been queued.

    The first strange thing is that always “Attempting to check in … Failed” is displayed every 5 seconds when the time is increased:
    0_1458815654252_2016-03-24 Fog queue display error 1.jpg
    In the scheduler log i can see this message every 5 seconds (The host names are matching the queued computers):

    [03-24-16 11:26:26 am] * 6 active tasks awaiting check-in.
    [03-24-16 11:26:26 am] | Sending WOL Packet(s)
    [03-24-16 11:26:26 am] 		- Host: pc1 WOL sent to all macs associated
    [03-24-16 11:26:27 am] 		- Host: pc2 WOL sent to all macs associated
    [03-24-16 11:26:27 am] 		- Host: pc3 WOL sent to all macs associated
    [03-24-16 11:26:27 am] 		- Host: pc4 WOL sent to all macs associated
    [03-24-16 11:26:28 am] 		- Host: pc5 WOL sent to all macs associated
    [03-24-16 11:26:28 am] 		- Host: pc6 WOL sent to all macs associated
    [03-24-16 11:26:28 am] * No tasks found!
    

    The second error is that all the queued computers display “No open slots, There are 0 before me.”

    When there is a free slot again partclone begins to deploy the image to one (not all) of the queued computers.

    Also the minutes are just displayed as “i” :
    1_1458815654268_2016-03-24 Fog queue display error 2.jpg

    Please tell me if you need some more information.


  • Senior Developer

    @Wayne-Workman THey base on the time. If all of them have the exact same checkin time, they will ALL say 0 before me, because it doesn’t know which is which, (of which I’ll likely fall back to the ID of the tasking).

    For now it’s working but a bit odd.


  • Moderator

    @Tom-Elliott Will the number be just all over the place or will it be consistent? Like if 30 computers all check in at exactly the same moment and maximum clients is set to 10 and 10 start, will all other 20 show that “There are 20 before me” ?

    Just trying to get an understanding.


  • Senior Developer

    @Wayne-Workman The count issue is based off of timing now. So if 5 hosts checkin at the same time, they all have the same count value.
    Normally there is a delay, but in the case of group tasks, this might not exist.

    Normal cases will be accurate, and I suppose this one is accurate as well.


  • Moderator

    @Tom-Elliott But the count issue is solved? The “There are 5 before me” message is accurate?


  • Senior Developer

    @tian My worry for the display is that all the clients have the same time starting out. Because of this, they all get updated at the same time too. I update the check in times if they have timed out now, so order should be maintained. However, if all times are identical, there’s no real way to know which one is first or not. Again, it’s a minimal problem, though I can understand the “huh?” issue.



  • @Tom-Elliott Maybe the group deployment task is not working correctly with this in the trunk versions. Except the wrong display the other things are working. We also just use one server (and on this one the default storage/node). We mostly use group deployment tasks - single deployment tasks are not used that often and wouldn’t reach the limit we set.

    Thanks for all the effort you put into solving this.


  • Senior Developer

    @tian I’m fairly sure it should work properly. I only say this because I have the time setup now and I tested with a 1 client node and 3 tasked hosts.

    I don’t know the exact reasoning that these things occur, but I don’t think they’re hindering anything. Maybe group tasks will suck for this?



  • @Tom-Elliott said:

    @tian While I’m glad we’re closer, I hope to have finally gotten this more properly solved. The checkin process was constantly updating the time which is why you would see the numbers change (depending on the other host checking in).

    It should be good now, Hopefully.

    I don’t want to tell you - but now (Version 6981) all queued computers display “There are 0 before me” again …(It also would be fine just to display the total number of computers waiting - or just display the time waited - if this problem consumes too much time at the moment…)


  • Senior Developer

    @tian While I’m glad we’re closer, I hope to have finally gotten this more properly solved. The checkin process was constantly updating the time which is why you would see the numbers change (depending on the other host checking in).

    It should be good now, Hopefully.



  • @Tom-Elliott Thanks for your hard work. Now the numbers are changing and are different. But these are totally mixed up now (Version 6977):
    1_1459263030972_2016-03-29 Fog queue display 5.jpg
    0_1459263030972_2016-03-29 Fog queue display 4.jpg
    The pictures were taken from different computers.
    A lot of times there still are the same numbers to displayed on different computers at the same time (maybe the 5 seconds interval is too huge to see if there are moments with double numbers).

    Is there anything else I can provide?

    (In fog 0.32 the queued numbers were/are fine - with the difference that there was/is a delay till free slots can be used by queued computers. Now in the recent versions the computers starting up first gets the free slots - but that is ok.)



  • @Tom-Elliott

    Thanks and no problem, we know what we sign for when we use the devel branch :) I’ll update in a few minutes and let you know.
    Edit: i confirm that it’s fixed.


  • Senior Developer

    @Cpasjuste Should be fixed, sorry about that.



  • Just a little more information i just found : if i click the “force task to start” button in the “Task Management”, then the deploy task does start. By the way, if you need some testing just send me a mail. I do have a good linux and developement knowledge which may help to debug if needed.



  • @Tom-Elliott sure, here it is ! So i create a basic task (deploy), then it just loop on the “attempting to check in”. This task did work fine (same machine/image) a few hours ago just before i update to latest git (svn).

    alt text


  • Senior Developer

    @Cpasjuste What do you mean? Pictures please if possible.



  • Hi,

    Just a little message to say that i do have the same problem with SVN 6975 : i have an “attempting to check in” loop when trying to restore an image. It was working fine before i updated (i don’t remember which version i was previously but it was SVN 69xx).


  • Senior Developer

    @tian Can you update and try again? I re-adjusted how the counting is to work, in hopes it will work a little better.



  • @Tom-Elliott I just did a test with version 6971 and it looks different now but still seems not completely correct.
    The number in “There are x before me” is counting now, but it displays the same number on all the queued computers again:

    • When there are four comupter waiting it is “3” on all waiting computers
    • with five computers it is “4” on all waiting computers
    • at the end all six queued computers display “There are 5 before me” on all waiting computers
      I waited 10+ minutes again, but the computers didn’t display different numbers.

    Here you can see the change of the number (that takes place on all computers) when one more computer is waiting:
    0_1459244785571_2016-03-29 Fog queue display 3.jpg


  • Senior Developer

    All should now be setup properly.

    Please update and if you can test with multiples and let me know.


Log in to reply
 

459
Online

39176
Users

10821
Topics

102960
Posts

Looks like your connection to FOG Project was lost, please wait while we try to reconnect.