So i’m seeing what and why and where the issue is.
It’s not really an issue, but rather a designed feature.
There is a timeout as you guys suspected and that is set in:
FOG Configuration->FOG Settings->General Settings FOG_CHECKIN_TIMEOUT
It is defaulted to 600 (10 minutes) and perhaps needs to be set longer in some environments?
That said, I think this is why you’re seeing the issue in that the “checkin time” vs the time they’ve been waiting has been surpassed, so it thinks it’s the next in line. The actual ordering is lost. You could try 15 or 20 minute timeout and see if it gives better results.
I think the idea of the timeout is to allow systems that have been waiting without other taskings happening, a chance to run in. For example, in the case somebody started a job on a system, but then decided to power it down. The now checked in, but powered down system would hold up all the rest of the systems queued behind it for ever if this didn’t change.