Fog Client Aware of Image Queue Depth

  • I would like to request that the Task Reboot function of the FOG Client be aware of the image queue depth. In this way the users productivity is not impeded when rolling out an image as they stare at a PXE screen informing them they are in line. The end result would be a rolling image deployment as the fog client would not trigger a reset unless there was an open slot.

  • That’s very promising to hear Wayne! I must admit that you shot beyond my skill set with your bash script and use of the API. I look forward to your “generic” solution!

  • I’ve gotta say I was really looking forward to making this happen - so I’m just going to do this on my own in a generic fashion and make it available for all.

  • @AngryScientist This would be a great place for the new FOG API to help out. I think it’s easy to write a script that slowly deploys via the API. If you know generally how long it takes for systems to image, you can use that figure for your sleep values. Take the following (untested) bash script as an example. Note that this script doesn’t need to be on the FOG server, as the communications to the fog server uses curl:

    hosts="host1 host2 host3"
    for host in $hosts; do
    curl --silent -k -H 'Content-Type: application/json' -H 'fog-user-token: ZjBkMmE3YmI5NmUzZDcxYTliYzNkZTc4MmJhNTFiYTQ3Mzc2MTg5MzYxMThmNjA5NDYyMjllMTA5YzE0NWUxMjFiNzkyMTc5OTMwZjFhZGM5NWIxMTc3YWZmNTU2MmMwYjFhNjg0NjVmMTkyMGZkNDQxYmY0MzI1NWNkMzQyM2M=' -H 'fog-api-token: MzI2NDY1NjY2NjM0MzUzMDMzMzA2MzM1MzEzNzYyMzg2NTYyNjQ2MjMxMzczMTM0NjY2NDM0NjUzOTM2NjIzNDM4MzQ2NDM3MzY2MzM2MzMzNjYyMzUzODY0MzUzNDYyMzgzMDY2NjQzNTMxMzI2MzM5NjYzNjYzMzMzMzM0MzA='${host}/task -d '{"taskTypeID":1,"shutdown": true}'
    sleep 400

    The above script is pretty simple, not much there in the way of intelligent tasking. Just a no-brain script with a sleep. If you want to see more intelligent task scheduling with the FOG API, look at the project I’ve made that does automated testing of FOG’s imaging in the link below. With just a little bit of work, this can be made into a system that deploys hosts from a group (getting them from a MySQL query or maybe the fog API?) and then could cycle through each one, waiting for one to complete before beginning the next.

    These below files are involved in automated image testing specifically. All of them require the file, which contains all the settings. You should find that I’ve made the settings file pretty plug-n-play with variables.

    In particular, checkout this (line 61):
    Here, that calls another script that starts imaging and then backgrounds the process. What you could probably do is not background the process, and put this into a for loop, and loop through all your hosts somehow, one at a time. If there’s any sign of trouble, you can just kill the script. It is possible to get more fancy and keep 2 or 3 going at a time instead of just one. And in reality, I’ve never had a ‘perfect’ large fog deployment where nothing went wrong. Something always is wrong. Loose or kinked patch cables, pulled power cords, dying HDDs, you name it, it happens every time. So most likely having more than one host going at once will be needed. This way, the dead box doesn’t stop the show, and you can go handle that bad box and either get it going or kick it out of the Task Manager in the web gui.

    These files have a lot more than you need in them, you will need to prune these way down for your intended purposes. If you have questions about anything, just ask here in this thread, I’ll respond. Make sure to ping me like @Wayne-Workman

  • @Quazz You are absolutely correct, but its the end users productivity being affected. I can not bring down the facility for hours.

  • Moderator

    @AngryScientist 50 desktops at 300MB / min means a 30gb image is deployed to all in 100 minutes or 1h 40min

    50 desktops at 5GB / min 2 at a time means a 30gb image is deployed to all in 150 minutes or 2h 30min.

    Making multicast faster 😉

    Of course it wouldn’t be fun for users to wait that long regardless, I suppose.

  • @Quazz The network in the facility is a hodgepodge of whatever we could get our hands on, and as a result the multicast throughput is abysmal. 300MB per min. Whereas Unicast I get upwards of 5GB per min.

  • Moderator

    @AngryScientist Why would a multicast require additional hardware?

  • Thanks for everyone’s hard work regardless of the outcome of this discussion!

    I suppose I should edit the topic title, but I realize what I’m truly looking for, in an end-game sense is a rolling deployment. With my current hardware, I have reduced my queue depth on our single node to 2, and deploying to 50 in-use hosts causes mass panic akin to a zombie apocalypse. I do not have the option to gather the hardware and do a multicast, because the facility is run 24-7 with each shift using the same computer systems, and there are not enough of them to go around as it is.

  • The problem with any approach is we use a “queued” system to determine which node it’s going to operate from. While the approach of detecting x number of tasks + x number available to node is simple on a single node system such a method becomes overly complicated to figure out when you have 2, 3, 4, or any other number of nodes within your storage groups. Adjusting the “Task type” to switch delayed seems like a relatively decent approach with the exception that we shouldn’t have to change the “type” deployed image to “scheduled” or not. However, this approach runs into the same pitfalls in that it works only with a single node system.

  • I’m not familiar with the specifics of the internal mechanics, but it occurred to me that it may also be handled server side, by pushing the deployment schedule for all not-in-progress tasks to a future time, then when the in-progress deployment complete successfully, change the time to instant, then after two begin, push the rest to a future date. Also sounds shaky…

  • Moderator

    Afaik, the way it works now, FOG is only aware of a queue when clients are already at that screen. Meaning that clients on that screen aren’t counted towards the queue, thus they get put there.

    A “easy” (guessing) way to quickly bandage this would be to check the amount of deployment tasks for a node against the specified max clients for that node and queue “dynamically”. But this sounds shaky, tbh.