RC6 - Snapins no longer working
-
Since upgrading to RC6, snapins no longer deploy.
When sending a single snapin to a host, the status of the task changes to checked in after a while, but stays on that status indefinitely.
fog.log on the client:
------------------------------------------------------------------------------ ---------------------------------SnapinClient--------------------------------- ------------------------------------------------------------------------------ 4-8-2016 10:03 Client-Info Client Version: 0.11.4 4-8-2016 10:03 Client-Info Client OS: Windows 4-8-2016 10:03 Client-Info Server Version: 1.3.0-RC-6 4-8-2016 10:03 Middleware::Response Success 4-8-2016 10:03 SnapinClient Snapin Found: 4-8-2016 10:03 SnapinClient ID: -1 4-8-2016 10:03 SnapinClient Name: 4-8-2016 10:03 SnapinClient Created: -1 4-8-2016 10:03 SnapinClient Action: 4-8-2016 10:03 SnapinClient Pack: False 4-8-2016 10:03 SnapinClient Hide: False 4-8-2016 10:03 SnapinClient Server: 4-8-2016 10:03 SnapinClient TimeOut: -1 4-8-2016 10:03 SnapinClient RunWith: 4-8-2016 10:03 SnapinClient RunWithArgs: 4-8-2016 10:03 SnapinClient Args: 4-8-2016 10:03 SnapinClient File: 4-8-2016 10:03 SnapinClient ERROR: Snapin hash does not exist ------------------------------------------------------------------------------
-
Fairly sure we got this figured out tonight. I’ve added the changes to the head state of svn and git so the fixes are now a part of 1.3.0-RC-8 so no need to wait.
Thanks @Wayne-Workman for the teamviewer which helped us narrow down what the issue was.
For all following along, it basically boiled down to file hashing taking far too long on large files. The checker would fail if the script took longer than 30 seconds and would also fail if the connection time was too long. It would also fail if the connection time was too long (timeout was defaulted to 15 seconds). So by default, if large file was not hashed within 45 seconds, it would fail the snapin completely. The fix, for simplicity sake, is to allow the hashing call to run unlimited and increase the connection timeout to a day. This could’ve been fixed had I kept a hash within DB, but I have a hard time trusting it as if somebody manually updates a file it would always fail the hash.
For Huge snapins (> 5gb) I’d recommend installing the software in your image rather than rely on the snapin system to install them. I say this, because even without hashing, having such a large file (especially spread out to many hosts) will create a lot of bandwidth usage (leaving the server less to perform imaging with if needed) and would be called that number of times to start transferring the file. Add in the hashing (which is another way The client and server help prevent bad files) and you have one big mess of Load and IO access issues.
-
Confirmed issue.
Sorry this was me working on commonizing and making a more friendly filesize checker/getter.
The file that this references to get the hash was missing a required item that allows access to the rest of the FOG information. Is currently fixed in RC-7.
-
@Tom-Elliott When is RC-7 expected to be released?
-
@Wayne-Workman I don’t know. It’s literally only 2 days old.
-
So, I know this thread has been marked as solved already,
but my building is on RC-6 and we have the same problem with snapins. None of our snapins work, and we’re hurting over it.
I’m highly anticipating RC-7, and I hope it is released soon.
-
Confirmed working again in RC-7.
I deployed 1,200 single-snapins this morning to mixed groups, some hosts got one single snapin, some hosts got 3. It’s knocking them out pretty quickly, the FOG server has a 1Gbps connection and it’s pegged right now. -
This exact problem still exists in the current working-RC-8 branch, for multi-node fog systems with locations enabled.
I have a 800MB MSI that I cannot deploy from anywhere except the main server.
-
@Wayne-Workman Nothing changed for snapins.
-
@Tom-Elliott then it was never fixed for locations, then.
-
@Wayne-Workman Yes it was. You just need to get all items on the same page. All nodes need the update, not just the main.
-
@Tom-Elliott All nodes are on RC-7. The main is on working-RC-8
-
@Wayne-Workman Right but with the replication issue, it’s likely unable to use the location properly.
-
Fairly sure we got this figured out tonight. I’ve added the changes to the head state of svn and git so the fixes are now a part of 1.3.0-RC-8 so no need to wait.
Thanks @Wayne-Workman for the teamviewer which helped us narrow down what the issue was.
For all following along, it basically boiled down to file hashing taking far too long on large files. The checker would fail if the script took longer than 30 seconds and would also fail if the connection time was too long. It would also fail if the connection time was too long (timeout was defaulted to 15 seconds). So by default, if large file was not hashed within 45 seconds, it would fail the snapin completely. The fix, for simplicity sake, is to allow the hashing call to run unlimited and increase the connection timeout to a day. This could’ve been fixed had I kept a hash within DB, but I have a hard time trusting it as if somebody manually updates a file it would always fail the hash.
For Huge snapins (> 5gb) I’d recommend installing the software in your image rather than rely on the snapin system to install them. I say this, because even without hashing, having such a large file (especially spread out to many hosts) will create a lot of bandwidth usage (leaving the server less to perform imaging with if needed) and would be called that number of times to start transferring the file. Add in the hashing (which is another way The client and server help prevent bad files) and you have one big mess of Load and IO access issues.