Fairly sure we got this figured out tonight. I’ve added the changes to the head state of svn and git so the fixes are now a part of 1.3.0-RC-8 so no need to wait.
Thanks @Wayne-Workman for the teamviewer which helped us narrow down what the issue was.
For all following along, it basically boiled down to file hashing taking far too long on large files. The checker would fail if the script took longer than 30 seconds and would also fail if the connection time was too long. It would also fail if the connection time was too long (timeout was defaulted to 15 seconds). So by default, if large file was not hashed within 45 seconds, it would fail the snapin completely. The fix, for simplicity sake, is to allow the hashing call to run unlimited and increase the connection timeout to a day. This could’ve been fixed had I kept a hash within DB, but I have a hard time trusting it as if somebody manually updates a file it would always fail the hash.
For Huge snapins (> 5gb) I’d recommend installing the software in your image rather than rely on the snapin system to install them. I say this, because even without hashing, having such a large file (especially spread out to many hosts) will create a lot of bandwidth usage (leaving the server less to perform imaging with if needed) and would be called that number of times to start transferring the file. Add in the hashing (which is another way The client and server help prevent bad files) and you have one big mess of Load and IO access issues.