RC10 Broken Items on upgrade

adukes40

@Wayne-Workman Well the workstation says 12:30 and the latest timestamp in the log says 12:30… of course I am typing this at 12:30. Could the server time have been knocked out of sync? All of our workstations are set with GPO to pull time from NTP from our state network.

Wayne Workman

@adukes40 The local fog server’s time has no impact on this particular thing, I think. When @Joe-Schmitt builds the client, he signs it afterwards, and I assume his system has the right timezone and time/date set because we’ve been using client 0.11.5 for some time now, the update to it went fine. It’s been out for a while now. If there was an issue with it’s signature, everyone would have brought this up when it was released. Try to re-run the installer please. It’ll re-download the client, maybe that will fix it.

Also - your bandwidth problem is because all your hosts are re-downloading the new fog client from the server over and over, because signature checking of the signed file is failing. You could also try to delete the web directory and then re-run the installer to ensure you have the latest files correctly. Delete it with rm -rf /var/www/fog;rm -rf /var/www/html/fog and then re-run the installer.

Tom Elliott

@Wayne-Workman it would fail to sign if his date time where broken. This cert was generated by me on a system that had been synced in time by pool.ntp.org. if there were a problem with his time he wouldn’t have been able to sign it at all. The time issue that’s being seen would be from the clients time being off.

This same message can appear from the servers own certs too though, as the time the SSL certs were made on the server could’ve been off. I do try to sync the time with an ntp source first but that doesn’t necessarily mean it will work. This is unlikely the issue though. I don’t know what is, but let’s try to narrow down what is causing the bandwidth usage first. We’ve stopped replication services so we now know it isn’t a replication issue causing the problem. We have essentially forced all snapins to fail rather immediately so as to limit it as a potential issue point. All I’m seeing in the logs is a bunch of ::1 requests which is ipv6 localhost TCP calls. If switching to pure socket connections doesn’t fix this I need to remote in at some point to see exactly what’s going on. For now I recommend not rerunning the installer as it will more or less cause us to have to start from scratch.

Wayne Workman

@Tom-Elliott said in RC10 Broken Items on upgrade:

don’t know what is, but let’s try to narrow down what is causing the bandwidth usage first.

It’s the hosts downloading the client over and over. It says so in the log he posted. The file fails signature authentication, so it just tries again the next iteration.

Wayne Workman

@Joe-Schmitt do you have any ideas or thoughts?

Joe Schmitt

@adukes40 for now globally disable the client auto updating (in fog settings). That will resolve your bandwidth issues and make your clients operable. When I get more free time I’ll take a look at what’s going on.

adukes40

@Joe-Schmitt Went ahead and turned that off. Took forever to get through the GUI, but managed. Still running sluggish and timing out, and flooding the pipe. That will clear up here at 6PM because all the student workstations are set to shutdown.

EDIT: Actually the settings didn’t take. Now that the workstations have shutdown off I can get into the GUI at free will. It is NOW turn off (auto updating of the client)

Wayne Workman

Crosslinking similar threads:
https://forums.fogproject.org/topic/8556/gui-not-responsive/15

Joe Schmitt

@adukes40 I will need to remote into a problematic machine to be able to identify exactly what is going wrong. Send me a chat message when you’re available for a remoting session (preferably using teamviewer).

adukes40

Just as a slight update. I came in this morning and traffic from the master sites was basically Null, and imaging from remote nodes seems to kick off. (there are some weird lines the show up before the image kicks off, will need to get a screen of those, might not be a concerning issue) Testing the image now to see if we are able to complete imaging. Also Tom I do believe I set the Snapins back to ‘’ instead of ‘abc’.

adukes40

OK so now after the image comes down, the master image is still using 11.2 client. but it was complaining about Snaping hash does not exist in the log. I manually updated it to 11.5 (being I had to turn the auto update off) and 11.5 still gave the same message: (also weird it says created today at 7:29, this script has been made since May/June, unless this is by design)

---------------------------------SnapinClient---------------------------------

9/15/2016 7:55 AM Client-Info Client Version: 0.11.5
9/15/2016 7:55 AM Client-Info Client OS: Windows
9/15/2016 7:55 AM Client-Info Server Version: 1.3.0-RC-10
9/15/2016 7:55 AM Middleware::Response Success
9/15/2016 7:55 AM SnapinClient Snapin Found:
9/15/2016 7:55 AM SnapinClient ID: 10357
9/15/2016 7:55 AM SnapinClient Name: 2-InstallScript1
9/15/2016 7:55 AM SnapinClient Created: 2016-09-15 07:29:11
9/15/2016 7:55 AM SnapinClient Action: reboot
9/15/2016 7:55 AM SnapinClient Pack: False
9/15/2016 7:55 AM SnapinClient Hide: False
9/15/2016 7:55 AM SnapinClient Server:
9/15/2016 7:55 AM SnapinClient TimeOut: 0
9/15/2016 7:55 AM SnapinClient RunWith: powershell.exe
9/15/2016 7:55 AM SnapinClient RunWithArgs: -ExecutionPolicy Bypass -NoProfile -File
9/15/2016 7:55 AM SnapinClient Args:
9/15/2016 7:55 AM SnapinClient File: InstallScript1.ps1
9/15/2016 7:55 AM SnapinClient ERROR: Snapin hash does not exist

Thiago

@adukes40
To solve “Snapin hash does not exist”, i updated all my snapins to generate the hash.
When i opened a snapin to edit, the hash field (that is read only) was empty. I clicked the update button and the field was filled.
After this, snapin was installed.
edit. some I had to delete and recreate the snap in, by uploading the file again.

adukes40

@Thiago Hopefully your “edit” isnt what I need to do lol. I have 746 snapins.

It does appear clicking on UPDATE is filling in the hash. I will test this ina few minutes. But is there a way to “update” them all at once?

Tom Elliott

@adukes40 By resetting with update snapins set sHash='' it should make all snapins force update when they are next checked in. The create date is coming from when the snapin tasking was createsd I think.

adukes40

@Tom-Elliott This morning before I manually hit Update, The boxes were blank as @Thiago mentioned.

I just started an image and I am at the snapin install part. Still complaining about Snapin Hash does not exist, even tho:

0_1473945878625_upload-aae00411-c835-4a8c-9bf0-0ac48122b646

adukes40

While looking at the thread about the failing to add snapins, I went and looked at my ownership. It looks like it is now set to Fog:www-data Is this right?

Because a few months ago I had to run this:

chown -R fog:root /opt/fog/snapins

I don’t know where this www-data came from

This is what I mean:

drwxr-xr-x 2 root root 4096 Sep 15 09:32 log/
drwxr-xr-x 9 root root 4096 May 24 13:50 service/
drwxrwxr-x 3 fog www-data 20480 Aug 25 14:42 snapins/

Tom Elliott

@adukes40 this is correct. So long as the first part is fog you will be fine. The www-data is the Apache user on debian based systems. Permissions as set with that because it’s typically accessed using web calls. It’s just to ensure we can do what we need.

adukes40

@Tom-Elliott Ok, just making sure something didn’t go haywire. Not sure what else to do about my snapins tho. I have a snapshot of my server from Aug 29 I can revert to, but I do not have any snapshots of my nodes Which I assume would cause issues. Just incase we needed t explore that route.

adukes40

Wait, so after I update the hash, does this need to replicate out to the nodes, because we still have the replication turned off from the other day. If it does, this might be why my remote site (where I am) still isn’t working.

Tom Elliott

@adukes40 the problem with just updating is the hash is coming from the file. Because you’re not uploading a file the hash returned is likely the same for all snapins.

RC10 Broken Items on upgrade

---------------------------------SnapinClient---------------------------------

76

12.7k

17.6k

156.6k