Middleware:: Response ERROR: Object reference not set to an instance of an object

Tom Elliott

@Darrin-Enerson Do you have a lot of snapins?

Darrin Enerson

@Tom-Elliott Each host has the same set of 17 snapins associated to them.

Wayne Workman

@Darrin-Enerson Please post a log from just one of the affected hosts, grab a copy of it after imaging finishes and after the fog client does all it’s stuff. We need a full log, but it doesn’t need to be overly long - as long as all the happenings are in it, we can use it.

Darrin Enerson

@Wayne-Workman The system won’t let me upload it as a file and it’s too long for the posting requirements. Here’s the Google Drive link to one of the logs though: https://drive.google.com/file/d/0BzdlDm_GwIvwdllHakduWGdwUTQ/view?usp=sharing.

This comes from a host that was one of the last to finish imaging in this batch. As such the other hosts were already going through joining the domain and attempting to apply snapins. Since whatever is causing this issue also causes the web interface to become increasingly unstable this one failed to even join the domain. Let me know what logs, if any, you need from the server itself.

Wayne Workman

@Darrin-Enerson On that computer, open up cmd and ping fogserver what happens?

Darrin Enerson

@Wayne-Workman

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.
C:\Users\IT>ping fogserver
Pinging fogserver.beaconacademy.com [172.16.12.6] with 32 bytes of data:
Reply from 172.16.12.6: bytes=32 time<1ms TTL=63
Reply from 172.16.12.6: bytes=32 time<1ms TTL=63
Reply from 172.16.12.6: bytes=32 time<1ms TTL=63
Reply from 172.16.12.6: bytes=32 time<1ms TTL=63
Ping statistics for 172.16.12.6:
   Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
   Minimum = 0ms, Maximum = 0ms, Average = 0ms
C:\Users\IT>

I can pull up the FOG management page from that computer as well.

Wayne Workman

@Darrin-Enerson I read in the OP that these are laptops. I would imagine that this has something to do with ping fogserver either not working on LAN or on Wifi.

Darrin Enerson

@Wayne-Workman On these the WiFi settings aren’t loaded until the computers join the domain so they’re entirely LAN based connection until then. As I mentioned if I image each of these one at a time it works without a hitch but in a batch of 24 it fails seemingly randomly and hammers the FOG web interface into instability in the process. I haven’t tried smaller batches yet because I didn’t want to destroy the client logs if they could be helpful. I will note however that in FOG 1.2.0 with the old client this setup doesn’t cause any issues so it’s definitely related to 1.3.0 and the new client.

Darrin Enerson

As an update to this I can image smaller batches of 10 with no issue. That would tend to indicate a load, memory leak, or process leak issue. Let me know if you need more troubleshooting on my end to narrow this down. I’ll proceed in smaller batches for now so that I can get everything prepped by my deadlines but I can set up larger batches for testing this as needed.

Wayne Workman

@Darrin-Enerson Ah. I’ve actually seen this before. @Tom-Elliott remember me telling you about that?

So - the fix, limit your maximum clients in the storage node settings. that’s in Storage Management.

You should also look into Multicast - it’ll be your best friend ever.

Darrin Enerson

I was able to do a bit more testing on this today and think I may have discovered the root cause of the issue. The problem doesn’t actually appear to be in the imaging system but rather in the snapin system. I say this because I can image 25 computers simultaneously with no snapins and don’t encounter any issues, however, if I deploy to the same 25 with snapins it exhibits this behavior as soon as the snapins start applying. The issue appears to be that when a snapin is first pushed out it looks to be running a hash function on the server and client, presumably to make sure it received an unaltered file before executing. The problem is that if a number of these tasks start at roughly the same time it maxes out the CPU and RAM on the server and strange things start to happen. I haven’t found the lower limit of where this starts to occur but I can reproduce it with my process every time I try it so can do any further troubleshooting needed as well as provide an entire batch of logs.

Wayne Workman

@Darrin-Enerson How large are your snapins? In size ?

Tom Elliott

@Wayne-Workman I don’t think size, alone, is the issue. It’s the number of hosts hashing which I’m fairly confident we got working properly for rc9. Loaf on the server will still likely happen if all the files are relatively large in size. I’m working on a few kinks with the database currently which should have limited impact to people if you wanted to give it a test just to see if snapins are less impacting to your server. Just talk to Wayne or myself on how to implement as I don’t want everybody jumping ship. The working RC branch is what will become the next RC release, but I’m still working on at least one semi-major issue and don’t want everybody just giving a shot on it.

Tom Elliott

With the release of rc10 I’ve solved this thread.

Darrin Enerson

@Tom-Elliott Thanks Tom :). I’ll have a chance to do a confirmation test in my setup sometime next week and let you know if I encounter any issues related to this.

Middleware:: Response ERROR: Object reference not set to an instance of an object

120

12.2k

17.4k

155.5k