Middleware:: Response ERROR: Object reference not set to an instance of an object
-
@Tom-Elliott If you want a whole slew of fog.log files from these hosts I can provide them or I can deploy some debugging scripts if you want since it seems like I can reproduce the issue on command currently. I can even send you our image for testing if you’d like.
For reference we’re using a gold master image with the auto driver install method from the wiki. The only software installed in the image is the FOG Client 0.11.5 with the FOGService set to “Disabled” before sysprep. We are using a KMS key so setupcomplete.cmd runs after without an issue. As part of the auto driver install we push up a fresh copy of setupcomplete.cmd right after the image is finished deploying that contains the follow:
net STOP "FOGService" start "Reset C:\Drivers Perms" /wait "icacls" C:\drivers /reset /T >>"C:\FOGDrivers.log" 2>&1 sc config FOGService start= auto shutdown -t 0 -r
I did notice on the run that I left over the weekend that the client did try to update and then started throwing the errors so I updated the client in the image before retrying. After that deployment all clients exhibited the issue immediately. I can provide the logs from that second run.
Oddly I don’t see this issue if I only image a single station but if I image the entire batch of 24 it seems to show up consistently. As I mentioned I noticed that the server was getting hammered with sha2sum processes as soon as the clients start checking in with a full batch of 24. The specs on the VM the server was running are 1 core 2.7 GHz, 4 GBs RAM, 10 Gb VMXNET3 Ethernet. I have since upped this to 2 core 2.7 GHz and 8 GBs RAM but have yet to retest just in case you need me to reproduce this issue.
-
@Darrin-Enerson Do you have a lot of snapins?
-
@Tom-Elliott Each host has the same set of 17 snapins associated to them.
-
@Darrin-Enerson Please post a log from just one of the affected hosts, grab a copy of it after imaging finishes and after the fog client does all it’s stuff. We need a full log, but it doesn’t need to be overly long - as long as all the happenings are in it, we can use it.
-
@Wayne-Workman The system won’t let me upload it as a file and it’s too long for the posting requirements. Here’s the Google Drive link to one of the logs though: https://drive.google.com/file/d/0BzdlDm_GwIvwdllHakduWGdwUTQ/view?usp=sharing.
This comes from a host that was one of the last to finish imaging in this batch. As such the other hosts were already going through joining the domain and attempting to apply snapins. Since whatever is causing this issue also causes the web interface to become increasingly unstable this one failed to even join the domain. Let me know what logs, if any, you need from the server itself.
-
@Darrin-Enerson On that computer, open up cmd and
ping fogserver
what happens? -
Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\IT>ping fogserver Pinging fogserver.beaconacademy.com [172.16.12.6] with 32 bytes of data: Reply from 172.16.12.6: bytes=32 time<1ms TTL=63 Reply from 172.16.12.6: bytes=32 time<1ms TTL=63 Reply from 172.16.12.6: bytes=32 time<1ms TTL=63 Reply from 172.16.12.6: bytes=32 time<1ms TTL=63 Ping statistics for 172.16.12.6: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 0ms, Average = 0ms C:\Users\IT>
I can pull up the FOG management page from that computer as well.
-
@Darrin-Enerson I read in the OP that these are laptops. I would imagine that this has something to do with
ping fogserver
either not working on LAN or on Wifi. -
@Wayne-Workman On these the WiFi settings aren’t loaded until the computers join the domain so they’re entirely LAN based connection until then. As I mentioned if I image each of these one at a time it works without a hitch but in a batch of 24 it fails seemingly randomly and hammers the FOG web interface into instability in the process. I haven’t tried smaller batches yet because I didn’t want to destroy the client logs if they could be helpful. I will note however that in FOG 1.2.0 with the old client this setup doesn’t cause any issues so it’s definitely related to 1.3.0 and the new client.
-
As an update to this I can image smaller batches of 10 with no issue. That would tend to indicate a load, memory leak, or process leak issue. Let me know if you need more troubleshooting on my end to narrow this down. I’ll proceed in smaller batches for now so that I can get everything prepped by my deadlines but I can set up larger batches for testing this as needed.
-
@Darrin-Enerson Ah. I’ve actually seen this before. @Tom-Elliott remember me telling you about that?
So - the fix, limit your maximum clients in the storage node settings. that’s in Storage Management.
You should also look into Multicast - it’ll be your best friend ever.
-
I was able to do a bit more testing on this today and think I may have discovered the root cause of the issue. The problem doesn’t actually appear to be in the imaging system but rather in the snapin system. I say this because I can image 25 computers simultaneously with no snapins and don’t encounter any issues, however, if I deploy to the same 25 with snapins it exhibits this behavior as soon as the snapins start applying. The issue appears to be that when a snapin is first pushed out it looks to be running a hash function on the server and client, presumably to make sure it received an unaltered file before executing. The problem is that if a number of these tasks start at roughly the same time it maxes out the CPU and RAM on the server and strange things start to happen. I haven’t found the lower limit of where this starts to occur but I can reproduce it with my process every time I try it so can do any further troubleshooting needed as well as provide an entire batch of logs.
-
@Darrin-Enerson How large are your snapins? In size ?
-
@Wayne-Workman I don’t think size, alone, is the issue. It’s the number of hosts hashing which I’m fairly confident we got working properly for rc9. Loaf on the server will still likely happen if all the files are relatively large in size. I’m working on a few kinks with the database currently which should have limited impact to people if you wanted to give it a test just to see if snapins are less impacting to your server. Just talk to Wayne or myself on how to implement as I don’t want everybody jumping ship. The working RC branch is what will become the next RC release, but I’m still working on at least one semi-major issue and don’t want everybody just giving a shot on it.
-
With the release of rc10 I’ve solved this thread.
-
@Tom-Elliott Thanks Tom :). I’ll have a chance to do a confirmation test in my setup sometime next week and let you know if I encounter any issues related to this.