SOLVED FOG Client stops reporting & working
- FOG Version: 1.3.0 RC-14
- OS: Fedora 21
- Service Version: 0.11.5
- OS: Win10 LTSB x64
I noticed this issue because our computers stopped shutting down overnight. Looking into it, I find that the log file for my computer hasn’t been written to since the 19th, 5 days ago. This started happening when we upgraded from RC-10 to RC-14. Yes big jump, but I was not at work for a month so nobody did it while I was gone. From logs, it looks like my computer has been on since the 19th of this month. Attached are the log and some screenshots.
Looking at the fog user log, It looks like the display manager is at fault.
@michael_f A windows crash would explain it (especially if the disk was in the middle of a write request). I’ll push out a patch to try and address this issue.
@Wayne-Workman Maybe i’ve got the clue. I too have a blanked settings.json file.
The windows eventlog shows an unexpected shutdown at 14:17:56
Windows Explorer shows, that settings.json was last changed at the same time
So I guess the file was opened to get updated while windows crashed.
it’s all smoke and mirrors until someone finds a better clue.
@Hanz I still disagree. Seeing as the images were updated then deployed I still lean towards .NET being the culprit, in what way I have no idea.
@Wayne-Workman then, without updates, it seems like that would rule out the .NET framework updates…I will add that I only noticed this after changing from RC-23 to Tom’s working RC_24 candidate. After uninstall/reinstall the .json file was correct, then reboot wiped it. So far, after reverting to RC_23 doesn’t seem to exhibit this behavior after several reboots.
Wayne Workman last edited by Wayne Workman
isn’t it all unfortunately as I stated before, but I think most of the above is due to empty .json file.
This is absolutely the same bug I and one other have experienced. The server address cannot be found because it is lost from the settings.json file, because the settings.json file gets completely wiped out.
The only difference? I don’t have WSUS here, and these computers in this building do not automatically update. We make a new image every summer for every model, and image everything every summer - so because of this we don’t worry about updates during the school year.
@Joe-Schmitt event viewer, I unfortunately lost that log after uninstall and reinstall of client…
reset token, and uninstall-reinstalled client 11/16/2016 8:44 AM Main Overriding exception handling 11/16/2016 8:44 AM Main Bootstrapping Zazzles 11/16/2016 8:44 AM Controller Initialize 11/16/2016 8:44 AM Zazzles Creating main thread 11/16/2016 8:44 AM Zazzles Service construction complete 11/16/2016 8:44 AM Controller Start 11/16/2016 8:44 AM Service Starting service 11/16/2016 8:44 AM Middleware::Configuration ERROR: Invalid parameters 11/16/2016 8:44 AM Service ERROR: ServerAddress not found! Exiting. 11/16/2016 8:52 AM Main Overriding exception handling 11/16/2016 8:52 AM Main Bootstrapping Zazzles 11/16/2016 8:52 AM Controller Initialize 11/16/2016 8:52 AM Zazzles Creating main thread 11/16/2016 8:52 AM Zazzles Service construction complete 11/16/2016 8:52 AM Controller Start 11/16/2016 8:52 AM Service Starting service 11/16/2016 8:52 AM Middleware::Configuration ERROR: Invalid parameters 11/16/2016 8:52 AM Service ERROR: ServerAddress not found! Exiting.
this isn’t it all unfortunately as I stated before, but I think most of the above is due to empty .json file.
all the “chinese/gibberish” was during the authentication portion when client first starts up…
@Hanz where did that error log come from? Event viewer? If its from the client’s log please upload the entire log file.
@Wayne-Workman I also had this occur, oddly enough after running a snapin that forces the client to update via WSUS. I recently added .NET 4.6.1 to WSUS.
yup. here’s event viewer from that client The performance counter name string value in the registry is not formatted correctly. The malformed string is 뢭傞碻�鬋쏄军⍮휒㾲籍봯禵ᦂ瞣㛂⯦闚椂↛抋篾귩恫嗋⸑蓣톽�쾐⨨莐潋㌍胡빌폍伤밧縒�鬋웘趲ප᥀巽뼉랥躎衙┱禌㾬됯㏨궒外㈦慙ﭯ潘ﯦ랢ꤎ懨䄩賩ꗼ�唹훐⩦�ẏ⾟疽褎⌼凴䞛枢ﴙᚅ옝햌Ш↊ᱰ貼Ӗ醑᥆㔥㎦Ὢ苻☍▩臄ꮨ僭烻�釶⧐첨륗ꃕ⽷㢭�ᬓ욄䤘똎ꖷ뜕뷲훴洬勳꓅뛰ⵏ敺铨苌哅쇇寒楣ᗳ땦샄쿆ﳸᚅᄽ鱔�⬈잃䯧浯厃ݖ�麳䋺꺋릎揍쟏ކꞯ봮蒵慻ኋ풭聕慻ꛓ젭ฦ崘ᆶ㾾ّ腳㗢⧹籛䖩蹯ඝ䎈﨨旑䮋굯䗙阪ꍢ佦빏桟᧑禛捲굃쥏�쑚胬�좮圠ߺ쭷䳭瓕㐕꺯墽䣣씳鳎綸ø쿹淗팅嫵䛕髠텊떪㠷醛ୟꯊ핚隂➼㋳쏔晶럤ﷲ顝༷㛲ㇿ灳ᰲ䷯矂퉷₾�䎡똪胉ᜬ馿ꗶ㏏켗ꔙ柄⽱ꉬ蟊贘톙鷤留閸ꗯ䀡膒릹㣩㮃廹죘쓈⛛曬夐뙶빖摷帛룒䚄�됯㍍땫ᢦ戼㑦灰ﻻ�㻩䌲㔔뎤틕慻쪃鈐�㝚ิ뿮⯷浨リ齃鹸ྵ抃똭�ﹹ辗켟䛶渊㝛躴㙎匔ꋅ့畹⅟䅆汐攳섡증뿋蕖멥ꉘᏎ휻︇栿媼楔ꜣ転亥ി医�뽟﷼꼆鹣㣎ᶉꌾ轭끿뙲師߿꼔檑ꢖꪶ羚緓晴埻욽得⡇넃¶鲱鶤᯾ﱘ뤗鈋裂煉⒑貵뎸כּ뉗욻푍붆杍촛깺㛯揝뮳ꄠꍢ饋㛎䳅輧뭩롫㳄䪄☙c豤ᖳ澷督㉁る鞀멒곮綦묭驼浡⩐䘣᧣㿷묶䠗쩆ꎄл㻶ﻟ뉃ڻ耼萐葽쉅탺⻬顈譍钕Ć펰�员䰙⊰벹槚糈줣搗끘贵�㇒ꮃ壕║㫇럐瓥摮˷猉ત榢懺뷵쏝銻㗝舂㊤咧썛嶳붯�अ劵庛╘杼ꩥ�㒽죉褓瀍䏨脖죜볚⒉案랴´뉗槻ʏ䑈㮜㍯쩏齣ௌ᪒嫖ᵤ솭挖錝쯝嚡燀ꪌ뙼ྥ눷腻斤틠ᩆꊥ�껙⮇巙紼糭현䨙ᩰ涏⠒츞掕幪ᒻ찄晅Ῐ�₼ⷩ�봔揗䑦౫덇쟞쨘춣ힼ⛏룛�䤅◩ᑐ宺ꦝ긵睤�薨䭆딐ﰶ췘ք唀劯텆辑䳙㉘骓篙帯㦛菇䑧ላ鳂ʢ͉糨賐൨僔蔱㇢䴝쎄㝣ᘓ찤䯕漣枾拂泇갦᧱な驒懑磷嫬苕➄뤦浧뷚ẋ嚳蚵ꡥ塅₌鈝㾒⁞껡ᤦ䝅握贚夑䁱ᣏꖶ込短␗咎꫁䢖랣䱾뮿愆衰↛赣刬㬞幦沐躻㔨쌷챴禘붍곈樝⠍툚띏瘶␗棐�ⶑ淯↮䤗邷骪톕뛇ם飉簯찍끑蕋�遱ꉆᱭⓂþ们虈뛩䧏씂�襾焢ꆪꔔ썍괽寊氥瞭京�ਖ਼쟴྇茶�翯竽岽䡽ꢎ樳䓰砠શ뗳베宪㚷鹲懹戲�囹鞾㭚ଫꈍ쵓╏系跿塀㋐봏뎲揝�붨䅃�㙥ῼ. The first DWORD in the Data section contains the index value to the malformed string while the second and third DWORDs in the Data section contain the last valid index values.
I got a bunch of “chinese” in my client log when this occurred
also had another warning saying something about registry being open and in use as well
Hope it helps guys.
Ever since deploying the debug 0.11.5 client that Joe gave me, the issue has not reappeared. It’s been about 20 or so days, I think that’s plenty of time. It could be possible that it is a bug in the client, but at this point and being unable to reproduce, I think it’s a non issue.
For all those that experience the settings.json file going blank, just uninstall and reinstall the fog client. You might do this manually or with a very carefully crafted startup script via GPO, that checks for the presence of a dummy file to indicate if it should uninstall/reinstall or not.
@Tom-Elliott Thanks for the explanation. I am now in the process of upgrading all four fog instances I manage to trunk. This way I might be helpful to advance this further. Keep it rolling!
@jhuebner 0.11.0 was NOT by any means the version that shipped with FOG 1.2.0.
The reasoning a file might go corrupt in the new client particularly with 0.11 on 1.2.0 (and potentially others) is because the client get’s configuration information from the Server. This configuration information has the potential to change each cycle, but to ensure the client operates during the time when a server may be unreachable we update information internally. If this data is corrupted the client usually cleans itself up to ensure something doesn’t go wrong, but if it can’t do that and the file is “open” there is a possible layer for the file to become corrupted.
We haven’t figured out WHAT is causing this and have been working to try to replicate the problem. To me, it seems like a windows update, but this is only my opinion.
Thanks for your quick reply.
I was using the Agent 0.11.0 with 1.2.0, I think that’s the one that came with 1.2.0. I am of the opinion that, whatever version, nothing should kill the settings file so that the agent/zazzles does not work anymore. I am in the good position that our production environment is more of a lab-style character (university campus) - so the world does not end when the agent is not working. I just have to do more manual work which I of course try to avoid.
Besides that: I really like the project and appreciate the work you put into this!
@jhuebner The new client never worked with the version 1.2.0 of FOG. The new client was ONLY introduced during the trunk builds. It’s no wonder why they got corrupted, they had nothing to work from if you really did have the new client and FOG 1.2.0.
I don’t know if this is related:
I had an issue where the content of the settings.json got corrupted on half of the computers and I had to reinstall the agent manually. I don’t know what killed the contents of the file. That happened with the stable version of fog (1.2.0). After reinstalling the agent I upgraded to trunk and am now waiting to see what happens next.
@Wayne-Workman I’m only giving the information as I’m seeing it.
This appears to have started in late July, early August, correct? Which would’ve been around the time (potentially) an image was being updated? I don’t know.
I still am leaning towards some windows update, and particularly a Windows Update in regards to the .NET Framework.
It’s the only guesses I can have at this point and with the “debug client” we should’ve seen something by now if it was server related. I suppose there could still be an issue in the client, but considering the client running has probably usurped the number of cycles before the issue became present, I would imagine this is not the issue either.
@Tom-Elliott Between our summer deployment and now, we’ve ran zero windows updates. We can’t risk downtime, and windows updates are not trustworthy enough, and we have too large a gap of downtime in the summer to not utilize for imaging an updated image with.
I’m very strongly thinking this related to a Windows update. Particularly one in regards to the .NET framework.