FOG 0.32- Server crashed and was restored from backups, but imaging clients fails (hal.dll missing)
Firstly, thanks for the work that has gone into the new forums. I’m sorry that it took a problem with FOG for me to finally sign up to the new one.
Here’s what happened to leave me in my current predicament:
A couple of days ago, the hard drive in our fog server failed (a 500GB Seagate ES drive). Although inconvenient, I of course had backups so it wasn’t the end of the world.
I installed a new hard drive in the fog server (2TB Hitachi), and proceeded to reinstall the operating system (Ubuntu server 10.04.3 32 bit- the same as the fog server was running previously).
I then installed Fog 0.32 (same as I was previously running).
Once I had verified that the fog installation was working, I then proceeded to restore my backups. I had backups of the following: the snapins, the image files, the fog.sql database file.
After restoring my backups, everything seemed to be back to normal. The fog web interface showed all of my clients, snapins and image files
As a test, I tried running memtest from the fog pxe menu on a client. This worked perfectly. I then tried to register a new client via the pxe menu, this also worked perfectly
Finally, I attempted to re image a client using our main SOE image that I had copied over from backup. Everything appears to go fine, right up until the client reboots after the imaging is complete. The first time it attempts to boot into Windows, the following message appears: “Windows could not start because the following file is missing or corrupt: <windows root>\system32\hal.dll. Please reinstall a copy of the above file”
I have tried on two different machines (different Dell GX620 and GX280) and the result is the same- the image is deployed fine, but on first reboot you get the missing hal.dll message. Both these machines worked perfectly with the same image prior to the fog server crashing and being rebuilt.
Is it possible that there is some setting I have missed that would be causing this to happen? Any other possible causes for this behaviour? As mentioned, the image I am restoring is our main SOE image that I have restored from backup, that previously worked perfectly on all our machines.
One thing that I will try today is uploading a new copy of our SOE image from my SOE building machine, so I can test if it is a case of there being something wrong with the image file that I restored from backup.
Thanks in advance for any help you can offer, if you need any additional information to help me work this out please let me know.
A follow-up for those interested.
Re-uploading a fresh SOE image to Fog solved the issue: once I had done that it was fine. Still not 100% sure why I had the issue with restoring the image files I restored from backup, but maybe it could be related to the following…
I use the standard Fog backup script, which does a good job of getting all the important stuff. One thing it doesn’t backup by default though is the /tftpboot/ partition. Now, you don’t really need to back this up unless, like me, you have made changes to anything pxe-related.
In my case, I had applied the chainloading fix described here:
Of course, that was ages ago and I had forgotten about it…until all of our newer Dell machines (optiplex 790) started rebooting when they tried to load the pxe boot menu :( Once I reapplyed the fix it was fine though.
The end result is that I manually added the /tftpboot/ partition to my fog backup scrupt, just to make things easier next time I have to do a restore (hopefully not for a while!).
This is a good reminder to keep another spare of your SOE… I will do a back up on an old spare drive tomorrow. . I wouldn’t want to have these same problems!
A quick update- as a test I uploaded an image of a computer I had in my office (IBM T60 laptop) to a brand new image file I created via the web interface. I then restored the image back to the laptop, and it worked perfectly- no hal.dll missing error.
So, obviously there is something wrong/missing from the image file I restored from backup although I’m not sure what. I’ll proceed to re-upload the image of our main SOE from my SOE building machine and see how that goes. A few pieces of software needed updating anyway, so no big deal :-)