SOLVED Windows 10 Error on deployment only on 1st attempts...
So I have had the following issue for the last few months and haven’t been able to figure out what’s going on. The issue seems to only be with any of my windows 10 images, including ones I have just created… and only on the 1st attempt FOG takes to image the machine. After the client machine reboots from the error it images just fine. The error is that it is unable to locate the image store (/bin/fog.download).
Based on searches it seems to be an issue with FTP… I did notice that somehow my deployment server now has both a “fog” and a “fogproject” user account that it uses. I have to use one for my storage node settings and the other for the FTP username and password under the TFTP settings. If I try to use one account or the other for both I can no longer update my kernel and other functions break.
The only thing I can think of is that somehow things got screwy when I went from being on some test builds for troubleshooting last year back to a stable build when 1.5.6 came out.
Any suggestions for fixing this is greatly appreciated.
@Sebastian-Roth That’s fine. Things have been pretty swamped over here so I had to put the issue on the back burner. The issue doesn’t always show up, so troubleshooting has been time consuming.
@george1421 Here is what shows up as mounted on the storage node:
Firewalld stopped and disabled.
sestatus is in permissive mode.
@george1421 I had a meeting this afternoon and will be off the next few days, so I won’t be able to test this until Monday. I will post results when I get in.
@jflippen The connection refused response is telling (maybe)
On 10.59.181.12 if you run the command
showmount -e 127.0.0.1It should show you the network shares that are created.
Also on 10.59.181.12 you do have selinux set to permissible and the linux firewall disabled?
There is another possibility that I have the nfs mount command wrong for FOS. I don’t think so, but anything is possible.
@george1421 Okay. I was issuing the debug task from the web GUI instead of deploying an image with debugging.
So, when I try mounting the volume before running the fog command, I get denied:
However when I then run the FOG command the imaging goes without a hitch:
After the imaging completes the share is mapped no problem.
@jflippen Oh wait, I just saw something in the previous picture.
Fatal Error: Unknown request type: Null
Are you usb booting into FOS? If so you need to schedule the task in the webui before you pick option 1 on the grub menu. If you don’t it will throw that error.
Now back on task, your surface has an IP address of 10.153.2.65. The fog server is on 10.59.10.12 and the storage node 10.59.181.12.
so now with these conditions from your surface pro running FOS.
mkdir /images mount -t nfs 10.59.181.12:/images /images
That should mount the nfs share on your storage node. That is the point where it is erroring out in your initial picture. It can’t map to the /images share on your storage node.
@george1421 Here is a pic of the first command… the 2nd command brought back an empty result on the surface:
The storage node IP is 10.59.181.12.
@jflippen Ok on this target computer where you are at the FOS Linux command prompt, what is the results of these commands:
ip addr show
lspci -nn|grep -i net
EDIT: Also from your picture, what is the storage node IP address? (its off the screen)
@george1421 They are just load sharing. The devices being imaged are on a different subnet than the storage nodes and fog server. Network team said there should be no blocking between sites on the firewall or the switch configs.
@Sebastian-Roth Here’s what I get when trying to debug on my surface when I try to run the fog command on the debug. It doesn’t even make it past the first step.
@jflippen how are your storage nodes setup? Are you using the location plugin to direct clients to specific storage nodes? -OR- are they setup just for load sharing (i.e. first one available gets the imaging job)?
I see from your picture that your fog server is at 10.59.10.12 and the storage server you are using is at 10.59.181.12. Are they on different subnets? What subnet is the pxe booting client on when attempting to connect to 10.59.181.12? Do you have any screening/firewall routers in between that might be blocking the NFS mount?
However the LAG setup didn’t seem to have issues previous to the 1.5.6 update.
Well that’s a good point. On the other hand we haven’t changed that bit of FOG since a while and I can’t see yet how it would be related to anything we did change in the last months.
Let’s see what we can figure out. Please schedule a debug deploy task on the machine you saw this issue on last, boot it up and hit ENTER twice to get to the shell. Now start the process and strep though it till you hit the error. It should throw you back to the shell. Now run the following commands and take a picture of the output you get:
ls -al /images mount
@Sebastian-Roth maybe… I know our network team has been tightening up security. Our main FOG server and our storage nodes are all set up with link aggregation (NIC teaming with LACP) with dual Gigabit ports. However the LAG setup didn’t seem to have issues previous to the 1.5.6 update.
@jflippen That’s really strange. “Unable to locate image store” usually means that the system is not able to properly mount the NFS share on the FOG server. That shouldn’t happen on a random occasion! Maybe you have some kind of issues in your network?