Fog 1.5.0 Storage Node Problems with PXE booting?
-
I recently deployed a new storage node to one of our remote locations that are connected via MPLS as I always do when spinning up a new site.
The problem I am having is here is this VIDEO
Any help would be greatly appreciated, I have been struggling with this for hours.
Thanks.
-
If I had to guess, either the disk is setup for GPT, or there’s no OS on that disk?
I’m only guessing here as I don’t have the environment readily available.
-
I think watching more of the videa. the 4xx error you saw is it attempting to redirect. It’s almost like the machine is getting information from multiple different sources.
One side got an IP on 10.x.x.x
Another got an IP on 192.168.x.x(That’s just from the fos system once it finally made it in there. I didn’t pay close enough attention to the pxe dhcp responses.
-
@tom-elliott
Yes the 192.168.x.x is the managment fog server. -
@Greg-Plamondon Although the IP/network addresses seemed a bit mixed up at first I watched the video a couple of times and I guess they’re not.
Local on-site subnet is: 10.30.100.x/24 with the local FOG server being outside of that subnet 10.30.40.39. Gateway information is fine for the client so I don’t see why this shouldn’t work. And then we have the master node at 192.168.10.238.
Lets start with the exit type. When no job is scheduled for the client it comes up with the FOG menu and tries to boot from hard disk using the GRUB method. This does not work for your client (not exactly sure why) but you can try different exit types (either as global setting or specific to that client in the host’s settings). Try simple EXIT as it works quite often.
So now to when you have a task scheduled. To me this looks like your client is receiving proper address information from the FOG DHCP server but contacting the FOG servers via HTTP it gets different answers. One time we see a HTTP 4xx error. Does this happen often? Take a look at the main servers apache error and access log to match if this is really the case.
Then when it gets further on the second try the client receives a new IP addr (10.30.100.190 instead of 10.30.100.191). Not saying that this has to be an issue but I find it strange. Usually DHCP servers remember leases and hand out the same IP to same clients (MAC addr) for a certain amount of time.
The error “Failed to get an IP via DHCP! Tried on interface(s): eth0” is a bit misleading. Behind the scenes the client tries to reach out to the FOG server via HTTP/S and fails with that same message. But we changed that message months ago and so I think your storage node might not be FOG version 1.5.0!!
-
@sebastian-roth said in Fog 1.5.0 Storage Node Problems with PXE booting?:
DHCP! Tried on interface(s): eth0” is a bit misleading. Behind the scenes the client tries to reach out to the FOG server via HTTP/S and fails with that same message. But w
Thaks for the reply, the exit works fine, the client in question doesn’t have an OS installed. I think that’s why we are seeing that.
I will look into the apache logs on the server and see what I can find. but I have noticed that some client from time to time take a second attempt at PXE booting to initiate the scheduled task.here is the git log -1 output for the storage node.
[root@30fogserver fogproject]# git log -1 commit e7164d0f844e2ad47d16e587f441e1c0e9f28d4a Author: Tom Elliott <tommygunsster@xxxxx.com> Date: Mon Feb 26 13:48:02 2018 -0500 Push up 1.5.0 [root@30fogserver fogproject]#
-
I dont know if this has anything to do with my current issues… but every now and then when accessing different menu options on the fogserver i get this:
If i refresh the page it takes me back to the fogserver console page…
-
@greg-plamondon Hint: When you get the http 500 errors, inspect your apache error log file so see if apache or php threw an error.
-
@greg-plamondon I see
30fogserver
in the console output and10fogserver
in your browser window. Sure you don’t mix up things? Which is which?The message "Failed to get an IP via DHCP! Tried on interface(s): eth0” points to the init files (
/var/www/fog/service/ipxe/init*
) possibly being old although the node was installed with FOG 1.5.0. I remember @Tom-Elliott saying that some kernel images didn’t update when pushing out 1.5.0. Possibly that was the case for the inits as well. It’s fixed for the kernels and so you might try downloading the latest init binaries again:sudo -i cd /var/www/fog/service/ipxe/ mv init.xz init.xz.bak mv init_32.xz init_32.xz.bak wget https://fogproject.org/inits/init.xz wget https://fogproject.org/inits/init_32.xz chown fog init*
-
@sebastian-roth said in Fog 1.5.0 Storage Node Problems with PXE booting?:
10fogserver
Sorry i should have been more clear… the 10fogserver is management, 30fogserver is the node we are having issues with.
I was only mentioning the browser issue because I thought it may be causing the PXE problem.I will try the new init files and update.
Thanks! -
I updated the inits and I the problem persists.
-
@Greg-Plamondon Do you still see the exact same error message?? Please post a picture of the screen when it boots into FOS and tries to get an IP three times in a row.
-
@sebastian-roth
Here is a new video -
At the point where it tries three times. It looks like its picking up an IP address. Where it appears to be failing (hint: we need to get a few extra echo statements in there to say trying to contact the fog server at IP address xxxx) is to contact the FOG server to prove it has a path. Now I don’t know if its the storage node it tries to connect to or the master FOG server. But I feel at this point its failing to communicate, so it tries 3 times and gives up.
-
@Greg-Plamondon Thanks for the video! Again I see the message "Failed to get an IP via DHCP! Tried on interface(s): eth0” which is not what I’d expect with the current init files. I just downloaded and extracted the init.xz file and it surely has the current message “Either DHCP failed or we were unable to access ${web}/index.php for connection testing.” in it!
So please follow these steps:
sudo -i cd /var/www/fog/service/ipxe/ mv init.xz init.xz.bak mv init_32.xz init_32.xz.bak wget https://fogproject.org/inits/init.xz wget https://fogproject.org/inits/init_32.xz chown fog init*
-
@sebastian-roth
The kernel panics with Kernel too old.
Video -
@Greg-Plamondon Sorry for the long delay. You want to update the kernel images as well. Similar commands but download URL is https://fogproject.org/kernels/bzImage
Typing on my mobile here so this is from the top of my head. Will check and add the full command set soon.
-
@Greg-Plamondon Here you go:
sudo -i cd /var/www/fog/service/ipxe/ mv bzImage bzImage.bak mv bzImage32 bzImage32.bak wget https://fogproject.org/kernels/bzImage wget https://fogproject.org/kernels/bzImage32 chown fog bzImage*
But I am still wondering about the version of FOG. Somehow feels like there is something wrong?! Maybe delete and re-clone the git repo?
-
@sebastian-roth
Here is a new VIDEO with the new kernels and inits.Just so I am clear what is the git repository link I should be using?
-
@Greg-Plamondon The new video is interesting as is seems to show that it finds the interface
eth0
but is not able to bring it up within 35 seconds. What kind of client model is this? Are you able to FOS boot this very same machine at a different location? I kind of doubt but what do I know.