Imaging works in VM and not on bare metal
-
I have been working on this for days to get this server the way I want it to work. I have several bootable ISO’s working fine in VMware ESXi hosts. When I try to connect to do the same thing it says "Reason: mount: mounting x.x.x.x:/images/dev on /images failed: Connection timed out.
Again the VM’s do not have this issue just the bare metal. I can book things like system rescue CD from it and that works just fine, however thats hosted only on the HTTP server. not sure what I’m doing wrong and I have been at this for days.
System: Debian 10 fresh install only for FOG
FOG: 1.5.9 fresh install
VM Hosting it has 8 2ghz cores and 8 gigs of ram VMware 6.5
Tried too many things to list
The Router has spanning tree disabled its a cisco SG 200-8
Giving up for the night and will be back on tomorrow after work.
Please help.Claw
-
@claw22000 said in Imaging works in VM and not on bare metal:
When I try to connect to do the same thing it says "Reason: mount: mounting x.x.x.x:/images/dev on /images failed: Connection timed out.
Again the VM’s do not have this issue just the bare metal.Not sure I understand this. The mount error on /images/dev happens when you want to capture an image from a host, right? That error only happens when you capture from bare metal but not when capturing an image from a VM?
I would look at the firewall rules on the VM host then.
-
@Claw22000 I’m with @Sebastian-Roth on this.
Typically, “Connect timed out” means it cannot reach the opposite side for whatever reason.
There are possible reasons, but most commonly when one side works, and the other doesn’t it’s firewall related.
My guess why VMs work, but bare metal doesn’t is your VMs are running on the same network segment as the FOG Server. Or, if you have VLANs, The VM network vlan is allowed to reach the FOG Server, where your Physical network VLAN is not allowed.
Just my thoughts. There’s many other reasons this could be the case though too, for example:
FOG is a VM on a “NAT” network, and the same goes for the VMs. The NAT network would only work on itself. External sources, without routing, would not be able to reach the FOG network.
-
@sebastian-roth
It happens only on the bare metal and it doesn’t work on capture to deploy. I tried to take an image captured on the VM and deploy it and also tried capturing from a machine I wanted to image.I also am trying bare metal to bare metal today Everything boots find PXE able to request a registration however cannot image. I tried pulling the image from the same VM that worked before and it will not image. when trying to boot a kubuntu install I setup it will PXE boot however will not mount the /image folder. The kubuntu is a bare metal machine.
Not sure what I’m doing wrong. this is a fresh install all I have done is add a few menu items. for the following…
kubuntu install
ubuntu server install
windows 10 install
System Rescue CD 8Seems the system rescue works though. This however is only hosted with HTTP it doesn’t need the NFS since it sends the image to ram.
Not sure what I’m doing wrong I’m off the rest of the night so if any one wants to help dive in all if it is in a test environment so I can document how it works for when I create an actual server so I am not worried about letting someone dig around in it.
If you are willing of course
Claw
-
@tom-elliott
If its boots off of the HTML sources and not the NFS could that still be a firewall issue. I use PFsense and have never had a single issue with it. Been running it for years now. Pretty close to a decade and I have hosted so many kinds of servers over the years, it would be strange to see it causing an issue now.I’m starting to wonder if its an issue with the hardware (bare metal side) I have 2 old Lenovo M72E systems I’m using for this experiment is it possible that the NIC is the cause?
Claw
-
@claw22000 said in Imaging works in VM and not on bare metal:
If its boots off of the HTML sources and not the NFS could that still be a firewall issue.
Sure! HTTP is port 80 (or 443 for HTTPS) and NFS uses completely different ports. So any firewall being setup correctly would not allow both to do through if not told so for a good reason.
While it is possible the NIC in the Lenovo M72E could cause an issue I don’t think it would cause the “mount: mounting x.x.x.x:/images/dev on /images failed: Connection timed out.”. This is because on boot up the FOS inits do a quick network check to see if the FOG server is reachable. If NIC has a general issue that would not work and it would bail out way earlier.
If you need further help with this we need to know what you setup really looks like. Why is a pfSense between the FOG server and the machines you PXE boot? Are they in two different subnets? And if it’s not pfSense maybe there is a local firewall on the VM host?
If you have Linux installed on one of your PCs you can just boot that up and to NFS mount testing: https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_NFS (I know this article is pretty dated but you will find useful information in there too)
-
@sebastian-roth said in Imaging works in VM and not on bare metal:
If you need further help with this we need to know what you setup really looks like. Why is a pfSense between the FOG server and the machines you PXE boot? Are they in two different subnets? And if it’s not pfSense maybe there is a local firewall on the VM host?
PFSense is not between the systems (Hosts) and the server Its is only a firewall to the outside world.
Everything is on the Same subnet
I followd the instructions to make sure the firewall is disabled and tried the debug test
came back…
mount: mounting x.x.x.x:/images on /images failed: connection timed outis it possible debian 10? do I need a different OS. I typicall use ubuntu server but several spots online say it doesn’t work and recomend using debian.
In the end I don’t care which OS it is since its inside my network it doesn’t need a firewall. I typically disable them on all my servers since only the ports needed to be outside are open and none of them are SSH or anything like it.
-
@claw22000 said in Imaging works in VM and not on bare metal:
came back…
mount: mounting x.x.x.x:/images on /images failed: connection timed outBut it does work from local VMs, right? Can you do the mount test in a VM as well?
I have used FOG in Debian 10 and others have too. Not to blame in general I would say.
-
@sebastian-roth
What I did to test this further was to install and recreate the fog server on a bare metal machine. Then try it the other way round. Neither the bare metal machine or the VM can image on this. Only way it works is VM to VM. Tonight after work I will turn on the VM and do the test again VM to VM.I appreciate all the suggestions as I have no ideas as to what could be the problem.
Claw
-
@sebastian-roth
Tried VM to VM and was able to use the mount command to connect just can’t connect from bare metal to the server.Also tried reinstalling and using Ubuntu tested exactly the same can connect VM to VM and cannot Bare metal to VM
Claw
-
@Claw22000 So what exactly is between bare metal and the FOG server? Host firewall in the VM server? Layer 7 switch with application layer gateway/firewall?
-
@sebastian-roth
My setup is is thisHypervisor: VMware ESXi, 6.5.0, 14320405
Model: PowerEdge R815
Processor Type: AMD Opteron Processor 6276
Logical Processors: 64
Ram: 186 Gigs2 NICS are use and 1 DRAC
1 NIC is directly to the Modem and only accessed by PFSense
the other NIC is shared across all the the VMs PFSense feeding the internet.
The DRAC and internal Network run to a SG 200-08 8-Port Gigabit Smart Switch (Spanning Tree is disabled)
This runs to the other side of the lab to a Netgear JGS516 16 Port unmanaged switchAll computers in the house are then wired to this switch.
VM for the FOG server is
4 CPU’s across 2 sockets with at least 1GHZ reservation
8GB Ram with at least 4Gig Reservation
HD is Thick provisioned with 300gigs. I will increase this once I document how this works and it will get a dedicated drive for this
NIC Adapter Type is VMXNET 3Install steps of current server
sudo -i
wget https://github.com/FOGProject/fogproject/archive/1.5.9.tar.gz
tar -xzvf 1.5.9.tar.gz
rm 1.5.9.tar.gz
cd fogproject-1.5.9/bin
./installfog.sh
click button in browser
press enter in termanal
log in
change default password
create new user for self
chmod -R 777 /images
chown fogproject:nogroup /imagesInfo after all this is done.
sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destinationChain FORWARD (policy ACCEPT)
target prot opt source destinationChain OUTPUT (policy ACCEPT)
target prot opt source destination/etc/hosts.deny # /etc/hosts.deny: list of hosts that are _not_ allowed to access the system. # See the manual pages hosts_access(5) and hosts_options(5). # # Example: ALL: some.host.name, .some.domain # ALL EXCEPT in.fingerd: other.host.name, .other.domain # # If you're going to protect the portmapper use the name "rpcbind" for the # daemon name. See rpcbind(8) and rpc.mountd(8) for further information. # # The PARANOID wildcard matches any host whose name does not match its # address. # # You may wish to enable this to ensure any programs that don't # validate looked up hostnames still leave understandable logs. In past # versions of Debian this has been the default. # ALL: PARANOID
/etc/exports /images *(ro,sync,no_wdelay,no_subtree_check,insecure_locks,no_root_squash,insecure,fsid=0) /images/dev *(rw,async,no_wdelay,no_subtree_check,no_root_squash,insecure,fsid=1)
ls -al / drwxr-xr-x 23 root root 4096 Mar 20 22:50 . drwxr-xr-x 23 root root 4096 Mar 20 22:50 .. lrwxrwxrwx 1 root root 7 Jul 31 2020 bin -> usr/bin drwxr-xr-x 3 root root 4096 Mar 20 18:57 boot drwxr-xr-x 2 root root 4096 Mar 20 17:15 cdrom drwxr-xr-x 17 root root 4000 Mar 20 18:51 dev drwxr-xr-x 102 root root 4096 Mar 21 15:55 etc drwxr-xr-x 4 root root 4096 Mar 20 22:41 home drwxrwxrwx 5 fogproject nogroup 4096 Mar 20 23:15 images lrwxrwxrwx 1 root root 7 Jul 31 2020 lib -> usr/lib lrwxrwxrwx 1 root root 9 Jul 31 2020 lib32 -> usr/lib32 lrwxrwxrwx 1 root root 9 Jul 31 2020 lib64 -> usr/lib64 lrwxrwxrwx 1 root root 10 Jul 31 2020 libx32 -> usr/libx32 drwx------ 2 root root 16384 Mar 20 17:14 lost+found drwxr-xr-x 2 root root 4096 Jul 31 2020 media drwxr-xr-x 2 root root 4096 Jul 31 2020 mnt drwxr-xr-x 3 root root 4096 Mar 20 22:41 opt dr-xr-xr-x 291 root root 0 Mar 20 18:51 proc drwx------ 5 root root 4096 Mar 21 07:27 root drwxr-xr-x 36 root root 1140 Mar 21 15:51 run lrwxrwxrwx 1 root root 8 Jul 31 2020 sbin -> usr/sbin drwxr-xr-x 6 root root 4096 Mar 20 18:52 snap drwxr-xr-x 4 root root 4096 Mar 20 22:41 srv -rw------- 1 root root 4294967296 Mar 20 17:18 swap.img dr-xr-xr-x 13 root root 0 Mar 20 18:51 sys drwxr-xr-x 5 fogproject root 4096 Mar 20 22:50 tftpboot drwxr-xr-x 2 root root 4096 Mar 20 22:50 tftpboot.prev drwxrwxrwt 15 root root 4096 Mar 21 15:39 tmp drwxr-xr-x 14 root root 4096 Jul 31 2020 usr drwxr-xr-x 14 root root 4096 Mar 20 22:34 var
ls -al /images -rwxrwxrwx 1 fogproject root 0 Mar 21 07:20 .mntcheck drwxrwxrwx 3 fogproject root 4096 Mar 20 23:15 dev drwxrwxrwx 2 fogproject root 4096 Mar 20 22:50 postdownloadscripts drwxrwxrwx 2 root root 4096 Mar 20 23:15 win10basic
ls -al /images/dev drwxrwxrwx 3 fogproject root 4096 Mar 20 23:15 . drwxrwxrwx 5 fogproject nogroup 4096 Mar 20 23:15 .. -rwxrwxrwx 1 fogproject root 0 Mar 20 22:50 .mntcheck drwxrwxrwx 2 fogproject root 4096 Mar 20 22:50 postinitscripts
Hope this helps I really appreciate your help. Sorry it takes so long to get back some times I work 50 - 60 hours a week so get wrapped up alot
Claw
-
@Claw22000 Can’t see anything obvious causing the described issue.
When you boot up a machine (bare metal), are you able to ping the FOG server and access it’s web interface in the browser from that machine?
-
@sebastian-roth Yes you can access the HTML side and ping the server. I’m stumped. I would be glad to let you see it your self if you have time. I’m a mid to low level linux user so I might be missing something obvious.
Claw
-
@sebastian-roth Ok some progress today. I tried the following commands and it failed
mkdir /images
mkdir /images/dev
mount -o nolock,proto=tcp,rsize=32768,intr,noatime x.x.x.x:/images /images
mount -o nolock,proto=tcp,rsize=32768,intr,noatime x.x.x.x:/images/dev/ /images/devHowever the following command worked
mount -o nolock,proto=udp,rsize=32768,intr,noatime x.x.x.x:/images /images
mount -o nolock,proto=udp,rsize=32768,intr,noatime x.x.x.x:/images/dev/ /images/devI am able to list the files in the folder and it works correctly.
So now the issue is why TCP doesn’t work on the bare metal and does work on the VM.
Suggestions?
Claw
-
@claw22000 so I presume there is a firewall between the fog server vm and the bare metal machines.
The reason vm to vm works is because they reside on the same side of the switch within the same subnet that the fog vm does. Your firewall likely allows port 80/443 from bare metal to the fog vm network. UDP may be fully allowed on the firewall? Not 100% sure of the network layout but this seems like a firewall issue. The only reason I think udp is working is because maybe an assumption was made that the fog server needed multicast capabilities?
-
I appreciate the help. When you say firewall are you talking about my PFsense Box or are we talking about something that resides in the FogServer?
Claw
-
@claw22000 said in Imaging works in VM and not on bare metal:
I appreciate the help. When you say firewall are you talking about my PFsense Box or are we talking about something that resides in the FogServer?
From what you posted so far (Debian 10 and output iptables command) I would not think this is an issue on the FOG server itself.
While I would not think the SG 200-08 (Cisco, right?) or the Netgear JGS516 do block such traffic it’s still worth to try and rule those out one by one. Please connect one of the bare metal machines directly to the SG 200-08 and see if that makes a difference. If NFS in TCP mode still doesn’t work, then could you take out the Cisco switch of the setup by connecting the Netgear uplink cable to your ESXi directly - just for a quick test I mean.
-
@claw22000 The unfortunate part is we don’t know. Could it be the PFSense box? Yes. Could it be a switch misconfiguration? Possibly.
Based on the information you’ve given us so far, though, it really seems to be a firewall type thing. Does this mean it absolutely is? No. As @sebastian-roth has alluded to, we have to take out and replace variables to more definitively get to the root of the issue.
-
@Tom-Elliott From the description so far I wouldn’t think that pfSense is connected in between.