@Sebastian-Roth I just realized I never responded to this. Ooops!
I would be happy to look at this if you are still up for it.
@Sebastian-Roth I just realized I never responded to this. Ooops!
I would be happy to look at this if you are still up for it.
Thank you. I understand the complexity issue and removing the DB manually is easy enough.
I installed a Storage Node on Ubuntu 18.04 LTS and was surprised to see that it installed MySQL server (or rather, MariaDB). As near as I can tell, the nodes don’t actually use the local database, and instead use the FOG master server’s database.
Am I missing something? Does the Node need that local database, or can I just disabled the service?
I like the /etc/exports.d approach the best.
My age is showing: I didn’t even know this functionality had been added to exportfs or I’d have been using it.
To be fair, it’s been a longstanding recommendation that the FOG server should be running nothing but the FOG server for a reason.
If I followed that advice for every service I stood up that suggested this, I’d have a pile of VM’s, Docker containers, or a slew of physical systems (or a combination of all three) that need to be managed. Each of those solutions brings their own administrative overhead and issues. It’s trading one problem for another.
In an ideal world, this is great. In the real world where resources are limited, I have to make decisions about what can go where, and what’s easiest to manage.
Please see the NFS section of the .fogsettings documentation, and set it to not overwrite your exports file:
I read that document. It says “rebuild” and “rebuilt”. When I have my engine in my car rebuilt, I expect to get back my old engine with new parts in it. I do not expect to get back just the new parts.
The language here matters.
We run nightly backups that keep several weeks of history and restoration took only a few minutes. Only three clients got wedged to the point where they had to be rebooted.
But we were lucky because we only have three dozen or so clients in our lab. In my previous role, I was part of a team that supported roughly 20,000 clients. Imagine the havoc that gets wreaked when an exports file with a few hundred filesystems in it goes away. Batch jobs start hanging, user login sessions hang, and so on. In most cases, it’s just a stale NFS filehandle message, but not all applications are tolerant of an outage of minutes to hours (depending on how long it takes for the issue to get noticed).
I appreciate that it’s a small development team, but this stuff has real impact. Overwriting a file that is not owned by the software and is a critical, shared resource rubs me the wrong way, especially when it takes down infrastructure. And yes, I saw the help, but the use of inconsistent language makes it easy to make faulty assumptions about what the installer is actually doing:
-E --no-exportbuild Skip building nfs file
-P --no-pxedefault Do not overwrite pxe default file
One option speaks of building, another says overwrite. The latter sounds dangerous, the former does not. But, more to the point, sometimes I make mistakes because my life is not 100% error-free, even when I am paying close attention. Software can help stop me from doing something I shouldn’t.
Here is a simple solution for the installer that would mitigate this problem: grep the file for the lines you want to add, and if you don’t see them, use echo with an append:
if grep '^my line regex' /etc/exports > /dev/null 2>&1
then
:
else
echo "my line" >> /etc/exports
fi
Worst-case scenario is that someone modified the line you were going to add, and you end up adding a duplicate.
Folks, the FOG installer blindly overwrites /etc/exports with no warning, no chance to interrupt it, and without backing up the old one. This is bad, bad behavior. I can’t emphasize enough how bad this is.
I see now, after I got bit by this, that there are many complaints about this behavior. Well, there should be, because this is terrible. Just adding menu options to installfog.sh is not enough warning. You need to alert people when you are about to overwrite a file that has been customized and give them a chance to stop it, just like every Linux distribution’s package manager on the planet does. A simple checksum can tell you that you are about to do that.
This worked! Thank you so much for the debugging and the resolution.
FOG has been a huge boon to our lab, and getting this last, niggling issue resolved is very much appreciated.
Thanks, @Sebastian-Roth. I will try this out.
I am an experienced C# developer so .NET isn’t the hangup. I just prefer solutions that are natively supported by the environment they run in. There is less complexity, and less complexity means fewer problems such as this one.
That being said, I’d rather get this working on mono since that’s what’s currently distributed and supported. So, yes, any hints the client developer might have would be greatly appreciated.
@Sebastian-Roth Probably Python since it has widespread support. Honestly, I don’t even need the full client. I just need the host renamer and the task manager for scheduling reboots. (And I can survive without the latter. I am using Fog in a small lab environment with only a few dozen clients).
I tried a source build of Mono 3.10 which is the version mentioned in the links above but the source build failed on a missing symbol that is internal to the mono package which was not encouraging. Deep dives into mono source builds is not where I want to be spending my time.
Is it possible to just fix the cert store by copying it from a working Ubuntu system? I know very little about mono.
Doing a build from source (or will, once I get the proxy server issues resolved with the build package). Will post an update when I have one.
Having a native client, or at least one written in a high-level language that’s native to Linux, would be nice. Is there a spec for the client API?
@Sebastian-Roth Nope, I was searching on other terms and hadn’t seen this one yet.
I’ll give it a shot.
I see the same problem on Fedora 27, which uses the same mono repo as CentOS 7.
I should add that this is a fresh install of CentOS 7.5.
Server
Ubuntu 16.04.5 LTS
4.4.98 kernel
Fog Server v1.5.5
Client
CentOS 7.5
3.10.0-957 kernel
Fog Client 0.11.16
On client, /opt/fog-service/fog.log at startup:
2/27/2019 12:26 PM Main Overriding exception handling
2/27/2019 12:26 PM Main Bootstrapping Zazzles
2/27/2019 12:26 PM Controller Initialize
2/27/2019 12:26 PM Controller Start
2/27/2019 12:26 PM Service Starting service
2/27/2019 12:26 PM Bus Became bus server
2/27/2019 12:26 PM Bus Emmiting message on channel: Status
2/27/2019 12:26 PM Service Invoking early JIT compilation on needed binaries
------------------------------------------------------------------------------
--------------------------------Authentication--------------------------------
------------------------------------------------------------------------------
2/27/2019 12:26 PM Client-Info Version: 0.11.16
2/27/2019 12:26 PM Client-Info OS: Linux
2/27/2019 12:26 PM Middleware::Authentication Waiting for authentication timeout to pass
2/27/2019 12:26 PM Middleware::Communication Download: http://XXXXX/fog/management/other/ssl/srvpublic.crt
2/27/2019 12:26 PM Middleware::Authentication ERROR: Could not authenticate
2/27/2019 12:26 PM Middleware::Authentication ERROR: Value cannot be null.
Parameter name: authority
Where XXXXX is a FQDN.
My Ubuntu 16.04.5 and 18.04.2 clients are working, so I suspect this may be an issue with mono packaging for CentOS. I used the instructions here:
There are no errors in the apache error log.
Access log shows HTTP OK response codes:
X.X.X.X:80 n.n.n.n - - [27/Feb/2019:12:26:35 -0800] "GET /fog/management/other/ca.cert.der HTTP/1.1" 200 1525 "-" "-"
X.X.X.X:80 n.n.n.n - - [27/Feb/2019:12:26:37 -0800] "GET /fog/management/other/ssl/srvpublic.crt HTTP/1.1" 200 1958 "-" "-"
X.X.X.X:80 n.n.n.n - - [27/Feb/2019:12:36:37 -0800] "GET /fog/management/other/ssl/srvpublic.crt HTTP/1.1" 200 1958 "-" "-"
X.X.X.X:80 n.n.n.n - - [27/Feb/2019:12:36:37 -0800] "GET /fog/service/getversion.php?clientver&newService&json HTTP/1.1" 200 554 "-" "-"
X.X.X.X:80 n.n.n.n - - [27/Feb/2019:12:36:37 -0800] "GET /fog/service/getversion.php?newService&json HTTP/1.1" 200 552 "-" "-"
X.X.X.X:80 n.n.n.n - - [27/Feb/2019:12:34:37 -0800] "GET /fog/management/index.php?sub=requestClientInfo&mac=OMITTED&newService&json HTTP/1.1" 200 552 "-" "-"
Where X.X.X.X is the server IP and n.n.n.n is the client IP. tcpdump shows that the public cert is getting transferred in its entirety. The MAC address for requestClientInfo is correct.