Popularity Contest
-
As I sit here working on making the daily fog installation tests more reliable, I wondered to myself when I should remove the older OSs (centos7, rhel7, etc). I thought it’d be nice to know how often what OS is being used to install FOG on. So I propose the following.
I can create a publicly available API that simply accepts a “string” being submitted. Using this, I can put together a popularity contest type thing that will allow us to see what people are installing FOG on. I can build the API (I have scaffolding that I can use to get this done pretty fast) and run it, and make the results publicly viewable.
Within the FOG Installer scripts, I can add a question about participating in the popularity contest. I think this should default to “Yes”. If “Yes” is chosen, a simple curl call is executed sending the output of
lsb_release --all
to the API.Thoughts on this? I’m willing to put it all together and submit the pull requests for it. Though wanted feedback before I went after it.
-
You know the fog server or the web ui does query (somewhere) to find out the latest version of FOG vs what is installed. Is there a way to send the FOG version number during that query?
What we really need to know (from a demographics stand point) is
- how many active FOG installs are out there running
- What version of FOG is currently running
- What host OS and version is the #2 running on
As long as we stick to those values only I don’t see any infringement on anyone’s privacy. There would be no way to tie the organization name to the data. What is really needed is prospective data. I know we could add that to FOG moving forward, but we really need to know what is already out there.
The other thing is to only support a distro supported OS. When the distro drops support for an OS then FOG should too. i.e. should FOG still support ububntu 12.04 or 14.04? When rhel/centos 7 EOL (hint: June 30th, 2024), does that mean FOG will support centos 7 until 2024? Is that the right thing to do? (asking)
-
George, these are great ideas and questions. This is the beauty of open source. Collaboration.
I’ve revised my thoughts now. Here they are.
- Generate a GUID on the FOG system during installation. This will be unique, but not tie to any particular organization.
- On the fog web login screen, send a request with the GUID, FOG Version, and OS Version details to an API.
- Request should be non-blocking. If it fails or times out, it should not impact the login process.
- API records GUID, FOG Version, OS Version, and a datetime stamp.
These things allow:
- Not double-counting any system due to the GUID.
- The ability to know how many are running in the last month/week whatever.
- How many of each FOG Version is out there currently running
- How many of each OS is out there running FOG.
- Presumably see how long a version lives on average, among other things.
- This protects privacy as the GUID does not tie to any organization, and simply allows us to not double-count entries.
Thoughts?
-
@Wayne-Workman Nice one. Thanks for bringing it up. While I hate all the tracking stuff going on nowadays I think it can be used in a good and totally anonymous way to help improve things.
FOG already has got this but it might need improvment. I can take a look in the next week or so.
I am wondering if generating a GUID is really a good idea as it would make tracking possible again. Don’t like the idea. On the other hand we do need some form of ID to distinguish between…
-
@Sebastian-Roth, @Wayne-Workman, @george1421 I agree.
I think the GUID generation can be done relatively simply. We may already have a GUID generated at install time. (FOG_UUID in FOG Settings I think.) Though I don’t know how generalized it is, and it’s more a random ID, not an actual UUID (in UUID Standard)
So my thoughts:
Keep this “polling/analytics” separate from the main fog code base (Maybe under utils?)
Ask, on next update/install, if the admin would like to send analytics data to fog, and potentially what to send?
Store this section to it’s own configuration file.
So the asking requests something like:
Would you like to send analytic data for FOG? (y/N)
We would like to collect the information once a week, on Sunday’s at 03:00:00. (This allows us to delete entries older than a week.)The information we would like to collect are:
- Randomly Generated UUID -> This allows us to track the information anonymously, but also not continually add the request to the queue, data is inserted once, then updated as needed.
- FOG Version -> This will allow us to track what versions of FOG are currently in use.
- OS -> This tells us what type of OS you are using to host the fog server.
- OS Version -> This tells us what version of OS you are using to host the fog server
- Timestamp -> Just the date and time this was sent.
We allow the user to select what they’re comfortable sending. I think this is just a better method overall.
The utility should simply create a simple shell script to send the data the user would like in real time. To collect the FOG Version, should should be able to do a lookup of the /var/www/fog/lib/fog/system.class.php and look up FOG_VERSION in the file.
OS and OS Version should look at lsb_release -a if the command exists, or look at /etc/os-release, or fallback as needed. Finally, the utility would then create the crontab to perform the task regularly.Getting UUID =
echo $(cat /proc/sys/kernel/random/uuid)
We call the configuration file something like/opt/foganalytics/.foganalytics
The script would be called something like/opt/foganalytics/foganalytics
Crontab would look like:0 3 * * 0 /usr/bin/bash /opt/foganalytics/foganalytics >/dev/null 2&>1
Does this all make sense and sound doable?
-
Great approach Tom. If a crontab is setup to only send once per week, the unique identifier becomes totally unnecessary. Because as you suggested, we can delete the collected data on Friday, and have all new fresh data by Monday - or keep all the data, and build reports based on the last week only.
One issue I see to this approach: having potentially many thousand fog servers all reporting in at the same time I think isn’t going to work out. I’d suggest the minute value of the Crontab be randomized between 0 and 59. This will spread the load out for the receiving API.
-
@developers what branch would you like the PR to be created to?
-
@Wayne-Workman i realize i’m coming into this really late, but why not randomize what day of the week we collect the data as well, and we can just look at the window of “last 7 days” when we look at the data?
-
@Junkhacker I think this is the right move.
-
I have the information collection portion of the server completed. Code is currently sitting in my fork of fog-community-scripts.
Here’s what the request looks like right now:
curl -X POST -H "Content-Type: application/json" -d '{"fog_version":"12.34.56","os_name":"Debian","os_version":"10"}' http://fog-popularity-contest.theworkmans.us:/api/records
^ That’s a live link, should work for whoever tries it. If it’s stored ok, the reply is
{"message":"record recorded"}
Here’s a screenshot of the first test record saved, and the table layout.
I’ll work on the client side next, will follow the conclusion of the guidance provided by everyone here.
I do plan on adding https to the server side.
As far as presentation of the data, not sure on that yet. Kicking around ideas of rendering some basic graphs in .png format every day and serving those, as well as making a daily database dump available for download. -
@Wayne-Workman shall we make the stats public, because “why not” and to be transparent about what we collect?
-
@Junkhacker absolutely. In fact, I was planning to make daily dumps of the database available.
-
@george1421 Sorry I have not found enough time to look into this last week. Would have been nice to add this to the upcoming FOG release but I think there are a few questions still to answer, so I won’t rush into it.
Using CRON and keeping it kind of separte from the other code seems like a good idea to me too. Still wondering if using a UUID is better than clearing the DB and wait for new input once a week. I do like the later approach but would spread the time even more. Across one whole way! Plus we have people from different timezones which adds to the randomization as well. So if we don’t need a UUID, is ther a point in using one?
Making it a installer question I am wondering if many people just use the default No and we lack a fair amount of stats?! Currently we have a global user count polling and inserting information on the FOG web UI logon screen anyhow. Adding general stats like OS version and brand doesn’t make it a real analytics tool (I don’t think we should name it anylytics!) and so I am wondering if there is really a point in asking via the installer as we have done stats since a long time.
-
Got a bit more done on this today.
The comparison between my work and the
dev-branch
can be seen here.This is what the settings look like:
This is what the cron job looks like - this is dynamically generated when the installer runs:
This is what the logging on your local FOG Server will look like:
I’ll be using the daily installation tests against my own
fogproject
fork to test against all the major OSs here shortly. I’ll validate the cron jobs get created correctly on each and then I’ll manually change all of the cron jobs to run every minute. This is so I can test immediately rather than waiting a week. -
@Wayne-Workman Well done! Haven’t had a chance to test but looks fine on first sight.
May I ask you to rename it all. The word “analytics” has just so much background to it - we don’t do real analytics here…
-
What do you want it to be called?
Tom had used the “analytics” term in his suggestions which is why I named everything like that, as later on it might be expanded to do more than simply collecting versions. Originally I named everything “popularity”
-
@Wayne-Workman, @Sebastian-Roth
I think analytics is the appropriate term, even if the information we’re using is minimal.
I’ve added similar code, slightly revised.
Please take a look and let me know:
https://github.com/FOGProject/fogproject/commit/1435d1b1852e452295359fb045f3bb38f01ba18f
This introduces the information - but I had a typo which is where the “second” link comes in.AND
https://github.com/FOGProject/fogproject/commit/acfc6445d3b8f9497bf2b63e69e3faec0c51ee51
Thank you,
-
@Tom-Elliott Looks fine to me. Should I submit a pull request to the
dev-branch
since you’ve already added this to the 1.6 branch? -
@Wayne-Workman I think that’d work best, yes.
-
I’ve validated the installation code as I’ve written it does work against all the major distributions - and I’ve also discovered my daily test for Ubuntu 20 was really Ubuntu 18 (fixed now).
Below is what the data looks like in my test environment. It’s interesting to note that RedHat changes it’s distribution name from major release to major release. It’s of no consequence for our use, but I think that’s interesting.
The server-side’s code is in this PR:
https://github.com/FOGProject/fog-community-scripts/pull/62The FOG Server’s code is in this PR:
https://github.com/FOGProject/fogproject/pull/405