Fog 1.5.7 Locations Plugin pulling image from wrong storage node

Kiweegie

Hi all

Ubuntu 18.04.3, FOG version 1.5.7

I’m running the locations plugin with 2 servers in head office, one normal server set as the Master node for this site and one other main site both of which share a common image plus one Storage server set as the Master node for all other remote sites, mainly smaller branch offices.

Lets refer to these servers as Georgia1 (Normal Master) and Georgia2 (Storage Master) in head office.

Remaining servers are:

Colorado (Storage server replicating from Georgia1)
Arkansas, California and Delaware (Storage servers replicating from Georgia2)

I’ve set up 2 storage groups StorageA and StorageB

StorageA is made up of Georgia1 and Colorado
StorageB is made up of Georgia2, Arkansas, California and Delaware

I have set up locations as follows

Georgia
Georgia-Rep *
Arkansas
California
Colorado
Delaware

this is in same location but set to handle imaging of machines internally to test the images which will be deployed to remote sites (I may be overthinking this).

I’ve linked hosts to groups, and those groups to locations.

When imaging a machine in Georgia with its location set to Georgia or Georgia-Rep it seems to be pulling the image from Colorado storage node.

These are in different sites, on different subnets. Is there a log file somewhere I can check to determine the logic being applied somewhere?

regards Tom

I have locations setup for each site plus 2 for head office to reflect the different storage groups and also so we can image in main office using images for the remote sites for testing. So:

Main office = Site1A and SiteaB
Remote offices Sites 2-12

Sebastian Roth

@Kiweegie This is a quite complex scenario and while your description is fairly detailed I am still struggling to find the knot hole. I’ll just poke into what jumps at me first. No idea if this will lead us the right way.

I’ve linked hosts to groups, and those groups to locations.

I hope you are aware of groups in the FOG world just being a tool to push settings to a group of hosts rather than setting for each individually. What that means is that if you add a host to a group it won’t inherit the settings. You need to “push out a specific setting” to the hosts of a group whenever you add a new host to it.

When imaging a machine in Georgia with its location set to Georgia or Georgia-Rep it seems to be pulling the image from Colorado storage node.

What is the option max clients set to for all these nodes? I suppose this is a unicast deployment, not multicast, right?

Is there a log file somewhere I can check to determine the logic being applied somewhere?

I know this would be great but doesn’t exist. We’ll probably need to go through the code to see why this happens. But please take a look at the things I mentioned above first.

Kiweegie

Hi Sebastien, appreciate the comments. To help others follow along here’s the setup I’m using with names changed to protect the innocent

That will hopefully make it easier to see what I’ve tried to setup. All sites barring Connecticut are on the MPLS, Connecticut is in a hosted site connected over site-to-site VPN.

I understand the concept of Host groups - i’ve just used them to set settings en masse as its designed to do. Just trying to give as complete an overview of the setup so there aren’t any surprises.

Max clients on each node is default of 10 but we’ve not yet tried to deploy to more than a half dozen at a time. I assume your comment here is referring to if we go over the 10 max clients at once it needs to look for a slot on another available storage node which makes sense.

One more question if I may? When an image is uploaded how does the replication to remote nodes work with regards the changes? I’d assume the nature of the image is that it copies the whole image over again? Or can it sort through the changed files in some way and just upload the deltas?

regards Tom

Sebastian Roth

@Kiweegie Ok, I think most of my initial ideas were wrong. But it’s still good to know and check the easy stuff before we get our hands dirty. Started to look at the code and discussed things a little bit with @Tom-Elliott who has a lot more experience with the PHP code than I have. He might have found an issue in the code and pushed out a change that might fix this.

Would you be willing to assist with testing this? The quick one would be to manually add this. But you’d do us a huge favor if you take the step and upgrade to the latest dev-branch where we work on those fixes. There have been a fair amount of changes between 1.5.7 and now, so this is not a minor step. It’s up to you. We will support you if anything goes wrong with this!

Best if you’d have your FOG server running as VMs and can take snapshots to roll back quickly. Be aware there is no easy way back to 1.5.7 other than snapshots or image backup of the server. Don’t get me wrong, we have really done a lot of testing with all this but there is no guarantee for there not being a single issue and I want to play open cards. We really need people with huge environments to test things as we as devs can’t set this all up as well.

Kiweegie

@Sebastian-Roth Hi Sebastien & @Tom-Elliott more than happy to help gents, you’re doingme the favour being so quick to respond to requests and with really good/helpful suggestions. I only wish paid support was this good…

I’ll snapshot the VMs for roll back no problem - i presume every one of the nodes needs updated this way or can I get away with (initially at least) the 2 master nodes in my head office? Only ask as there are 12 nodes to snapshot/update over 10 different vcenters/esxi hosts. If needs be I can do though.

Let me know on the above and I shall proceed accordingly. I’m technically on leave tomorrow but the imaging solution here is getting a lot of focus up the food chain as it were so I’m keen to get working as soon as I can. I can dial in from home tomorrow to focus on this without other distractions

regards Tom

Sebastian Roth

@Kiweegie Great to hear you can do some testing!

i presume every one of the nodes needs updated this way or can I get away with (initially at least) the 2 master nodes in my head office?

Yeah, definitely a good question. In the location settings, did you say PXE boot from location? If yes, then updating the storage node in question could be enough. If all clients PXE boot from the master server then I’d update that one/two. I am trying to get my head around all the major changes we had since 1.5.7 and if one of them could cause an issue if you leave some of the nodes on 1.5.7. Well, we added some database security and I am sure all your storage nodes will fail to connect to the master(s) DB as soon as the master is updated to dev-branch. Not exactly sure about the other way round but I think only updating a storage node might work.

My suggestion: If possible set PXE booting for Colorado location to yes (location setting “Use inits and kernels from this node”). Make sure clients boot properly. Update that one node to dev-branch and see if deployment of a single client pulls the image from Colorado server.

Update to dev-branch:

su -
git clone https://github.com/FOGProject/fogproject
cd fogproject
git checkout dev-branch
cd bin
./installfog.sh

EDIT: Most certainly you will need to adjust a setting because otherwise clients booting the updated init files form your storage node will fail! FOG web UI -> FOG Configuration -> FOG Settings -> TFTP Server -> KERNEL RAMDISK SIZE is probably set to 127000 and you need to set to 275000. This doesn’t hurt even if the other nodes are still on 1.5.7. It just means that machines with less than 512 MB of RAM will fail to run a FOG task.
EDIT2: Just figured that this was even before 1.5.7 was out. So you might check, but it should be set to 275000 already.

Kiweegie

@Sebastian-Roth said in Fog 1.5.7 Locations Plugin pulling image from wrong storage node:

git checkout dev-branch

Hi @Sebastian-Roth

I’ve snapshotted both vms for the 2 Georgia Servers, GEO01VFOG01 and GEO01VFOG02 and uplifted both to latest dev version 1.5.7.109

KERNEL RAMDISK SIZE was already showing as 275000

Can see on this version it forces a root mysql password to be added.

Also that the fogstorage DB password used previously was deemed not secure enough so new one has been generated. From memory the storage node setup calls this user account so do we need to update all the storage nodes to use this pw? If so how or is it simply easiest to uplift all the nodes to dev instance?

regards Tom

Tom Elliott

@Kiweegie from the GUI you should be able to get the fogstorage password from FOG Configuration Page->FOG Settings->Storage Password. I forget the exact string to look for but should be pretty close

Kiweegie

@Tom-Elliott @Sebastian-Roth

Thanks Tom, can see that in the GUI ok and it matches what was displayed on screen during installation.

FOG configuration page > FOG Settings > FOG Storage Nodes > STORAGENODE MYSQLPASS

My question was around whether the storage nodes need to be updated to reflect this change? Per this section in the installer unless I’m mistaken.

What is the username to access the database?
This information is storage in the management portal under
'FOG Configuration' ->
'FOG Settings' ->
'FOG Storage Nodes' ->
'FOG_STORAGENODE_MYSQLUSER'. Username [fogstorage]:
What is the password to access the database?
This information is storage in the management portal under
'FOG Configuration' ->
'FOG Settings' ->
'FOG Storage Nodes' ->
'FOG_STORAGENODE_MYSQLPASS'.  Password:

I’ve updated the 2 master nodes (normal and storage) in Georgia only at this stage and kicked off an image deployment task to a machine in Wisconsin. Its picking up the storage node from Alabama instead still.

To which end I’m going to kill that task and uplift all nodes onto the 1.5.709 dev branch and try again.

regards Tom

Kiweegie

All storage node have now been uplifted to the latest dev version 1.5.709. I edited the /opt/fog/.fogsettings on each to amend to the new fogstorage creds first.

I’ve deployed image task to Wisconsin desktop now and no longer picking up Alabama storage node, this time it’s selected California.

Going back over my setup, each location is set with Use inits and kernels from this node option checked.

There is a location for each site with 2 in Georgia (head office), 2nd of which I had setup as location for Georgia but linked to the storage node. I don’t think this is actually required so have removed this location entry.

There are only 2 storage groups, Toys’RUs and Mattel with latter being replicated out to all the storage nodes.

All storage nodes therefore are pointed to the Mattel Storage group with exception of the Georgia Head office (normal FOG) server which points to Toys’RUs.

I “think” everything is set as it should be. Is there a log file which shows which storage node is being selected and why?

cheers Tom

Sebastian Roth

@Kiweegie I am wondering why you updated the Georgia nodes first but possibly my description was a bit confusing. As long as things are up and running now with all nodes being on dev-branch that’s fine.

I have gone through the code now a few more times but can’t see any obvious problems with it. Though it’s very hard doing this kind of debugging based solely on assumptions. Could you please connect to your database on the master node, run a query as follows and post the full output here:

shell> mysql -u root -p
Password:
...
mysql> use fog;
...
mysql> SELECT * FROM location;
...

Kiweegie

@Sebastian-Roth said in Fog 1.5.7 Locations Plugin pulling image from wrong storage node:

SELECT * FROM location;

+-----+--------------------+-------+-----------------+----------------+------------+---------------------+--------------+
| lID | lName              | lDesc | lStorageGroupID | lStorageNodeID | lCreatedBy | lCreatedTime        | lTftpEnabled |
+-----+--------------------+-------+-----------------+----------------+------------+---------------------+--------------+
|   2 | Alabama	           |       |               2 |              4 | fog        | 2020-01-31 23:37:10 | 1            |
|   3 | Connecticut	   |       |	           2 |              3 | fog        | 2020-01-31 23:37:28 | 1            |
|   4 | Louisiana 	   |       |               2 |              7 | fog        | 2020-01-31 23:39:08 | 1            |
|   5 | Arizona      	   |       |               2 |              6 | fog        | 2020-01-31 23:39:32 | 1            |
|   6 | California         |       |               2 |              8 | fog        | 2020-01-31 23:41:10 | 1            |
|   7 | Arkansas           |       |               2 |              5 | fog        | 2020-01-31 23:41:35 | 1            |
|   8 | Colorado Shop      |       |               2 |              9 | fog        | 2020-01-31 23:42:51 | 1            |
|   9 | Kentucky           |       |               2 |             12 | fog        | 2020-02-01 01:22:42 | 1            |
|  10 | South Dakota       |       |               2 |             11 | fog        | 2020-02-01 01:23:37 | 1            |
|  11 | Wisconsin          |       |               2 |             10 | fog        | 2020-02-01 01:23:53 | 1            |
|  12 | Georgia            |       |               1 |              1 | fog        | 2020-02-01 01:24:45 | 1            |
|  14 | Colorado Office    |       |               1 |             13 | kiweegie   | 2020-02-01 15:34:10 | 1            |
+-----+--------------------+-------+-----------------+----------------+------------+---------------------+--------------+

Sebastian Roth

@Kiweegie Sorry, I’d need info from another query as well:

SELECT ngmID,ngmMemberName,ngmIsMasterNode,ngmGroupID,ngmIsEnabled,ngmHostname,ngmMaxClients FROM nfsGroupMembers;

Kiweegie

@Sebastian-Roth said in Fog 1.5.7 Locations Plugin pulling image from wrong storage node:

SELECT ngmID,ngmMemberName,ngmIsMasterNode,ngmGroupID,ngmIsEnabled,ngmHostname,ngmMaxClients FROM nfsGroupMembers;

Here you go

+-------+---------------+-----------------+------------+--------------+----------------+---------------+
| ngmID | ngmMemberName | ngmIsMasterNode | ngmGroupID | ngmIsEnabled | ngmHostname    | ngmMaxClients |
+-------+---------------+-----------------+------------+--------------+----------------+---------------+
|     1 | GEO01VFOG01   | 1               |          1 | 1            | 10.166.136.199 |            10 |
|     2 | GEO01VFOG02   | 1               |          2 | 1            | 10.166.136.198 |            10 |
|     3 | CON01VFOG01   | 0               |          2 | 1            | 192.168.7.1    |            10 |
|     4 | ALA02VFOG01   | 0               |          2 | 1            | 192.168.1.1    |            10 |
|     5 | ARK01VFOG01   | 0               |          2 | 1            | 192.168.9.1    |            10 |
|     6 | ARI01VFOG01   | 0               |          2 | 1            | 192.168.11.1   |            10 |
|     7 | LOU01VFOG01   | 0               |          2 | 1            | 192.168.6.1    |            10 |
|     8 | CAL01VFOG01   | 0               |          2 | 1            | 192.168.2.1    |            10 |
|     9 | COL02VFOG01   | 0               |          2 | 1            | 192.168.4.1    |            10 |
|    10 | WIS02VFOG01   | 0               |          2 | 1            | 192.168.10.1   |            10 |
|    11 | SOU01VFOG01   | 0               |          2 | 1            | 192.168.8.1    |            10 |
|    12 | KEN02VFOG01   | 0               |          2 | 1            | 192.168.5.1    |            10 |
|    13 | COL01VFOG01   | 0               |          1 | 1            | 192.168.3.1    |            10 |
+-------+---------------+-----------------+------------+--------------+----------------+---------------+

Sebastian Roth

@Kiweegie Sorry for the delay! I have some time tomorrow to work on this. I’ll try to set things up as close to what you have and try to find what’s wrong.

Kiweegie

@Sebastian-Roth Appreciate the time and effort Sebastian. Let me know if you need anything tested my side please.

regards Tom

Kiweegie

Hi @Sebastian-Roth

In case it leads to any pointers or brainwaves.

I thought initially the main server was imaging fine as images sent from GEO01VFOG01 to devices within Georgia site pick up OK. However I can see an example this morning where a machine is trying to image from COL01VFOG01 (Colorado Office) which is the only storage node in the Toys 'R Us storage group. (I’ve updated my spreadsheet below which didn’t accurately reflect this).

In the greater number of cases imaging from Georgia office to Georgia machines is working but imaging to remote sites is pulling from the wrong storage node.

Looking at the dashboard I can the display for the Mattel Storage group displays 99.9999999

Instead of the split shown by the Toys 'R Us group

This could be purely cosmetic and the display was not meant to handle more than 99 clients. I presume this shows the number of client activity per storage group?

I can’t seem to find any logs for the plugins, are there any?

regards Tom

Sebastian Roth

@Kiweegie Not sure if you saw the chat bubble in the top right corner yet. Trying to contact you via PM.

Kiweegie

@Sebastian-Roth I did now, and have replied

Kiweegie

Thanks to a support session from @Sebastian-Roth this evening we’ve confirmed that this is actually working as designed.

There are 2 things at play.

One: Location plugin

Albeit the location plugin being installed the active tasks list initially shows the task as linking to the wrong storage node. This is only until such time as the client boots and checks in then updates to show the correct storage node is actually being used.

Two: BIOS boot order on clients

In our case due to changes in the BIOS boot order on the remote site computers they were never seeing the FOG task on boot so the task still shows on the Active Tasks page as wrong storage node.

I will resolve the BIOS issue on the remote sites and then test again and confirm back that all is working as designed.

regards Tom

Fog 1.5.7 Locations Plugin pulling image from wrong storage node

193

12.2k

17.4k

155.5k