Fog 1.5.7 Locations Plugin pulling image from wrong storage node



  • Hi all

    Ubuntu 18.04.3, FOG version 1.5.7

    I’m running the locations plugin with 2 servers in head office, one normal server set as the Master node for this site and one other main site both of which share a common image plus one Storage server set as the Master node for all other remote sites, mainly smaller branch offices.

    Lets refer to these servers as Georgia1 (Normal Master) and Georgia2 (Storage Master) in head office.

    Remaining servers are:

    Colorado (Storage server replicating from Georgia1)
    Arkansas, California and Delaware (Storage servers replicating from Georgia2)

    I’ve set up 2 storage groups StorageA and StorageB

    StorageA is made up of Georgia1 and Colorado
    StorageB is made up of Georgia2, Arkansas, California and Delaware

    I have set up locations as follows

    Georgia
    Georgia-Rep *
    Arkansas
    California
    Colorado
    Delaware

    • this is in same location but set to handle imaging of machines internally to test the images which will be deployed to remote sites (I may be overthinking this).

    I’ve linked hosts to groups, and those groups to locations.

    When imaging a machine in Georgia with its location set to Georgia or Georgia-Rep it seems to be pulling the image from Colorado storage node.

    These are in different sites, on different subnets. Is there a log file somewhere I can check to determine the logic being applied somewhere?

    regards Tom

    I have locations setup for each site plus 2 for head office to reflect the different storage groups and also so we can image in main office using images for the remote sites for testing. So:

    Main office = Site1A and SiteaB
    Remote offices Sites 2-12


  • Senior Developer

    @AlexPDX Please open a new topic as your request has nothing to do with the initial post as far as I can see. Just copy and paste your text from this.



  • This post is deleted!


  • Thanks to a support session from @Sebastian-Roth this evening we’ve confirmed that this is actually working as designed.

    There are 2 things at play.

    One: Location plugin

    Albeit the location plugin being installed the active tasks list initially shows the task as linking to the wrong storage node. This is only until such time as the client boots and checks in then updates to show the correct storage node is actually being used.

    Two: BIOS boot order on clients

    In our case due to changes in the BIOS boot order on the remote site computers they were never seeing the FOG task on boot so the task still shows on the Active Tasks page as wrong storage node.

    I will resolve the BIOS issue on the remote sites and then test again and confirm back that all is working as designed.

    regards Tom



  • @Sebastian-Roth I did now, and have replied 🙂


  • Senior Developer

    @Kiweegie Not sure if you saw the chat bubble in the top right corner yet. Trying to contact you via PM.



  • Hi @Sebastian-Roth

    In case it leads to any pointers or brainwaves.

    I thought initially the main server was imaging fine as images sent from GEO01VFOG01 to devices within Georgia site pick up OK. However I can see an example this morning where a machine is trying to image from COL01VFOG01 (Colorado Office) which is the only storage node in the Toys 'R Us storage group. (I’ve updated my spreadsheet below which didn’t accurately reflect this).

    In the greater number of cases imaging from Georgia office to Georgia machines is working but imaging to remote sites is pulling from the wrong storage node.

    Looking at the dashboard I can the display for the Mattel Storage group displays 99.9999999

    d530d3d3-aa13-44a8-b443-c84d420265e2-image.png

    Instead of the split shown by the Toys 'R Us group

    bf3269c6-dc97-43e6-b38f-dc113c47d990-image.png

    This could be purely cosmetic and the display was not meant to handle more than 99 clients. I presume this shows the number of client activity per storage group?

    I can’t seem to find any logs for the plugins, are there any?

    regards Tom



  • @Sebastian-Roth Appreciate the time and effort Sebastian. Let me know if you need anything tested my side please.

    regards Tom


  • Senior Developer

    @Kiweegie Sorry for the delay! I have some time tomorrow to work on this. I’ll try to set things up as close to what you have and try to find what’s wrong.



  • @Sebastian-Roth said in Fog 1.5.7 Locations Plugin pulling image from wrong storage node:

    SELECT ngmID,ngmMemberName,ngmIsMasterNode,ngmGroupID,ngmIsEnabled,ngmHostname,ngmMaxClients FROM nfsGroupMembers;

    Here you go

    +-------+---------------+-----------------+------------+--------------+----------------+---------------+
    | ngmID | ngmMemberName | ngmIsMasterNode | ngmGroupID | ngmIsEnabled | ngmHostname    | ngmMaxClients |
    +-------+---------------+-----------------+------------+--------------+----------------+---------------+
    |     1 | GEO01VFOG01   | 1               |          1 | 1            | 10.166.136.199 |            10 |
    |     2 | GEO01VFOG02   | 1               |          2 | 1            | 10.166.136.198 |            10 |
    |     3 | CON01VFOG01   | 0               |          2 | 1            | 192.168.7.1    |            10 |
    |     4 | ALA02VFOG01   | 0               |          2 | 1            | 192.168.1.1    |            10 |
    |     5 | ARK01VFOG01   | 0               |          2 | 1            | 192.168.9.1    |            10 |
    |     6 | ARI01VFOG01   | 0               |          2 | 1            | 192.168.11.1   |            10 |
    |     7 | LOU01VFOG01   | 0               |          2 | 1            | 192.168.6.1    |            10 |
    |     8 | CAL01VFOG01   | 0               |          2 | 1            | 192.168.2.1    |            10 |
    |     9 | COL02VFOG01   | 0               |          2 | 1            | 192.168.4.1    |            10 |
    |    10 | WIS02VFOG01   | 0               |          2 | 1            | 192.168.10.1   |            10 |
    |    11 | SOU01VFOG01   | 0               |          2 | 1            | 192.168.8.1    |            10 |
    |    12 | KEN02VFOG01   | 0               |          2 | 1            | 192.168.5.1    |            10 |
    |    13 | COL01VFOG01   | 0               |          1 | 1            | 192.168.3.1    |            10 |
    +-------+---------------+-----------------+------------+--------------+----------------+---------------+
    

  • Senior Developer

    @Kiweegie Sorry, I’d need info from another query as well:

    SELECT ngmID,ngmMemberName,ngmIsMasterNode,ngmGroupID,ngmIsEnabled,ngmHostname,ngmMaxClients FROM nfsGroupMembers;
    


  • @Sebastian-Roth said in Fog 1.5.7 Locations Plugin pulling image from wrong storage node:

    SELECT * FROM location;

    +-----+--------------------+-------+-----------------+----------------+------------+---------------------+--------------+
    | lID | lName              | lDesc | lStorageGroupID | lStorageNodeID | lCreatedBy | lCreatedTime        | lTftpEnabled |
    +-----+--------------------+-------+-----------------+----------------+------------+---------------------+--------------+
    |   2 | Alabama	           |       |               2 |              4 | fog        | 2020-01-31 23:37:10 | 1            |
    |   3 | Connecticut	   |       |	           2 |              3 | fog        | 2020-01-31 23:37:28 | 1            |
    |   4 | Louisiana 	   |       |               2 |              7 | fog        | 2020-01-31 23:39:08 | 1            |
    |   5 | Arizona      	   |       |               2 |              6 | fog        | 2020-01-31 23:39:32 | 1            |
    |   6 | California         |       |               2 |              8 | fog        | 2020-01-31 23:41:10 | 1            |
    |   7 | Arkansas           |       |               2 |              5 | fog        | 2020-01-31 23:41:35 | 1            |
    |   8 | Colorado Shop      |       |               2 |              9 | fog        | 2020-01-31 23:42:51 | 1            |
    |   9 | Kentucky           |       |               2 |             12 | fog        | 2020-02-01 01:22:42 | 1            |
    |  10 | South Dakota       |       |               2 |             11 | fog        | 2020-02-01 01:23:37 | 1            |
    |  11 | Wisconsin          |       |               2 |             10 | fog        | 2020-02-01 01:23:53 | 1            |
    |  12 | Georgia            |       |               1 |              1 | fog        | 2020-02-01 01:24:45 | 1            |
    |  14 | Colorado Office    |       |               1 |             13 | kiweegie   | 2020-02-01 15:34:10 | 1            |
    +-----+--------------------+-------+-----------------+----------------+------------+---------------------+--------------+
    

  • Senior Developer

    @Kiweegie I am wondering why you updated the Georgia nodes first but possibly my description was a bit confusing. As long as things are up and running now with all nodes being on dev-branch that’s fine.

    I have gone through the code now a few more times but can’t see any obvious problems with it. Though it’s very hard doing this kind of debugging based solely on assumptions. Could you please connect to your database on the master node, run a query as follows and post the full output here:

    shell> mysql -u root -p
    Password:
    ...
    mysql> use fog;
    ...
    mysql> SELECT * FROM location;
    ...
    


  • All storage node have now been uplifted to the latest dev version 1.5.709. I edited the /opt/fog/.fogsettings on each to amend to the new fogstorage creds first.

    I’ve deployed image task to Wisconsin desktop now and no longer picking up Alabama storage node, this time it’s selected California.

    Going back over my setup, each location is set with Use inits and kernels from this node option checked.

    There is a location for each site with 2 in Georgia (head office), 2nd of which I had setup as location for Georgia but linked to the storage node. I don’t think this is actually required so have removed this location entry.

    There are only 2 storage groups, Toys’RUs and Mattel with latter being replicated out to all the storage nodes.

    All storage nodes therefore are pointed to the Mattel Storage group with exception of the Georgia Head office (normal FOG) server which points to Toys’RUs.

    I “think” everything is set as it should be. Is there a log file which shows which storage node is being selected and why?

    cheers Tom



  • @Tom-Elliott @Sebastian-Roth

    Thanks Tom, can see that in the GUI ok and it matches what was displayed on screen during installation.

    FOG configuration page > FOG Settings > FOG Storage Nodes > STORAGENODE MYSQLPASS

    My question was around whether the storage nodes need to be updated to reflect this change? Per this section in the installer unless I’m mistaken.

    What is the username to access the database?
    This information is storage in the management portal under
    'FOG Configuration' ->
    'FOG Settings' ->
    'FOG Storage Nodes' ->
    'FOG_STORAGENODE_MYSQLUSER'. Username [fogstorage]:
    What is the password to access the database?
    This information is storage in the management portal under
    'FOG Configuration' ->
    'FOG Settings' ->
    'FOG Storage Nodes' ->
    'FOG_STORAGENODE_MYSQLPASS'.  Password:
    

    I’ve updated the 2 master nodes (normal and storage) in Georgia only at this stage and kicked off an image deployment task to a machine in Wisconsin. Its picking up the storage node from Alabama instead still.

    To which end I’m going to kill that task and uplift all nodes onto the 1.5.709 dev branch and try again.

    regards Tom


  • Senior Developer

    @Kiweegie from the GUI you should be able to get the fogstorage password from FOG Configuration Page->FOG Settings->Storage Password. I forget the exact string to look for but should be pretty close



  • @Sebastian-Roth said in Fog 1.5.7 Locations Plugin pulling image from wrong storage node:

    git checkout dev-branch

    Hi @Sebastian-Roth

    I’ve snapshotted both vms for the 2 Georgia Servers, GEO01VFOG01 and GEO01VFOG02 and uplifted both to latest dev version 1.5.7.109

    KERNEL RAMDISK SIZE was already showing as 275000

    Can see on this version it forces a root mysql password to be added.

    Also that the fogstorage DB password used previously was deemed not secure enough so new one has been generated. From memory the storage node setup calls this user account so do we need to update all the storage nodes to use this pw? If so how or is it simply easiest to uplift all the nodes to dev instance?

    regards Tom


  • Senior Developer

    @Kiweegie Great to hear you can do some testing!

    i presume every one of the nodes needs updated this way or can I get away with (initially at least) the 2 master nodes in my head office?

    Yeah, definitely a good question. In the location settings, did you say PXE boot from location? If yes, then updating the storage node in question could be enough. If all clients PXE boot from the master server then I’d update that one/two. I am trying to get my head around all the major changes we had since 1.5.7 and if one of them could cause an issue if you leave some of the nodes on 1.5.7. Well, we added some database security and I am sure all your storage nodes will fail to connect to the master(s) DB as soon as the master is updated to dev-branch. Not exactly sure about the other way round but I think only updating a storage node might work.

    My suggestion: If possible set PXE booting for Colorado location to yes (location setting “Use inits and kernels from this node”). Make sure clients boot properly. Update that one node to dev-branch and see if deployment of a single client pulls the image from Colorado server.

    Update to dev-branch:

    su -
    git clone https://github.com/FOGProject/fogproject
    cd fogproject
    git checkout dev-branch
    cd bin
    ./installfog.sh
    

    EDIT: Most certainly you will need to adjust a setting because otherwise clients booting the updated init files form your storage node will fail! FOG web UI -> FOG Configuration -> FOG Settings -> TFTP Server -> KERNEL RAMDISK SIZE is probably set to 127000 and you need to set to 275000. This doesn’t hurt even if the other nodes are still on 1.5.7. It just means that machines with less than 512 MB of RAM will fail to run a FOG task.
    EDIT2: Just figured that this was even before 1.5.7 was out. So you might check, but it should be set to 275000 already.



  • @Sebastian-Roth Hi Sebastien & @Tom-Elliott more than happy to help gents, you’re doingme the favour being so quick to respond to requests and with really good/helpful suggestions. I only wish paid support was this good…

    I’ll snapshot the VMs for roll back no problem - i presume every one of the nodes needs updated this way or can I get away with (initially at least) the 2 master nodes in my head office? Only ask as there are 12 nodes to snapshot/update over 10 different vcenters/esxi hosts. If needs be I can do though.

    Let me know on the above and I shall proceed accordingly. I’m technically on leave tomorrow but the imaging solution here is getting a lot of focus up the food chain as it were so I’m keen to get working as soon as I can. I can dial in from home tomorrow to focus on this without other distractions

    regards Tom


  • Senior Developer

    @Kiweegie Ok, I think most of my initial ideas were wrong. But it’s still good to know and check the easy stuff before we get our hands dirty. Started to look at the code and discussed things a little bit with @Tom-Elliott who has a lot more experience with the PHP code than I have. He might have found an issue in the code and pushed out a change that might fix this.

    Would you be willing to assist with testing this? The quick one would be to manually add this. But you’d do us a huge favor if you take the step and upgrade to the latest dev-branch where we work on those fixes. There have been a fair amount of changes between 1.5.7 and now, so this is not a minor step. It’s up to you. We will support you if anything goes wrong with this!

    Best if you’d have your FOG server running as VMs and can take snapshots to roll back quickly. Be aware there is no easy way back to 1.5.7 other than snapshots or image backup of the server. Don’t get me wrong, we have really done a lot of testing with all this but there is no guarantee for there not being a single issue and I want to play open cards. We really need people with huge environments to test things as we as devs can’t set this all up as well.


Log in to reply
 

408
Online

7.4k
Users

14.5k
Topics

136.5k
Posts