Fog 1.5.2 ignores image storage group association



  • A little background. I have three storage groups setup in our environment, one for our main campus and one at each of our two satellite campuses. The fog server resides on our main campus and there is a storage nodes at each satellite. Each storage node is in a separate group. It’s setup this way because there are images we don’t want to replicate to the satellite campuses and to conserve bandwidth across the wan.

    campus | storage group | server
    main | sg1 | fog
    sat1 | sg2 | storage1
    sat2 | sg3 | storage2

    Everything had been working fine in 1.4. We just updated to 1.5.2 to start testing for our next deployment and I’m running into an issue.

    When I deploy an image that’s only associated with our main campus image group it will sometimes use one of the storage servers that are in a different storage group. This happens with any image I try. I’ve deleted and recreated the image definition and still the same behavior. It seems that it is randomly picking a storage server regardless of whether the image is associated with it’s storage group or not.



  • That seems to have fixed the issue. I ran two images successfully.

    Thanks.


  • Senior Developer

    Please repull the working branch. I did a bunch of testing and found out the issue. I believe I’ve now addressed this particular issue appropriately.



  • @tom-elliott said in Fog 1.5.2 ignores image storage group association:

    Are you saying the image is not assigned to the storage group it’s attempting to pull the image from?

    Yes, that’s exactly what’s happening.

    I’m open to it being a configuration issue but I don’t know where it is. If you need troubleshooting information from my system let me know and I’ll post it.

    My storage nodes are setup
    0_1524142060714_Storage_nodes.png

    The image’s storage node assignment
    0_1524142038814_image_stroage_group.png

    Here’s what the task’s page looks like before the client checks in. The correct node is in the working with node column.
    0_1524142106456_task_before_client_checkin.png

    This is what happens when the client checks in, the working with node changes to one that isn’t associated with the image.
    0_1524142150224_task_after_client_checkin.png


  • Senior Developer

    I’m not seeing where the problem is.

    You’re saying the image storage group association isn’t being followed. Here’s the PHP code and I’ll try to break it down a bit nicer for understanding:

         /**
          * Gets the storage group
          *
          * @throws Exception
          * @return object
          */
         public function getStorageGroup()
         {
             $groupids = $this->get('storagegroups');
             $count = count($groupids);
             if ($count < 1) {
                 $groupids = self::getSubObjectIDs('StorageGroup');
                 $groupids = @min($groupids);
                 if ($groupids < 1) {
                     throw new Exception(_('No viable storage groups found'));
                 }
             }
             $primaryGroup = array();
             foreach ((array)$groupids as &$groupid) {
                 if (!$this->getPrimaryGroup($groupid)) {
                     continue;
                 }
                 $primaryGroup[] = $groupid;
                 unset($groupid);
             }
             if (count($primaryGroup) < 1) {
                 $primaryGroup = @min((array)$groupids);
             } else {
                 $primaryGroup = array_shift($primaryGroup);
             }
    
             return new StorageGroup($primaryGroup);
         }
    

    $groupids = $this->get('storagegroups'); This grabs all the images assigned groups.
    $count = count($groupids); This just gets a count of the groups found.

    Below, if no storage groups found before, we have to find a storage group, This is most likely where there’s a problem though I’m not sure how. I can change this stanza so that it errors before, though this can cause many other issues.

    if ($count < 1) {
        $groupids = self::getSubObjectIDs('StorageGroup');
        $groupids = @min($groupids);
        if ($groupids < 1) {
            throw new Exception(_('No viable storage groups found'));
        }
    }
    

    The below stuff just tries to get the primary group. (As images can belong across multiple groups, the primary group is the “preferred” group to use for the image. Particularly for replication, but should also be preferred for imaging tasks.

    $primaryGroup = array();
    foreach ((array)$groupids as &$groupid) {
        if (!$this->getPrimaryGroup($groupid)) {
            continue;
        }
        $primaryGroup[] = $groupid;
        unset($groupid);
    }
    

    This test checking primary group just ensures a primary group is set. Basically if primaryGroup isn’t set we need to set it to something (from above similar just a secondary check). If one is found, (there should only be one anyway), it sets the primaryGroup ID to that.

    if (count($primaryGroup) < 1) {
        $primaryGroup = @min((array)$groupids);
    } else {
        $primaryGroup = array_shift($primaryGroup);
    }
    

    return new StorageGroup($primaryGroup); (This just returns the now set primary group)

    I know there’s a lot of information there. Basically, this is where I’m seeing things. Bootmenu on tasking calls this particular method to grab the storage group.

    Are you saying the image is not assigned to the storage group it’s attempting to pull the image from?



  • I updated to the working branch this morning. It shows the version as 1.5.2.5 now. Still doing the same thing.

    I don’t have the location plugin installed.

    One thing I did notice while I was testing it this morning is in the task view the working with node column shows the correct node until the client connects.

    The workflow was:
    Create task.
    Check Active Tasks, Working with node is correct.
    Boot client into fog.
    Client Checks in.
    Check Active Tasks, Working with node has changed to incorrect node.
    Client fails because the image doesn’t exist on that node.


  • Senior Developer

    I believe I may have found the problem, though I have no means to test. Would you mind checking out the working branch and installing it to see if this is fixed?

    cd /path/to/fogproject/git
    git checkout working
    git pull
    cd bin
    ./installfog.sh -y
    

  • Senior Developer

    Is this at all, possibly, related to the location plugin? This will help me figure out where things are going awry. (Do you use the location plugin by chance? I don’t recall changes to how storage group associations happen).


 

515
Online

5.4k
Users

12.6k
Topics

118.6k
Posts