Storagenode and queues
-
Server
- FOG Version: rc25
- OS: ubuntu 14
Client
- Service Version: 11.5
- OS: win7
Description
Hi guys, i have four storagenodes each with 5 slots available, main server slots set to 0 so gui/machine is always responsive.
Now i launched an imaging task for 16 machines earlier and while 5 went in and started the rest just sat in the queue as if no slots available but there is 15 more! Checking the machines using
http://fogserver/fog/service/ipxe/boot.php?mac=e0:69:95:dd:03:afI can see that the ones i looked at that are in the queue are all waiting for the same storagenode instead of spreading out over all of them, is this normal ? I would not of thought so but not sure how to resolve.
-
@Wayne-Workman
I think i cracked it, after you mentioned DB connection issues i checked everything using workbench and found fogstorage user had no schema privaleges at all ! Not sure how they would dissapear but after adding back in fog seems to be picking other storagenodes correctly.
Thank you for the help and guidance.
This can be marked as no bug but maybe helpful for others to check if they encounter issues. -
If you’re using the location plugin and these 16 machines are all assigned to the same location, then this behavior is perfectly normal.
Do you know if you’re using the location plugin?
-
@Wayne-Workman
I do have it enabled but i dont set a location on any of the hosts and all the storagenodes are in the same group/location. -
@theWizard If all storage nodes are assigned to the same location and are all in the same group, there’s no point in using the location plugin. You can uninstall it in the plugin management area, then try again, see what happens.
-
@Wayne-Workman
There is because the machines that are used to build images on are set to the main server then when uploading/downloading those machines 1 slot is opened up and there is then no need to wait for replication ( we can be updating and uploading images at the same time as deploying machines with the image it will be replacing so our replication only runs once a day).
Location plugin aside if i open 1 slot on server then deploy 5 machines only 1 will start and others will be queued.I can disable location plugin for testing purposes though no problem.
-
@Wayne-Workman
Ok disabled location plugin and set main server queue to 1, meaning there is 1 slot on main server and 5 on each of the nodes ( 26 total )
Launched two machines to image and only one started, the other sits in the queue i assume because the result of
http://fogserver/fog/service/ipxe/boot.php?mac=28:f1:0e:15:34:e8Is trying to use the server instead of any of the storage nodes.
-
@theWizard Did you disable the location plugin, or completely uninstall it? Also are you sure the image you’re working with exists on the other nodes?
-
I’m going to setup a test at home to see if queuing works right. I’ll be back in a few hours.
-
@Wayne-Workman
Yea i checked all nodes and the images are there as they should be, like i say if i set the queue on main server to 0 it will then use a single storagenode and not any others i.e queue of 5 on node A and fire a group of 20 to deploy only 5 will start and the other 15 will sit in the queue even tho Nodes B C D have 5 slots each too.Yea i removed it from installed plugins.
-
Ok, just tested this at home with 2 storage nodes both on RC-25. I put both in the same storage group and waited for replication to complete. I didn’t install the Location Plugin. I set both storage nodes to
1
maximum clients, and then I started imaging two computers with the same image.It load balanced properly (see pictures below). So now, the question is how do you have your system setup? Can you post a screenshot of your Storage Management area like I did in the 1st picture below so we can see how you have your nodes, groups, masters, and max clients arranged please?
-
Thinking more on this, it could be a DB connection issue or mis-matched version issue. For all of your storage nodes, visit this URL, replacing x.x.x.x with each of their IP addresses.
x.x.x.x/fog/service/getversion.php
- All should respond
- None should give any sort of error
- All should be on the exact same version as the main server, which should be RC-25.
Output for them all should look like:
1.3.0-RC-25
-
Completed on each one no errors at all each one returning RC-25 including server
Attached management page image
-
@Wayne-Workman
I think i cracked it, after you mentioned DB connection issues i checked everything using workbench and found fogstorage user had no schema privaleges at all ! Not sure how they would dissapear but after adding back in fog seems to be picking other storagenodes correctly.
Thank you for the help and guidance.
This can be marked as no bug but maybe helpful for others to check if they encounter issues.