Storage Nodes Not Providing Images
-
@george1421 said in Storage Nodes Not Providing Images:
Each fog server will fill its 1GbE network uplink connection with 3-4 simultaneous unicast sessions
In my experience, unicast will saturate a 1Gbps link with 3 simultaneous sessions. With 2 simultaneous sessions, it is almost saturated.
But yeah, you need to use Multicast. But still, we should explore why the nodes aren’t working. It’s probably something simple.
-
@wayne-workman said in Storage Nodes Not Providing Images:
In my experience, unicast will saturate a 1Gbps link with 3 simultaneous sessions. With 2 simultaneous sessions, it is almost saturated.
That was also my experience too. I was being a bit generous saying 3-4 where I included the performance hit of the disk subsystem too. But its exactly the 3rd stream where the link gets saturated and the retransmits shoot up, if all you are doing is moving data. If you have a slower disk subsystem then that releases some of the pressure on the network a bit.
ref: https://forums.fogproject.org/topic/10459/can-you-make-fog-imaging-go-fast/5
-
Thank you both for all the information. I have considered link aggregation but have not worked on implementing such a solution yet. I have not tried a multicast session yet but have just started reading more on it. In terms of imaging speeds, before I made the nodes I was usually getting about 2.9 to 5.3 GB/min according to Partclone, depending on the age of the hard drives in the machines that were being imaged. This was still while imaging 5-7 machines at least. When only imaging 2-4, the speeds were often higher. Each of the servers have 1TB SSDs, so they are pretty quick when it comes to their own disk speeds. I’ll look at the link saturations tomorrow when I get in the office and follow up here, thank you for the helpful links! Here is the Storage Node webpage from the GUI:
Here are the screen shots of the specific details per node. I played around with replication speeds but that did not make a difference.
-Master Node-
-Storage Node 1-
-Storage Node 2-
-
@voison I’ve not fully looked through the photos - but I did notice the passwords for the storage nodes are extremely short. The FOG installer would not have set such short passwords, it sets really long ones. So my first thought is that the passwords are simply wrong, which means replication never happened, which means these nodes don’t have copies of the images, which means they won’t be chosen by the fog server. Have a look at this article: https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_FTP
AGAIN, just a hunch. Maybe the passwords are right, idk. -
Well, an interesting find. No the passwords were not correct. I checked both of the nodes (/opt/fog/.fogsettings) and their passwords did differ from the the passwords on the web gui. I tried updating the web gui however it has not fixed the problem, all unicast images are still being pulled from the Master Node. I do remember watching the replicator log for a while when I first created the nodes. So, that being said, I made a little comparison of all three /images/ directors on each server. The master node does have more that the other two, but these are just old images that we deleted out of FOG but not off of the disk. Everything in the directories on the storage servers are the same as what is found in the web gui. I would say that the replicator did run at some point, even with the differing passwords. Is there any kind of manual system refresh I need to do after updating the passwords to associate the storage nodes to the master node?
-
@voison The fog installer has mechanisms built into it to correct an incorrect password for the local account used and for the credentials stored in the DB for the node the installer is running on. The easiest way to correct this stuff is to ensure you have your desired password for a node set inside of
/opt/fog/.fogsettings
and just rerun the installer on that node. These mechanisms won’t work though if there’s a DB connectivity problem between the nodes and the DB - There’s troubleshooting steps for that here: https://wiki.fogproject.org/wiki/index.php?title=Troubleshoot_MySQL -
@wayne-workman Good call on the MySQL issue. The database on the Master Node would not allow a login for ‘fogstorage’ from anything except localhost. I fixed the database issues for both storage nodes, testing all three machines abilities to access the MySQL server with success. After this I looked over the .fogsettings file on both storage nodes and configured them accordingly, then ran the installer again. All was successful and nothing failed. I attempted another batch of images, and nothing changed. I then went to see what was happening with the replicator, just out of curiosity, and it seemed that the Master Node was trying to replicate something, but neither of the storage nodes seemed to recognize anything. While writing this, they have both since come back with “disabled replication” messages:
[01-30-18 11:20:31 pm] * Starting ImageReplicator Service
[01-30-18 11:20:31 pm] * Checking for new items every 600 seconds
[01-30-18 11:20:31 pm] * Starting service loop
[01-30-18 11:20:31 pm] * | This is not the master node
[01-30-18 11:30:31 pm] * | This is not the master node
[01-30-18 11:55:59 pm] * * Image replication is globally disabledFor a very short brief moment both of the servers were showing up on both the home page in the bandwidth graph as well as on the
web gui > Fog Configuration > Kernel Versions
; however they have now disappeared from both. I’m not sure if that helps with the problem but I figured I would report it.Do I need to re-run the installer on the master node as well?
-
@voison Replication activities are only ‘conducted’ from master nodes. Meaning the non-master nodes don’t really do anything other than receive data that the master sends. So all of the logs that matter are on your master node. If it says it’s replicating, just give it some time.
Here’s a reference thread also where someone posted both master and non-master logs. In the non-master log, you can see the same message ‘Image replication is globally disabled’ but in the master one you can see it’s going.
https://forums.fogproject.org/topic/10891/image-replication-not-working/10@voison said in Storage Nodes Not Providing Images:
For a very short brief moment both of the servers were showing up on both the home page in the bandwidth graph as well as on the web gui > Fog Configuration > Kernel Versions; however they have now disappeared from both.
When a lot of bandwidth is being taken by replication, the graphs sort of crap out. They are not the best/most resilient graphs really, they are only meant to give you ‘an idea’ of what’s happening. If you want to know exactly what the throughputs are, you should use a CLI tool like
iftop
or something. -
Thank you wayne, that is good to know about the replication scripts.
Sorry I disappeared for a bit, I had to go out of town for a little while. Unfortunately now that I have returned, there has been no change in the status of this issue. I have confirmed that the replicator ran perfectly fine, and the /images directory on all three servers are identical. What would you all suggest for me to try next? At this point I am starting to consider exporting all of the images to a hard drive and starting over from scratch.
The server is still running well, even while supporting as many unicast streams as it does. However I would like to try to solve this issue. I have gone back over this evening and checked on all of the passwords, thoroughly eliminating that as a possible culprit.
-
@voison If you have some time this weekend, I am willing to help you out via a screen share - PM me.