The Vertica database has elastic grow/shrink capabilities which are not currently supported by the Vertica guest agent for Trove.
Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/vertica-grow-shrink-cluster
The Vertica guest agent currently does not leverage the underlying elastic capabilities of Vertica. This will enable a user to grow a cluster in the event that they wish to accommodate more data or enable faster query performance, while scaling down helps avoid costs associated with overprovisioning.
As Vertica was architected from the ground up to be a clustered system, adding and removing nodes is relatively simple in comparison to other datastores.
A minimum k-safety configuration option will be added for vertica to allow the operator to decide their desired level of fault tolerance.
The following public API calls will be made available to the Vertica datastore.
Support for the following existing CLI calls.
No changes should be nessesary to accomplish these actions.
To enable more efficient grow and shrink, local data segmentation will be enabled on Vertica . This creates additional local, logical segments of data on a node to enable easier shipping of data between nodes. The number of local segments is configurable with the scaling factor variable. Local segmentation has the drawback of making tables with many hundreds of projections less efficient .
Growing a cluster involves two main steps .
First, a new “host” must be added to the cluster, which in the case of trove would mean a new instance. The update_vertica script is then called, similar to the install_vertica script, which handles installation of the vertica binaries.
Second, the host must be added as a node to the database. The adminTools utility is called with the db_add_node command to register the host with the database.
Removing a node from a Vertica cluster proceeds inversely to addition, with an extra check to ensure that the minimum k-safety level of the system is maintained.
If a user attempts to remove a node that would lower the k-safety level below the configured level, an error will be thrown.
After the k-safety check, the host is removed from the database . Similarly as with grow, the adminTools utility will be called using the db_remove_node command.
Then, the host to be removed is removed from the cluster, using the same update_vertica script but with the –remove-hosts option.
Vertica defines three K-safety levels for the number of nodes K that could fail while allowing the cluster to continue to operate: K=0 for clusters with 1 or 2 nodes, K=1 for clusters with 3 or 4 nodes, and K=2 for 5 or more  .
Rather than prevent a user from removing nodes that would result in a lower k-safety value, it is up to the operator to define a minimum level of safety she is willing to accept. For example, in some cases it may be that the costs associated with overprovisioning the cluster outweigh the risk of data being unavailable.
Trove could enforce a minimum k-safety level to ensure the integrity of the cluster, but this could be too restrictive.
Integration tests will be added or modified as needed in order to test grow/shrink with the new int-test framework.
The documentation should be updated to reflect the fact that grow and shrink is supported for Vertica clusters.
There will be some minor changes to the UI to support grow and shrink buttons for the cluster.
|||The Vertica documentation recommends local data segmentation be done with numbers of nodes that are a power of two. Some experimentation will be required to see what is whether violating this recommendation is still worthwhile compared to not using local data segmentation at all|