The Cassandra datastore currently does not support any backup/restore strategy.
The following Trove CLI commands (upon completion) will be fully functional with Cassandra:
- create –backup
We are implementing full backup using node snapshots following the procedure outlined in the Nodetool manual . Nodetool can take a snapshot of one or more keyspace(s). A snapshot can be restored by moving all *.db files from a snapshot directory to the respective keyspace overwriting any existing files.
When a snapshot is taken Cassandra starts saving all changes into new data files keeping the old ones at the same state as when the snapshot was taken. The data storage must have enough capacity to accommodate the backlog of all changes throughout the duration of the backup operation until the snapshots get cleaned up.
Backups are streamed to and from a remote storage as (TAR) archives. We now outline the general procedure for creating and restoring such an archive.
Unique backup IDs will be used for snapshot names, to avoid collisions between concurrent backups.
The Backup Procedure:
Make sure the database is up an running.
Clear any existing snapshots (nodetool clearsnapshot) with the same name as the created one.
Take a snapshot of all keyspaces (nodetool snapshot).
Collect all *.db files from the snapshot directories.
Package the snapshot files into a single TAR archive (compressing and/or encrypting as required) while streaming the output to Swift storage under the database_backups container.
Transform the paths such that the backup can be restored simply by extracting the archive right to an existing data directory. This is to ensure we can always restore an old backup even if the standard guest agent data directory changes.
Clear the created snapshots as in (1).
The Restore Procedure:
Instances are created as single-node clusters. A restored instance should therefore belong to its own cluster as well. The original cluster name property has to be reset to match the new unique ID of the restored instance. This is to ensure that the restored instance is a part of a new single-node cluster rather than forming a one with the original node or other instances restored from the same backup. Cluster name is stored in the database and is required to match the configuration value. Cassandra fails to start otherwise.
A general ‘cluster_name’ reset procedure is:
The ‘superuser’ (“root”) password stored in the system keyspace needs to be reset before we can start up with restored data.
A general password reset procedure is:
Unittests will be added to validate implemented functions and non-trivial codepaths. We do not implement functional tests as a part of this patch set.
The datastore documentation should be updated to reflect the enabled features. Also note the new configuration options - backup/restore namespaces and backup_strategy for Cassandra datastore.
|||Documentation on Nodetool utility for Cassandra 2.1: http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsNodetool_r.html|
|||Manual on Backup and Restore for Cassandra 2.1: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_restore_c.html|
|||Documentation on Cassandra 2.1: http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html|
|||Database and User Functions for Cassandra: https://blueprints.launchpad.net/trove/+spec/cassandra-database-user-functions|
|||Configuration Groups for Cassandra: https://blueprints.launchpad.net/trove/+spec/cassandra-configuration-groups|