Replication functionality needs to be added to the Redis datastore.
Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/redis-replication
At present, only single instances of Redis can be created. While useful, having multiple slaves that replicate off of a designated master is also desirable. This functionality will be addressed in this spec.
Redis replication is a very simple to use (and configure) master-slave replication. It allows Redis slaves to be exact copies of master servers. 
Redis replication has the following features:
To improve performance, persistence can be turned off on the master node. This however can lead to a loss of data if the system reboots and Redis starts automatically. For this reason, the master node will be required to have persistence enabled.
Creating a Redis replication network is handled by the Redis SLAVEOF command. A new instance (or set of instances) will be created and the SLAVEOF command executed on each one. Having Trove create a backup and restore it is not necessary, as Redis has this capability built into the SLAVEOF command. This means that the Redis replication strategy will need to bypass the creation of a backup to add to the snapshot info, and the taskmanager will need to be modified to handle this case.
Note: Redis replication could be enabled using the current backup/restore implementation, however once the slave restarts (or starts for the first time), it will automatically do a full resync, thus rendering the backup obsolete. _
Enough disk space must be available on the master node to allow Redis to persist its data to storage.
Note: Starting in version 2.8.18, Redis has the (experimental) ability to stream the backup directly to the slave. Since this behaviour is still considered experimental (in version 3.0), a specific version of Redis will not be required - beyond being >= 2.8 - as the feature could be removed in a future release. If it exists, however, it can be used by Redis to increase performance on systems with slow disks. A configuration parameter will be provided to allow operators to turn this feature on.
The Redis configuration file on each slave will have the corresponding values set so that subsequent starting of the database will maintain the slave status. As part of the slave configuration, all slaves will also be set to read-only. As with the MySQL implementation, slave-of-slave will not be allowed. The feature could be augmented to include this in the future.
The steps to create a replication network is as follows:
Create the necessary configuration file. This will have the following settings:
- slaveof <master_ip> <master_port>
- repl-diskless-sync-delay (if more than one slave is specified)
Create ‘n’ new slave instances with the correct configuration file
The current API for detach-replica will need to be implemented. No additions to the API are anticipated.
The current APIs for failover (both eject-replica-source and promote-to-replica-source) will need to be implemented. When ejecting the current replica source, a slave needs to be chosen as the new one. This will be done by overriding the _most_current_replica() method and having it query each slave and choose the one with the smallest value for ‘master_last_io_seconds_ago.’ This, presumably, will be the one with the most current data.
The default values for the following config options will need to be updated:
Existing Python bindings are sufficient, and no changes are anticipated.
Once these changes are implemented, the following Trove CLI commands will now be fully functional with Redis:
- create –replica_of <id> –replica_count <n>
The following files will need to be added to the guest agent, where the corresponding implementation will reside:
The following existing files will be updated:
guestagent/datastore/experimental/redis/manager.py guestagent/datastore/experimental/redis/service.py guestagent/datastore/experimental/redis/system.py
No backwards compatibility issues are anticipated.
No alternative solutions are proposed at this time.
No new tests are deemed to be required (beyond the requisite unit tests). The int-tests group for Redis will be modified to run replication-related commands during integration test runs.
Datastore specific documentation should be modified to indicate that replication is now supported by Redis, along with the corresponding detach/failover commands.