Redis Replication

Replication functionality needs to be added to the Redis datastore.

Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/redis-replication

Problem Description

At present, only single instances of Redis can be created. While useful, having multiple slaves that replicate off of a designated master is also desirable. This functionality will be addressed in this spec.

Proposed Change

Redis replication is a very simple to use (and configure) master-slave replication. It allows Redis slaves to be exact copies of master servers. [1]

Replication

Redis replication has the following features:

  • Asynchronous
  • Multiple slaves
  • Slave-of-slave
  • Non-blocking initialization of slaves

To improve performance, persistence can be turned off on the master node. This however can lead to a loss of data if the system reboots and Redis starts automatically. For this reason, the master node will be required to have persistence enabled.

Creating a Redis replication network is handled by the Redis SLAVEOF command. A new instance (or set of instances) will be created and the SLAVEOF command executed on each one. Having Trove create a backup and restore it is not necessary, as Redis has this capability built into the SLAVEOF command. This means that the Redis replication strategy will need to bypass the creation of a backup to add to the snapshot info, and the taskmanager will need to be modified to handle this case.

Note: Redis replication could be enabled using the current backup/restore
implementation, however once the slave restarts (or starts for the first
time), it will automatically do a full resync, thus rendering the backup
obsolete.  [1]_

Enough disk space must be available on the master node to allow Redis to persist its data to storage.

Note: Starting in version 2.8.18, Redis has the (experimental) ability to
stream the backup directly to the slave.  Since this behaviour is still
considered experimental (in version 3.0), a specific version of Redis will
not be required - beyond being >= 2.8 - as the feature could be removed in
a future release.  If it exists, however, it can be used by Redis to
increase performance on systems with slow disks.  A configuration parameter
will be provided to allow operators to turn this feature on.

The Redis configuration file on each slave will have the corresponding values set so that subsequent starting of the database will maintain the slave status. As part of the slave configuration, all slaves will also be set to read-only. As with the MySQL implementation, slave-of-slave will not be allowed. The feature could be augmented to include this in the future.

The steps to create a replication network is as follows:

  • Create the necessary configuration file. This will have the following settings:

    • slaveof <master_ip> <master_port>
    • slave-read-only
    • repl-diskless-sync-delay (if more than one slave is specified)
  • Create ‘n’ new slave instances with the correct configuration file

Detach Replica

The current API for detach-replica will need to be implemented. No additions to the API are anticipated.

Failover

The current APIs for failover (both eject-replica-source and promote-to-replica-source) will need to be implemented. When ejecting the current replica source, a slave needs to be chosen as the new one. This will be done by overriding the _most_current_replica() method and having it query each slave and choose the one with the smallest value for ‘master_last_io_seconds_ago.’ This, presumably, will be the one with the most current data.

Configuration

The default values for the following config options will need to be updated:

  • replication_strategy

Database

None

Public API

None

Public API Security

None

Python API

Existing Python bindings are sufficient, and no changes are anticipated.

CLI (python-troveclient)

Once these changes are implemented, the following Trove CLI commands will now be fully functional with Redis:

  • create –replica_of <id> –replica_count <n>
  • eject-replica-source
  • promote-to-replica-source
  • detach-replica

Internal API

None

Guest Agent

The following files will need to be added to the guest agent, where the corresponding implementation will reside:

guestagent/strategies/replication/experimental/redis_sync.py

The following existing files will be updated:

guestagent/datastore/experimental/redis/manager.py
guestagent/datastore/experimental/redis/service.py
guestagent/datastore/experimental/redis/system.py

No backwards compatibility issues are anticipated.

Alternatives

No alternative solutions are proposed at this time.

Implementation

Assignee(s)

Primary assignee:
peterstac

Milestones

Target Milestone for completion:
Liberty-2

Work Items

  • Create replication strategy for Redis.
  • Implement API calls for detach_replica, promote_to_replica_source and eject_replica_source.

Upgrade Implications

None

Dependencies

None

Testing

No new tests are deemed to be required (beyond the requisite unit tests). The int-tests group for Redis will be modified to run replication-related commands during integration test runs.

Documentation Impact

Datastore specific documentation should be modified to indicate that replication is now supported by Redis, along with the corresponding detach/failover commands.

References

[1]Redis Replication: http://redis.io/topics/replication