More periodic tasks to slave for Juno¶
In the Icehouse development cycle we gave deployers the option to offload most reads from nova-compute periodic tasks to a DB replication slave. We will continue this work in Juno by “slaveifying” the rest of the periodic tasks where appropriate.
Currently the accepted way to scale the database for reads and writes in Nova is to do a multi-master setup or use some sort of database clustering. The problem with this approach is that while read scalability is potentially increased by making more hardware resources available (CPU, RAM, iops, etc). Write scalability is decreased and more operational complexity is inherited.
I would like to continue the work done in Icehouse by completing the “slaveification” of periodic tasks.
There are alternative ways to scale reads and writes both:
-Handling scaling within the application through some sort of sharding scheme. -Handle scaling at the DB level.
We have a sharding model, cells, in Nova currently. It could be argued that time would be better spent improving this approach rather than spending time trying to scale it using available DB technologies.
Data model impact¶
REST API impact¶
Other end user impact¶
No negative changes, hopefully this allows us to take some load off of a “write master” and offload them to a slave or slaves.
Other deployer impact¶
If a deployer changes the slave_connection configuration parameter in the database section it is assumed that they are accepting the behavior of having all reads from periodic tasks be sent to that connection. The deployer needs to be educated and aware of the implication of running a database replication slave and fetching actionable data from said slave. These include, but may not be limited to:
-Need for monitoring of the slave status -Operational staff familiar with maintenance of replication slaves -Possibility to operate on data that is slightly out of date
Developers should consider which reads might benefit from optionally using a slave handle. When new reads are introduced, consider the context in which the code is called. Will it matter if this code operates on possibly out of date data? Is the benefit of offloading reads greater than an inconvenience caused by acting on old data?
- Primary assignee:
- Other contributors:
Slaveify the following periodic tasks in nova/compute/manager.py
update_available_resource _run_pending_deletes _instance_usage_audit _poll_bandwidth_usage _poll_volume_usage
We will need to have an object for bw_usage, this is covered by https://blueprints.launchpad.net/nova/+spec/compute-manager-objects-juno
Currently there is no testing in Tempest for reads going to the alternate slave handle. We should add a replication slave to our test runs and test the periodic tasks with and without slave_connection enabled.
The operations guide should be updated and provide instructions with references to MySQL and Postgres documentation on setting up and maintaining slaves. We should also talk about HA possibilities with asynchronous slaves and various automation frameworks that deal with this problem. It would also be good to explain that while being able to specify a slave_connection is primarily a scaling feature, the ability to use it for availability purposes is there.
The original blueprint with code history and discussion: https://blueprints.launchpad.net/nova/+spec/db-slave-handle
The Icehouse blueprint: https://blueprints.launchpad.net/nova/+spec/periodic-tasks-to-db-slave