More periodic tasks to slave for Juno

https://blueprints.launchpad.net/nova/+spec/juno-slaveification

In the Icehouse development cycle we gave deployers the option to offload most reads from nova-compute periodic tasks to a DB replication slave. We will continue this work in Juno by “slaveifying” the rest of the periodic tasks where appropriate.

Problem description

Currently the accepted way to scale the database for reads and writes in Nova is to do a multi-master setup or use some sort of database clustering. The problem with this approach is that while read scalability is potentially increased by making more hardware resources available (CPU, RAM, iops, etc). Write scalability is decreased and more operational complexity is inherited.

Proposed change

I would like to continue the work done in Icehouse by completing the “slaveification” of periodic tasks.

Alternatives

There are alternative ways to scale reads and writes both:

-Handling scaling within the application through some sort of sharding scheme. -Handle scaling at the DB level.

We have a sharding model, cells, in Nova currently. It could be argued that time would be better spent improving this approach rather than spending time trying to scale it using available DB technologies.

Data model impact

None

REST API impact

None

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

No negative changes, hopefully this allows us to take some load off of a “write master” and offload them to a slave or slaves.

Other deployer impact

If a deployer changes the slave_connection configuration parameter in the database section it is assumed that they are accepting the behavior of having all reads from periodic tasks be sent to that connection. The deployer needs to be educated and aware of the implication of running a database replication slave and fetching actionable data from said slave. These include, but may not be limited to:

-Need for monitoring of the slave status -Operational staff familiar with maintenance of replication slaves -Possibility to operate on data that is slightly out of date

See https://wiki.openstack.org/wiki/Slave_usage

Developer impact

Developers should consider which reads might benefit from optionally using a slave handle. When new reads are introduced, consider the context in which the code is called. Will it matter if this code operates on possibly out of date data? Is the benefit of offloading reads greater than an inconvenience caused by acting on old data?

Implementation

Assignee(s)

Primary assignee:

<geekinutah>

Other contributors:

<None>

Work Items

Slaveify the following periodic tasks in nova/compute/manager.py

update_available_resource _run_pending_deletes _instance_usage_audit _poll_bandwidth_usage _poll_volume_usage

Dependencies

We will need to have an object for bw_usage, this is covered by https://blueprints.launchpad.net/nova/+spec/compute-manager-objects-juno

Testing

Currently there is no testing in Tempest for reads going to the alternate slave handle. We should add a replication slave to our test runs and test the periodic tasks with and without slave_connection enabled.

Documentation Impact

The operations guide should be updated and provide instructions with references to MySQL and Postgres documentation on setting up and maintaining slaves. We should also talk about HA possibilities with asynchronous slaves and various automation frameworks that deal with this problem. It would also be good to explain that while being able to specify a slave_connection is primarily a scaling feature, the ability to use it for availability purposes is there.

References

https://wiki.openstack.org/wiki/Slave_usage

The original blueprint with code history and discussion: https://blueprints.launchpad.net/nova/+spec/db-slave-handle

The Icehouse blueprint: https://blueprints.launchpad.net/nova/+spec/periodic-tasks-to-db-slave