Provide a rolling upgrade steps in keystone-manage to allow for zero downtime database upgrades.
OpenStack services are moving towards support for rolling upgrades, with today, nova, neutron and swift claiming such support. Other services are working on this (e.g. glance). See the References section for more information on support in other services. Prior to Newton, keystone did not claim any such support, but we have decided to add this.
First, it is important to state that there are actually two intertwined features when people talk about rolling upgrades;
In Newton, we plan to support both (1) and (2). We have already initiated support for rolling upgrades by only allowing additive changes to the database (this is protected via tests).
To support just the rolling upgrade part, keystone would allow (given we have only permitted additive database changes) the following sequence:
In order to support such rolling upgrades, some migration scripts cannot leave the database in the state they would ideally like. An example of this is where a new attribute is added which has a default which needs to be written manually by code (rather than a server default). If, during a rolling upgrade, new entities are created via un-migrated nodes, such a new attribute will not get the default. This causes a race condition between the process trying to clean up such data and the processes creating it, and requires that the new release of keystone understand both data schemas in order to function properly. See: Bug 1496500.
Support for zero downtime upgrades without service interruption would require keystone to support the gradual on-the-fly migration of data as they are modified, with support for a final manual “migrate anything left” action. Such on-the-fly support is implemented in the database object layer by Nova and Neutron, which have support for looking in both the old and new locations for data.
To solve that problem without writing and maintaining the necessary Python code, we can instead rely on database triggers. Database triggers would give us the ability to continue coding each release against a single schema, regardless of the actual schema underlying the application (which must at least be a superset of the schema the application knows about). More specifically, when the previous release writes to the database schema it knows about, database triggers would also update the new schema. When the next release writes to the database schema it knows about, database triggers would also update the old schema.
This approach allows us to eliminate:
It is proposed that we add new capabilities to keystone-manage which solve the existing rolling upgrade problem and fit within the context of the full rolling upgrades and zero downtime cross-project initiative.
There seems little or no commonality of the actual commands used by other services’ *-manage utility for the full zero downtime support. In order to see how our proposed initial support fits within this full support, here are the conceptual steps that will eventually be needed to upgrade a multi-node configuration running code release X to X+1, where, for example, X+1 moves data from one column to a new column:
For keystone in this release, it is proposed that we support all the above commands:
keystone-manage db_sync --expand: Expands the database schema by performing purely “additive” operations such as creating new columns, indexes, tables, and triggers. This can be run while all nodes are still on the X release.
The new schema will begin to be populated by triggers while the X release continues to write to the old schema.
keystone-manage db_sync --migrate: Will perform on-the-fly data migrations from old schema to new schema, while all nodes serving requests are still running the X release.
keystone-manage db_sync --contract: Removes any old schema and triggers from the database once all nodes are running X+1.
The keystone-manage db_sync command (without options) will still be supported, first to ensure that existing tooling and upgrade processes (which do not try to execute a rolling upgrade) will continue to operate, and second to provide a “force a database upgrade to completion” in case a deployer gets into problems with a rolling upgrade. Once a keystone-manage db_sync command (without options) is executed, however, nodes running old code are no longer supported. Running keystone-manage db_sync in this fashion will execute all the phases (including the contract phase, if it is safe to do so), and set the database migration status. This ensures subsequent rolling update attempts at the next release are possible.
In terms of implementation, the keystone-manage db_sync --expand, keystone-manage db_sync --migrate and keystone-manage db_sync --contract phases will be driven by sqlalchemy-migrate repositories.
The proposed approach is designed to support both deployers who are upgrading at major release cycles as well as those more closely tracking master.
One other aspect is that, in conjunction with services, we will not support rolling upgrades across 2 releases. For example, once you are running Newton, we will not support a rolling upgrade direct to the P release, you will need to go to Ocata first.
We could just use keystone-manage db_sync as the --expand step, but since this would still want to print a reminder to run additional commands, this would mean operators would not have a set of commands that did not print a warning (which doesn’t seem a good idea for production).
We could just use totally different keystone-manage commands, and not try to make this fit the general trend for now:
keystone-manage db_sync --initial-migration keystone-manage db_sync --complete-migration
Developers would need to be aware of the new database migration repositories, and the requirements for each of them.
Additional assignees: - Dolph Mathews (dolphm) - Dave Chen (davechen)
Operator guides will need to updated.