Database strategy for rolling upgrades¶
https://blueprints.launchpad.net/glance/+spec/database-strategy-for-rolling-upgrades
This spec outlines a database modification strategy for Glance that will
facilitate zero-downtime rolling upgrades and make it possible for Glance
to assert the assert:supports-zero-downtime-upgrade
tag.
Problem description¶
In order to apply the assert:supports-zero-downtime-upgrade
tag [GVV2]
to Glance, the assert:supports-rolling-upgrade
tag [GVV1] must first be
asserted. It states that
The project has a defined plan that allows operators to roll out new code to subsets of services, eliminating the need to restart all services on new code simultaneously.
In order to assert the assert:supports-zero-downtime-upgrade
tag, Glance
must completely eliminate API downtime of the control plane during the upgrade.
A key issue for Glance in this regard is how to handle database changes
required for release N while release N-1 code is still running. We outline a
strategy by which this may be accomplished below.
Note
In what follows, we make the assumption that an operator interested in rolling upgrades will meet us halfway, that is, will be using an up-to-date version of the underlying DBMS that supports online schema changes that allow as much concurrency as possible.
Proposed change¶
We propose an expand and contract strategy to implement database changes in such a way that a rolling upgrade may be accomplished within the confines of a single release. Some OpenStack services, such as Cinder, that have addressed this problem have chosen to make database changes over a sequence of releases, but we believe that given the structure of Glance and typical usage patterns, database changes can be made and finalized within a single release. Such an approach is preferable for an open source project in which the cast of characters may change considerably from cycle to cycle.
We first present an overview of the upgrade strategy and then provide a detailed example of how this will work for a change that will be occurring in the Ocata release.
Overview¶
The following diagram depicts a typical upgrade of an OpenStack service. The older services are shutdown completely (mostly during a maintenance window), the new code is deployed and finally the new services are started. Obviously, this involves some downtime for the users. To minimize/eliminate downtime, the services can be upgraded in a rolling fashion, that is, upgrading few services at a time. This results in a situation where both old (say N-1) and new (say N) services must co-exist for a certain period of time. In a straightforward upgrade where the new services have no database changes associated with them, the services can co-exist right from the onset as both rely on the same schema.
| | | | | | | | | -----------------------+ | | | +--------------------- | | Deploy N | | N-1 | | (upgrade code | | N | | and/or config | | -----------------------+ | and migrate db) | +--------------------- | | | Stop N-1 | Start N Services | Services | | | | | | | | | | | | | | | | | | <----------------> Downtime <---------------->
However, in the presence of database changes it isn’t yet possible for the services to co-exist. The primary reason is the way we do database changes/migrations currently. A typical migration in Glance is an atomic change that includes both schema and corresponding data migrations together. While the schema migrations perform necessary additions/removals/modifications to the schema, data migrations perform corresponding changes to the data. This approach at times, depending on the nature of schema changes, is backwards-incompatible. That is, older services may not be able to run with new schema. This essentially limits the ability for old and new services to co-exist and consequently prohibits rolling upgrades.
To achieve rolling upgrades, database migrations need to be done in such a way
that both old and new services can co-exist over a period of time. A well known
strategy is to re-imagine database changes in expand and contract
style
instead of one atomic change. With the expand and contract style, we achieve
the desired schema changes in two distinct steps:
Expand: In the
expand
step, we makeonly additive changes
that are required by the new services. This keeps the schema intact for older services to run along with the new services. The typical schema changes that fall into this category are adding columns and tables.An exception to this additive-only change strategy is that it may be necessary to remove some constraints in order to allow database triggers (discussed below) to work.
Contract: All the other changes, that is
non-additive changes
, are grouped intocontract
step. Changes like removing a column, table and/or constraints are made in this step.Additionally, if any constraints were removed during the expand step, they are restored during the contract phase. Any database triggers installed during the expand phase are also removed at this point.
This breakup gives us the ability to perform the minimum required changes first (while keeping schema compatibility with old services) and delay the other changes until a later point in time. Therefore, we always first expand the database in order to start a rolling upgrade while the old services are still running. Once the database is expanded, the new columns and tables are created. However, they would be empty. At this point, we should start migrating the data over to the new column. But, at the same time, it is important to keep the new and old columns in sync. Any writes to old column must be sync-ed to the new column. And, vice-versa (although we don’t write to the new column yet, we have to keep the old column in sync when the new services come up and start writing to the new column while the services co-exist). We use database triggers to keep the columns in sync.
We add the triggers to the database along with the additive changes during the database expand. At this point, we start migrating the data over to the new column. However, because the old services are live at this point, we migrate the data in small batches to avoid excessive load on the database and thereby any interruption to the old services. These migrations can be scheduled to run during low-traffic hours to minimize impact on older services. Once the data migrations finish, the new column is populated and ready for use, we start deploying the new services.
We deploy services in small batches by taking some nodes out of rotation, wait for them to drain connections, upgrade the services and put the nodes back into rotation. It is during this period that old and new services co-exist. When the new services come up, they start reading from and writing to the new column. Any data written to the new column is synced over to the old column (by the triggers added during database expand) and available for older services to consume. Once all the older services are upgraded, it is now safe to contract the database. This ensures that we reach the desired state of database schema. We also drop the database triggers during the database contract because the old column would cease to exist and only the new column would be in use.
---------------------------------------+ | N-1 | | ---------------------------------------+ | | | | | | | | Finish | | | | | Data | Expand | Migrations| +---------------------------- Database | | | | & | | | | N Add | | | | Triggers| | | +---------------------------- | | | | | | | Start N | | | Start | Deploy | Contract | Data | | | Database | Migrations | | | & | | | | | Drop | | | | | Triggers | | | | Finish N | | | | | Deploy | | | | | | | | | | | | |
To summarize, as shown in the above diagram, we split the database migrations into schema and data migrations. The schema migrations can be additive or contractive in nature, or a combination of both. Additive schema migrations are run before the upgrade begins to prepare the database for new services while it is still usable by old services. (This phase is also called “database expand”.) During database expand, we also add triggers on old and new columns to keep them in sync. Once the database is expanded, we start migrating the data over to the new column in small batches. When the data migrations are complete, we upgrade the old services in a rolling fashion. Once all the old services are upgraded, we run the contractive migrations on the database. (This phase is also called “database contract”.) The triggers are also dropped during database contract.
In addition to the description of the process given above, here are a few constraints on how upgrades will work:
A typical upgrade is complete only when the entire expand-migrate-contract cycle for a release is performed. We do not propose to support an N-1 to N upgrade while an N-2 to N-1 upgrade is in progress.
“Leapfrogging” releases (that is, allowing a direct N-2 to N upgrade, skipping N-1) is not supported.
It’s possible that in a single release there may be multiple features, worked on independently by different developers, that will require some kind of database modification. What we are proposing in this spec is that for each release, there will be a single expand-migrate-contract operation from the operator’s point of view. In other words, all feature teams will have to coordinate so that all expands are performed, followed by all migrations, and concluding with all contractions. This will be easy for features whose changes are completely independent, but may be more difficult for others. However, preserving zero-downtime database changes will be a Glance project priority once this spec has been approved, so such interactions will be addressed on the specs for features.
Note
The current Glance spec template asks this question in the “Data model impact” section:
What database migrations will accompany this change?
This should be modified along the following lines. (Note: this is only a suggestion, we can argue about the best wording on the patch that modifies the spec template.)
Glance is committed to zero-downtime database migrations. Explain what database migrations will accompany this change. Do they have the potential to interfere with the database migrations for other specs that have been approved for this cycle?
Keep in mind that our goal here is to achieve the upgrade tags. While it’s not prohibited to exceed them, they do specify a baseline for achievement that’s been adopted by the OpenStack community. Hence simply meeting the requirements for the tags is a worthwhile goal.
Steps¶
Let’s look at a rolling-upgrade strategy for Glance in more detail. Consider the case where a database change is made such that data stored in “the old column” in release N-1 will be stored in “the new column” in release N. The following are steps that we take to achieve rolling-upgrade.
Expand Database¶
Goal: Prepare database for N by expanding the database
As shown in the below diagram, initially, we have N-1 reading from and writing to the old column. We then expand the schema for N which introduces the new column while N-1 is still running. This should have minimal to no impact on N-1 services.
Note
It is important to note that while database expand operations are required to be strictly additive in nature, adding constraints can sometimes be disruptive as they are known to lock the table. This is alleviated by online DDL capabilities in MySQL 5.6 for InnoDB. So, simple changes may not be a concern. (In any case, the plan proposed in this spec adds constraints during the contract phase only.)
----------------------------------------------------------- N-1 -------+-----------------------+--------------------------- | | | | | | Read/Write | Read/Write | | Expand N | | | & | Start +----v-----+ Add +----v-----+----------+ Data | Old | Triggers | Old | New | Migrations +----------+ | +---------------------+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +----------+ | +----------+----------+ | | ^ ^ | | | Triggers | | | +-------------+ | | | | <--------------------------------- > | | Expand Database | | <--------------------------------- > |
While expanding the database, we also add triggers that keep the old and new columns in sync.
Deliverable: We propose to make this available by extending the glance-manage
utility with an expand command. The database could be expanded by running
glance-manage db expand
.
Migrate Data¶
Goal: Populate the new column(s) for N to use
At this point, only release N code is running and it continues to read from and write to the old column as shown in the below diagram. All writes made by N-1 to the old column are synced with the new column. While the triggers slowly start populating the new column, we commence the background data migrations to populate data into the new column in a non-intrusive manner.
------------------------------------------------- N-1 -----------------+------------------------------- | | | | | Read/Write | | | | Start | Finish Data +----v-----+----------+ Data Migrations | Old | New | Migrations | +---------------------+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +----------+----------+ | | ^ ^ | | | Triggers | | | +-------------+ | | | | <--------------------------------- > | | Migrate Data | | <--------------------------------- > |
Deliverable: We propose to extend the glance-manage utility to migrate the data
in batches. The batch size could be controlled with an optional parameter, for
example, max_rows
. The parameter would allow operators to schedule
migrations of no more than N rows at a time, in case they have a large database
and want to run the migration only during off-peak times. Without the optional
parameter, all rows would be migrated. The utility will return an appropriate
response if it’s run and it finds that there are no rows that need to be
migrated.
For example: glance-manage db migrate --max_rows=10
.
Deploy¶
- Goal: Deploy N by upgrading N-1 in a rolling fashion and have both versions
co-exist during the deploy
As the new column is now ready to use, we start deploying N in small batches. Release N-1 has no idea that an upgrade is occurring, but the release N code is co-existing with N-1 services as shown in the below diagram. While N-1 and N services are using the old and new column respectively, the triggers are keeping the two columns in sync whenever there is a database write. This enables N to see the updates made by N-1 and vice-versa.
------------------------------------+ | N-1 | | --------------+---------------------+ | | | | | | | +------------------------------ | | | | | | N | | | | | +-------+---------------------- | | | | Read/ Read/ | | Write Write | | | | | | | | | | +---v------+----v-----+ Finish N | | Old | New | Deploy Start N +---------------------+ | Deploy | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +----------+----------+ | | ^ ^ | | | Triggers | | | +-------------+ | | | | | | <----------------------------> | | Deploy | | <----------------------------> | | |
Note
As the N-1 and N services co-exist, users may notice inconsistent behavior in certain situations. Typically, a new release is backwards-compatible with the previous release. As such, all requests should exhibit similar behavior across both versions. However, some changes to the API (for example: bug fixes) may result in different behavior across two releases. So, a user may witness different responses to similar requests depending upon which service processes the request.
Similarly, user requests for new features introduced with the new version, may fail when they are processed by an older service. While this inconsistency is not desirable, it could be seen as a decent compromise over incurring downtime during the upgrade.
Contract Database¶
Goal: Complete the schema migrations desired in N
When all the services are upgraded, the old column is unused. The new column would be the source of truth henceforth. The old column is ready for removal. At this point, we contract the database, which drops the old column and triggers added during database expand.
| | ------------------------------------------- N -----------------------+------------------- | | Read/ | Write | | | | Contract +---v----+ Database | New | & +--------+ Drop | | Triggers | | | | | | | | | | | | | | | | | | | | | | | | | | | ---------+ | | | <-----------------------> | Contract Database | <-----------------------> |
Note
In addition to removing the unused columns and tables, SQL constraints such as nullability, unique and default must be set here.
Deliverable: We propose to make this available by extending the glance-manage
utility with a contract command. The database could be contracted by running
glance-manage db contract
.
Rolling Upgrade Process for Operators¶
Following is the process to upgrade Glance with zero downtime:
Backup Glance database.
Choose an arbitrary Glance node or provision a new node to install the new release. If an existing Glance node is chosen, gracefully stop the Glance services.
Upgrade the above chosen node with new release and update the configuration accordingly. However, Glance services MUST NOT be started just yet.
Using the upgraded node, expand the database using the command
glance-manage db expand
.Then, schedule the data migrations using the command
glance-manage db migrate --max_rows=<max. row count>
.Data migrations must be scheduled to run until no more rows are left to migrate.
Start the Glance processes on the first node.
Taking one node at a time from the remaining nodes, stop the Glance processes, upgrade to the new release (and corresponding configuration) and start the Glance processes.
Note
Before stopping the Glance processes on a node, one may choose to wait until all the existing connections drain out. This could be achieved by taking the node out of rotation. This way all the requests that are currently being processed will get a chance to finish processing. However, some Glance requests like uploading and downloading images may last a long time. This increases the wait time to drain out all connections and consequently the time to upgrade Glance completely. On the other hand, stopping the Glance services before the connections drain out will present the user with errors. This can at times be seen as downtime as well. Hence, an operator must be judicious when stopping the services.
Contract the database by running the command
glance manage db contract
from any one of the nodes.
Example¶
To understand how this would work in action, consider the following example of a Glance database change proposed for Ocata.
Note
This does not prescribe the actual Ocata database change. It is included here as a realistic example for a sanity check of this proposal.
The “old column”: Newton (release N-1): boolean is_public
column in images
table. This column has nullable=False and default=False.
The “new column”: Ocata (release N) : enum (or string … key point is it’s a
different data type) visibility
column in images table. This column can
have one of the values ‘public’, ‘private’, ‘shared’, ‘community’. After the
database contraction has completed, this column would have
nullable=False, and default=’private’. (During the migrate and deploy phases,
this column will probably have nullable=true with no default.)
Using the proposed strategy, the database upgrade would proceed as follows.
Pre-upgrade: Version N-1 code read/write to
is_public
.Expand Database: Add the
visibility
column and the appropriate triggers to keep the old and new values in sync.Migrate Data: Crawl the ‘images’ table. For any row where
visibility
is null, set the value forvisibility
as follows:If
is_public
is ‘1’: set visibility topublic
If
is_public
is ‘0’: if the image has any members, set visibility toshared
; otherwise set visibility toprivate
Migrate any data (using triggers) written by N-1 code to the old column using the above criteria.
Deploy: Deploy N code in a rolling fashion. N code will start using the
visibility
column.Here’s an analysis of database activity.
Write operations
The v1 API
nothing to worry about, has no concept of
visibility
The v2 API
visibility
set topublic
N-1: will put ‘1’ in
is_public
* Triggers will put ‘public’ invisibility
N: will put ‘public’ in
visibility
* Triggers will put ‘1’ inis_public
visibility
set toprivate
N-1: will put ‘0’ in
is_public
* Triggers will put ‘private’ invisibility
N: will put ‘private’ in
visibility
* Triggers will put ‘0’ inis_public
visibility
set tocommunity
N-1: call will fail at API level, will never hit the database
N: will put ‘community’ in
visibility
* Triggers will put ‘0’ inis_public
Note
This essentially means that a community image will be considered as private image by N-1. Thus, barring the owner, a community image won’t be visible to any one. Since N-1 has no notion of community images, this behavior can be seen as consistent with respect to N-1. However, it may be confusing the owner of the community image for whom the image will appear as community with N and private with N-1. Thus, the owner may try to change the visibility again. To discourage this, we may prohibit any writes to the
is_public
column when it has ‘0’ andvisibility
column hascommunity
. This could be done again by using the same triggers that we added during database expand. The first alternative mentioned in the alternatives section avoids this situation.
visibility
set toshared
N-1: call will fail at API level, will never hit the database
N: will write ‘shared’ in
visibility
Triggers will write ‘0’ in
is_public
; this will allow image sharing to continue to work properly on the release N-1 nodes as well as the v1 API on all nodes.
Read operations
Read operations across API versions and releases should remain unaffected as the triggers keep both old and new columns in sync by translating the data appropriately.
Contract Database: The only API nodes running are version N. The
is_public
column is no longer in use. Dropis_public
and add nullable=True and default=private onvisibility
column.
Alternatives¶
This is a small variant of the above described strategy. The fundamental idea behind the above described strategy is: When both versions co-exist, sync the writes made by one set of services to be available for the others to consume. We achieve this using triggers. On the other hand, what if we eliminate the need to sync? That is, what if we disallow any writes to both old and the new columns while the services co-exist? This can be achieved by using triggers again. Essentially, the triggers we add during the database expand step will intercept and disallow writes to the old and new columns.
For the above given example, all requests attempting to change the visibility of image will fail for the duration of deploy step where the services co-exist. Reads would be permitted, however. Once the deploy is finished and we contract the database (the triggers are dropped here), writes to new column would be permitted as usual. This gives us a way to eliminate the need for syncing data across columns. Consequently, there is much less complexity in the triggers and the upgrade is less error-prone. However, it is important to note that one may see an increased error rate during the deploy due to disallowed writes. Although the API will be responsive throughout this entire period (and hence “up”), the increased rate of 5xx responses will make it impossible to assert the
assert:zero-downtime-upgrade
tag [GVV2]. Since being able to assert this tag is the aim of this spec, this alternative is not acceptable.A well known alternative replaces the use of triggers by migrating the data online from within the application. While the triggers approach migrates the data online on a database write operation, the other approach attempts to migrate the data on an on-demand basis in the event of a database read operation.
Approaches taken by other Openstack Projects:
Nova: See [NOV1].
Cinder: See [CIN1].
Keystone: See [KEY1].
Data model impact¶
None
REST API impact¶
There would be no impact to REST API contracts as such.
Security impact¶
None
Notifications impact¶
None
Other end user impact¶
None
Performance Impact¶
The background data migration will consume extra database resources, but this can be managed if the migration script is carefully written.
Other deployer impact¶
Deployers who intend to deploy Glance the old way, that is with downtime, remain unaffected.
Each step of the migration requires operator intervention.
Developer impact¶
Any developer working on a feature that requires database changes must write additional code to support the rolling upgrade strategy outlined in this document. By confining the database changes to a single release, however, developers of release N+1 do not have to worry about completing procedures begun during the migration of release N-1 to N.
Implementation¶
Assignee(s)¶
- Primary assignee:
alex_bash hemanthm
- Other contributors:
nikhil
Work Items¶
Write documentation for rolling upgrade (developer docs).
Write documentation for rolling upgrade (operator docs).
Introduce expand/contract migration streams and the corresponding glance-manage CLI.
Work with the developers of Ocata features that require database changes to implement code to follow the rolling upgrade strategy. These include:
community images and enhanced image sharing
image import
Dependencies¶
None
Testing¶
In order to assert the rolling upgrades tag, Glance must have full stack integration testing with a service arrangement that is representative of a meaningful-to-operators rolling upgrade scenario.
Ideally these tests will be able to simulate Glance running at scale, since, as discussed above, some DBMS problems may not be revealed in a small test database.
Documentation Impact¶
Developer documentation: the upgrade strategy.
Operator documentation:
configuration options for putting the code into the various modes
running the database scripts
References¶
https://governance.openstack.org/reference/tags/assert_supports-zero-downtime-upgrade.html
https://specs.openstack.org/openstack/keystone-specs/specs/mitaka/online-schema-migration.html