Rolling Upgrades
Sections in italics are optional.
Problem description
OpenStack operators often shy away from upgrading to the latest OpenStack
release due to concerns about the intrusiveness of upgrades. This prohibits
operators from realizing the complete value of their OpenStack cloud,
specifically their access to a constantly improving platform.
User Stories
- As a Cloud User, I want to experience a stable, regularly updated
OpenStack platform in order to utilize new features, bug fixes and
security enhancements, so that my cloud development experience is
consistently world-class.
- As a Cloud Operator, I want to provide my users a reliable and
available OpenStack platform so that they do not experience any data
plane or control plane downtime
- As a Cloud Operator, I want to have confidence in my ability to
perform an OpenStack cloud upgrade so that I can perform them on a
monthly basis
- As a Cloud Operator, I want to be able to roll back the most recent cloud
upgrade I initiate in the event of issues so that I can be confident
that even in the case of errors I will still avoid data plane or
control plane downtime
- As a Cloud Operator, I want to be able to define characteristics of
a rolling reboot of my data and control plane hosts so that my users
are not impacted by a rolling upgrade
Usage Scenarios Examples
- Successful upgrade
- Cloud Operator schedules OpenStack upgrade to latest release
- Cloud Operator can be assured that API contracts are backwards compatible
- Cloud Operator performs upgrade following simple documentation
- Cloud Operator notifies users of successful upgrade and new feature and
enhancement availability
- Cloud Operator schedules next upgrade for 1 month’s time to take
advantage of backports and security updates
- Unsuccessful upgrade
- Cloud Operator schedules OpenStack upgrade to latest 6 month release
- While performing upgrade Cloud Operator notices an unexpected error
- Cloud Operator rolls back the upgrade to a previously known, error-free
state
- Immediate Upgrade
- Cloud Operator is informed that a security vulnerability has been found
in an OpenStack service and a patch is available for the current release
- Cloud Operator schedules an upgrade to the newest update
- After successfully completed the Cloud Operator’s cloud is no longer
vulnerable
- Rolling Upgrade on Dataplane
- Cloud Operator schedules an OpenStack upgrade for a security
vulnerability which requires reboots of the entire dataplane hosts
- Cloud Operator initiates the upgrade and performs the reboots of the
dataplane hosts in an automated, configurable process
- Cloud Users are unaffected by the reboots
Opportunity/Justification
This is a large reason why enterprises fail to gain the full value of their
OpenStack cloud. Upgrades have never been easy and in many environments require
downtime of both the control and dataplane. This is an inherently un-cloudy
characteristic of the OpenStack platform. Fixing upgrades so would clear up
many concerns which limit OpenStack adoption today.
Gaps
Upgrades today require downtime in the data plane, network connectivity and often
control plane.
External References
None.
Glossary
Control Plane Hosts or infrastructure which operate OpenStack services
Data Plane Hosts or infrastructure which are managed by OpenStack services