Supporting OpenStack cloud environment beyond single major release requires that the deployment automation engine that manages the environment allows to upgrade OpenStack control plane and data plane software between major releases.
The new features of OpenStack must be made available to users of Fuel with minimal impact on their workloads, i.e. virtual machines, connected virtual resources and applications that run on top of this infrastructure.
The upgrade of OpenStack environment involves upgrade of the following components (see diagrams below):
These requirements define functional aspect of the solution. The proposed procedure upgrades an environment running OpenStack and installed by Fuel from 6.x to 7.0 release and meet the following criteria:
No in-place upgrade is supported in 7.0 release. Every host must be re-installed from scratch during the procedure. In-place upgrade will be implemented in the future.
Only core OpenStack services are upgraded:
Upgrade of OpenStack services assumes that OpenStack APIs will be made read-only for the period of the upgrade procedure.
The following requirements define characteristics of the solution.
Downtime of storage, network and compute resources due the upgrade procedure must be kept at minimum through leverage of live migration techniques where possible.
Upgrade solution must work on reference architectures that include the following components:
- High availability architecture (including Galera MySQL, HAProxy and Corosync/Pacemaker)
- Ubuntu operating system
- KVM hypervisor
- Neutron networking manager with OVS+VLAN plugin
- Cinder virtual block storage volumes
- Ceph shared storage for volumes and ephemeral data
- Ceph shared storage for images and object store
Upgrade solution must not require from users to provide more than 3 hardware servers in addition to servers already existing in their environment.
We propose to develop and implement a solution that allows to upgrade an OpenStack environment in Fuel from version 2014.2.2-6.1 to version 2015.1-7.0. This solution will rely on certain functions of the Fuel installer, and will have external component that orchestrates the upgrade process.
This proposal only covers external upgrade orchestration script. Implementation of functions of Fuel installer used by this script are out of scope of this proposal.
Upgrade strategy implemented in the proposed upgrade script involves installation of new Controllers side by side with the ones being upgraded. Resource nodes are redirected to the new Controllers and eventually upgraded with minimal move of data. Under Resource nodes we understand nodes with Compute and/or Storage roles. Resource nodes are upgraded by reinstallation on the same hardware, keeping user data intact on storage devices separated from Operating System boot device on the node. Reinstallation is carried out by the Fuel installer.
The reason to have external script that performs operations outlined above is that it has to orchestrate at least 2 OpenStack environments: the original one picked for upgrade and the new one, upgraded. Fuel currently can only handle a single environment at a time. It doesn’t have a component that can orchestrate multiple environments.
The proposed solution to the upgrades problem includes the following general steps described below in more details:
We propose to develop script called octane that will facilitate stages of upgrade procedure outlined above. Every step is implemented as a subcommand to the main script:
If upgraded environment does not work for some reason, user might revert the procedure by changing back to the original controllers. In 7.0, script will not support the full revert procedure. Documentation will describe the path for the recovery from disaster. The path is generally as follows:
The side-by-side strategy of upgrade of a cloud has an alternative of fully in-place solution. In that case, no data nor metadata are moved in the cloud. All software components are updated on the same set of hardware. Metadata is converted into format of the new version. Data remain where it were.
This type of upgrade, in theory, must be more seamless then side-by-side variant. However, in complex architectures like Fuel HA Reference Architecture, multiple components that interact with each other make it extremely difficult. Various race conditions in upgrade flow can cause severe interruptions to the virtual infrastructure and workloads running on top of it.
Potential solution to this problem (to be researched in the future releases) is to use containers for all OpenStack and platform services in the cluster.
The eventual goal of upgrade user story in Fuel is to make it possible to upgrade OpenStack control plane and data plane in-place without interruption of virtual resources and end user’s workloads.
Proposed procedure imposes limitations on supported network architectures. By default, the Neutron VLAN plugin is supported as the most widely used network manager plugin. However, it is possible to upgrade clusters deployed with other network managers using Fuel Plugins. Upgradeability of the Fuel plugins is beyond the scope of this proposal.
Upgrade script itself does not require any changes in Fuel or OpenStack data models. Accompanying proposals for new functions in Fuel that the upgrade script uses, on the other hand, might have impact on data models. That impact is described in the corresponding specifications.
Upgrade script doesn’t have an impact on REST API. Supporting features proposed to Fuel might have such an impact. This is described in corresponding specifications in more details.
This change implements the upgrade process as an external script that orchestrates 2 OpenStack environments: original and new version.
Proposed solution depends on the ability to upgrade the Fuel Master node. Before upgrading cluster, user needs to upgrade the Fuel master node. It will allow the user to create an Upgrade Seed environment with 2015.1-7.0 release version and install Controller nodes which will be used in upgraded OpenStack cloud. It also gives an ability to upgrade Compute nodes by installing them with 2015.1-7.0 version of OpenStack.
Upgrade is a high-risk procedure from security standpoint. It requires administrative access to both environments involved in upgrade.
End users of upgrade script are cloud operators wanting to upgrade their clouds. This proposal introduces a new CLI tool for them that guides them through the upgrade procedure.
Users of the cloud are impacted by this procedure. During the upgrade, cloud APIs are in maintenance mode and inaccessible, so users can’t provision resources.
Existing virtual machines in the cloud might experience temporary network disruptions in the course of the upgrade procedure due to restarting of OpenStack virtual networking. Live migration used in upgrade of Compute nodes might cause virtual machines to be suspended for short periods of time.
Performance of existing virtual resources might be impacted by the upgrade procedure. Upgrade of Ceph OSD nodes involves reboot, and that may lead to degraded performance of storage provided to virtual machines.
To upgrade the environment installed using one or more plugins, the following requirements must be satisifed:
Proposed script can be packaged as a Python application and distributed with Fuel as a part of Fuel repository, or separately via Python package management system (pip)
This change will require the whole Upgrade CI infrastructure to be built. This script must be run against any changes that are being backported to 7.0 branch.
This is an overview of architecture of the upgrade script and how the things work with each other during the procedure.
Fuel API allows to manage a single environment and perform operations on nodes in the environment. Side-by-side upgrade concept implies that some operations have to be performed on more than one environment at a time. This logic doesn’t belong to Fuel API and must be implemented as an outside script.
Testing of the script itself will require upgrading the Fuel Master node during the integration/system test run:
Documentation for the upgrade script must be integrated into Operations Guide. It must replace the description of the experimental manual upgrade procedure from 5.1.1 to 6.x.