Update pacemaker and corosync infrastructure (Corosync 2.x)
https://blueprints.launchpad.net/fuel/+spec/corosync-2
A next iteration of Corosync & Pacemaker improvements required by
scaling requirements, better Pacemaker management and new OS support.
Problem description
The current Pacemaker implementation has some limitations:
- Doesn’t allow to deploy a large amount of OpenStack Controllers
- Operations with CIB utilizes almost 100% of CPU on the Controller
- Corosync shutdown process takes a lot of time
- No support of new OSes as CentOS 7 or Ubuntu 14.04
- Current Fuel Architecture is limited to Corosync 1.x and
Pacemaker 1.x
- Pacemaker service can be run only as a plugin for Corosync service.
We cannot restart pacemaker separately from the corosync and vice
versa.
- Fuel fork of corosync module contains a lots of tunings for parallel
deployment of controllers which cannot be contributed to the upstream
yet because of the huge diverge of the code base
Proposed change
- Support Fuel Controllers with Corosync 2.3.3 and Pacemaker 1.1.12
packages for Centos 6.5 and Ubuntu 14.04
- Run Pacemaker service separated from Corosync (ver:1)
- Get the puppet corosync module from puppetlabs and integrate it. That
would allow to install and configure Corosync cluster with Pacemaker
without additional reosurces for the code maintanance.
- Move all custom Fuel changes for corosync and pacemaker providers to
the separate pacemaker module. That would allow custom changes to not
interfere with the upstream code.
Alternatives
- Continue to develop and support Fuel fork of corosync module in order
to make it compatible with Corosync 2 without help from puppet
community
- Leave Corosync 1.x infrastructure as is
Upgrade impact
- Corosync 2.x is NOT compatible with previous versions of Corosync [0].
Please make sure to upgrade all nodes at once (full-downtime patching)
Notifications impact
None
Other end user impact
- If Corosync service started/restarted, Pacemaker service should be
(re)started next as well. Otherwise, the inter-service communication
layer would be broken.
- Corosync service cannot be stopped gracefully prior to the Pacemaker
service. When shutting down, pacemaker service should be turned off
first.
Other deployer impact
None
Developer impact
- All changes for custom pacemaker providers should go to the separate
pacemaker module.
- Any changes not related to the providers should be done for corosync
module and contributed to the upstream as well
Implementation
Work Items
- Replace Corosync 1.x infrastructure with Corosync 2.3.3 and
Pacemaker 1.1.12 at the staging mirrors
- Adapt puppet modules for corosync and pacemaker for Corosync 2.x
- Synchronize corosync manifest with puppetlabs as well
- Push staging mirrors to the public ones once manifests is ready
Dependencies
- Corosync 2.3.3 and Pacemaker 1.1.12 packages with dependency libraries
Testing
- Standard swarm testing are required.
- Manual HA testing is required.
- Rally testing is preffered but not mandatory.
Acceptance criteria
- Openstack clouds deployed by Fuel are passing OSTF tests with
Corosync 2.
Documentation Impact
- High Availability guide should be reviewed. For Ubuntu, crm tool stays
as is, but documentation should be as well enhanced with pcs
equivivalents for Centos
- Upgrade/Patching impact should be described - corosync 2.x upgrading
assumes full downtime for cloud
References