Update pacemaker and corosync infrastructure (Corosync 2.x)
A next iteration of Corosync & Pacemaker improvements required by
scaling requirements, better Pacemaker management and new OS support.
The current Pacemaker implementation has some limitations:
- Doesn’t allow to deploy a large amount of OpenStack Controllers
- Operations with CIB utilizes almost 100% of CPU on the Controller
- Corosync shutdown process takes a lot of time
- No support of new OSes as CentOS 7 or Ubuntu 14.04
- Current Fuel Architecture is limited to Corosync 1.x and
- Pacemaker service can be run only as a plugin for Corosync service.
We cannot restart pacemaker separately from the corosync and vice
- Fuel fork of corosync module contains a lots of tunings for parallel
deployment of controllers which cannot be contributed to the upstream
yet because of the huge diverge of the code base
- Support Fuel Controllers with Corosync 2.3.3 and Pacemaker 1.1.12
packages for Centos 6.5 and Ubuntu 14.04
- Run Pacemaker service separated from Corosync (ver:1)
- Get the puppet corosync module from puppetlabs and integrate it. That
would allow to install and configure Corosync cluster with Pacemaker
without additional reosurces for the code maintanance.
- Move all custom Fuel changes for corosync and pacemaker providers to
the separate pacemaker module. That would allow custom changes to not
interfere with the upstream code.
- Continue to develop and support Fuel fork of corosync module in order
to make it compatible with Corosync 2 without help from puppet
- Leave Corosync 1.x infrastructure as is
- Corosync 2.x is NOT compatible with previous versions of Corosync .
Please make sure to upgrade all nodes at once (full-downtime patching)
Other end user impact
- If Corosync service started/restarted, Pacemaker service should be
(re)started next as well. Otherwise, the inter-service communication
layer would be broken.
- Corosync service cannot be stopped gracefully prior to the Pacemaker
service. When shutting down, pacemaker service should be turned off
- Deployment process will be improved and will require less time as CIB
operations will not require 100% CPU time
- Corosync 2 has a lot of improvements that allow to have up to 100
Controllers. Corosync 1.0 scales up to 10-16 node
Other deployer impact
- All changes for custom pacemaker providers should go to the separate
- Any changes not related to the providers should be done for corosync
module and contributed to the upstream as well
- Replace Corosync 1.x infrastructure with Corosync 2.3.3 and
Pacemaker 1.1.12 at the staging mirrors
- Adapt puppet modules for corosync and pacemaker for Corosync 2.x
- Synchronize corosync manifest with puppetlabs as well
- Push staging mirrors to the public ones once manifests is ready
- Corosync 2.3.3 and Pacemaker 1.1.12 packages with dependency libraries
- Standard swarm testing are required.
- Manual HA testing is required.
- Rally testing is preffered but not mandatory.
- Openstack clouds deployed by Fuel are passing OSTF tests with
- High Availability guide should be reviewed. For Ubuntu, crm tool stays
as is, but documentation should be as well enhanced with pcs
equivivalents for Centos
- Upgrade/Patching impact should be described - corosync 2.x upgrading
assumes full downtime for cloud