Improve Corosync and Pacemaker management
A next iteration of Corosync & Pacemaker improvements required by scaling
requirements, better Pacemaker management and new OS support.
The current Pacemaker implementation has some limitations:
- Doesn’t allow to deploy a large amount of OpenStack Controllers
- Operations with CIB utilizes almost 100% of CPU on the Controller
- Corosync shutdown process takes a lot of time
- No support of new OSes as CentOS 7 or Ubuntu 14.04
- Current Fuel Architecture is limited to Corosync 1.x and Pacemaker 1.x
- Puppet service provider for pacemaker doesn’t disable Upstart or SystemV
services by default
- At current implementation ordering between resources is not specified
- Diff operations against Corosync CIB require to save data to file rather
than keep all data in memory
- Debug process of OCF scripts is not unified requires a lot of actions from
- Not granular enough
- Openstack services are not managed by Pacemaker
- Compute nodes aren’t in Pacemaker cluster, hence, are lacking a viable
control plane for their’s compute/nova services.
- Support Fuel Controllers with Corosync 2.0 packages
- Get the puppet corosync module from puppetlabs and integrate it
- Rename OCF resources. Remove __old from resource names
- Refactor service provider and include disabling of the same services under
- Refactor provider and remove diff operation from files
- Add wrapper handler for OCF scripts or unify debug handling of OCF scripts
- Move pacemaker & corosync installation to own stage. Create own corosync.pp
to make it more granular
- Add all openstack services to pacemaker and make ordering
- Use monit as compute nodes’ services additional control plane
All changes are not critical and doesn’t affect deployment or Cluster
- Since Resources will be renamed Upgrade process should delete old resources
on upgrade and delete new resource names on roll back.
- Corosync 2.x is NOT compatible with previous versions of Corosync (1.3/1.4).
Please make sure to upgrade all nodes at once (full-downtime patching)
Other end user impact
- Deployment process will be improved and will require less time as CIB
operations will not require 100% CPU time
- Corosync 2.0 has a lot of improvements that allow to have up to 100
Controllers. Corosync 1.0 scales up to 10-16 node
Other deployer impact
- Enchanced pacemaker provider requires some refactoring of puppet manifests
in Fuel Library manifests:
- Upstream corosync manifests will replace our in-memory diff invention to
standard approach: crm or pcs or cibadmin –patch ‘<xml patch>’ directly.
- Renaming vip primitives could require additional orchestration refactoring
- New Pacemaker/monit control plane for Openstack services would require
appropriate changes in manifests as well.
- Replace Corosync 1.0 with Corosync 2.0
- Synchronize corosync manifest with puppetlabs
- Refactor puppet service core provider. It should:
- Disable systemd/upstart/system V when corosync system
provider is enabled
- Redesing puppet manifests to start all OCF scripts via
- Add openstack services to Pacemaker
- Configure ordering between services in Pacemaker
- Configure monit for compute nodes’ Openstack services
- Corosync 2.x packages
- Monit packages
- Standard swarm testing are required.
- Manual HA testing is required.
- Rally testing is preffered but not mandatory.
- New control plane for Openstack services requires manual testing.
- New debug wrappers for OCF require manual testing.
- Openstack clouds deployed by Fuel are passing OSTF tests with
Corosync 2.0 and new Pacemaker/monit control plane for services,
- Debug wrappers for OCF do produce enough information but aren’t too
verbouse as well.
- VIP resources do not contain an _old postfix in their names.
- Upstart/system V control plane is disabled for services managed via
- High Availability guide should be reviewed. For Ubuntu, crm tool stays
as is, but documentation should be as well enhanced with pcs
equivivalents for Centos
- Upgrade/Patching impact should be described - corosync 2.0 upgrading
assumes full downtime for cloud
- Changes to OCF debugging approach with bash wrappers should be described
- Renaming of VIP resources should be mentioned
- In case of Openstack services become managed by Pacemaker + monit, related
changes for their new control plane should be described