https://blueprints.launchpad.net/fuel/+spec/controller-maintenance-mode
Support for maintenance mode on controllers
First: | there is no unified way to obtain needed state across all operation systems (OS) used within fuel. There are different ways for each of them:
So we have to use different algorithms which depend on OS. Common for all mentioned operating systems single mode does not provide network access and has different behavior. |
---|---|
Second: | when we stop services or use switching runlevels for obtaining maintenance mode (MM), very often we have “garbage things” (zombie processes, locks, memory leaks, etc). It doesn’t let to do service staff properly. |
Third: | dependency and interaction between services. If we want to stop some service manually, we have to keep in mind all its dependencies and take care of them as well. |
Fourth: | similar mode in other OS, for example “Windows safe mode”, has mechanism for automatic enforcing MM mode, if we have unexpected emergency reboot. So we need the same, but because we usually don’t have access to console in one hand and in other hand automatic enforce of MM should accept some “emergency” reboots. |
Fifth: | some services have own “maintenance mode” such as corosync which let us do the same things. But they may do it in a different way than required by us, they may be absent in current cloud configuration. |
Sixth: | HA services may look node in MM like node in “fail state” because services on it don’t stop own work properly. |
We will create common procedure and unified interface for all operating system which are used by fuel. It let us enforce MM state and return into operational mode in a unified way for all operating systems. Under the hood it will be based on boot scripts and mechanisms which are specific for each operating system. It is possible that we will introduce some changes to these mechanisms to obtain proper set of services running in MM.
This procedure is not a service of openstack, but unification of recovery procedures across all OSs. It will give us the same user interface across all used systems.
It is suggested to create “umm” utility which will enforce maintenance mode on the system and resume normal operation.
Usage:
umm status - check mm status
umm on [command to execute in mm mode] - enforce MM mode [and execute
command when MM is reached]
umm off [reboot] - continue boot process [or reboot]
into operational mode.
umm enable - enable mm functionality
umm disable - disable mm functionality
To avoid “garbage things” described in the second problem, maintenance mode will be obtained only by reboot and subsequent pausing of the boot process on apropriate state and resuming it when we want switch back into operational state.
It lets us:
For that we will modify boot-shutdown mechanism and create state in which only network, ssh daemon and services which are needed for them are run.
We will modify current boot process for automatically enforce MM if system has some “unexpected” reboot during established time.
Full set of needed files will be provided as a single puppet class. Also we will provide UMM task for granular deployment in 6.1.
None
None
None
None
None
None
None
None
Primary assignee: | |
---|---|
Peter Zhurba | |
QA: | Veronika Krayneva |
Documentation: | Peter Zhurba, Dmitry Klenov |
Reviewer: | Vladimir Kuklin |
None
All actions are performed on the same controller. Once finished with these actions, move on to another controller
All actions are performed on the same controller. Once finished with all actions, move on to another controller
Operations Guide -> “Maintenance Mode” will be added.
Terminology Reference -> “Maintenance Mode” will be added.