Support Watcher HA active-active Mode

https://blueprints.launchpad.net/watcher/+spec/support-watcher-ha-active-active-mode

Watcher Decision Engine and Applier currently don’t support HA active-active mode.

Problem description

There is often more than one controller in the real environment. Watcher Decision Engine and Applier are deployed on the controller. Now there is only one active Watcher-decision-engine or Applier. Otherwise Watcher Decision Engines will make duplicate actionplans for CONTINUOUS audit. And we don’t know which Applier is doing the actionplan. Another problem is how to sync Data Models of Decision Engines to each other. Now Decision Engine updates its Data Model based on notifications from Nova. If there are many Decision Engines, we need to find a way to broadcast notifications. One solution don’t depend on notifications is to update CDM before each audit.

Use Cases

As an operator, I want to enable all Watcher-decision-engine and Applier on all controllers.

Proposed change

If we enable more than one Watcher Decision Engine, the question is how to know the relation between Audit and Watcher Decision Engine. We can add a new ‘host’ field to the Audit table to solve this question. When we update the audit state from PENDING to ONGOING, we set the host field to record the Watcher Decision Engine hostname.

So, the changes will include:

  • Watcher Decision Engine CDM should consume notifications from Nova in broadcast mode to make CDMs synced.

  • Add a new ‘host’ field to the audit table

  • Add a new ‘host’ field to the actionplan table

  • Record the host when updating the audit’s state from PENDING to ONGOING

  • Record the host when updating the actionplan’s state from PENDING to ONGOING

  • For CONTINUOUS audit, we need to check the host that running decision engine and recorded host value in audit table. If they are different, just skip the audit.

  • When starting Applier process, if there are ONGOING actionplan with the same host, cancel these stale actionplans. Actions of these stale actionplans should also be marked as CANCELLED.

  • Action Plan includes different resources which are represented in Actions. If user wants to run more than one action plan at the same time, Watcher should check whether new action plan’s resources overlaps with already running one or not. If so, Watcher should prevent running of new Action Plan by raising appropriate error.

Alternatives

None

Data model impact

  • Add a new ‘host’ filed in the audit table

  • Add a new ‘host’ filed in the actionplan table

REST API impact

None

Security impact

None

Notifications impact

Add ‘host’ field to AuditPayload, ActionPlanPayload

Other end user impact

None

Performance Impact

None

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

alexchadin

Other contributors:

licanwei

Work Items

  • Add broadcast notifications for DE

  • Update database to add a new field ‘host’

  • Record host when changing state from PENDING to ONGOING

  • Add checking host for CONTINUOUS audit

  • Check stale actonplans when starting Applier

Dependencies

None

Testing

Unittest for each change.

Documentation Impact

Appropriate documentation should be added with new HA section.

References

None

History

None