Watcher Continuous Optimization

https://blueprints.launchpad.net/watcher/+spec/continuously-optimization

Problem description

Cluster can be optimized by different Strategies only when they have been triggered by Administrator. Launching a recommended Action Plan manually is not always suitable since state of cluster is constantly changing. It would be better to have two ways of launching audit: either by triggering it manually or by launching it periodically. We propose to include continuous optimization as continuous type of audit object in Watcher Project.

The main purpose of this change is to design and implement active mode of Watcher’s audit.

This specification relates to blueprint: https://blueprints.launchpad.net/watcher/+spec/continuously-optimization

Use Cases

As an administrator, I would like to create a periodic audit to be able to optimize continuously my cloud infrastructure. I can specify a period with –period parameter (in seconds) to launch an audit every 600 seconds.

As an administrator, I would like to be able to remove a continuous audit.

As an administrator, I would like to be able to update the period of an audit.

Project Priority

Essential for Newton-2

Proposed change

The watcher system enables a private cloud administrator to launch Audit on an Openstack cluster in order to optimize it in regards of one or several goals. An Audit is an optimization request.

There are two types of audits :
  • ONESHOT : the audit will only be executed once

  • CONTINUOUS : the audit will be executed regularly with a given frequency.

We propose to use the APScheduler library to schedule the continuous audits. Note: This library is already in the Openstack global requirements.

APScheduler provides several scheduler implementations to schedule jobs with a specific interval. The scheduler which seems to match well our requirements is the BackgroundScheduler. APScheduler provides an example: BackgroundScheduler.

The DecisionEngineManager (watcher/decision_engine/manager.py) class will need to be amended in order to instantiate the new ContinuousAuditManager class.

The ContinuousAuditManager class will contain the BackgroundScheduler but also the logic for managing the continuous audits. We should also create a ContinuousAuditJob class in charge of supervising one Audit. This class will contain the APScheduler job and its associated audit.

We can easily add new audits or remove old ones on the fly with BackgroundScheduler. So, the existing continuous audits should be automatically added by the decision_engine during start.

Then, the ContinuousAuditManager will manage the audits in an even driven fashion. In order to do that, we should then modify the ‘post’, ‘patch’ and ‘delete’ methods in the API source file for sending immediate notification messages. The notifications generated by Watcher are generated in JSON format, and placed on an AMQP queue named watcher.status. This parameter must be configurable.

The ContinuousAuditManager will consume these events in order to update the status of the audits.

Immediate Notification Examples

{
"event_type": "audit.create",
 "timestamp": "2016-03-12 17:01:29.899834",
 "message_id": "1234653e-ce46-4a82-979f-a9286cac5258",
 "priority": "INFO",
 "publisher_id": "<service name >:<the host where the service runs>",
 "payload": {
    "watcher_object.namespace":"watcher",
    "watcher_object.name":"Audit",
    "watcher_object.version":"1.0",
    "watcher_object.data":{
    "audit_uuid": "840eeb3e-3486-11e6-ac61-9e71128cae77",
    "type": "CONTINUOUS",
    "state": "PENDING",
    "period": 3600
   }
 }
}
{
"event_type": "audit.update",
 "timestamp": "2016-03-12 17:01:29.899834",
 "message_id": "1234653e-ce46-4a82-979f-a9286cac5258",
 "priority": "INFO",
 "publisher_id": "<service name >:<the host where the service runs>",
 "payload": {
      "watcher_object.namespace":"watcher",
         "watcher_object.name":"Audit",
    "watcher_object.version":"1.0",
    "watcher_object.data":{
    "audit_uuid": "840eeb3e-3486-11e6-ac61-9e71128cae77",
    "type": "CONTINUOUS",
    "state": "ONGOING",
    "period": 3600
   }
 }
}
{
"event_type": "audit.delete",
 "timestamp": "2016-03-12 17:01:29.899834",
 "message_id": "1234653e-ce46-4a82-979f-a9286cac5258",
 "priority": "INFO",
 "publisher_id": "<service name >:<the host where the service runs>",
 "payload": {
      "watcher_object.namespace":"watcher",
         "watcher_object.name":"Audit",
    "watcher_object.version":"1.0",
    "watcher_object.data":{
    "audit_uuid": "840eeb3e-3486-11e6-ac61-9e71128cae77",
    "type": "CONTINUOUS",
    "state": "SUCCEEDED",
    "period": 3600
   }
 }
}

The notification logic isn’t yet available in Watcher. We will work on this with the watcher-notifications-ovo blueprint. So, for the first implementation of this spec, we will manage the audits by querying periodically in the watcher database in order to update running audits and their periods.

APScheduler give also the possibility to store your jobs in a database. In this way, the jobs will survive decision engine restarts and maintain their state. This feature is interesting, but for the first implementation of the continuous Audit we will use the memory backend.

To keep track of the triggered audit, notification has to be pushed on the message bus every time the audit is re-triggered. When a new action plan is proposed, Watcher should cancel all the previously generated action plans (and actions) with same Audit Template become obsolete and therefore their state should be changed to CANCELLED.

Alternatives

  • To use Congress to automatically trigger audits when some conditions are met.

  • To use a cronjob which triggers new audit regularly via python-watcherclient.

Data model impact

There must be new field in Audit model: integer ‘period’. ‘period’ field has 3600 by default.

REST API impact

period’s field has to be added as Audit attribute.

Security impact

None expected.

Notifications impact

None expected.

Other end user impact

Support for ‘period’ field must be added to the python-watcherclient and to the watcher-dashboard.

Performance Impact

No specific performance impact is expected.

Other deployer impact

No specific deployer impact is envisaged.

Developer impact

This will not impact other developers working on OpenStack.

Implementation

Assignee(s)

Primary assignee:

Alexander Chadin <alexchadin>

Other contributors:

Vladimir Ostroverkhov <Ostroverkhov> Jean-Emile DARTOIS <jed56>

Work Items

Part 1

  • Implement ContinuousAuditManager that use APScheduler.

  • Implement ContinuousAuditJob class.

  • Implement the logic to add new audits or remove old ones on the fly with BackgroundScheduler by periodically query the watcher db. Audit.list()

  • Adapt API to support period field.

  • Make some changes to python-watcherclient to add support for period argument.

  • Add changes to watcher-dashboard to support CONTINUOUS type.

  • Implement appropriate unit tests to test various scenarios.

Part 2

  • We need to wait that watcher-notifications-ovo is implemented for this part.

  • Load the registered audits in the watcher database during decision engine start.

  • Implement the logic to add new audits or remove old ones on the fly with BackgroundScheduler by subscripting to the events.

Dependencies

There is a dependency with watcher-notifications-ovo blueprint.

Testing

Appropriate unit tests will be adapted to new changes.

Documentation Impact

It will be necessary to add new content relating to this change.

References

No references.

History

No history.