Add notifications about resources CRUD and node states

https://bugs.launchpad.net/ironic/+bug/1606520

This spec proposes addition of new notifications to ironic: CRUD (create, update, or delete) of resources and node state changes for provision state, maintenance and console state.

Problem description

Resource indexation services like Searchlight [1] require notifications about creation, update or deletion of a resource. Currently CRUD notifications are not implemented in ironic. Creating an efficient plugin for Searchlight is impossible without these notifications. Ironic node notifications for provision state, maintenance and console state also could be used by Searchlight plugin in order to keep Searchlight’s index of ironic resources up-to-date.

Apart from searchlight, there is a use case of monitoring service, that caches all notification payloads along with event type, like start/end/error/<etc> and an operator can query this service to see if ironic is behaving properly. For example, if there are much more start notifications for node create, than there are end notifications, it may mean that the database is not behaving properly, or messaging is having a hard time delivering messages between API and conductor. That is a separate case from searchlight: searchlight for example does not need to know the payload of the node create start notification, as there is no actual node yet, but for monitoring purposes, it may be useful.

Proposed change

As a general note for all CRUD notifications, *.start and *.error event payloads will be ignored by Searchlight, as in both cases it would mean that resource representation has not changed, or in case of *create* notifications, that the resource was not created.

Node CRUD notifications

The following event types will be added:

  • “baremetal.node.create.start”;
  • “baremetal.node.create.end”;
  • “baremetal.node.create.error”;
  • “baremetal.node.update.start”;
  • “baremetal.node.update.end”;
  • “baremetal.node.update.error”;
  • “baremetal.node.delete.start”;
  • “baremetal.node.delete.end”;
  • “baremetal.node.delete.error”.

Priority level - INFO or ERROR (for “error” status). Payload contains all fields from base NodePayload with additional fields: chassis_uuid, instance_info, driver_info. Secrets in the node fields will be masked. raid_config and target_raid_config fields are excluded because they can contain low-level disk and vendor information. If/when there is a use case for them, they can be added in the future. All these notifications will be implemented at the API level.

Port CRUD notifications

The following event types will be added:

  • “baremetal.port.create.start”;
  • “baremetal.port.create.end”;
  • “baremetal.port.create.error”;
  • “baremetal.port.update.start”;
  • “baremetal.port.update.end”;
  • “baremetal.port.update.error”;
  • “baremetal.port.delete.start”;
  • “baremetal.port.delete.end”;
  • “baremetal.port.delete.error”.

Priority level - INFO or ERROR (for “error” status). Payload contains these fields: uuid, node_uuid, address, extra, local_link_connection, pxe_enabled, created_at, updated_at. These notifications will be implemented at the API level. In addition, “baremetal.port.create.*” will be emitted by the ironic-conductor service when driver creates a port (examples are [2] and [3]).

Chassis CRUD notifications

The following event types will be added:

  • “baremetal.chassis.create.start”;
  • “baremetal.chassis.create.end”;
  • “baremetal.chassis.create.error”;
  • “baremetal.chassis.update.start”;
  • “baremetal.chassis.update.end”;
  • “baremetal.chassis.update.error”;
  • “baremetal.chassis.delete.start”.
  • “baremetal.chassis.delete.end”.
  • “baremetal.chassis.delete.error”;

Priority level - INFO or ERROR (for “error” status). Payload contains these fields: uuid, extra, description, created_at, updated_at. All these notifications will be implemented at the API level.

Node provision state notifications

Will be implemented via TaskManager methods (and emitted by the ironic-conductor service).

Types of events for node provision state:

  • “baremetal.node.provision_set.start”;
  • “baremetal.node.provision_set.end”;
  • “baremetal.node.provision_set.error”;
  • “baremetal.node.provision_set.success”.

Types of state changing in ironic and corresponding events:

  • Start transition, spawning a working thread: “start” notification with INFO level.
  • End transition, cleaning target_provision_state: “end” notification with INFO level.
  • Error events processing: “error” notification with ERROR level.
  • Change provision_state without starting a worker that is not “end” or “error”: “success” notification with INFO level. Examples are DEPLOYING <-> DEPLOYWAIT, AVAILABLE -> MANAGEABLE.

Payload contains all fields from base NodePayload with additional fields: instance_info, previous_provision_state, previous_target_provision_state, event (FSM event that triggered the state change). To efficiently use the provision state notifications all related node changes (like setting of last_error, maintenance) should be done before event processing.

Node maintenance notifications

The following event types will be added:

  • “baremetal.node.maintenance_set.start”;
  • “baremetal.node.maintenance_set.end”;
  • “baremetal.node.maintenance_set.error”.

Priority level - INFO or ERROR (for “error” status). Payload contains all fields from base NodePayload. All these notifications will be implemented at the API level and reflect maintenance changes to a node due to a user request. There won’t be any explicit node maintenance notifications for maintenance changes done internally by ironic. Since these internal changes occur as a result of trying to change the node’s state (e.g. provision, power), one of the other notifications that is emitted will “cover” these internal maintenance changes.

Node console notifications

The following event types will be added:

  • “baremetal.node.console_set.start”;
  • “baremetal.node.console_set.end”;
  • “baremetal.node.console_set.error”;
  • “baremetal.node.console_restore.start”;
  • “baremetal.node.console_restore.end”;
  • “baremetal.node.console_restore.error”.

console_set action is used when start or stop console is initiated via API request, console_restore action is used when console_enabled flag is already enabled in the DB for node and console restart via driver is required (due to dead or restarted ironic-conductor process). Priority level - INFO or ERROR (for “error” status). Payload contains all fields from base NodePayload. All these notifications will be implemented in the ironic-conductor, because setting of a node’s console is an asynchronous request, so ironic-conductor can easily emit notifications for the start/end of the change.

Alternatives

Periodically polling ironic resources via API.

Data model impact

None

State Machine Impact

None

REST API impact

None

Client (CLI) impact

None

RPC API impact

None

Driver API impact

None

Nova driver impact

None

Ramdisk impact

None

Security impact

None

Other end user impact

None

Scalability impact

If notifications are enabled, they can create high load on the message bus during node deployments on large environments.

Performance Impact

None

Other deployer impact

Deployers should set already existing notification_level config options properly.

Developer impact

  • If developer creates resources in the driver, proper notification should be emitted.
  • For provision state change all related node updates should be done before event processing.

Implementation

Assignee(s)

Primary assignee:
  • yuriyz
Other contributors:
  • vdrok
  • mariojv

Work Items

  • Implement node provision state change notifications.
  • Implement CRUD notifications and node maintenance notifications.
  • Implement console notifications.
  • Add notifications to the current ironic code that creates resources in the drivers.
  • Fix ironic code with node updates after event processing.

Dependencies

Patch with base NodePayload [4].

Testing

Unit tests will be added.

Upgrades and Backwards Compatibility

None

Documentation Impact

New notifications feature will be documented.