Aodh Notifier¶
launchpad blueprint: https://blueprints.launchpad.net/vitrage/+spec/aodh-notifier
The Evaluator performs root cause analysis on the Vitrage Graph and may determine that an alarm should be created, deleted or otherwise updated. Other components are notified of such changes by the Vitrage Notifier service. Among others, Vitrage Notifier is responsible for handling Aodh Alarms.
This blueprint describes the implementation of Vitrage Notifier for notifying Aodh on Vitrage alarms.
+------------------+ +------------------+ +------------------+
| Aodh <--+ | | | |
+------------------+ | Update | Vitrage | Raise | Vitrage |
+--------| <----------| |
+------------------+ | Alarm | Notifier | Alarm | Evaluator |
| Other components <--+ | | | |
+------------------+ +------------------+ +------------------+
Problem description¶
Vitrage should be capable of creating, deleting and otherwise updating alarms as requested by the Evaluator Engine. The notifier is responsible for ensuring these updates are executed. Specifically we will start here with Aodh alarms.
Main challenges:
There is no way to define a ‘custom alarm’ in Aodh
Vitrage alarms are based on resources. There is a need to pass the resource information to Aodh
Several alarms of the same type can be triggered at the same time, each for a different resource. For example, in case there is an alarm on a host, Vitrage will raise a deduced alarm on every instance in this host.
How can someone ask for notifications on updates of Vitrage alarms?
Proposed change¶
The Vitrage Notifier will be separate from the Evaluator, as the two will have different demands of scale and other performance considerations. The Vitrage Notifier will supply an API used by the Vitrage Evaluator, containing create/delete/update alarm.
In Aodh, Vitrage alarms will be defined as event alarms, this seems like the most appropriate option. The resource id will be defined in the alarm query.
Vitrage deduced alarms will look like this:
Property |
Value |
alarm_actions |
[] |
alarm_id |
4a3cb988-a620-4bf3-87f7-077c751c408f |
description |
Instance is unreachable |
enabled |
True |
event_type |
vitrage.alarm.instance_unreachable |
insufficient_data_actions |
[] |
name |
vitrage_instance_unreachable_1 |
ok_actions |
[] |
project_id |
5542b27142154f30b32dea6238aa81aa |
query |
[{field’: ‘resource_id’, ‘type’: ‘’, ‘value’: ‘b0bf3635-d9e8-4624-9793-7aac82948c0a’, ‘op’: ‘eq’}] |
repeat_actions |
False |
severity |
moderate |
state |
alarm |
type |
event |
user_id |
8ab65ef808b245e3ba234b7b3554cb94 |
In this example, Vitrage triggers a deduced alarm that an instance is unreachable due to a failure in the public switch (which was detected by Nagios). There will be several alarms with the same event_type and different instance ids in their query.
There are two options how to trigger Vitrage alarms in Aodh, none is perfect.
Alternative 1¶
Vitrage will create an event alarm in Aodh. Then, it will send a notification to the message bus. The notification will be converted to a Ceilometer event, which will trigger the Aodh alarm.
The exact notification and event format are still TBD.
The main problem with this solution is that the Aodh alarm will be created on-the-fly and triggered immediately, so it will be impossible for another project to register a web-hook on the alarm before it is triggered. It will be possbile to see Vitrage alarms in list-alarms, but not to be notified when they are first triggered.
Alternative 2¶
Vitrage will create an event alarm in Aodh, with ‘alarm’ state. The event itself will never be sent, so the alarm state will remain ‘alarm’.
The problem with this solution is that Aodh will not send a notification about the alarm being triggered. But since in Alternative 1 it is also impossible to register on the alarm, there is no real difference between the two options.
Data model impact¶
None
REST API impact¶
None
Versioning impact¶
None
Other end user impact¶
None
Deployer impact¶
For Alternative 1 - there is a need to define the notification->event configuration
For Alternative 2 - None
Developer impact¶
None
Horizon impact¶
None
Implementation¶
Assignee(s)¶
- Primary assignee:
idan-hefetz
Work Items¶
None
Dependencies¶
None
Testing¶
This blueprint requires unit tests and Tempest tests.
Documentation Impact¶
For Alternative 1 - there is a need to document the notification->event configuration
For Alternative 2 - None
References¶
Vitrage wiki page: https://wiki.openstack.org/wiki/Vitrage
Vitrage use cases: https://github.com/openstack/vitrage/blob/master/doc/source/vitrage-use-cases.rst