Support node history¶
This spec proposes node history support for nodes, which is useful for identifying issues.
Currently ironic uses one last_error field to record error information when an operation failed, this field is easily overwritten, to traceback the root cause we have to search logs on the conductor host located somewhere in the cloud. To make bare metal management easier, it would be handy to have a history, especially, errors and state transitions of a node.
The proposal is to introduce a new table to store those events and provide API support to retrieve them.
Introduces a new table named
node_history and a db object
see Data model impact for the schema definition.
Implements API layer to support node history query. The node history is supposed to be query only.
Only two kinds of events will be logged in this proposal:
Everything goes to last_error, this also covers node maintenance state change.
The range could be extended according to requirements in the future, but not included in this spec.
Introduces a periodic task to remove node history entries which exceed specified maximum of number, the number will be configurable by configuration options.
history module to provide history interface abstraction and provides
two implementation with
Other solutions exist, like using LOG collector and aggregator, but they need more integrations and not directly supported from ironic.
Data model impact¶
A new database table will be added with following schema:
op.create_table('node_history', sa.Column('created_at', sa.DateTime(), nullable=True), sa.Column('updated_at', sa.DateTime(), nullable=True), sa.Column('id', sa.Integer(), nullable=False), sa.Column('uuid', sa.String(length=36), nullable=False), sa.Column('conductor', sa.String(length=255), nullable=True), sa.Column('event', sa.Text(), nullable=True), sa.Column('node_id', sa.Integer(), nullable=True), sa.Column('user'), sa.String(length=32), nullable=True), sa.PrimaryKeyConstraint('id'), sa.UniqueConstraint('uuid', name='uniq_history0uuid'), sa.ForeignKeyConstraint(['node_id'], ['nodes.id'], ), mysql_ENGINE='InnoDB', mysql_DEFAULT_CHARSET='UTF8') sa.Index('node_id', 'node_id')
event is the string conveys what happened to the node, the content will
be truncated to 1000 characters.
conductor is the hostname of the conductor who recorded the entry.
user is the requestor for the operation from the context, for the Identify
service it’s a string with fixed length.
State Machine Impact¶
REST API impact¶
Following endpoints will be added to support querying node history, microversioned. Clients with earlier microversion will receive 404.
Retrieve the list of events logged for this node. By default
created_atare returned. The
eventwill be truncated to 255 to give a brief information. Detailed history entry will be returned if
detailis set to True in the query string.
For a normal request, 200 is returned.
Get detailed information of an event.
For a normal request, 200 is returned.
Client (CLI) impact¶
“openstack baremetal” CLI¶
OSC will be enhanced to support following operations:
openstack baremetal node history list: list all events kept for this node
openstack baremetal node history show <uuid>: show a specific node event
RPC API impact¶
Driver API impact¶
Nova driver impact¶
Other end user impact¶
Node events could occupy considerable amount of data in the database when this feature is enabled, depending on the scale of bare metals and activities. In such case the configuration options of this feature should be evaluated.
The new periodic task and database access will use some resource, but should be trivial.
Other deployer impact¶
Adds following configuration options to change the behavior of this feature:
[conductor]node_history_backend: can be
nonedoes nothing and effectively disable this feature, this is the default.
[conductor]node_history_max_entries: how many events ironic should keep. Oldest events will be removed when reached max entries. The default is 300, the minimum value is 1.
[conductor]node_history_cleanup_interval: the interval in seconds, the clean up periodic task should be scheduled. One day by default. Set to 0 will disable periodic clean up.
[conductor]node_history_cleanup_batch_num: the maximum number of entries will be removed during one clean up operation.
Other events could be added once this spec is implemented.
- Primary assignee:
- Other contributors:
Implements proposed work:
The history module and two backends namely none and database
Log history at proper code path
The feature will be covered by unit test.
Upgrades and Backwards Compatibility¶
This feature is disabled by default.
Documentation will be updated.