Add action event fault details¶
https://blueprints.launchpad.net/nova/+spec/action-event-fault-details
The blueprint proposes to add the fault details to the failed instance action event.
Problem description¶
Currently, the instance action event details that a non-admin owner of a
server sees do not contain any useful information about what caused the
failure of the action. For example, if we failed to cold migrate a server,
show the server’s event info by
openstack server event show <server> <request-id>
that will be
recorded as:
{ "events": [ { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "cold_migrate", "result": "Error" }, { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "conductor_migrate_server", "result": "Error" } ] }
Obviously, from the response of the server event action, the user cannot obtain the actual useful information.
If the server status is not ERROR but some operation failed, the user cannot get the fault details either because server faults are only shown for servers in ERROR or DELETED status. But instance actions can be shown for a server in any status (and even for deleted servers since microversion 2.21).
Use Cases¶
As a non-admin user, I would like to know the details about the failure when
the server is not in ERROR status. Although I can’t see the exact
traceback
, at least I can do other attempts based on the details.
Proposed change¶
In a new microversion, expose the details
field in the Show Server
Action Details API:
GET /servers/{server_id}/os-instance-actions/{request_id}
Add a new policy to control the visibility for a set of instance action attributes, its default rule is ‘rule:system_reader_api’ (Legacy rule is ‘rule:admin_api’).
The event “details” are the same as the fault.message
that the user would
see when the server is in ERROR status. For NovaExceptions that would be
the actual exception message but for non-NovaExceptions it’s just the
exception class name (In the defined exception_to_dict
function to do
message = fault.__class__.__name__
).
Alternatives¶
Add user_message
field on NovaException which, if present, gets percolated
up to a new field similar to the details
. Perhaps this can be as painful
as documenting all the different error types.
Data model impact¶
None. The details
column was already in the instance_actions_events
table, and it’s a TEXT size column so it should be large enough to hold
exception fault messages.
REST API impact¶
In a new microversion, expose the details
parameters in the following
API responses:
GET /servers/{server_id}/os-instance-actions/{request_id}
{ "events": [ { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "cold_migrate", "result": "Error", "details": "No valid host was found." }, { "finish_time": "2019-11-13T16:18:27.000000", "start_time": "2019-11-13T16:18:26.000000", "event": "conductor_migrate_server", "result": "Error", "details": "No valid host was found." } ] }
This only populates the value of the details
attribute after checking
that the policy matches ‘rule: system_reader_api’.
With the new microversion the “details” key is always returned with each event dict but the value may be null because of old records or events that did not fail.
Security impact¶
There is a chance for a security impact with this change because we could be leaking sensitive information about the deployment to a non-admin end user, but we already do through server faults so this shouldn’t be worse. Note bug 1851587 about faults.
Add a new policy so that the deployer can decide whether to expose these fault information to the end users.
Notifications impact¶
None
Other end user impact¶
None
Performance Impact¶
None
Other deployer impact¶
None
Developer impact¶
The developers have to be careful about what information they put into NovaExceptions which could leak sensitive information to a non-admin end user.
Upgrade impact¶
None. The new column in the database was already exist.
Implementation¶
Assignee(s)¶
- Primary assignee:
brinzhang
Feature Liaison¶
- Feature liaison:
brinzhang
Work Items¶
Add
details
to theInstanceActionEvent
object, and populating it, and the populating part requires some work.Modify the API to expose the
details
field in GET responses that expose instance action event.Add related tests
Docs for the new microversion.
Dependencies¶
None
Testing¶
Add related unit test for negative scenarios.
Add related functional test (API samples).
Tempest testing should not be necessary for this change.
Documentation Impact¶
Update the API reference for the new microversion.
References¶
- [1] “Thoughts on exposing exception type to non-admins in instance action
event” in ML http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010775.html
History¶
Release Name |
Description |
---|---|
Ussuri |
Introduced |