Persist Error Message

Errors that occur in Trove should be easy to retrieve so that the end user can see exactly what is happening with their database instance.

Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/persist-error-message

Problem Description

Historically it has been very difficult to determine the cause of a failure in Trove. This is due to the fact that errors may be logged in multiple places, none of which are available to the end user. With the advent of Notifications in Trove, however, it is now feasible to persist error messages in the db so that they can be retrieved and displayed.

Proposed Change

Each server will register a callback with the notification framework. Whenever a notification is sent, this callback will be fired off and any errors that occur can then be saved in the database. This information can then be recalled by the user using the ‘trove show’ command.

For errors that occur outside the framework of notifications, a direct call will be made to persist the error. Not all errors will need to be persisted, so an initial set will be proposed that can be enhanced over time as the need arises.

Configuration

No configuration changes are anticipated.

Database

A new table (instance_faults) will be added to the Trove schema:

Column Type Allow Nulls Description
id varchar(64) No ID of fault (autogenerated)
instance_id varchar(64) No ID of instance that the fault occurred on
message varchar(255) No Error message of the fault
details text(65535) No Extra details (i.e. stack trace)
created DateTime No Created date
updated DateTime No Updated date
deleted tinyint(1) Yes Deleted flag
deleted_at DateTime Yes Deleted date

Public API

The only change to the public API will be the addition of a ‘fault’ data structure that is returned when requesting instance details. This will look like:

'fault' :
{
    'created': <date>,
    'message': 'error message',
    'details': 'potential stack trace',
},

The ‘details’ value will only be available if the request is done by an admin user.

Public API Security

No security issues are anticipated. Since the messages persisted are all exception messages that are broadcast as notifications, none should contain sensitive information. If any are found to, they should be treated as bugs and modified accordingly (none have been discovered as of yet).

Python API

No changes are anticipated to the python API.

CLI (python-troveclient)

The ‘show’ Trove CLI command may now have new data displayed:

+-------------------+----------------------------------------------------+
| Property          | Value                                              |
+-------------------+----------------------------------------------------+
| created           | 2016-05-06T21:28:53                                |
| datastore         | mysql                                              |
| datastore_version | 5.6                                                |
| fault_date        | 2016-05-06T21:30:06                                |
| fault_details     | Traceback (most recent call last):                 |
|                   |   File "/<snip>/manager.py", line 265, in prepare  |
|                   |     cluster_config, snapshot, modules)             |
|                   |   File "/<snip>/manager.py", line 355, in _prepare |
|                   |     raise RuntimeError("A  guest error occurred")  |
|                   | RuntimeError: A guest error occurred               |
| fault_message     | A guest error occured                              |
| flavor            | 15                                                 |
| id                | 73cfc462-dd59-4dc1-9d32-95954171775f               |
| ip                | 10.66.25.8                                         |
| name              | myinst2                                            |
| status            | ACTIVE                                             |
| updated           | 2016-05-06T21:28:58                                |
| volume            | 1                                                  |
| volume_used       | 0.1                                                |
+-------------------+----------------------------------------------------+

Internal API

No changes need to be made to this API.

Guest Agent

No changes need to be made to the guest agent.

Alternatives

We could continue to require access to the logs and/or Nova instances to determine what happened when an error occurs.

Dashboard Impact (UX)

The relevant fields need to be exposed during the ‘show’ command.

Implementation

Assignee(s)

Primary assignee:
[peterstac]

Milestones

Newton

Work Items

The work will be undertaken within a single task.

Upgrade Implications

No upgrade issues are expected.

Dependencies

None.

Testing

Scenario tests will be enhanced to verify that errors are persisted in the database and can be retrieved.

Documentation Impact

This is a net-new feature, and as such will require documentation.

References

None

Appendix

None