Trove Logging

Provide end-user access to various type of logs on guest instances.

Launchpad Blueprint: https://blueprints.launchpad.net/trove/+spec/datastore-log-operations

Problem Description

In the current implementation, it is not possible for the user to retrieve any logs from the guest agent without ssh access to the instance. The user should be able to retrieve logs as defined by the datastore.

Proposed Change

This document outlines a proposal to provide access to guest logs by storing them in a Swift container.

Each datastore will define a number of log files which it can make available to the user. Logs can be either system logs such as the Trove guest-agent log which are always enabled, or the log can be user logs such as the MySQL Slow Query Log which can be enabled or disabled by the user.

The contents of the appropriate log files will be copied from the guest instance to a Swift container via execution of a log-publish command, and may then be streamed to the user. Subsequent invocations of the log-publish command will only copy log entries made since the last log-publish command to efficiently utilize network resources and minimize impact on database performance.

Swift Log Container

Each log will be composed of multiple objects stored in a Swift container (with a predefined prefix), with each object being a subset of the logging information. The entire log will be reconstructed by concatenating the objects in the container that pertain to the specified log; order can be determined by examining the object metadata for each object, or simply by alphabetically sorting the objects by filename (assuming a suitable naming convention). The prefix will be generated by using a know pattern. All the corresponding files can then be retrieved from Swift using this prefix. The suggested prefix will be: ‘%(instance_id)s/%(datastore)s-%(log)s/’


Each log will store a metadata file in the container, which will have the following information associated with it:

Key Value
Log Name Name of log
Log Type SYS or USER
Log File File where data is published from
Log Hash Hash of log file
Log Size Log file size at last publish
Log Lines Log file line count at last publish

A list of all the instances with logs in the container could be viewed with the following command:

$ swift list database_logs --delimiter='/'

Response:

fa8c452e-6568-438a-9f04-b2878290fb90/

A list of all the logs for an instance could be viewed with the following command:

$ swift list database_logs --delimiter='/' --prefix fa8c452e-6568-438a-9f04-b2878290fb90/

Response:

fa8c452e-6568-438a-9f04-b2878290fb90/mysql-guest/
fa8c452e-6568-438a-9f04-b2878290fb90/mysql-guest_metafile

A list of all the actual log files associated with a specific datastore log could be viewed with the following command:

$ swift list database_logs --delimiter='/' --prefix fa8c452e-6568-438a-9f04-b2878290fb90/mysql-guest/

Response:

fa8c452e-6568-438a-9f04-b2878290fb90/mysql-guest/log-2015-11-18T21:28:09.975754

Configuration

Datastore CONF settings:

  • guest_log_exposed_logs - Used to configure which logs will be exposed by the logging component
  • guest_log_container_name - pattern used to generate the name of the contain which will hold the logs
  • guest_log_long_query_time - the time to set to identify if a query is taking a long time
  • guest_log_limit - max size (in bytes) for any chunk pushed to Swift
  • guest_log_expiry - time in seconds after which the log component is removed from the Swift container (0 for no expiry)

guest_log_exposed_logs

A guest_log_exposed_logs config value will enumerate the logs available for a datastore. By default, the guestagent log would be supported by all datastores plus any other logs defined and supported by the datastore’s guestagent manager.

Excerpt from trove-guestagent.conf:

guest_log_exposed_logs = error,guest,slow_query,general

Any USER log not mentioned in the guest_log_exposed_logs list will be disabled by default. A special value of “ALL” will enable all logs supported by the guest agent manager.

An admin user will always be able to see all the logs, and to publish and view the contents, regardless of whether they are ‘exposed’ or not.

guest_log_container_name

The name of the container used to store log components can be specified in the configuration file.

Excerpt from trove-guestagent.conf:

guest_log_container_name = database_logs

The value shown above is the default value.

guest_log_long_query_time

The amount of time to set for ‘slow_query’ logging. This value is datastore specific, and may mean different things for different datastores. For example, MySQL has a slow query log that these queries are written into, whereas PostgreSQL would use the field to decide what queries to write into its general log.

Log Rotation

Many systems will use log rotation to ensure that logs do not exceed the amount of available disk space on a system. At any point in time, the current log file could be renamed to “<logfile>.1” (or some other name) and a new log file started for ongoing log messages.

To account for this, the logging feature will keep track of a hash of the first line of the current log file that exists during a log-publish operation. The current hash value will be stored in the x-container-meta-log-header-digest value associated with the log file container. Subsequent log-publish operations will use the hash value to determine whether the log has indeed been rotated. If so, the current container will be purged and the new log file published to it.

Database

n/a

Public API

For log-list:

Request:

GET v1/instance/{id}/log

Response:

{
    'logs' : [
        {
            'name': 'guest',
            'type': 'SYS',
            'status': 'Ready',
            'published': '0',
            'pending': '4234',
            'container': 'None'
            'prefix': 'None',
            'metafile': '<id>/mysql-guest_metafile',
        },
        {
            'name': 'general',
            'type': 'USER',
            'status': 'Disabled',
            'published': '0',
            'pending': '0',
            'container': 'None'
            'prefix': 'None',
            'metafile': '<id>/mysql-general_metafile',
        },
        {
            'name': 'slow_query',
            'type': 'USER',
            'status': 'Partial',
            'published': '1009',
            'pending': '304',
            'container': 'database_logs'
            'prefix': '<id>/mysql-slow_query/',
            'metafile': '<id>/mysql-slow_query_metafile',
        },
    ]
}

For log-show:

Request:

POST v1/instance/{id}/log
{ 'name': 'general' }

Response:

{
    'log': {
        'name': 'guest',
        'type': 'SYS',
        'status': 'Partial',
        'published': 218913,
        'pending': 2636234
        'container': 'database_logs',
        'prefix': '<id>/mysql-guest/',
        'metafile': '<id>/mysql-guest_metafile',
    }
}

For log-enable:

Request:

POST v1/instance/{id}/log
{ 'name': 'general', 'enable': 'True' }

Response:

{
    'log': {
        'name': 'general',
        'type': 'USER',
        'status': 'Enabled',
        'published': '0',
        'pending': '0',
        'container': 'None'
        'prefix': 'None',
        'metafile': '<id>/mysql-general_metafile',
        }
    ]
}

For log-disable:

Request:

POST v1/instance/{id}/log
{ 'name': 'general', 'disable': 'True' }

Response:

{
    'log': {
        'name': 'general',
        'type': 'USER',
        'status': 'Disabled',
        'published': '30103',
        'pending': '0',
        'container': 'log-mysql-general-<id>'
        'prefix': '<id>/mysql-general/',
        'metafile': '<id>/mysql-general_metafile',
        }
    ]
}

For log-publish: (Note that ‘publish’ will automatically ‘enable’ a log)

Request:

POST v1/instance/{id}/log
{ 'name': 'general', 'publish': 'True' }

Response:

{
    'log': {
        'name': 'guest',
        'type': 'SYS',
        'status': 'Published',
        'published': '443',
        'pending': '0',
        'container': 'log-mysql-guest-<id>'
        'prefix': '<id>/mysql-guest/',
        'metafile': '<id>/mysql-guest_metafile',
        }
    ]
}

For log-discard

Request:

POST v1/instance/{id}/log
{ 'name': 'general', 'discard': 'True' }

Response:

{
    'log': {
        'name': 'general',
        'type': 'USER',
        'status': 'Ready',
        'published': '0',
        'pending': '30103',
        'container': 'None'
        'prefix': 'None',
        'metafile': '<id>/mysql-general_metafile',
        }
    ]
}

Python API

def log_list(self, instance):
    """Get a list of all guest logs.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to get the log from.
    :rtype: list of :class:`DataStoreLog`.
    """

def log_show(self, instance, log):
    """Show details of a log.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to get the log from.
    :param log: The type of <log> to enable
    :rtype: List of :class:`DataStoreLog`.
    """

def log_enable(self, instance, log):
    """Enable the writing of a log.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to get the log from.
    :param log: The type of <log> to enable
    :rtype: List of :class:`DataStoreLog`.
    """

def log_disable(self, instance, log):
    """Disable the writing of a log.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to get the log from.
    :param log: The type of <log> to disable
    :rtype: List of :class:`DataStoreLog`.
    """

def log_publish(self, instance, log, disable=None, discard=None):
    """Publish guest log to Swift container.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to get the log from.
    :param log: The type of <log> to publish
    :param disable: Turn off <log>
    :param discard: Delete the associated container
    :rtype: List of :class:`DataStoreLog`.
    """

def log_discard(self, instance, log):
    """Discard (delete) the published log container in Swift.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to get the log from.
    :param log: The type of <log> to discard
    :rtype: List of :class:`DataStoreLog`.
    """

def log_generator(self, instance, log, publish=None, lines=50):
    """Return generator to yield the last <lines> lines of guest log.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to to get the log from.
    :param log: The type of <log> to publish
    :param publish: Publish updates before displaying log
    :param lines: Display last <lines> lines of log (0 for all lines)
    :rtype: generator function to yield log as chunks.
    """

def log_save(self, instance, log, publish=None, filename=None):
    """Saves a guest log to a file.

    :param instance: The :class:`Instance` (or its ID) of the database
    instance to get the log from.
    :param log: The type of <log> to publish
    :param publish: Publish updates before displaying log
    :rtype: Filename to which log was saved
    """

CLI (python-troveclient)

Log List

The log-list command provides information about each log available on a given Trove instance.

$ trove log-list <instance>
+------------+------+-------------+-----------+---------+---------------+---------------------+
| Name       | Type | Status      | Published | Pending | Container     | Prefix              |
+------------+------+-------------+-----------+---------+---------------+---------------------+
| error      | SYS  | Unavailable |         0 |       0 | None          |                     |
| general    | USER | Published   |      1009 |       0 | database_logs | <id>/mysql-general/ |
| guest      | SYS  | Ready       |         0 |  499850 | None          |                     |
| slow_query | USER | Disabled    |         0 |       0 | None          |                     |
+------------+------+-------------+-----------+---------+---------------+---------------------+
Column Description
Name Name of the log component
Type SYS: System log, always on
USER: Managed by user
Status Disabled: Inital state of USER log
Enabled: Initial state of a SYS log or a USER log with no data in it
Unavailable: SYS log that has no data in it
Ready: Log has data available for publishing
Published: Log file has been fully published
Partial: Log file has been partially published
Rotated: Log file has rotated, so next publish will delete the container first
Restart Required: Datastore requires a restart in order to begin writing to the log file
Restart Completed: Internal state so the guest log knows to begin reporting the actual state again
Published Amount of data published to container
Pending Amount of data available to be published by log-publish
Container Swift container that holds the components of the log
Prefix Prefix to send to Swift to get just the relevant log parts

Note: Where the values for ‘Container’ and ‘Prefix’ for the logs in the example above are ‘None,’ this signifies that that log has not had a log-publish operation executed against it

Log Show

The log-show command provides full information about a specific log available on a given Trove instance.

$ trove log-show <instance> general
 +--------------+-----------------------------+
 | Property     | Value                       |
 +--------------+-----------------------------+
 | name         | slow_query                  |
 | type         | USER                        |
 | status       | Enabled                     |
 | published    | 135                         |
 | pending      | 2156                        |
 | container    | database_logs               |
 | prefix       | <id>/mysql-slow_query/      |
 | metafile     | <id>/mysql-general_metafile |
 +--------------+-----------------------------+

Log Enable

The log-enable command will instruct the guest agent to begin writing information to the specified log file. Only ‘USER’ logs can be enabled as ‘SYS’ logs are enabled by default (and cannot be disabled). Depending on the datastore, this may cause the log to go into a ‘Restart Required’ state where it will remain until the datastore is restarted. This can be configured on a per-datastore basis, and should only be done if there is no way to dynamically start the logging (i.e. PostgreSQL must be restarted in order to change logging, so it would require this configuration).

$ trove log-enable <instance> general
 +--------------+-----------------------------+
 | Property     | Value                       |
 +--------------+-----------------------------+
 | name         | general                     |
 | type         | USER                        |
 | status       | Enabled                     |
 | published    | 0                           |
 | pending      | 0                           |
 | container    | None                        |
 | prefix       | None                        |
 | metafile     | <id>/mysql-general_metafile |
 +--------------+-----------------------------+

Log Disable

The log-disable command will instruct the guest agent to stop writing information to the specified log file. Only ‘USER’ logs can be disabled. As with log-enable, this may cause the log to go into a ‘Restart Required’ state. See log-enable for more details.

$ trove log-disable <instance> general
 +--------------+-----------------------------+
 | Property     | Value                       |
 +--------------+-----------------------------+
 | name         | general                     |
 | type         | USER                        |
 | status       | Disabled                    |
 | published    | 34658                       |
 | pending      | 2532                        |
 | container    | database_logs               |
 | prefix       | <id>/mysql-general/         |
 | metafile     | <id>/mysql-general_metafile |
 +--------------+-----------------------------+

Log Publish

The log-publish command will instruct the guest agent to push any updates to the specified log to the Swift container, which will be created if required. One log-publish command could result in multiple objects being pushed to the Swift container in order to keep each object below the maximum object size as configured by the guest_log_limit CONF value.

The log-publish command will execute asynchronously. When the log-publish command is executed, the Trove instance will be put in the LOGGING state, returning to ACTIVE when objects have been pushed to the logging container so as to successfully finish execution of the command.

When an object is pushed to the Swift container, an X-Delete-After header is used to specify a time-to-live for the container object. This will result in objects automatically being removed from the container after a period of time as specified by the log_expiry CONF value.

An optional –disable parameter will be supported to disable logging for a particular USER log. An optional –discard parameter will be supported to first discard (delete) the associated container.

$ trove log-publish <instance> slow_query
 +--------------+--------------------------------+
 | Property     | Value                          |
 +--------------+--------------------------------+
 | name         | slow_query                     |
 | type         | USER                           |
 | status       | Published                      |
 | published    | 43242                          |
 | pending      | 0                              |
 | container    | database_logs                  |
 | prefix       | <id>/mysql-slow_query/         |
 | metafile     | <id>/mysql-slow_query_metafile |
 +--------------+--------------------------------+

Log Discard

The log-discard command will discard (delete) the container where the current log information resides.

$ trove log-discard <instance> general
 +--------------+-----------------------------+
 | Property     | Value                       |
 +--------------+-----------------------------+
 | name         | general                     |
 | type         | USER                        |
 | status       | Enabled                     |
 | published    | 0                           |
 | pending      | 37190                       |
 | container    | None                        |
 | prefix       | None                        |
 | metafile     | <id>/mysql-general_metafile |
 +--------------+-----------------------------+

Log Tail

By default, log-tail outputs the 50 lines at the end of the log. With the –lines=n option, log-tail will output the last n lines of the log. If n is negative, output will start past line n and continue to the end; –lines=0 will output the entire log.

With the –publish option, log-tail will first execute a log-publish command and wait for the log to be published before beginning output.

It should be noted that the actual display of the log will take place in the Python API only. There will be no REST APIs to facilitate display of the log; such APIs would put undue stress on the system due to the requirements of buffering and streaming from Swift.

$ trove log-tail <instance> slow_query --publish
/usr/local/mysql/libexec/mysqld, Version: 3.23.54-log, started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
# Time: 030207 15:03:33
# User@Host: wsuser[wsuser] @ localhost.localdomain [127.0.0.1]
# Query_time: 13  Lock_time: 0  Rows_sent: 117  Rows_examined: 234
use wsdb;
SELECT l FROM un WHERE ip='209.xx.xxx.xx';

Log Save

Like the log-tail command, the log-save command will execute in the python-troveclient, but where log-tail will output the log to the console, the log-save command will save the log into a file in the filesystem. This will allow the user to download extremely large log files without overwhelming the client or browser.

With the –file option, the log will be saved to the named file. Without the –file option, the log will be output to <logname>.log in the current directory.

With the –publish option, log-tail will first execute a log-publish command and wait for the log to be published before beginning output.

It should be noted that the actual saving of the log will take place in the Python API only. There will be no REST APIs to facilitate display of the log; such APIs would put undue stress on the system due to the requirements of buffering and streaming from Swift.

$ trove log-save <instance> slow_query --file=/tmp/my.log --publish

Public API Security

The Swift containers will be created with the same security credentials as used for backup.

Internal API

Appropriate API methods will be added to both the task manager and the guest.

No changes to existing APIs are foreseen.

Guest Agent

No compatibility issues are foreseen.

Alternatives

Alternate API

An alternative API could be implemented that does away with some of the simpler commands.

Note: This alternate section was modified after describing the process to documentation personnel, where it was determined that having the ‘simple’ commands made it significantly easier to understand the logging process.


Openstack functionality:

API Description
log-list As above
log-show Removed
log-enable Removed
log-disable Removed
log-publish As above
log-discard Removed
log-tail As above
log-save As above

–publish by default

There has also been a suggestion to make the –publish option the default on the log-tail and log-save commands. If this were done, it seems unlikely that anyone would ever execute –no-publish and the log-publish command would really only ever execute “log-publish –disable”, so it would be better to just eliminate the –publish/–no-publish options, have log-tail and log-save always publish first, and replace the log-publish command with log-disable.

Admin override for guest log

It is possible for the operator to exclude the guest agent log from the list of logs returned to the user. Doing so, however, will not prevent the operator from seeing the guest log, as they would access it through the admin account. Currently the admin user will still stream the logs to the same tenant as the Trove user. This could be enhanced (in the future) to have the admin user provide a tenant that could be used to host the log containers.

Dashboard Impact (UX)

TBD (section added after approval)

Implementation

Assignee(s)

Primary assignee:
vgnbkr (morgan@tesora.com)
Secondary assignees:
peterstac (peter@tesora.com) atomic77 (atomic@tesora.com)
Documentation:
laurelm (lmichaels@tesora.com)

Milestones

Target Milestone for completion:
Mitaka-1

Work Items

This component has been largely implemented.

Upgrade Implications

As this is entirely new functionality, no upgrade implications are foreseen.

Dependencies

n/a

Testing

An integration scenario test will be added to retrieve the guest log, execute a command, retrieve the guest log again and confirm that additional logging details were captured. The log will then be deleted via the log-discard command and the removal of the files from the container from Swift will be verified. The TestHelper class will be enhanced to include the ‘default’ log list (which is simply ‘guest’), and the MySQL one the additional defined USER logs.

Documentation Impact

New documentation describing the added functionality will need to be written.

Appendix

None