V3 Diagnostics - common output¶
https://blueprints.launchpad.net/nova/+spec/v3-diagnostics
Currently there is no defined format for VM diagnostics. This BP will ensure that all of the drivers that provide VM diagnostics will have a consistent format.
NOTE: this cannot be used for V2 as there may be existing deployments that parse the current output of the V2 diagnostics.
Problem description¶
In V2 the VM diagnostics are a ‘blob’ of data returned by each hypervisor. The goal here is to have a formal definition of what output should be returned, if possible, by the drivers supporting the API. In additition to this a driver will be able to return additional data if they choose.
Proposed change¶
Introduce a new driver method that will return a predefined structure: get_instance_diagnostics(self, context, instance)
This is a new driver method. The reason for this is that it is much cleaner to have a new method instead of having if’s which indicate if it is new or legacy. We should also consider deprecating get_diagnostics. This should be documented in the virt driver API.
The proposal is to have the drivers return the following information in a object class. A diagnostics Model() class will be introduced. This will be instantiated and populated by the virt drivers. The class will have a method to serialize to JSON so that the API interface can return a JSON format to the user. A field that is not populated by the driver will return a default value set in the aforementioned class.
The table below has the key and a description of the value returned:
Key |
Description |
---|---|
state |
The current state of the VM. Example values are: ‘pending’, ‘running’, ‘paused’, ‘shutdown’, ‘crashed’, ‘suspended’ and ‘building’ (String) |
driver |
A string denoting the driver on which the VM is running. Examples may be: ‘libvirt’, ‘xenapi’, ‘hyperv’ and ‘vmwareapi’ (String) [Admin only - key will not appear if non admin] |
hypervisor_os |
A string denoting the hypervisor OS (String) [Admin only - key will not appear if non admin] |
uptime |
The amount of time in seconds that the VM has been running (Integer) |
num_cpus |
The number of vCPUs (Integer) |
num_nics |
The number of vNICS (Integer) |
num_disks |
The number of disks (Integer) |
cpu_details |
An array of details (a dictionary) per vCPU (see below) |
nic_details |
An array of details (a dictionary) per vNIC (see below) |
disk_details |
An array of details (a dictionary) per disk (see below) |
memory_details |
A dictionary of memory details (see below) |
config_drive |
Indicates if the config drive is supported on the instance (Boolean) |
driver_private_data |
A dictionary of private data from the driver. This is driver specific and each driver can return information valuable for diagnosing VM issues. The raw data should versioned. |
Note: A number of the above details are common to all drivers. These values will be filled in by the Nova compute manager prior to invoking the driver call. The ones that are virt driver specific will be filled, if possible, by the virt driver. If the virt driver is unable to provide a spcific field then that field will not be reported in the diagnostics.
For example:
def get_instance_diagnostics(self, context, instance):
"""Retrieve diagnostics for an instance on this host."""
current_power_state = self._get_power_state(context, instance)
if current_power_state == power_state.RUNNING:
LOG.audit(_("Retrieving diagnostics"), context=context,
instance=instance)
diagnostics = {}
diagnostics['state'] = instance.vm_state
...
driver_diags = self.driver.get_instance_diagnostics(instance)
diagnostics.update(driver_diags)
return diagnostics
The cpu details will be an array of dictionaries per each virtual CPU.
Key |
Description |
---|---|
time |
CPU Time in nano seconds (Integer) |
The network details will be an array of dictionaries per each virtual NIC.
Key |
Description |
---|---|
mac_address |
Mac address of the interface (String) |
rx_octets |
Received octets (Integer) |
rx_errors |
Received errors (Integer) |
rx_drop |
Received packets dropped (Integer) |
rx_packets |
Received packets (Integer) |
tx_octets |
Transmitted Octets (Integer) |
tx_errors |
Transmit errors (Integer) |
tx_drop |
Transmit dropped packets (Integer) |
tx_packets |
Transmit packets (Integer) |
The disk details will be an array of dictionaries per each virtual disk.
Key |
Description |
---|---|
id |
Disk ID (String) |
read_bytes |
Disk reads in bytes(Integer) |
read_requests |
Read requests (Integer) |
write_bytes |
Disk writes in bytes (Integer) |
write_requests |
Write requests (Integer) |
errors_count |
Disk errors (Integer) |
The memory details is a dictionary.
Key |
Description |
---|---|
maximum |
Amount of memory provisioned for the VM in MB (Integer) |
used |
Amount of memory used by the VM in MB (Integer) |
Below is an example of the dictionary data returned by the fake driver:
{'state': 'running',
'driver': 'fake-driver',
'hypervisor_os': 'fake-os',
'uptime': 7,
'num_cpus': 1,
'num_vnics': 1,
'num_disks': 1,
'cpu_details': [{'time': 1024}]
'nic_details': [{'rx_octets': 0,
'rx_errors': 0,
'rx_drop': 0,
'rx_packets': 0,
'tx_octets': 0,
'tx_errors': 0,
'tx_drop': 0,
'tx_packets': 0}],
'disk_details': [{'read_bytes':0,
'read_requests': 0,
'write_bytes': 0,
'write_requests': 0,
'errors_count': 0}],
'memory_details': {'maximum': 512, 'used': 256},
'driver_private_data': {'version': 1,
'memory': {'actual': 220160,
'rss': 200164}}
Alternatives¶
Continue with the same format that the V2 has. This is problematic as we are unable to build common user interface that can query VM states, for example in tempest.
We can add an extension to the V2 API that will enable us to return the information defined in this spec.
Data model impact¶
None
REST API impact¶
The V3 diagnostics API will no longer return data defined by the driver but it will return common data defined in this spec.
Security impact¶
None
Notifications impact¶
None
Other end user impact¶
None
Performance Impact¶
None
Other deployer impact¶
It will make life easier - deployers will be able to get better insight into the state of VM and be able to troubleshoot.
We should consider adding this support for V2. In order to support backward compatibility we can add a configuration flag. That is, we can introduce a flag for the legacy format.
Developer impact¶
None
Implementation¶
Assignee(s)¶
- Primary assignee:
Gary Kotton - garyk
- Other contributors:
Bob Ball - bob-ball
Work Items¶
All work items were in review Icehouse. They were broken up as follows:
VM diagnostics (v3 API only)
XenAPI
libvirt
VMware
Dependencies¶
None
Testing¶
Once the code is approved we will add tests to Tempest that will do the following for the V3 API (assuming that the underlying driver does not return NotImplemented (501), which may be the case if the driver does not support the method):
Check that the returned driver is one of the supported ones in tree (at the moment only libvirt, vmware and xenapi support the v3 method).
Check that the number of CPU’s matches the flavor.
Check that the disk data matches the flavor.
Check that the memory matches the flavor.
If a cinder volume has been attached then we check that there is the correct amount of disks attached.
Check that the number of vNics matches the instance running.
If the private data is present then check that this is a dictionary and has a key ‘version’.
In addition to this, if there are tests that fail then we can use the V3 diagnostics to help debug. That is, we can get the diagnostics which may help isolate problems.
Documentation Impact¶
We can now at least document the fields that are returned and their meaning.
If we do decide to update the v2 support we will need to update:
Please also update: http://docs.openstack.org/user-guide-admin/common/nova_show_usage_statistics_for_hosts_instances.html http://docs.openstack.org/user-guide/content/usage_statistics.html http://docs.openstack.org/user-guide/content/novaclient_commands.html http://docs.openstack.org/trunk/openstack-ops/content/lay_of_the_land.html#diagnose-compute
We will need to make sure that we update all of the equivalent v3 docs. The information in the tables above will be what we add to the documentation.
References¶
https://wiki.openstack.org/wiki/Nova_VM_Diagnostics https://bugs.launchpad.net/nova/+bug/1240043