Nova - Cyborg Interaction¶
https://blueprints.launchpad.net/nova/+spec/nova-cyborg-interaction
This specification describes the Nova - Cyborg interaction needed to create and manage instances with accelerators, and the changes needed in Nova to accomplish that.
Problem description¶
Scope¶
Nova and Cyborg need to interact in many areas for handling instances with accelerators. While this spec covers the gamut, specific areas are covered in detail in other specs. We list all the areas below, identify which specific parts are covered by other specs, and describe what is covered in this spec.
Representation: Cyborg shall represent devices as nested resource providers under the compute node (except possibly for disaggregated servers), accelerator types as resource classes and accelerators as inventory in Placement. The properties needed for scheduling are represented as traits. This is specified by [1]. This spec does not dwell on this topic.
Discovery and Updates: Among the devices discovered in a host, Cyborg intends to claim only those that are not included under the PCI Whitelisting mechanism. Cyborg shall update Placement in a way that is compatible with the virt driver’s update of Placement. These aspects are addressed in sections Coexistence with PCI whitelists and Placement update respectively.
User requests for accelerators: Users usually request compute resources via flavors. However, since the requests for devices may be highly varied, placing them in flavors may result in flavor explosion. We avoid that by expressing device requests in a device profile [2] . The relationship between device profiles and flavors is explored in Section User requests.
When an instance creation (boot) request is made, the contents of a device profile shall be translated to request groups in the request spec; the syntax in request groups is covered in Section Updating the Request Spec.
Instance scheduling: Nova shall use the Placement data populated by Cyborg to schedule instances. This spec does not dwell on this topic.
Assignment of accelerators: We introduce the concept of Accelerator Request objects in Section Accelerator Requests. The workflow to create and use them is summarized in Section Nova changes for Assignment workflow. The same section also highlights the Nova changes needed. The details of the Cyborg API implementation for this workflow is covered in Cyborg specs ([3]).
Instance operations: The behavior with respect to accelerators for all standard instance operations are defined in [4]. This spec does not dwell on this topic.
Use Cases¶
A user requests an instance with one or more accelerators of different types assigned to it.
An operator may provide users with both Device as a Service or Accelerated Function as a Service in the same cluster (see [1]).
The following use cases are not addressed in Train but are of long term interest:
A user requests to add one or more accelerators to an existing instance.
Live migration with accelerators.
Proposed change¶
Coexistence with PCI whitelists¶
The operator tells Nova which PCI devices to claim and use by configuring the PCI Whitelists mechanism. In addition, the operator installs Cyborg drivers in compute nodes and configures/enables them. Those drivers may then discover and report some PCI devices. The operator must ensure that both configurations are compatible.
Ideally, there should be a single way for the operator to identify which PCI devices should be claimed by Nova and which by Cyborg. This could be along the lines suggested in [5] or [6]. If such a mechanism could be agreed upon by all stakeholders, Cyborg could adopt it.
Until that point, the operator tells Cyborg which devices to claim by using Cyborg’s configuration file. The operator must ensure that this is compatible with the PCI whitelists configured in Nova.
Placement update¶
Cyborg shall call Placement API directly to represent devices and accelerators. Some of the intended use cases for the API invocation are:
Create or delete child RPs under the compute node RP.
Create or delete custom RCs and custom traits.
Associate traits with RPs or remove such association.
Update RP inventory.
Cyborg shall not modify the RPs created by any other component, such as Nova virt drivers.
User requests¶
The user request for accelerators is encapsulated in a device profile [2], which is created and managed by the admin via the Cyborg API.
A device profile may be viewed as a ‘flavor for devices’. Accordingly, the instance request should include both a flavor and a device profile. However, that requires a change to the Nova API for instance creation. To mitigate the impact of such changes on users and operators, we propose to do this in phases.
In the initial phase, Nova API remains as today. The device profile is folded into the flavor as an extra spec by the operator, as below:
openstack flavor set --property 'accel:device_profile=<profile_name>' flavor
Thus the standard Nova API can be used to create an instance with only the flavor (without device profiles), like this:
openstack server create --flavor f .... # instance creation
In the future, device profile may be used by itself to specify accelerator resources for the instance creation API.
Updating the Request Spec¶
When the user submits a request to create an instance, as described in Section User requests, Nova needs to call a Cyborg API, to get back the resource request groups in the device profile and merge them into the request spec. (This is along the lines of the scheme proposed for Neutron [7].)
This call, like all the others that Nova would make to Cyborg APIs, is done through a Keystone-based adapter that would locate the Cyborg service, similar to the way Nova calls Placement. A new Cyborg client module shall be added to Nova, to encapsulate such calls and to provide Cyborg-specific functionality.
VM images in Glance may be associated with image properties (other than image traits), such as bitstream/function IDs needed for that image. So, Nova should pass the VM image UUID from the request spec to Cyborg. This is TBD.
The groups in the device profile are numbered by Cyborg. The request groups that are merged into the request spec are numbered by Nova. These numberings would not be the same in general, i.e., the N-th device profile group may not correspond to the N-th request group in the request spec.
When the device profile request groups are added to other request groups in
the flavor, the group_policy
of the flavor shall govern the overall
semantics of all request groups.
Accelerator Requests¶
An accelerator request (ARQ) is an object that represents the state of the request for an accelerator to be assigned to an instance. The creation and management of ARQs are handled by Cyborg, and ARQs are persisted in Cyborg database.
An ARQ, by definition, represents a request for a single accelerator. The
device profile in the user request may have N request groups, each asking for
M accelerators; then N * M
ARQs will be created for that device profile.
When an ARQ is initially created by Cyborg, it is not yet associated with a specific host name or a device resource provider. So it is said to be in an unbound state. Subsequently, Nova calls Cyborg to bind the ARQ to a host name, a device RP UUID and an instance UUID. If the instance fails to spawn, Nova would unbind the ARQ without deleting it. On instance termination, Nova would delete the ARQs after unbinding them.
Each ARQ needs to be matched to the specific RP in the allocation candidate
that Nova has chosen, before the ARQ is bound. Since Placement does not match
RPs to request groups, this must be done in the Cyborg client module of Nova
(cyborg-client-module). The matching is done using the requester_id field
in the RequestGroup
object ([8]) as below:
The order of request groups in a device profile is not significant, but it is preserved by Cyborg. Thus, each device profile request group has a unique index.
When the device profile request groups returned by Cyborg are added to the request spec, the requester_id field is set to ‘device_profile_<N>’ for the N-th device profile request group (starting from zero). The device profile name need not be included here because there is only one device profile per request spec.
When Cyborg creates an ARQ for a device profile, it embeds the device profile request group index in the ARQ before returning it to Nova.
The matching is done in two steps:
Each ARQ is mapped to a specific request group in the request spec using the requester_id field.
Each request group is mapped to a specific RP using the same logic as the Neutron bandwidth provider ([9]).
Nova changes for Assignment workflow¶
This section summarizes the workflow details for Phase 1. The changes needed in Nova are marked with NEW.
NEW: A Cyborg client module is added to nova (cyborg-client-module). All Cyborg API calls are routed through that.
The Nova API server receives a
POST /servers
API request with a flavor that includes a device profile name.NEW: The Nova API server calls the Cyborg API
GET /v2/device_profiles?name=$device_profile_name
and gets back the device profile request groups. These are added to the request spec.The Nova scheduler invokes Placement and gets a list of allocation candidates. It selects one of those candidates and makes claim(s) in Placement. The Nova conductor then sends a RPC message
build_and_run_instances
to the Nova compute manager.NEW: Nova calls the Cyborg API
POST /v2/accelerator_requests
with the device profile name. Cyborg creates a set of unbound ARQs for that device profile and returns them to Nova. (The call may originate from Nova conductor or the compute manager; that will be settled in code review.)NEW: The Cyborg client in Nova matches each ARQ to the resource provider picked for that accelerator. See match-rp.
NEW: The Nova compute manager calls the Cyborg API
PATCH /v2/accelerator_requests
to bind the ARQ with the host name, device’s RP UUID and instance UUID. This is an asynchronous call which prepares or reconfigures the device in the background.NEW: Cyborg, on completion of the bindings (successfully or otherwise), calls Nova’s
POST /os-server-external-events
API with:{ "events": [ { "name": "arq_resolved", "tag": $arq_uuid, "server_uuid": $instane_uuid, "status": "ok" # or "failed" }, ... ] }
NEW: The Nova virt driver waits for the notification, subject to the timeout mentioned in Section Other deployer impact. It then calls the Cyborg REST API
GET /v2/accelerator_requests?instance=<uuid>&bind_state=resolved
.NEW: The Nova virt driver uses the attach handles returned from the Cyborg call to compose PCI passthrough devices into the VM’s definition.
NEW: If there is any error after binding has been initiated, Nova must unbind the relevant ARQs by calling Cyborg API. It may then retry on another host or delete the (unbound) ARQs for the instance.
This flow is captured by the following sequence diagram, in which the Nova conductor and scheduler are together represented as the Nova controller. The ARQ creation is shown to happen in Nova compute manager only for concreteness; it may be in the controller instead.
Alternatives¶
It is possible to have an external agent create ARQs from device profiles by calling Cyborg, and then feed those pre-created ARQs to the Nova instance creation API, analogous to Neutron ports. We do not take that approach yet because it requires changes to Nova instance creation API.
It is possible to have the Nova virt driver poll for the Cyborg ARQ binding completion. That is not preferable, partly because that is not the pattern of interaction with other services like Neutron.
Data model impact¶
None
REST API impact¶
None. A new extra_spec key accel:device_profile_name
is added to
the flavor.
Security impact¶
None
Notifications impact¶
Nova may choose to add additional notifications for Cyborg API calls.
Other end user impact¶
None
Performance Impact¶
The extra calls to Cyborg REST API may potentially impact Nova conductor/scheduler throughput. This has been mitigated by making some critical Cyborg operations as asynchronous tasks.
Other deployer impact¶
The deployer needs to set up the clouds.yaml
file so that Nova
can call the Cyborg REST API.
The deployer needs to configure a new tunable in nova-cpu.conf
:
* arq_binding_timeout (integer): Time in seconds for Nova compute
manager to wait for Cyborg to notify that ARQ binding is done.
Timeout is fatal, i.e., VM startup is aborted with an exception.
Default: 300.
Developer impact¶
Define two new standard resource classes: FPGA and PGPU.
We have VGPU and VGPU_DISPLAY_HEAD RCs defined already. But we propose a PGPU as a different RC for the following reasons:
Both VGPU and VGPU_DISPLAY_HEAD RCs specifically refer to virtual GPUs. We need a different one for physical GPUs.
It will be subject to separate quotas/limits in Keystone.
Using PCI_DEVICE RC is too general: we want quotas for GPU RC specifically.
Upgrade impact¶
None
Implementation¶
Assignee(s)¶
Sundar Nadathur
Work Items¶
See the steps marked NEW in Nova changes for Assignment workflow section.
Dependencies¶
Testing¶
There need to be unit tests and functional tests for the Nova changes. Specifically, there needs to be a functional test fixture that mocks the Cyborg API calls.
There need to be tempest tests for the end-to-end flow, including failure modes. The tempest tests should be targeted at a fake driver (in addition to real hardware, if any) and tied to the Nova Zuul gate.
Documentation Impact¶
Device profile creation needs to be documented in Cyborg, as noted in [2].
The need for operator to fold the device profile into the flavor needs to be documented.
References¶
History¶
Release Name |
Description |
---|---|
Train |
Introduced |