Cyborg FPGA Model Proposal

Blueprint url is not available yet https://blueprints.launchpad.net/openstack-cyborg/+spec/cyborg-fpga-modelling

This spec proposes the DB modelling schema for tracking reprogrammable resources

Problem description

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing. Their advantage lies in that they are sometimes significantly faster for some applications because of their parallel nature and optimality in terms of the number of gates used for a certain process. Hence, using FPGA for application acceleration in cloud has been becoming desirable. Cyborg as a management framwork for heterogeneous accelerators, tracking and deploying FPGAs are much needed features.

Use Cases

When user requests FPGA resources, scheduler will use placement agent 1 to select appropriate hosts that have the requested FPGA resources.

When a FPGA type resource is allocated to a VM, Cyborg needs to track down which exact device has been assigned in the database. On the other hand, when the resource is released, Cyborg will need to be detached and free the exact resource.

When a new device is plugged in to the system(host), Cyborg needs to discover it and store it into the database

Proposed change

We need to add 2 more tables to Cyborg database, one for tracking all the deployables and one for arbitrary key-value pairs of deplyable associated attirbutes. These tables are named as Deployables and Attributes.

Deployables table consists of all the common attributes columns as well as a parent_id and a root_id. The parent_id will point to the associated parent deployable and the root_id will point to the associated root deployable. By doing this, we can form a nested tree structure to represent different hierarchies. In addition, there will a foreign key named accelerator_id reference to the accelerators table. For the case where FPGA has not been loaded any bitstreams on it, they will still be tracked as a Deployable but no other Deployables referencing to it. For instance, a network of FPGA hierarchies can be formed using deployables in following scheme:

                        -------------------
    ------------------->|Deployable - FPGA|<--------------------
    |                   -------------------                    |
    |                           /\                             |
    | root_id                  /  \  parent_id/root_id         |
    |                         /    \                           |
    |         -----------------    -----------------           |
    |         |Deployable - PF|    |Deployable - PF|           |
    |         -----------------    -----------------           |
    |               /\                                         |
    |              /  \  parent_id                     root_id |
    |             /    \                                       |
-----------------      -----------------                       |
|Deployable - VF|      |Deployable - VF| -----------------------
-----------------      -----------------

Attributes table consists of a key and a value columns to represent arbitrary k-v pairs.

For instance, bitstream_id and function kpi can be tracked in this table. In addition, a foreign key deployable_id refers to the Deployables table and a parent_attribute_id to form nested structured attribute relationships.

Cyborg needs to have object classes to represent different types of deployables(e.g. FPGA, Physical Functions, Virtual Functions etc).

Cyborg Agent needs to add feature to discover the FPGA resources from FPGA driver and report them to the Cyborg DB through the conductor.

Conductor needs to add couple of sets of APIs for different types of deployable resources.

Alternatives

Alternativly, instead of having a flat table to track arbitrary hierarchies, we can use two different tables in Cyborg database, one for physical functions and one for virtual functions. physical_functions should have a foreign key constraint to reference the id in Accelerators table. In addition, virtual_functions should have a foreign key constraint to reference the id in physical_functions.

The problems with this design are as follows. First, it can only track up to 3 hierarchies of resources. In case we need to add another layer, a lot of migaration work will be required. Second, even if we only need to add some new attribute to the existing resource type, we need to create new migration scripts for them. Overall the maintenance work is tedious.

Data model impact

As discussed in previous sections, two tables will be added: Deployables and Attributes:

CREATE TABLE Deployables
  (
    id           INTEGER NOT NULL ,     /*Primary Key*/
    parent_id    INTEGER ,              /*Pointer to the parent deployable's primary key*/
    root_id      INTEGER ,              /*Pointer to the root deployable's primary key*/
    name         VARCHAR2 (32 BYTE) ,   /*Name of the deployable*/
    pcie_address VARCHAR2 (32 BYTE) ,   /*pcie address which can be used for passthrough*/
    uuid         VARCHAR2 (32 BYTE) ,   /*uuid v4 format for the deployable itself*/
    node_id      VARCHAR2 (32 BYTE) ,   /*uuid v4 format to identify which host this deployable is located*/
    board        VARCHAR2 (16 BYTE) ,   /*Identify the model of the deployable(e.g. KU115)*/
    vendor       VARCHAR2 (16 BYTE) ,   /*Identify the vendor of the deployable(e.g. Xilinx)*/
    version      VARCHAR2 (32 BYTE) ,   /*Identify the version of the deployable(e.g. 1.2a)*/
    type         VARCHAR2 (32) ,        /*Identify the type of the deployable(e.g. FPGA/PF/VF)*/
    assignable   CHAR (1) ,             /*Represent if the deployable can be assigned to users*/
    instance_id  VARCHAR2 (32 BYTE) ,   /*Represent which instance this deployable has been assigned to*/
    availability INTEGER NOT NULL,      /*enum type to represent the status of the deployable(e.g. acclocated/claimed)*/
    accelerator_id INTEGER NOT NULL     /*foreign key references to the accelerator table*/
  ) ;
ALTER TABLE Deployables ADD CONSTRAINT Deployables_PK PRIMARY KEY ( id ) ;
ALTER TABLE Deployables ADD CONSTRAINT Deployables_accelerators_FK FOREIGN KEY ( accelerator_id ) REFERENCES accelerators ( id ) ;


CREATE TABLE Attributes
  (
    id            INTEGER NOT NULL ,    /*Primary Key*/
    deployable_id INTEGER NOT NULL ,    /*foreign key references to the Deployables table*/
    KEY CLOB ,                          /*Attribute Key*/
    value CLOB ,                        /*Attribute Value*/
    parent_attribute_id INTEGER         /*Pointer to the parent attribute's primary key*/
  ) ;
ALTER TABLE Attributes ADD CONSTRAINT Attributes_PK PRIMARY KEY ( id ) ;
ALTER TABLE Attributes ADD CONSTRAINT Attributes_Deployables_FK FOREIGN KEY ( deployable_id ) REFERENCES Deployables ( id ) ON
DELETE CASCADE ;

RPC API impact

Two sets of conductor APIs need to be added. 1 set for physical functions, 1 set for virtual functions

Physical function apis:

def physical_function_create(context, values)
def physical_function_get_all_by_filters(context, filters, sort_key='created_at', sort_dir='desc', limit=None, marker=None, columns_to_join=None)
def physical_function_update(context, uuid, values, expected=None)
def physical_function_destroy(context, uuid)

Virtual function apis:

def virtual_function_create(context, values)
def virtual_function_get_all_by_filters(context, filters, sort_key='created_at', sort_dir='desc', limit=None, marker=None, columns_to_join=None)
def virtual_function_update(context, uuid, values, expected=None)
def virtual_function_destroy(context, uuid)

REST API impact

Since these tables are not exposed to users for modifying/adding/deleting, Cyborg will only add two extra REST APIs to allow user query information related to deployables and their attributes.

API for retrieving Deployable’s information:

Url: {base_url}/accelerators/deployable/{uuid}
Method: GET
URL Params:
    GET: uuid --> get deplyable by uuid

Data Params:
    None

Success Response:
    GET:
        Code: 200
        Content: { deployable: {id : 12, parent_id: 11, root_id: 10, ....}}

Error Response
    Code: 401 UNAUTHORIZED
    Content: { error : "Log in" }
    OR
    Code: 422 Unprocessable Entry
    Content: { error : "deployable uuid invalid" }

Sample Call:
    To get the deployable with uuid=2864a139-c2cd-4f9f-abf3-44eb3f09b83c
    $.ajax({
      url: "/accelerators/deployable/2864a139-c2cd-4f9f-abf3-44eb3f09b83c",
      dataType: "json",
      type : "get",
      success : function(r) {
        console.log(r);
      }
    });

API for retrieving list of Deployables with filters/attirbutes:

Url: {base_url}/accelerators/deployable
Method: GET
URL Params:
    None

Data Params:
    k-v pairs for filtering

Success Response:
    GET:
        Code: 200
        Content: { deployables: [{id : 12, parent_id: 11, root_id: 10, ....}]}

Error Response
    Code: 401 UNAUTHORIZED
    Content: { error : "Log in" }
    OR
    Code: 422 Unprocessable Entry
    Content: { error : "deployable uuid invalid" }

Sample Call:
    To get a list of FPGAs with no bitstream loaded.
    $.ajax({
      url: "/accelerators/deployable",
      data: {
        "bitstream_id": None,
        "type": "FPGA"
      },
      dataType: "json",
      type : "get",
      success : function(r) {
        console.log(r);
      }
    });

API for retrieving Deployable attributes’ information:

Url: {base_url}/accelerators/deployable/{uuid}/attribute/{key}
Method: GET
URL Params:
    GET: uuid --> uuid for the associated deployable
         key  --> key for the associated deployable

Data Params:
    None

Success Response:
    GET:
        Code: 200
        Content: { attribute: {key : value}}

Error Response
    Code: 401 UNAUTHORIZED
    Content: { error : "Log in" }
    OR
    Code: 422 Unprocessable Entry
    Content: { error : "attirbute key invalid" }

Sample Call:
    To get the value of key=kpi for deployable with id=2864a139-c2cd-4f9f-abf3-44eb3f09b83c
    $.ajax({
      url: "/accelerators/deployable/2864a139-c2cd-4f9f-abf3-44eb3f09b83c/attribute/kpi",
      dataType: "json",
      type : "get",
      success : function(r) {
        console.log(r);
      }
    });

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

None

Other deployer impact

None

Developer impact

There will be new functionalities available to the dev because of this work.

Implementation

Assignee(s)

Primary assignee:

Li Liu <liliu1@huawei.com>

Work Items

  • Create migration scripts to add two more tables to the database

  • Create models in sqlalchemy as well as related conductor APIs

  • Create corespoinding objects

  • Create Conductor APIs to allow resourece reporting

Dependencies

Testing

  • Unit tests will be added test Cyborg generic driver.

Documentation Impact

Document FPGA Modelling in the Cyborg project

References

1

https://docs.openstack.org/nova/latest/user/placement.html

History

Revisions

Release

Description

Queens

Introduced