New Cyborg Generic Driver Proposal¶
This spec proposes to provide the initial design for Cyborg generic device driver.
Problem description¶
Currently, the FPGA and GPU have been supported in Cyborg, but the capability for the generic accelerator is not supported yet.
In general, these generic devices are the specific accelerators in some specific scenarios. For example:
The AI chips. which can be used for AI training and inference.
The security accelerator, which can be used for encryption and decryption.
In order to add the support for these specific accelerators, we propose to improve the generic driver to manage these devices. We propose to improve the existing GenericDriver.
Use Cases¶
As an AI chip vendor, I want to add the driver in Cyborg, but the existing driver type doesn’t meet our requirement. We hope the driver can provide the firmware upgrade, device configure, device stats query.
As a security accelerator vendor, I want add the driver in Cyborg with security accelerator configure and device stats query
Proposed change¶
1. The Generic driver change
As the initial version, the new GenericDriver
should include the following
attributes:
VENDOR: the vendor name of the driver.
TYPE: the type of the driver, such as, “FPGA”, “GPU”
and the following interfaces also should be included:
discover() Discover specific accelerator
update(control_path, image_path): Upgrade the device firmware with a specific image. control_path: the image update control path of device. image_path: The image path of the firmware binary, the image would be downloaded by Cyborg agent.
get_stats(): Collects device stats. The
get_stats
method is used to collect information from the device about the device capabilities. Such as performance info like temprature, power, volt, packet_count info. The response of get_stats should follow the current Cyborg device-deploy-accelerator model.A real stats look like:
{ "device": { "device_name": "XXX", # Standard properties "device_number": "RFD1644N48373", # Standard properties "properties": { # Vendor/Custom properties "id": "1", "temperature": "26", "volt": "", "packet_count": "", "memeory": { "model": "DDR4", "description": "" }, "board": { }, "flash": { } } "deploy": [ { "accelerator": { "acc_name": "", # Standard properties "properties": { # Vendor/Custom properties } } }, { "accelerator": { "acc_name": "", # Standard properties "properties": { # Vendor/Custom properties } } } ] } }
This GenericDriver
should not be used directly, after adding this
GenericDriver
, for every new device, we need to introduce a new driver
that inherits from this GenericDriver
.
2. Existing driver change We should also improve FPGA and GPU driver to inherit from this driver and implements the base driver interface.
The generic FPGA driver interface:
The discover
, update
, get_accelerator_stats
should be implemented
in the FPGA driver. And the below interface is the FPGA specifc interface:
program(controlpath, image_path) Program the FPGA with the provided bitstream image.
The generic GPU driver interface:
The discover
, update
, get_accelerator_stats
should be implemented
in the GPU driver.
Alternatives¶
None
Data model impact¶
None
REST API impact¶
None
Security impact¶
None
Notifications impact¶
None
Other end user impact¶
None
Performance Impact¶
None
Other deployer impact¶
None
Developer impact¶
None
Implementation¶
Assignee(s)¶
Primary assignee: Yikun Jiang <yikunkero@gmail.com> Sundar Nadathur <sundar.nadathur@intel.com> wangzhh <wangzh21@lenovo.com>
Work Items¶
Improve the generic driver for generic device
Improve the existing FPGA driver
Improve the existing GPU driver
Dependencies¶
Testing¶
Documentation Impact¶
None
References¶
None
History¶
Release Name |
Description |
---|---|
Train |
Introduced |