Promote agent vendor passthru to core API¶
https://bugs.launchpad.net/ironic/+bug/1570841
This spec suggests making the current agent vendor passthru API (lookup and heartbeat) first class API endpoints and deprecate the agent vendor passthru interface.
Problem description¶
The vendor passthru was designed for vendors to place their specific features before they get wider adoption in Ironic and get promoted to the core API.
However, when IPA is used (which is the only ramdisk available in-tree), these two API endpoints play the critical role in both deployment and cleaning processes. Thus every IPA-based driver must mix agent vendor passthru into its vendor passthru.
There was a bug in the drac driver when it was not done, and the driver did not work with IPA.
This proposal also tries to reduce the amount of data sent to and from unauthenticated endpoints. The current vendor passthru API accepts the whole inventory and returns the whole node record, including IPMI credentials.
Proposed change¶
Create new API endpoints for lookup and heartbeat - see REST API impact for details.
Extend the deploy interface with a heartbeat method - see Driver API impact for details.
Alternatives¶
Continue doing what we do now.
Make the lookup call driver-dependent (as the passthru used to be).
This looks like unnecessary complication (e.g. we have to pass a driver to IPA from the conductor nowadays).
We could use this change to move the unauthenticated endpoints away from the main API completely. This could be done by introducing a new API service, say
ironic-agent-api
, serving only these two endpoints. Then we will recommend operators to make this service only listen on the provisioning network, but not on the network accessible to users.Arguably the same result can be achieved by configuring a WSGI container (Apache mod_wsgi or similar), so it might not be worth complication.
Data model impact¶
None
State Machine Impact¶
None
REST API impact¶
Two new endpoints are added. Both endpoints are NOT authenticated.
GET /v1/lookup?addresses=MAC1,MAC2&node_uuid=UUID
Look up node details for further use in the ramdisk.
No body; at least one of the following URL parameters must be present:
addresses
comma-separated list of hardware addresses (e.g. MAC) from the node for lookup;node_uuid
node UUID, if known (e.g. by inspection).
If
node_uuid
is present,addresses
are ignored.By default only return a node if it’s in one of transient states:
deploying
,deploy wait
,cleaning
,clean wait
,inspecting
,inspect wait
. Deployers who need lookup to always work will be able to set a new option[api]restrict_lookup
toFalse
.Note
In theory, we don’t need
-ing
states here either. But when we reboot during cleaning, we don’t currently reset the state toclean wait
. The other-ing
states are supported in case 3rd party drivers have similar restrictions.Response: HTTP 200 with JSON body containing keys:
config
dictionary for passing configuration options from conductor to the ramdisk. For the IPA ramdisk only one is currently used:heartbeat_timeout
timeout (in seconds) between heart beats from the ramdisk, expected by Ironic.
node
partial node representation as a JSON object, with the following fields sent:properties
for root device hints,instance_info
for disk sizing details,uuid
node UUID,driver_internal_info
for passing other runtime information.
More fields can be exposed with time with an appropriate API version bump.
Error codes:
400 - bad request,
404 - a node was not found.
POST /v1/heartbeat/<UUID>
Record a heartbeat message from the ramdisk.
Body is a JSON with fields:
callback_url
- the IPA URL to call back. Note that for potential non-IPA-based drivers it might have a different meaning (e.g. if we agree on the ansible driver, this can be an SSH “URL” for it).
Response: HTTP 202 with no body.
Error codes:
400 - bad request,
404 - a node was not found,
409 - node is locked (should be retried by the ramdisk).
A new API version will be introduced to cover both endpoints.
Client (CLI) impact¶
Both endpoints will be exposed in the Python API for the ironic client as:
ironic.node.lookup(addresses, node_uuid=None)
ironic.node.heartbeat(node_uuid, callback_url)
However, as they are not intended for end users, they will not be exposed in both CLI.
“ironic” CLI¶
None
“openstack baremetal” CLI¶
None
RPC API impact¶
A new RPC call is created to connect the heart beat API endpoint and
the new deploy driver method: heartbeat
(async).
Driver API impact¶
A new method is added to the deploy driver interface:
def heartbeat(self, task, callback_url):
"""Record a heart beat for the node.
:param task: a task manager task
:param callback_url: a URL to use to call to the ramdisk
:return: None
"""
LOG.warning('Got heartbeat message from node %(node)s, but the driver '
'%(driver)s does not support heartbeating',
{'node': task.node.uuid, 'driver': task.node.driver})
The heartbeat method from BaseAgentVendor
will be refactored to a separate
mix-in class for reusing in both AgentDeploy
and BaseAgentVendor
.
The new method will not be abstract to allow drivers that use a different approach (e.g. which do not have a ramdisk at all). The default implementation will do nothing to account for deploy drivers which do not need heart beats.
The new method will receive a shared node lock. It is up to the implementation to upgrade the lock to exclusive, if required.
Nova driver impact¶
None
Ramdisk impact¶
None
Security impact¶
This change will expose unauthenticated API to lookup a node by its MAC addresses. It does not have any impact on most deployments, as both in-tree deploy methods (iscsi and http) already expose such API.
After the complete switch to the new API endpoints is finished, it will no longer be possible to fetch the whole node knowing its MAC address without authentication. Only limited fields will be available. Notably, the power credentials are not sent in the new API endpoints.
We should clearly note that any deploy implementation should treat the incoming data in the new
heartbeat
call with care. Particularly, no sensitive information should be ever sent to the endpoint designated by thecallback_url
parameter.
Other end user impact¶
None
Scalability impact¶
None
Performance Impact¶
Unlike the old lookup passthru, the new lookup endpoint will not use RPC, lowering load on the message queue and the conductor.
Other deployer impact¶
An update of the IPA image will be recommended to make it use the new API.
New option
restrict_lookup
in theapi
section (boolean, defaults toTrue
) - whether to restrict the new lookup API to only certain states in which lookup is expected.
Developer impact¶
3rd party driver developers should stop using the BaseAgentVendor
class
in their drivers and just use the AgentDeploy
class.
3rd party drivers should document whether they require the restrict_lookup
option to be False
for correct functioning.
Implementation¶
Assignee(s)¶
Dmitry Tantsur (lp: divius, irc: dtanstur)
Jim Rollenhagen (irc: jroll)
Work Items¶
Create new deploy interface methods
Implement them in the AgentDeploy
Create new RPC calls and API endpoints
Switch IPA to use the new endpoints, and fall back to old ones on failure
Dependencies¶
None
Testing¶
Testing will be conducted as part of the current gate tests.
Upgrades and Backwards Compatibility¶
Full backward compatibility will be guaranteed independent of upgrade order between IPA and ironic itself.
The BaseAgentVendor
class will be deprecated, but stay for some time,
following the usual deprecation policy. Old IPA images will be able to run by
using the old passthru API.
The new IPA image will try to hit the new endpoints first, and will fall back to the old ones on getting HTTP 406 Not Acceptable (meaning, the API version is not supported).
Documentation Impact¶
Document how to implement new deploy drivers with the new heartbeat method.
Document the potential security issues with both endpoints.