Active Node Creation

https://bugs.launchpad.net/ironic/+bug/1526315

This spec is intended to allow a slightly more permissive interaction with the ironic API to allow an operator to migrate a hardware fleet to be managed by ironic.

Problem description

At present the ironic API explicitly sets the new state for nodes to the beginning step in the ironic workflow.

As part of hardware fleet lifecycle management, an operator expects to be able to migrate inventory and control systems for their hardware fleet utilizing their existing inventory data and allocation records. Ultimately this means that an imported host MAY already be allocated and unavailable for immediate allocation.

For an operator of multiple distinct OpenStack infrastructures, it is reasonable to permit an operator to migrate running baremetal hosts from one “system” to another “system” that are ultimately components in a larger infrastructure, while not immediately reprovisioning hardware.

Proposed change

Allow an API client to transition a node directly from the MANAGEABLE state to the ACTIVE state, bypassing actual deployment of the node.

  • Creation of a new API provision_state verb of adopt that invokes the state transition of ADOPTING.

  • Creation of a new machine state transition of ADOPTING which is valid only in the MANAGEABLE state and allows an operator to directly move a node to active state. This transition would be dependent upon a successful interface validation. Failure of this transition shall move nodes to an ADOPTFAIL which will allow users to be able to identify nodes that failed adoption.

  • Creation of a new machine state of ADOPTFAIL which a machine is set to upon the ADOPTING transition failing. This state will allow a user to re-attempt ADOPTING via adopt, or attempt to transition the node to the MANAGEABLE state via manage. Additionally, the ADOPTFAIL state will be listed in the list of states that permit node deletion from the ironic database.

  • API client update to provide CLI interface to invoke this feature.

  • Creation of explicit documentation covering:

    - Use cases of the feature while explicitly predicating that proper
      operation requires node validation to succeed.
    - Explicitly detail that it is the operator's responsibility to
      define the node with all relevant appropriate configuration else
      the node could fail node state provision operations of ``rebuild``
      and ``delete``. Which would result in manual intervention being
      necessary.
    - Explain the basic mechanics of the use of the adoption feature
      to users in order to help convey the importance of the correct
      information being populated.
    

Alternatives

The logical alternative is to remove restrictions in what an API client posts to allow the caller to explicitly state or invoke a node to be created in ACTIVE state. As the community desires full functionality of the node to exist upon being imported along with driver interface validation, the implementation appears to lend itself to be implemented as a state transition instead of pure API logic.

Alternatively, we could craft operator documentation to help assist operators in directly loading nodes into the ironic database, coupled with the caveats of doing so, and require that documentation is updated in lock-step with any database schema changes.

Data model impact

None

State Machine Impact

Implementation of a new state transition from MANAGEABLE state to ACTIVE state utilizing an intermediate state of ADOPTING which takes the following actions.

  1. Triggering the conductor node take_over logic.

  2. Upon completion the node state is set to ACTIVE.

Should a failure of take_over logic occur in the ADOPTING state, the node will be returned to ADOPTFAIL state from which a user will be able to retry the adoption operation or delete the node.

Addition of ADOPTFAIL into the DELETE_ALLOWED_STATES list.

REST API impact

Addition of a new state verb of adopt that triggers a transition to ADOPTING state. This verb will be unavailable for clients invoking an insufficent API version.

The API micro-version will need to be incremented as a result of this change.

Client (CLI) impact

Update of the ironicclient CLI to detail that adopt is a possible provision state option.

Update of the ironicclient micro-version.

RPC API impact

None

Driver API impact

None

Nova driver impact

None

Ramdisk impact

N/A

Security impact

None

Other end user impact

None

Scalability impact

None

Performance Impact

Minimal API impact will exist for a user of this feature as the creation of nodes in ACTIVE state will require multiple calls with the API by any user attempting to leverage this feature.

Users performing bulk loads of hosts may find the multiple API calls somewhat problematic from the standpoint of multiple API calls to create, validate, and adopt a node, on top of API calls to poll the current state of the node before proceeding to the next step. Bulk loaders should also be congnizant of their configurations as they potentially could result in the conductors consuming disk space and network bandwidth if items need to be staged on the conductor to support the node’s normal operation.

Other deployer impact

Allows for an easier adoption by managers of pre-existing hardware fleets.

There is the potential that a operator could define a hardware fleet with bare minimal configuration to initially add the node to ironic. The result of which means that an operator could conceivably and inadvertently act upon a node when insufficent information is defined. This risk will be documented as part of the resulting documentation in order to help highlight the risk and help provide guidance on preventing such a possibility should a user be attempting to adopt an inventory that is already “cloudy”.

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

juliaashleykreger

Other contributors:

None

Work Items

  • Conductor State Machine Changes

  • API microversion and update and appropriate logic

  • CLI node-set-provision-state option addition

  • Documentation updates

Dependencies

None

Testing

Addition of unit tests as well as tempest tests to explicitly test the interface.

Upgrades and Backwards Compatibility

This feature will not be usable by older API clients via the API micro-version interface.

Documentation Impact

Documentation will need to be updated to represent this new feature.

References

None