This blueprint extends ML2 port binding to support hierarchical network topologies.
Note to reviewers: A similar blueprint was accepted for the Juno release and successfully implemented, but the review was not completed early enough in the cycle for the code to be merged. This specification has been significantly updated to reflect decisions made during implementation and in response to review comments, as well as to improve understandability. It describes the final state of the code reviewed during Juno. Other than rebasing, no significant changes should be required to the previously reviewed code.
The ML2 plugin does not adequately support hierarchical network topologies. A hierarchical virtual network uses different network segments, potentially of different types (VLAN, VXLAN, GRE, proprietary fabric, ...), at different levels. It might be made up of one or more top-level static network segments along with dynamically allocated network segments at lower levels. For example, virtual network traffic between ToR and core switches could be encapsulated using VXLAN segments, while traffic for those same virtual networks between the ToR switches and the compute nodes could use dynamically allocated VLANs segments.
+-------------+ | | | Core Switch | | | +---+-----+---+ VXLAN | | VXLAN +-----------+ +------------+ | | +------+-----+ +------+-----+ | | | | | ToR Switch | | ToR Switch | | | | | +---+---+----+ +---+----+---+ VLAN | | VLAN VLAN | | VLAN +-------+ +----+ +----+ +------+ | | | | +----+----+ +----+----+ +----+----+ +----+----+ | | | | | | | | | Compute | | Compute | | Compute | | Compute | | Node | | Node | | Node | | Node | | | | | | | | | +---------+ +---------+ +---------+ +---------+
Dynamically allocating segments at lower levels of the hierarchy is particularly important in allowing Neutron deployments to scale beyond the 4K limit per physical network for VLANs. VLAN allocations can be managed at lower levels of the hierarchy, allowing many more than 4K virtual networks to exist and be accessible to compute nodes as VLANs, as long as each link from ToR switch to compute node needs no more than 4K VLANs simultaneously.
Support for allocating dynamic segments was merged to ML2 during Juno. But there is currently no way for these dynamic segments to be allocated by one mechanism driver supporting the ToR switch, and used by a different mechanism driver supporting the L2 agent on the compute node. What is needed is the ability for an initial mechanism driver to partially bind to a static segment of a port’s virtual network, and provide a dynamic segment that can be bound by a different mechanism driver at the next level, continuing until the binding is complete.
Note that the diagram above shows static VXLAN segments connecting ToR and core switches, but this most likely isn’t the current ‘vxlan’ network_type where tunnel endpoints are managed via RPCs between the Neutron server and L2 agents. It would instead be a network_type specific to both the encapsulation format and the way tunnel endpoints are managed among the switches. Each network_type value should identify a well-defined standard or proprietary protocol, enabling interoperability where desired, and coexistance within heterogeneous deployments.
ML2 will support hierarchical network topologies by binding ports to mechanism drivers and network segments at each level of the hierarchy. For example, one mechanism driver might bind to a static VXLAN segment of the network, causing a ToR switch to bridge that network to a dynamically allocated VLAN on the link(s) to the compute node(s) connected to that switch, while a second mechanism driver, such as the existing OVS or HyperV driver, would bind the compute node to that dynamic VLAN.
Supporting hierarchical network topologies impacts the ML2 driver APIs and ML2’s port binding data model, but does not change Neutron’s REST APIs in any way.
A new function and property will be added to the PortContext class in the driver API to enable hierarchical port binding.
class PortContext(object): # ... @abc.abstractmethod def continue_binding(self, segment_id, next_segments_to_bind): pass @abc.abstractproperty def segments_to_bind(self): pass
The new continue_binding() method can be called from within a mechanism driver’s bind_port() method as an alternative to the existing set_binding() method. As is currently the case, if a mechanism driver can complete a binding, it calls PortContext.set_binding(segment_id, vif_type, vif_details, status). If the mechanism driver can only partially establish a binding, it will instead call continue_binding(segment_id, next_segments_to_bind). If the mechanism driver cannot bind at all, it simply returns without calling either of these.
As with set_binding(), the segment_id passed to continue_binding() indicates the segment that this driver is binding to. The new_segments parameter specifies the set of network segments that can be used by the next stage of binding for the port. It will typically contain a dynamically allocated segment that the next driver will use to continue or complete the binding.
Currently, mechanism drivers try to bind using the segments from the PortContext.network.network_segments property. These are the network’s static segments. The new PortContext.segments_to_bind property should now be used instead by all drivers. For the initial stage of binding, it will contain the same segments as PortContext.network.network_segments. But for subsequent stages, it will contain the segment(s) passed as next_segments_to_bind to PortContext.continue_binding() by the previous stage driver.
The ML2 plugin currently tries to bind using all registered mechanism drivers in the order they are specified in the mechanism_drivers config variable. To support hierarchical binding, only a minor change is needed to avoid any possibility of binding loops in misconfigured or misbehaving deployments. At each stage of binding, any drivers that have already bound at a higher level using any of the current level’s set of segments to bind are excluded. This approach allows the same driver to partially bind at multiple levels of a hierarchical network using different segments, but not using the same segment. Also, if a limit on the total number of binding levels is exceeded, binding will fail and an error will be logged.
Finally, to enable mechanism drivers to see the details of hierarchical (as well as normal) bindings, the PortContext’s bound_segment, original_bound_segment, bound_driver, and original_bound_driver properties will be replaced with a new set of properties:
class PortContext(object): # ... @abc.abstractproperty def binding_levels(self): @abc.abstractproperty def original_binding_levels(self): @abc.abstractproperty def top_bound_segment(self): @abc.abstractproperty def original_top_bound_segment(self): @abc.abstractproperty def bottom_bound_segment(self): @abc.abstractproperty def original_bottom_bound_segment(self):
The binding_levels and original_binding_levels properties return a list of dictionaries describing each binding level if the port is/was fully or partially bound. Keys for BOUND_DRIVER and BOUND_SEGMENT are defined, and additional keys can be added in future versions if needed. The first entry describes the top-level binding, which always involves one of the port’s network’s static segments. When fully bound, the last entry describes the bottom-level binding that supplied the port’s binding:vif_type and binding:vif_details attribute values.
In the presence of hierarchical bindings, some mechanism drivers that used the old bound_segment, original_bound_segment, bound_driver, and/or original_bound_driver properties need to access the top-level binding, while other drivers need to access the bottom-level binding. Therefore, the old properties will be replaced by new sets of properties providing access to each of these.
In order to store multiple levels of binding information for access by mechanism drivers, changes to the ML2 database schema are required. The driver and segment columns will be moved from the existing ml2_port_bindings and ml2_dvr_port_bindings tables to a new ml2_port_binding_levels table. This table will have port_id, host, and level columns as primary keys. No separate DVR-specific table will be needed.
A DB migration will be provided that moves existing binding data to the new table on upgrade. Downgrades will preserve existing single-level binding data, but there is no sensible way to preserve existing multi-level bindings on downgrade.
No REST API changes are proposed in this specification.
Using the existing providernet and multiprovidernet API extensions, only the top-level static segments of a network are accessible. There is no current need to expose dynamic segments through REST APIs. The portbindings extension could potentially be modified in the future to expose more information about multi-level bindings if needed.
As mechanism drivers for specific network fabric technologies are developed, new network_type values may be defined that will be visible through the providernet and multiprovidernet extensions. But no new network_type values are being introduced in this specific BP.
Hierarchical network topologies should enable OpenStack deployments to scale to larger numbers of networks.
Hierarchical network topologies may improve tenant network throughput by utilizing low-overhead encapsulation techiniques such as VLAN at the compute node level, while specialized network switches handle the more advanced encapsulation needed to scale to very large numbers of networks.
No impact, since this blueprint is at layer 2.
No change is required when deploying non-hierarchical network topologies. When deploying hierarchical network topologies, additional mechanism drivers will need to be configured. Additionally, when VLANs are used for the bottom-level bindings, L2 agent configurations will be impacted as described in the next few paragraphs.
Mechanism drivers determine whether they can bind to a network segment by looking at the network_type and any other relevant information. For example, if network_type is ‘flat’ or ‘vlan’, the L2 agent mechanism drivers look at the physical_network and, using agents_db info, make sure the L2 agent on that host has a mapping for the segment’s physical_network. This is how the existing mechanism drivers work, and this will not be changed by this BP.
With hierarchical port binding, where a ToR switch is using dynamic VLAN segments and the hosts connected to it are using a standard L2 agent, the L2 agents on the hosts will be configured with a mapping for a physical_network name that corresponds to the scope at which the switch assigns dynamic VLANs.
If dynamic VLAN segments are assigned at the switch scope, then each ToR switch should have a unique corresponding physical_network name. The switch’s mechanism driver will use this physical_network name in the dynamic segments it creates as partial bindings. The L2 agents on the hosts connected to that switch must have a (bridge or interface) mapping for that same physical_network name, allowing any of the normal L2 agent mechanism drivers to complete the binding.
If dynamic VLAN segments are instead assigned at the switch port scope, then each switch port would have a corresponding unique physical_network name, and only the host connected to that port should have a mapping for that physical_network.
Since multi-level network deployments will typically involve a particular vendor’s propriery switches, that vendor should supply documentation and deployment tools to assist the administrator.
Mechanism drivers that support hierarchical bindings will use the additional driver API call(s). Other drivers will only need a very minor update to use PortContext.segments_to_bind in place of PortContext.network.network_segments, and the new properies for accessing the top or bottom bound segment or driver.
This change provides new capabilities for ML2 mechanism drivers to support hierarachical network topologies. This supports the community’s interest in allowing Neutron to scale to larger deployments. It has been discussed at ML2 sub-team meetings, and at the Juno and Kilo summits.
The alternative is for vendors that need to support hierarchical network topologies to do so through monolithic plugins rather than ML2, or through ML2 mechanism drivers that require customized L2 agents.
As was done during Juno, the implementation will consist of a series of three patches:
Use of this feature requires the dynamic segment feature that was merged during Juno.
Unit tests will verify all the new driver API methods and properties, and that hierarchical port bindings can be established with a test mechanism driver. Real mechanism drivers using hierarchical bindings will be tested in the correspinding 3rd party CI, where the required switch hardware is available.
No new tempest tests are required because there is no change in application level behavior. 3rd party CI testing using the existing tempest tests will adequately exercise the hierarchical port binding functionality.
No new functional tests are required because there are no system level interactions that cannot be tested in unit tests.
There are no REST API changes to test.
Deployer documentation is the responsibility of 3rd parties that provide mechanism drivers that establish hierarchical bindings.
No user documentation changes are required.
The primary developer documentation is derived from the doc strings in the driver API source file, which describe the APIs and their behavior in detail. If an ML2 driver writers guide is ever written, it will need to cover hierarchical port binding.