Neutron Routed Networks¶
In Neutron, there is a priority effort to support routed networks. A routed network, in this context, is a physical network infrastructure that implements scaled networks by routing instead of large L2 broadcast domains. For example, deployers may have routers at each top-of-rack. Instead of a single VLAN covering the deployment, each rack would have its own VLAN and the routers will provide reachability to the rest of the racks over L3. Operators want Neutron to model this like a single network. This has implications for Nova scheduling and possibly migration.
Neutron has a spec for how this will be handled in there. Each L2 network is referred to as a segment. Other terminology is in discussion in the spec.
For Nova, this has a couple of specific implications. First, IP subnets will have affinity to particular network segments. Second, compute hosts will have L2 reachability to (typically) only one segment within an network. This means that IP addresses assigned to ports are constrained to a potentially small subset of compute hosts.
Currently, Nova requires an IP address on a port. If that requirement were kept, and that IP address is constrained to a small subset of compute hosts, then the scheduler would have to constrain scheduling to that subset. This is a pretty severe artificial constraint on the scheduler. To avoid it, Neutron needs to be able to leave the IP address unassigned until after the port is bound to a host. After host binding, Nova can still fail the build for a deferred IP port if an IP is still not allocated.
A related but much less severe constraint is that of IP availability across segments. Some segments might be exhausted and that should be considered by the scheduler. This is a resource that is under the control of Neutron and hence will need a resource provider created to manage it for the Nova scheduler.
For move operations involving the scheduler (e.g. live migrations), the VM already has an IP address. For that IP address to continue to work, the VM must be migrated to another host with reachability to the same network segment. Forced move operations that bypass the scheduler may cause a failure at binding time if the segment is not available on the new host.
In the following use cases, there is an assumption that all segments in Neutron can be associated with one or more aggregates in Nova via the proposed new openstack resource-pool create and openstack resource-pool add aggregate` commands and associated REST API.
User has a port without a binding to a segment and provides it to nova boot. Such a port will not have an IP address until after the scheduler places the instance and the port gets bound to that host. Then, Neutron can assign an IP address from a segment which that compute host can reach.
In this use case, the scheduler must take into consideration the availability of IP addresses in each of the segments. For example, there could be some segments in the network which are out of addresses completely.
A similar use case is to add an additional port to an existing instance. In this case, the segment and IP address of the new port will be set when the new port is bound to the compute host. Since the port was unbound to begin with, there should be no restriction.
Binding may fail in this case if all of the segments available to the host are out of IP addresses.
User has a port that has an IP address and thus is effectively attached to a segment (but not bound to a host). He/She provides it to nova boot. Nova will ask Neutron for the segment to which the port is bound by getting the details of the port. Given that segment, the scheduler should place the instance on a compute host belonging to the corresponding aggregate.
A similar use case is to add an additional port to an existing instance. In this case, the segment of the new port must match a segment available to the instance’s host. If not, adding the port to the instance should fail.
User calls Nova boot and passes a network id. The Nova scheduler will call Neutron to create a port, will place the instance, and then will call Neutron to update the port with binding details. Neutron will use the host binding to set the segment and allocate the IP.
Any move operation calling out the scheduler. In this case, the port already has an IP address. That IP address is only viable in the same segment. The scheduler must only consider target hosts that belong to the same segment (or aggregate).
Neutron will be a resource provider as described in the generic resource pools specification and its dependencies. I imagine that Neutron will create and maintain aggregates corresponding to its segments so that Nova has the same mapping as Neutron does of hosts to segments.
Next, Neutron creates a resource_pool for each of the segments. The pool has a resource class (e.g. “IPV4_ADDRESS” or “IPV6_ADDRESS”) in common with other resource pools but each pool is specific to a segment id. The linkage is set by setting the UUID of the resource pool equal to the UUID of the segment in Neutron. Resource pools are linked to the host aggregates.
The resource pool has a record in an inventories table for IPs as a resource class. It effectively gives the capacity of the pool from Nova’s perspective:
capacity = (total - reserved) * allocation_ratio
Neutron will call Nova’s REST API to set “total” to the size of the allocation pool(s) on the subnets. This will remain mostly static but could change if the allocation pool is updated in a subnet-update call. The allocation_ratio will always be 1.0 in this use case.
Neutron sets reserved to the total number of addresses which are consumed outside of Nova’s purview. This includes overhead stuff like dhcp and dns consumed from the subnets’ allocation pool which Neutron shares with Nova. This is expected to remain mostly constant but might change a little more often than the total if new overhead ports are allocated in Neutron.
The allocations table indicates how much of the capacity has been consumed by Nova.
There can be a race to consume IP resources for any given segment. In current Nova, the claim is made on the compute node after scheduling is done. This can result in a race to consume IPs if the IP resource is getting low. With the claim being made by the compute node, a failure to collect the claim can be very costly since the compute node has already started the process of claiming and consuming other resources.
To reduce the cost of a failed claim this spec depends on John G’s spec for pre-allocating before scheduling and moving the port update to the conductor.
Regarding the use cases where the user has a port and brings that port to Nova to create an instance (or to add it to an existing instance), they appear the same at first:
nova boot --nic port_id=$PORT_ID
Nova will make a call to Neutron to get or create a port and will receive the details of the port in the response. In those details, Neutron will include the segment_id of the each fixed_ip on the port if it is bound to a segment. This segment_id will be used to lookup the resource provider for IP addresses on the segment.
For Nova to allow deferring IP allocation on a port, a new attribute will be added to the Neutron port called ip_allocation. It will have one of three values: “immediate,” “deferred,” or “none.” Ports with “immediate” ip_allocation act like ports do today: it is expected that an IP will be allocated on port create. Ports with “deferred” ip_allocation will have an IP address allocated on port update when host binding information is provided. Ports with “none” in ip_allocation are not intended to have an IP address allocation at all. It is beyond the scope of this patch to handle ports with “none.”
One alternative was considered around trying to eliminate races for IP resource between Nova and Neutron. It involved significantly more active maintenance of the reserved field on the resource provider and required that the allocation was conditionally recorded depending on the scenario.
This method was rejected in favor of the current proposal for its complexity.
Data model impact¶
REST API impact¶
Other end user impact¶
Users who create a port with Neutron and bring it to Nova will notice that the port doesn’t have an IP address when the network is routed.
Operators will notice the use of host aggregates which correspond to Neutron segments and their corresponding resource providers.
The preceding spec to prepare Nova for network aware has some performance effects that should be noted here although this spec does not add to those. It moves port get/create to before the scheduler which adds some overhead. It also moves the port update to the conductor which will significantly reduce the overhead involved when port update fails due to exhausted IP address resources.
Other deployer impact¶
Since this work is co-dependent on work in Neutron, there are some upgrade considerations. If routed networks are not used in Neutron then there is no problem. Existing networks and new non-routed networks will still work the way they do today. Since routed networks are an optional new feature, this will only affect operators who wish to take advantage of it.
The best thing for operators to do will be to upgrade both services before attempting to configure a routed provider network. However, I’ll discuss the implications of rolling upgrades.
Consider if the Neutron API is upgraded and Nova is not. Neutron will not have the generic resource provider API endpoint available. Neutron will need to handle this gracefully taking advantage of microversioning in the Nova API. Neutron will poll infrequently to discover when Nova has been upgraded and will make use of the API when it becomes available.
In the meantime, it will be possible to create routed networks in Neutron but scheduling will not be IP resource aware. So, if segments run out of addresses, boot failures will happen when a VM is scheduled to these segments when Nova attempts to create a port and that fails.
Finally, the deferred IP allocation use case will not work because Nova will refuse to use a port without an IP address until it has been upgraded. The use cases that don’t involve deferred IP allocation will work until the above IP exhaustion problem is encountered.
If Nova is upgrade and Neutron is not, then there is no problem because routed provider networks and deferred IP address ports are not possible.
Get segment_id, if available, from the port in the pre-schedule phase on the conductor. Use that segment_id to look up the resource provider for IP address.
Allow deferred or no IP addresses on ports by looking at the ip_allocation attribute on the port.
Neutron to curate host aggregates and resource pools within Nova. (This is Neutron acting as a client to the Nova API, isn’t it? So, it isn’t really a Nova work item.)
This is co-dependent on the Neutron spec mentioned above. Also depends on the resource providers which has merged in Nova and the newly created spec to prepare for network aware scheduling.
All new functionality will be covered with unit tests. We’ll be looking to create a multi-node job to run on Neutron and Nova which tests out routed networks. It will include tests specifically for the use cases mentioned in this spec.
The OpenStack Administrator Guide will be updated.