Neutron Distributed Virtual Router implements the L3 Routers across the compute nodes, so that tenants intra VM communication will occur without hitting the controller node. (East-West Routing)
Also Neutron Distributed Virtual Router implements the Floating IP namespace on every compute node where the VMs are located. In this case the VMs with FloatingIPs can forward the traffic to the external network without reaching the controller node. (North-South Routing)
Neutron Distributed Virtual Router provides the legacy SNAT behavior for the default SNAT for all private VMs. SNAT service is not distributed, it is centralized and the service node will host the service.
Currently Neutron L3 Routers are deployed on specific Nodes (controller nodes) where all the compute traffic will flow through.
Problem 1: Intra VM traffic flows through the controller node
In this case even VMs traffic that belong to the same tenant on a different subnet has to hit the controller node to get routed between the subnets. This would affect performance and scalability.
Problem 2: VMs with FloatingIP also receive and send packets through the controller node routers
Today FloatingIP (DNAT) translation is done on the controller node and also the external network gateway port is available only at the controller. So any traffic that goes to the external network from the VM will have to go through the controller node. In this case the controller node becomes a single point of failure and also the traffic will heavily load the controller node. This would affect the performance and scalability.
The proposal is to distribute L3 Routers across compute nodes when required by VMs. This implies having external network access on each compute node.
In this case there will be enhanced L3 Agents running on each and every compute node (This is not a new agent, this is an updated version of the existing L3 Agent). Based on the configuration in the L3 Agent.ini file, the enhanced L3 Agent will behave in legacy (centralized router) mode or as a distributed router mode.
Also the FloatingIP will have a new namespace created on the specific compute node where the VM is located (this is done by L3 agent itself). Each Compute Node will have one new namespace for FloatingIP per external network that will be shared among the tenants. Additional namespace and external gateway port will also be created on each compute node for the external traffic to flow through, in case there are VMs with floating ips residing on this node. This port will consume additional IP address from external network.
Default SNAT functionality will still be centralized and will be running on controller nodes.
The Metadata agent will be distributed as well and will be hosted on all compute nodes and the Metadata Proxy will be hosted on all the distributed routers.
This implementation is specific to ML2 with OVS driver. All three type of segmentation are supported: GRE, VXLAN, VLAN.
No Distributed SNAT
Neutron Distributed Virtual Router provides the legacy SNAT behavior for the default SNAT for all private VMs. SNAT service is not distributed, it is centralized and the service node will host the service. Thus current DVR architecture is not fully fault tolerant - outbound traffic for VMs without floating IPs is still going through one L3_agent node and is still prone to failures of a single node.
Only with ML2-OVS/L2-pop
DVR feature is supported only by ML2 plugin with OVS mechanism driver. If using tunnel segmentation (VXLAN, GRE) L2 population mechanism should be enabled as well.
OVS and Kernel versions
Proper operation of DVR requires OpenvSwitch 2.1 or newer and VXLAN requires kernel 3.13 or newer.
No bare metal support
Distributed routers rely on local l3 agent (residing on compute node) for address translation, so for bare metal instances only legacy routers should be used.
Fuel Library related changes
Fuel Web related changes
Compute nodes: ----------------- network_scheme: endpoints: br-ex: IP: none ----------------- quantum_settings: DVR: true ----------------- Controller nodes: ----------------- network_scheme: endpoints: br-ex: IP: - 172.16.0.3/24 ---------------- quantum_settings: DVR: true ----------------
No FUEL REST API changes.
The upgrade path from legacy to distributed router is supported. It’s a 3 step process:
distributed->legacy upgrade is not officially supported in Kilo but it may work, just needs to be tested.
Inter VM traffic between the tenant subnets doesn’t need to reach the router in the controller node to get routed and will be routed locally from the compute node. This would increase the performance substantially.
Also the Floating IP traffic for a VM from a Compute Node will directly hit the external network from the compute node, instead of going through the router on the controller node.
Dataplane testing results from 25 bare metal nodes env show significant performance improvement for both East-West and North-South (with floating IPs) scenarios.
This will likely depend on enabling l2-population for tunneling which is a separate effort. However we will not wait but enable l2 pop as part of DVR effort if needed.
It also correlates with blueprint upgrade-openstack-puppet-modules as all required changes might be already in master in upstream manifests.
All existing HA/destructive tests should pass on env with DVR enabled. Additional scenarios should include:
Shaker scenarios should be run on a bare-metal environment with DVR enabled. Significant increase in performance is expected for east-west and north-south (with Floating IPs) topologies. Some of the results were already obtained (see “Performance Impact” section of the this doc)
Ability to enable DVR support in Neutron should be documented in Fuel Deployment Guide.