The aim of this blueprint is to add High Availability Features on virtual routers to Fuel. L3 HA VRRP support covers only access to VMs and access from VMs to Internet.
L3 HA feature gives an opportunity to establish connection faster after L3 agent failover than router rescheduling.
Currently we are able to spawn many l3 agents, however each l3 agent is a SPOF. If an l3 agent fails, all virtual routers of this agent will be lost, and consequently all VMs connected to these virtual routers will be isolated from external networks and possibly from other tenant networks. Existing rescheduling has one big issue - thousands of routers take hours to finish the rescheduling and configuration process.
The idea of this spec is to schedule a virtual router to at least two l3 agents, but this limit could be increased by changing a parameter in the neutron configuration file.
L3 HA starts a keepalived instance in every router namespace. The different router instances talk to one another via a dedicated HA network, one per tenant. This network is created under the blank tenant to hide it from the CLI and GUI. The HA network is a Neutron tenant network, same as every other network, and uses the default segmentation technology. HA routers have an ‘HA’ device in their namespace: When a HA router is created, it is scheduled to a number of network nodes, along with a port per network node, belonging to the tenant’s HA network. keepalived traffic is forwarded through the HA device (As specified in the keepalived.conf file used by the keepalived instance in the router namespace).
+----+ +----+ | | | | +-------+ QG +------+ +-------+ QG +------+ | | | | | | | | | +-+--+ | | +-+--+ | | VIPs| | | |VIPs | | | +--+-+ +--+-+ | | | + | | | | + | | KEEPALIVED+---+ HA +------+ HA +----+KEEPALIVED | | + | | | | + | | | +--+-+ +--+-+ | | | VIPs| | | |VIPs | | +-+--+ | | +-+--+ | | | | | | | | | +-------+ QR +------+ +-------+ QR +------+ | | | | +----+ +----+
In section Neutron Advanced Configuration we need a checkbox for enabling L3 HA. This checkbox cannot be enabled if DVR is turned on.
Additional option ‘neutron_l3_ha’ will be added into opentack.yaml. It will marked as incompatible with Neutron DVR.
No FUEL REST API changes.
The following options should be passed to neutron::server class in order to enable L3 HA and disable legacy rescheduling:
Upgrade from legacy to HA router was not added in Liberty, but will be backported from upstream.
We can upgrade legacy router to HA router by 3 steps:
neutron router-update router1 --admin_state_up=False neutron router-update router1 --ha True neutron router-update router1 --admin_state_up=True
It also will be possible to make HA router legacy router.
HA L3 is based on Keepalived(VRRP protocol) which gives the following features:
L3 HA feature uses service network called “HA network” for VRRP protocol messages. This network is created for every tenant, so if there’s a limited number of tunnels (or VLANs) for Neutron private networks it should be considered.
Ability to enable L3 HA support in Neutron should be documented in Fuel Deployment Guide.
keepalived must satisfy the following criteria: 1.2.13, >1.2.16 (done for Ubuntu 14.04, satisfied in CentOS 7)
checking compatibility with plugins
bug fixing/backport from upstream
Since this implementation relies on Keepalived, Keepalived has to be installed on each l3 node. The required version of Keepalived is the version 1.2.10 in order to have the IPV6 support. Safe versions:1.2.13,>1.2.16
All existing HA/destructive tests should pass on env with L3 HA enabled.
Environment with L3 HA enabled should pass all tests currently run on Scale Lab with no significant performance degradation. No additional Rally scenarios are needed to test specifics of L3 HA.
Pass acceptance functional test - after active L3 agent fails, connection establishes and less than 10 packages should be lost.