Neutron/L3 High Availability VRRP

Launchpad blueprint:

https://blueprints.launchpad.net/neutron/+spec/l3-high-availability

The aim of this blueprint is to add High Availability Features on virtual routers.

High availability features will be implemented as extensions and drivers. A first driver on the agent side will be based on Keepalived.

A new scheduler will be also added in order to be able to spawn multiple instances of a same router on many agents for the redundancy.

The DVR blueprint will leverage this proposal as a Service node specifically for SNAT traffic. See the reference for the DVR BP at the end of this specification

Problem description

Currently we are able to spawn more than one l3 agent, and a l3 agent is able to handle more than one external network, however each l3 agent is a SPOF.

If an l3 agent fails, all virtual routers of this agent will be lost, and consequently all VMs connected to these virtual routers will be isolated.

Proposed change

For the Neutron server side:

The idea of this blueprint is to schedule a virtual router to at least two l3 agents, but this limit could be increased by changing a parameter in the neutron configuration file.

For the Neutron L3 agent side:

The current router interfaces management in the l3 agent will be abstracted in order to introduce the possibility to add drivers for that purpose. As a first implementation of a driver, an HA Keepalived driver will be added. All the IPs will be converted to VIPs.

In order to hide the HA traffic from the tenant point of view a HA network will be added and all the virtual router instances will be connected through a HA port to this network.

Flows:

        +----+                          +----+
        |    |                          |    |
+-------+QG+------+           +-------+QG+------+
|       |    |      |           |       |    |      |
|       +-+--+      |           |       +-+--+      |
|     VIPs|         |           |         |VIPs     |
|         |      +--+-+      +--+-+       |         |
|         +      |    |      |    |       +         |
|  KEEPALIVED+---+HA+------+HA+----+KEEPALIVED  |
|         +      |    |      |    |       +         |
|         |      +--+-+      +--+-+       |         |
|     VIPs|         |           |         |VIPs     |
|       +-+--+      |           |       +-+--+      |
|       |    |      |           |       |    |      |
+-------+QR+------+           +-------+QR+------+
        |    |                          |    |
        +----+                          +----+

As a phase 2 of the keepalived driver implementation, the Keepalived driver will start a conntrackd instance in order to not lose the established connections when switching from the active to standby.

Alternatives

The first driver is going to be based on Keepalived. We could use some alternative drivers based on other protocols for ex: Common Address Redundancy Protocol (CARP).

By default a config parameter will be added in order to specify whether the virtual routers will be HA or not. In addition, an admin-only API is introduced which will allow admins to migrate existing routers to HA mode.

Data model impact

Two new columns will be added to the router_extra_attributes table in order to specify whether the virtual router will be HA or not and to specify the virtual router id.

Attribute Name

Type

Access

Default Value

Validation/ Conversion

Description

ha ha_vr_id

bool int

RW, admin RW, admin

False N/A

N/A N/A

Set router as HA HA virtual router id

The ha_vr_id will be limited to 255 due to VRRP protocol. This limit will have to be removed when introducing a new driver without this limitation.

A new table will be introduced to specify the association between a router, the agents and the HA ports that are going to be used for the HA administrative traffic.

Attribute Name

Type

Access

Default Value

Validation/ Conversion

Key

Description

port_id

UUID

RW, admin

N/A

N/A

PRI

HA port id

router_id

UUID

RW, admin

N/A

N/A

l3_agent_id

UUID

RW, admin

N/A

N/A

priority

int

RW, admin

50

N/A

state

enum

RW, admin

N/A

N/A

active/standby

REST API impact

router-create Create a router for a given tenant.

::

router-create –name another_router –ha=true

Admin can only set this attribute. The tenants need not be aware about this attribute in the router table. So it is not visible to the tenant.

Request

::

POST /v2.0/routers Accept: application/json

{ “router”:{ “name”:”another_router”, “admin_state_up”:true, “ha”:true} }

Response

::

{ “router”:{ “status”:”ACTIVE”, “external_gateway_info”:null, “name”:”another_router”, “admin_state_up”:true, “ha”:true, “tenant_id”:”6b96ff0cb17a4b859e1e575d221683d3”, “id”:”8604a0de-7f6b-409a-a47c-a1cc7bc77b2e”} }

router-show Show information of a given router.

Request

::

GET /v2.0/routers/a9254bdb-2613-4a13-ac4c-adc581fba50d Accept: application/json

Response

::

{ “routers”:[{ “status”:”ACTIVE”, “external_gateway_info”:{ “network_id”:”” }, “name”:”router1”, “admin_state_up”:true, “ha”:true, “tenant_id”:”33a40233088643acb66ff6eb0ebea679”, “id”:”a9254bdb-2613-4a13-ac4c-adc581fba50d”}] }

router-update Create a router for a given tenant.

Admin can only update the HA mode of a router.

Admin only context:

::

neutron router-update router1 –ha=True

Security impact

None

Notifications impact

None

Other end user impact

None

Performance Impact

There will be no network performance impact. Spawning a new virtual router may be a bit longer due to the delay of starting the Keepalived/Conntrackd processes.

Other deployer impact

Since this implementation relies on Keepalived, Keepalived will have to be deployed on each l3 node. The required version of Keepalived is the version 1.2.0 in order to have the IPV6 support.

In addition, conntrackd will be required to be run on each node.

There is no plan to migrate automatically the original virtual routers to the HA virtual routers when updating a previous Openstack installation. So after a migration and with the l3_ha configuration parameter set to “True”, the new routers created will be HA while the older ones will be unchanged. Cloud admins can migrate existing virtual routers to be HA routers by using the new API. This API is not exposed to tenants.

Developer impact

None

Implementation

Assignee(s)

Primary assignee:

Sylvain Afchain <sylvain-afchain>

Other contributors:

Assaf Muller <amuller>

Work Items

  1. HA L3 Extension, DB bases

  2. HA L3 Scheduler

  3. Keepalived manager

  4. L3 agent driver abstraction introduction, Keepalived driver

  5. Conntrackd support

Dependencies

None

Testing

The code will be covered by unit tests. When multi-nodes test will be available, tempest test will be introduced.

A document explaining how to test all the patches during the review process will be updated here :

https://docs.google.com/document/d/1P2OnlKAGMeSZTbGENNAKOse6B2TRXJ8keUMVvtUCUSM

Documentation Impact

Document deployer impacts.

References

https://review.openstack.org/#/q/topic:bp/l3-high-availability,n,z https://git.openstack.org/cgit/openstack/neutron-specs/tree/specs/juno/neutron-ovs-dvr.rst https://wiki.openstack.org/wiki/Neutron/L3_High_Availability_VRRP