Report HA Router Master¶
Highly available routers is a new functionality that was merged in the l3-high-availability blueprint. HA routers are scheduled on multiple L3 agents however the cloud operator has no way of knowing where the active instance is.
A cloud operator can know which L3 agents are providing a router, but not where the active instance is. Legacy routers may be manually moved from one agent to another. With HA routers, the equivalent is moving the active instance, but that is not currently possible. The first step is to know where the active instance, which will be addressed in this blueprint, however setting the location of the active instance is out of scope and will be addressed in the future.
The operator might want to perform node maintenance which is assisted by manually moving routers from the node. Likewise the operator might want to see the state of routers after a failover (Did the active instance actually failover?).
Currently shows all L3 agents hosting the router. It will now also show the HA state (Active, standby or fault) of said router on every agent.
+-----------+------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +-----------+------+----------------+-------+----------+ | 534c4b37- | net1 | True | :-) | active | | da2730c6- | net2 | True | :-) | standby | | 7abcd991- | net3 | True | xxx | fault | +-----------+------+----------------+-------+----------+
Keepalived doesn’t support a way to query the current VRRP state. The only way to know then is to use notifier scripts. These scripts are executed when a state transition occurs, and receive the new state (Master, backup, fault).
Every time we reconfigure keepalived (When the router is created or updated) we tell it to execute a Python script (That is maintained as part of the repository).
The script will:
Write the new state to a file in $state_path/ha_confs/router_id/state
Notify the agent that a transition has occurred via a Unix domain socket. The reason that step 1 will happen in the script and not in the agent after it receives the notification is that we want to write down the state transition whenever it happens so that it isn’t lost if the agent is down. keepalived does not expose a way to query for the current state, so that if a state transition occurred but we failed to write it down, that information is forever lost.
The L3 agent will start and stop the metadata proxy when it receives a notification. This is to save on memory usage by enabling the proxy only on the active instance. This can be important at scale as every proxy takes 20+ MBs.
The L3 agent will batch these state change notifications over a period of T seconds. When T seconds have passed and no new notifications have arrived it will send a RPC message to the server with a map of router ID to VRRP state on that specific agent. How it works is that once an event is received by the agent, it batches all future events over a period of T seconds. When the timer goes off, it sends all of the state changes in a single message to the controller. Additionally, every time the agent starts it gets a list of routers scheduled on the agent. The agent will now loop through said routers, collect their HA states from disk and update the server. This is to catch any state changes that occurred if and when an agent was down. If a router changes states multiple times during the batching period, the agent will only send the most up to date state.
The RPC message send will be retried in case the management network is temporarily down, or the agent is disconnected from it.
The server will then persist this information following the RPC message: The tables are already set up for this. Each router has an entry in the HA bindings table per agent it is scheduled to, and the record contains the VRRP state on that specific agent. The controller will also persist the last time a state change was received, so that in a split brain situation the admin would be able to understand which is the ‘real’ master by observing the time stamps.
Optionally*, the server will look for dead agents (That have not sent heartbeats in a while) and will mark their HA routers as down. This will aid the main use case of a hypervisor dying (Of course not being able to report of any state changes), and another hypervisor hosting all of the routers. In this case the API will return ‘active’ for all routers on both machines until the server notices that the first agent died and marks its routers as down.
This is an optional enhancement that could be added after the enhancement lands if we find it correct.
Data Model Impact¶
The HA state of every router to agent binding is persisted in the L3HARouterAgentPortBinding table. It is currently unused. A DB migration will be necessary in order to add time stamps as well as the ‘fault’ state, as currently only the ‘active’ and ‘standby’ can be persisted.
REST API Impact¶
l3-agent-list-hosting-router will now return an extra column that can be ‘active’, ‘standby’ or ‘fault’ for HA routers, or None for other types of routers.
keepalived runs as root, as does the transition script that it invokes. The transition script talks to the agent via a Unix domain socket.
Other End User Impact¶
python-neutronclient will support the new ha_state column. It will show ‘active’, ‘standby’ or ‘fault’ when a proper response is received. ‘-’ will be displayed if None is received by an old server or for non-HA routers.
Assuming two L3 agents and 1,000 routers hosted on each, a failover from node 1 to node 2 should induce only a single RPC call from node 2 to the server, and a single DB transaction.
Other Deployer Impact¶
Instead of neutron-keepalived-state-change notifying the agent via a Unix domain socket, the agent could poll for the state of all HA routers every T seconds. It would then diff the new states against a cached copy and notify the server of any changes. One could argue that this is simpler to implement and maintain, but is less performant.
- Primary assignee:
Assaf Muller <amuller>
Current keepalived notifier bash scripts are generated in-line. These will now be a Python script maintained as part of the repository. The script will be available as neutron-keepalived-state-change and will be invoked by keepalived.
At first the script will replicate the existing behavior of the bash scripts: Write the new state to disk and start up or shut down the metadata proxy.
The script must also notify the agent of the state change via a Unix domain socket. Starting and stopping the metadata proxy will be moved ot the agent.
The RPC message that updates HA routers states will be implemented (It currently actually already exists but cannot be used without changing its format).
The agent will batch up state change notifications in to a single RPC message. The Nova notifier mechanism batches notifications and the code will be reused.
The API must expose the new ha_state column.
The L3 agent must report HA states after it starts.
Add the fault state and state change timestamp via a DB migration patch.
Optional: The controller will look for dead agents and move their HA routers to the fault state.
L3 HA cannot be tested in Tempest without multi-node support. L3 HA is the first candidate to be tested when in-tree integration tests are introduced via the integration-tests blueprint.
The L3 agent already has functional testing in place. Two new tests will be added:
When a state change occurs, that the notification arrives at the agent.
When multiple state changes occur, that the RPC call is sent to the server with the expected parameters.
The RPC and DB methods will be tested with unit tests.
The changes to the API and CLI require documentation.
The CLI client documentation must be updated.
The Neutron API change must be documented.