Multiple L3 and DHCP agents in Neutron¶

https://blueprints.launchpad.net/fuel/+spec/fuel-multiple-l3-agents

In FUEL 5.1 and before HA network solution was based on one neutron-l3-agent and one DHCP agent, which were switchable between controllers.

This blueprint describes a way of using multiple L3 and DHCP agents instead of single. It is required for network scalability and neutron performance improvements.

Problem description¶

When we created virtual router in Neutron, it was scheduled to the L3-agent (to one of alive if we had multiple agents using random selection). Before Juno Neutron server didn’t monitor life cycle of agent serving this router. If the L3-agent service stopped on a node containing this agent or connectivity was lost, Neutron server didn’t reschedule this router to another L3-agent. So there was no HA network solution.

Proposed change¶

In Juno multiple solutions for this problem were introduced. The easiest solution is to use the internal Neutron routers rescheduling mechanism. In that case Neutron server automatically monitors L3 agents lifecycle. If agent is marked as dead, all routers associated to the dead agent will be safely moved by Neutron server to an alive agent on another node and auxiliary resources created by the dead agent, such as additional interfaces and iptables rules, will be removed. There are some cases when auxiliary resources will be kept on the dead node and potentially affect connection to instances. For example, when L3 agent is alive but lost connection to a message broker. To avoid such problems additional monitoring and clean up mechanism should be added. It must be easily usable by Pacemaker. Current rescheduling script must be modified to match the proposed changes.

This feature allows to have permanent and stable connection to instances even in case of failure of one or more L3 agents. Also it allows to effectively distribute routers between all available agents to improve network performance.

For DHCP agent multiple-agent mode implemented as experimental feature and disabled by default.

Alternatives¶

In the Juno release DVR L3 agent is introduced. It looks like alternative router solution. This solution serves only VMs with floating IP addresses and doesn’t change behavior for VMs without FIP. Also this solution doesn’t change behavior of DHCP agents.

There’s another solution based on VRRP. The problem is that this solution doesn’t cover situation where both vrrp nodes are down. This solution also needs external re-scheduling mechanism.

Data model impact¶

None

REST API impact¶

None

Upgrade impact¶

None

Security impact¶

None

Notifications impact¶

None

Other end user impact¶

None

Performance Impact¶

Time delays when neutron agents go down will decrease.
Network scalability will grow.
Load on a separate controller will be decreased.
Customers will get a possibility to add any number of nodes with started neutron agents and network scalability will grow.

Other deployer impact¶

None

Developer impact¶

None

Implementation¶

In astute.yaml we have following options:
- quantum_settings/L3/multiple_dhcp_agents (default=false)
- quantum_settings/L3/dhcp_agents_per_network (default=3)
- quantum_settings/L3/multiple_l3_agents (default=true)
OCF scripts for L3 and DHCP agents got “multiple_agents” option, that allows run agents in non-singletone mode
cluster::neutron::l3 and cluster::neutron::dhcp classes got “multiple_agents” option, that allows configure agents for running in multiple-agent mode
cluster::neutron::dhcp got “agents_per_net” option (by default = 3), that describe amount of dhcp-agents for serve each network. This default justifyed by performance reasons.

Backward compatibility¶

Using “multiple_agents” option for OCF scripts we can manipulate behavior of L3 and DHCP agents. Moreover, for using old-style behavior of L3/DHCP agents we should decrease clone size for corresponded Pacemaker resources to “1”.

Work Items¶

Update Puppet manifests to enable multiple L3 agents
Add necessary patches to Neutron for additional agents monitoring
Edit the rescheduling script and Pacemaker OCF script to support multiple agents behavior

Assignee(s)¶

Sergey Vasilenko Eugene Nikanorov Oleg Bondarev Sergey Kolekonov

Dependencies¶

None

Documentation Impact¶

New Neutron-server behavior in case of dead L3 agents should be reflected in documentation to correctly debug possible problems.

References¶

None

Testing¶

Deploy HA cluster
All instances must be constantly available via floating ips and have Internet access even in case of whole controller failure or particular cases such as message broker failures

OpenStack

Multiple L3 and DHCP agents in Neutron¶

Problem description¶

Proposed change¶

Alternatives¶

Data model impact¶

REST API impact¶

Upgrade impact¶

Security impact¶

Notifications impact¶

Other end user impact¶

Performance Impact¶

Other deployer impact¶

Developer impact¶

Implementation¶

Backward compatibility¶

Work Items¶

Assignee(s)¶

Dependencies¶

Documentation Impact¶

References¶

Testing¶

Table Of Contents

Previous topic

Next topic

Project Source

This Page

OpenStack

Multiple L3 and DHCP agents in Neutron¶

Problem description¶

Proposed change¶

Alternatives¶

Data model impact¶

REST API impact¶

Upgrade impact¶

Security impact¶

Notifications impact¶

Other end user impact¶

Performance Impact¶

Other deployer impact¶

Developer impact¶

Implementation¶

Backward compatibility¶

Work Items¶

Assignee(s)¶

Dependencies¶

Documentation Impact¶

References¶

Testing¶

Table Of Contents

Previous topic

Next topic

Project Source

This Page

Quick search

Navigation