https://blueprints.launchpad.net/fuel/+spec/fuel-multiple-l3-agents
In FUEL 5.1 and before HA network solution was based on one neutron-l3-agent and one DHCP agent, which were switchable between controllers.
This blueprint describes a way of using multiple L3 and DHCP agents instead of single. It is required for network scalability and neutron performance improvements.
When we created virtual router in Neutron, it was scheduled to the L3-agent (to one of alive if we had multiple agents using random selection). Before Juno Neutron server didn’t monitor life cycle of agent serving this router. If the L3-agent service stopped on a node containing this agent or connectivity was lost, Neutron server didn’t reschedule this router to another L3-agent. So there was no HA network solution.
In Juno multiple solutions for this problem were introduced. The easiest solution is to use the internal Neutron routers rescheduling mechanism. In that case Neutron server automatically monitors L3 agents lifecycle. If agent is marked as dead, all routers associated to the dead agent will be safely moved by Neutron server to an alive agent on another node and auxiliary resources created by the dead agent, such as additional interfaces and iptables rules, will be removed. There are some cases when auxiliary resources will be kept on the dead node and potentially affect connection to instances. For example, when L3 agent is alive but lost connection to a message broker. To avoid such problems additional monitoring and clean up mechanism should be added. It must be easily usable by Pacemaker. Current rescheduling script must be modified to match the proposed changes.
This feature allows to have permanent and stable connection to instances even in case of failure of one or more L3 agents. Also it allows to effectively distribute routers between all available agents to improve network performance.
For DHCP agent multiple-agent mode implemented as experimental feature and disabled by default.
In the Juno release DVR L3 agent is introduced. It looks like alternative router solution. This solution serves only VMs with floating IP addresses and doesn’t change behavior for VMs without FIP. Also this solution doesn’t change behavior of DHCP agents.
There’s another solution based on VRRP. The problem is that this solution doesn’t cover situation where both vrrp nodes are down. This solution also needs external re-scheduling mechanism.
None
None
None
None
None
None
None
None
Using “multiple_agents” option for OCF scripts we can manipulate behavior of L3 and DHCP agents. Moreover, for using old-style behavior of L3/DHCP agents we should decrease clone size for corresponded Pacemaker resources to “1”.
Sergey Vasilenko Eugene Nikanorov Oleg Bondarev Sergey Kolekonov
None
New Neutron-server behavior in case of dead L3 agents should be reflected in documentation to correctly debug possible problems.
None