|Author:||Rossella Sblendido <firstname.lastname@example.org>|
|Author:||Marios Andreou <email@example.com>|
This blueprint focuses on paying down the technical debt for the OVS L2 agent, as already discussed during the Kilo design summit  . The goal of this work is to improve the code quality of the OVS l2 agent, in particular with respect to scalability and performance. These improvements will be evaluated by performing stress tests using Rally and comparing the results before and after this change. Test coverage for the OVS L2 agent will also be improved as part of this blueprint.
The L2 agent presents several points that can be improved to boost performance and scalability. This blueprint tackles the following areas: RPC, device processing and the OVSDB monitor. Every point will be analized in detail in the next section. Orthogonal to this blueprint, there’s a spec that was completed in Kilo to use OVS Python lib instead of the CLI  .
We propose changes for each area identified in the problem description.
With the current implementation if there’s an error in the communication with the plugin during the agent loop, the sync flag is set to true and a complete resync will be performed by the agent at the next iteration. This means that all the devices will be processed again by the agent. To avoid that when an error occurs the device being processed will be put in a list of devices in error. The resync will be performed only for those devices. The agent’s reaction to errors should be improved as well. The agent should analyze the error and perform one of the following actions:
The L2 agent doesn’t have a reliable way of ensuring the state reported on the server side is consistent with the state applied on the backend. Providing a solution to this problem is out of scope for this blueprint.
The OVS DB monitor knows which devices have been added, and which ones have been removed. It can therefore be used to generate the events that the agent needs to process, at least the ones initiated by changes on the host such as vif plug and vif unplug. Nevertheless, we just use the OVS DB monitor to “signal” that an event occurred and then scan the bridge again to gather information which was already retrieved by the OVS DB monitor.
Leveraging the OVS DB monitor in this way can also simplify the process of transforming the agent event processing mechanism from a loop with polling to a queue-based mechanism. Events can be either initiated on the host itself (e.g.: vif plugged) or from the neutron server (e.g.: security group membership changed). In many cases these events can be processed independently. Adding new events to queues will simplify the process of enabling multiple workers for consuming these events and ensure events with a prerequisite event are executed in the appropriate order.
There is also a possibility of using different queues for handling events with different priorities according to their criticality. This is however something that can be done in a subsequent iteration (it won’t be anymore debt repayment but ‘enhancement’).
There’s an ongoing effort to modify OVS Python library to make events regarding port added or deleted available to its client. Even though right now OVS Python lib essentially runs monitor to update the local cache of the interfaces, it doesn’t make those events (device added or removed) available to the user of the library. Terry Wilson otherwiseguy is working on it.
The L2 agent could make use of this notification system when the change is merged upstream. This is out of the scope for this blueprint though.
Modify port update to specify which change occured to the port
Performance should be improved. It’s not possible to quantify it now but the following is expected:
This change has been discussed during the Kilo design summit and supports the focus for Kilo to pay down technical debt.
This blueprint in the end is a list of small changes. Every small change can be discussed and several slightly different variants can be proposed. But the only general alternatives to this blueprint, are: to leave the agent as it is or to write a completely new one.
No new tests
Functional tests for ip_lib and ovs_lib
Currently there’s no functional test for the agent. The following cases will be tested: