This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode

Bulk zone update throttling

https://blueprints.launchpad.net/designate/+spec/notify-throttling

Implement a mechanism to throttle the delivery of NOTIFY transactions when a large number of zones are updated at the same time.

Problem description

If a large number of zones are updated in a short time this will generate a consequently large amount NOTIFY transaction to be sent to the nameservers with no delay leading to a burst of incoming AXFR requests. This might impact on bottlenecks in MiniDNS and the storage layer in terms of CPU, I/O or network bandwidth.

A typical trigger is the update of an NS record in a Pool containing many zones.

The autonomous refreshing of zones performed by resolvers can also trigger a similar burst of AXFR. This can happen on recently started resolvers, where the refresh timers can share the same values across many zones.

Related to bug https://bugs.launchpad.net/designate/+bug/1498462

Proposed change

Implement a mechanism for enqueuing and delayed delivery of notify transactions at a configurable throttle speed.

Also, implement staggering of zone refresh requests by randomizing the refresh interval.

API Changes

Expose the count of zones flagged for delayed notify in the Admin API as “/reports/counts/zones_pending_notify”.

Central Changes

Implement support for a new database column “pending_notify” and set it to True every time a Pool NS record is updated.

Storage Changes

Add an new boolean database column “pending_notify” on Zones. Implement a migration script to add the column to existing databases, defaulting to False. In future, the column might default to True.

Other Changes

Implement a Task in Zone Manager to periodically fetch a set of zones that need to receive a Notify starting with the oldest in term of last update time. The task frequency and the maximum set size can be configured to throttle the amount of outgoing Notify. Zone Manager will reset the “pending_notify” flag once done.

Alternatives

N/A

Implementation

The throttling queue is implemented as a new database column containing a boolean flag. See Central Changes and Storage Changes.

Also, new zones will be created with an uniformly random refresh time between a minimum and a maximum value.

Design considerations

The throttling queue could be implemented outside of the database: - No need to create an extra database column - No increased database I/O

We propose using the database for the following reasons: - Zone Manager is the best candidate to handle the delayed Notify. Currently there are no ways for Central to send a list of Zones to Zone Manager other than through the database - The queue can support delayed Notify for changes other than Pool NS record updates - Ability to monitor the queue size and ETA to inform the user and for debugging - A persistent queue can survive Zone Manager unhandled exceptions or restarts - The increased database load is negligible compared to the existing traffic

Risk analysis

  • Zone Manager fails to run the Notify delivery task. The nameservers will eventually refresh the zone anyways. Impact: slow update propagation. Mitigation: expose the notification queue length to the user through Admin API and by logging.
  • A big notification queue takes a considerable time to be handled. Impact: potentially prevents more urgent changes to be delivered quickly. Mitigation: encourage users to configure the throttling parameters; Provide sensible default values. Implementing a concept of notification priority seems unnecessary.

Assignee(s)

Primary assignee:
Federico Ceratto https://launchpad.net/~federico-ceratto

Milestones

Target Milestone for completion:
Liberty-3

Work Items

  • Implement refresh time staggering
  • Implement Notify throttling
  • Add throttle parameters to configuration files
  • Document throttling mechanism
  • Write unit and functional tests
  • Test throttling and staggering on devstack

Dependencies

N/A