This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode
Implement a mechanism to throttle the delivery of NOTIFY transactions when a large number of zones are updated at the same time.
If a large number of zones are updated in a short time this will generate a consequently large amount NOTIFY transaction to be sent to the nameservers with no delay leading to a burst of incoming AXFR requests. This might impact on bottlenecks in MiniDNS and the storage layer in terms of CPU, I/O or network bandwidth.
A typical trigger is the update of an NS record in a Pool containing many zones.
The autonomous refreshing of zones performed by resolvers can also trigger a similar burst of AXFR. This can happen on recently started resolvers, where the refresh timers can share the same values across many zones.
Related to bug https://bugs.launchpad.net/designate/+bug/1498462
Implement a mechanism for enqueuing and delayed delivery of notify transactions at a configurable throttle speed.
Also, implement staggering of zone refresh requests by randomizing the refresh interval.
Expose the count of zones flagged for delayed notify in the Admin API as “/reports/counts/zones_pending_notify”.
Implement support for a new database column “pending_notify” and set it to True every time a Pool NS record is updated.
Add an new boolean database column “pending_notify” on Zones. Implement a migration script to add the column to existing databases, defaulting to False. In future, the column might default to True.
Implement a Task in Zone Manager to periodically fetch a set of zones that need to receive a Notify starting with the oldest in term of last update time. The task frequency and the maximum set size can be configured to throttle the amount of outgoing Notify. Zone Manager will reset the “pending_notify” flag once done.
The throttling queue is implemented as a new database column containing a boolean flag. See Central Changes and Storage Changes.
Also, new zones will be created with an uniformly random refresh time between a minimum and a maximum value.
The throttling queue could be implemented outside of the database: - No need to create an extra database column - No increased database I/O
We propose using the database for the following reasons: - Zone Manager is the best candidate to handle the delayed Notify. Currently there are no ways for Central to send a list of Zones to Zone Manager other than through the database - The queue can support delayed Notify for changes other than Pool NS record updates - Ability to monitor the queue size and ETA to inform the user and for debugging - A persistent queue can survive Zone Manager unhandled exceptions or restarts - The increased database load is negligible compared to the existing traffic