Using aggregation pipeline instead of map-reduce job in MongoDB¶
https://blueprints.launchpad.net/ceilometer/+spec/mongodb-aggregation-pipeline
Problem description¶
Currently, when we make a GET “/v2/meter/<meter_type/statistics” with MongoDB backend it starts a native map-reduce job in MongoDB instance. Tests and deep researching show that the job have a lack of performance in work with huge amount of samples (several millions and above). For example, job processes ~10000 samples per second on my test environment (16 GB RAM, 8 CPU, 1 TB disk, 15000000 samples). So job for 15M samples works ~1500 seconds. It’s longer than default api timeout, 1 minute.
Of course, with Gnocchi dispatcher we haven’t issue with statistics, but users which are going to use only MongoDB backend will have troubles with alarm work and making user reports.
Proposed change¶
Add an implementation of method get_meter_statistics via MongoDB aggregation pipeline framework.
From MongoDB docs: “This framework modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result. The most basic pipeline stages provide filters that operate like queries and document transformations that modify the form of the output document. Other pipeline operations provide tools for grouping and sorting documents by specific field or fields as well as tools for aggregating the contents of arrays, including arrays of documents. In addition, pipeline stages can use operators for tasks such as calculating the average or concatenating a string. The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB.”
My researches show that aggregation pipeline is faster than native map-reduce job to ~10 times. So processing of 15M samples in the same test environment works 128 seconds vs 1500 seconds with map-reduce.
Pipeline aggregation framework has a large functionality and amount of operators, that allows to provide support of all existence “statistics” features.
This implementation affects only performance of statistics request in Ceilometer MongoDB and doesn’t affects API or different backends.
Risks:
This framework have specified limits. It restricted by 100 MB RAM for stage otherwise it needs to write temporary files with intermediate stages results to disk. For avoiding failing caused by excessive memory using in MongoDB>=2.6 we can use the option allowDiskUse=True in aggregation command . This option allows to write intermediate staging data to temporary files. So, primary risks of this approach are a necessity of free space on disk and a slow performance of disk writing and reading.
Accordingly, researches and MongoDB docs, the “$sort” command creates the most amount of intermediate data for follow stages. So, in practice this stage prepares data whose size is close to new index size. In same time, the indexed fields sorting (like timestamp in our meter collection) does not need the any additional data in the disk. This request uses the existing index for sorting. Other commands works with processed and grouped data and use additional space only in worst case (huge amount of resources and group by resource_id in one request).
Despite to writing temporary file into disk the aggregation command in this case faster than Map-Reduce up to several times.
Also this MongoDB mechanisms have limit in size of finally document in 16 MB, same as map-reduce job.
Alternatives¶
Also we may improve performance of map-reduce job
Data model impact¶
None
REST API impact¶
None
Security impact¶
None
Pipeline impact¶
None
Other end user impact¶
None
Performance/Scalability Impacts¶
Improve performance of GET “/v2/<meter_name>/statistics” request.
Other deployer impact¶
None
Developer impact¶
None
Implementation¶
Assignee(s)¶
- Primary assignee:
ityaptin
- Ongoing maintainer:
idegtiarev
Work Items¶
implement a get_meter_statistics function with aggregation pipeline framework
Future lifecycle¶
None
Dependencies¶
None
Testing¶
current tests are check correct work of “statistics” request
Documentation Impact¶
None