7.5. Summary

In this chapter, we covered the major aggregation types and how you can combine them to get insights about documents matching a query:

Aggregations help you get an overall view of query results by counting terms and computing statistics from resulting documents.

Aggregations are the new facets in Elasticsearch because there are more types of aggregations, and you can also combine them to get deeper insights into the data.

There are two main types of aggregations: bucket and metrics.

Metrics aggregations calculate statistics over a set of documents, such as the minimum, maximum, or average value of a numeric field.

Some metrics aggregations are calculated with approximation algorithms, which allows them to scale a lot better than exact metrics. The percentiles and cardinality aggregations work like this.

Bucket aggregations put documents into one or more buckets and return counters for those buckets— for example, the most frequent posters in a forum. You can nest sub-aggregations under bucket aggregations, making these sub-aggregations run one time for each bucket generated by the parent. You can use this nesting, for example, to get the average number of comments for blog posts matching each tag.

The top_hits aggregation can be used as a sub-aggregation to implement result grouping. The terms aggregation is typically used for top frequent users/locations/items/... kinds of use cases. Other multi-bucket aggregations are variations of the terms aggregation, such as the significant_terms aggregation, which returns those words that appear more often in the query results than in the overall index.

The range and date_range aggregations are useful for categorizing numeric and date fields. The histogram and date_histogram aggregations are similar, but they use fixed intervals instead of manually defined ranges.

Single-bucket aggregations, such as the global, filter, filters, and missing aggregations, are used to change the document set on which other aggregations run, which defaults to the documents returned by the query.