10.3. Making the best use of caches

One of Elasticsearch’s strong points—if not the strongest point—is the fact that you can query billions of documents in milliseconds with commodity hardware. And one of the reasons this is possible is its smart caching. You might have noticed that after indexing lots of data, the second query can be orders of magnitude faster than the first one. It’s because of caching—for example, when you combine filters and queries—that the filter cache plays an important role in keeping your searches fast.

In this section we’ll discuss the filter cache and two other types of caches: the shard query cache, useful when you run aggregations on static indices because it caches the overall result, and the operating system caches, which keep your I/O throughput high by caching indices in memory.

Finally, we’ll show you how to keep all those caches warm by running queries at each refresh with index warmers. Let’s start by looking at the main type of Elasticsearch-specific cache—the filter cache—and how you can run your searches to make the best use of it.

10.3.1. Filters and filter caches

In chapter 4 you saw that lots of queries have a filter equivalent. Let’s say that you want to look for events on the get-together site that happened in the last month. To do that, you could use the range query or the equivalent range filter.

In chapter 4 we said that of the two, we recommend using the filter, because it’s cacheable. The range filter is cached by default, but you can control whether a filter is cached or not through the _cache flag.

Tip

Elasticsearch 2.0 will cache, by default, only frequently used filters and only on bigger segments (that were merged at least once). This should prevent caching too aggressively but should also catch frequent filters and optimize them. More implementation details can be found in the Elasticsearch[5] and Lucene[6] issues about filter caching. This flag applies to all filters; for example, the following snippet will filter events with "elasticsearch" in the verbatim tag but won’t cache the results:

  1. https://github.com/elastic/elasticsearch/pull/8573
  2. https://issues.apache.org/jira/browse/LUCENE-6077

% curl localhost:9200/get-together/group/_search?pretty -d '{

"query": {

"filtered": {

"filter": {

"term": {

"tags.verbatim": "elasticsearch",

"_cache": false

}

}

}

}

}'

Note

Although all filters have the _cache flag, it doesn’t apply in 100% of cases. For the range filter, if you use "now" as one of the boundaries, the flag is ignored. For the has_child or has_parent filters, the _cache flag doesn’t apply at all.

Filter cache

The results of a filter that’s cached are stored in the filter cache. This cache is allocated at the node level, like the index buffer size you saw earlier. It defaults to 10%, but you can change it from elasticsearch.yml according to your needs. If you use filters a lot and cache them, it might make sense to increase the size. For example: indices.cache.filter.size: 30%

How do you know if you need more (or less) filter cache? By monitoring your actual usage. As we’ll explore in chapter 11 on administration, Elasticsearch exposes lots of metrics, including the amount of filter cache that’s actually used and the number of cache evictions. An eviction happens when the cache gets full and Elasticsearch drops the least recently used (LRU) entry in order to make room for the new one.

In some use cases, filter cache entries have a short lifespan. For example, users typically filter gettogether events by a particular subject, refine their queries until they find what they want, and then leave. If nobody else is searching for events on the same subject, that cache entry will stick around doing nothing until it eventually gets evicted. A full cache with many evictions would make performance suffer because every search will consume CPU cycles to squeeze new cache entries by evicting old ones.

In such use cases, to prevent evictions from happening exactly when queries are run, it makes sense to set a time to live (TTL) on cache entries. You can do that on a per-index basis by adjusting index.cache.filter.expire. For example, the following snippet will expire filter caches after 30 minutes:

% curl -XPUT localhost:9200/get-together/_settings -d '{

"index.cache.filter.expire": "30m"

}'

Besides making sure you have enough room in your filter caches, you need to run your filters in a way that takes advantage of these caches.

Combining filters

You often need to combine filters—for example, when you’re searching for events in a certain time range, but also with a certain number of attendees. For best performance, you’ll need to make sure that caches are well used when filters are combined and that filters run in the right order.

To understand how to best combine filters, we need to revisit a concept discussed in chapter 4: bitsets. A bitset is a compact array of bits, and it’s used by Elasticsearch to cache whether a document matches a filter or not. Most filters (such as the range and terms filter) use bitsets for caching. Other filters, such as the script filter, don’t use bitsets because Elasticsearch has to iterate through all documents anyway. Table 10.1 shows which of the important filters use bitsets and which don’t.

Table 10.1. Which filters use bitsets

Filter type Uses bitset
term Yes
terms Yes, but you can configure it differently, as we’ll explain in a bit
exists/missing Yes
prefix Yes
regexp No
nested/has_parent/has_child No
script No
geo filters (see appendix A) No

For filters that don’t use bitsets, you can still set cache to true in order to cache results of that exact filter. _Bitsets are different than simply caching the results because they have the following characteristics:

They’re compact and easy to create, so the overhead of creating the cache when the filter is first run is insignificant.

They’re stored per individual filter; for example, if you use a term filter in two different queries or within two different bool filters, the bitset of that term can be reused.

They’re easy to combine with other bitsets. If you have two queries that use bitsets, it’s easy for Elasticsearch to do a bitwise AND or OR in order to figure out which documents match the combination.

To take advantage of bitsets, you need to combine filters that use them in a bool filter that will do that bitwise AND or OR, which is easy for your CPU. For example, if you want to show only groups where either Lee is a member or that contain the tag elasticsearch, it could look like this:

"filter": {

"bool": {

"should": [

{

"term": {

"tags.verbatim": "elasticsearch"

}

},

{

"term": {

"members": "lee"

}

}

]

}

}

The alternative to combining filters is using the and, or, and not filters. These filters work differently because unlike the bool filter, they don’t use bitwise AND or OR. They run the first filter, pass the matching documents to the next one, and so on. As a result, and, or, and not filters are better when it comes to combining filters that don’t use bitsets. For example, if you want to show groups having at least three members, with events organized in July 2013, the filter might look like this:

"filter": {

"and": [

{

"has_child": {

"type": "event",

"filter": {

"range": {

"date": {

"from": "2013-07-01T00:00",

"to": "2013-08-01T00:00"

}

}

}

}

},

{

"script": {

"script": "doc['members'].values.length > minMembers",

"params": {

"minMembers": 2

}

}

}

]

}

If you’re using both bitset and nonbitset filters, you can combine the bitset ones in a bool filter and put that bool filter in an and/or/not filter, along with the nonbitset filters. For example, in the next listing you’ll look for groups with at least two members where either Lee is one of them or the group is about Elasticsearch.

Listing 10.6. Combine bitset filters in a bool filter inside an and/or/not filter

Whether you combine filter with the bool, and, or, or not filters, the order in which those filters are executed is important. Cheaper filters, such as the term filter, should be placed before expensive filters, such as the script filter. This would make the expensive filter run on a smaller set of documents—those that already matched previous filters.

Running filters on field data

So far, we’ve discussed how bitsets and cached results make your filters faster. Some filters use bitsets; some can cache the overall results. Some filters can also run on field data. We first discussed field data in chapter 6 as an in-memory structure that keeps a mapping of documents to terms. This mapping is the opposite of the inverted index, which maps terms to documents. Field data is typically used when sorting and during aggregations, but some filters can use it, too: the terms and the range filters.

Note

An alternative to the in-memory field data is to use doc values, which are calculated at index time and stored on disk with the rest of your index. As we pointed out in chapter 6, doc values work for numeric and not-analyzed string fields. In Elasticsearch 2.0, doc values will be used by default for those fields because holding field data in the JVM heap is usually not worth the performance increase.

A terms filter can have lots of terms, and a range filter with a wide range will (under the hood) match lots of numbers (and numbers are also terms). Normal execution of those filters will try to match every term separately and return the set of unique documents, as illustrated in figure 10.6.

Figure 10.6. By default, the terms filter is checking which documents match each term, and it intersects the lists.

As you can imagine, filtering on many terms could get expensive because there would be many lists to intersect. When the number of terms is large, it can be faster to take the actual field values one by one and see if the terms match instead of looking in the index, as illustrated in figure 10.7.

Figure 10.7. Field data execution means iterating through documents but no list intersections.

These field values would be loaded in the field data cache by setting execution to fielddata in the terms or range filters. For example, the following range filter will get events that happened in 2013 and will be executed on field data:

"filter": {

"range": {

"date": {

"gte": "2013-01-01T00:00",

"lt": "2014-01-01T00:00"

}, "execution": "fielddata"

}

}

Using field data execution is especially useful when the field data is already used by a sort operation or an aggregation. For example, running a terms aggregation on the tags field will make a subsequent terms filter for a set of tags faster because the field data is already loaded.

Other execution modes for the terms filter: bool and and/or

The terms filter has other execution modes, too. If the default execution mode (called plain) builds a bitset to cache the overall result, you can set it to bool in order to have a bitset for each term instead. This is useful when you have different terms filters, which have lots of terms in common.

Also, there are and/or execution modes that perform a similar process, except the individual term filters are wrapped in an and/or filter instead of a bool filter.

Usually, the and/or approach is slower than bool because it doesn’t take advantage of bitsets. and/or might be faster if the first term filters match only a few documents, which makes subsequent filters extremely fast.

To sum up, you have three options for running your filters:

Caching them in the filter cache, which is great when filters are reused

Not caching them if they aren’t reused

Running terms and range filters on field data, which is good when you have many terms, especially if the field data for that field is already loaded

Next, we’ll look at the shard query cache, which is good for when you reuse entire search requests over static data.

10.3.2. Shard query cache

The filter cache is purpose-built to make parts of a search—namely filters that are configured to be cached—run faster. It’s also segment-specific: if some segments get removed by the merge process, other segments’ caches remain intact. By contrast, the shard query cache maintains a mapping between the whole request and its results on the shard level, as illustrated in figure 10.8. If a shard has already answered an identical request, it can serve it from the cache.

Figure 10.8. The shard query cache is more high-level than the filter cache.

As of version 1.4, results cached at the shard level are limited to the total number of hits (not the hits themselves), aggregations, and suggestions. That’s why (in version 1.5, at least) shard query cache works only when your query has search_type set to count.

Note

By setting search_type to count in the URI parameters, you tell Elasticsearch that you’re not interested in the query results, only in their number. We’ll look at count and other search types later in this section. In Elasticsearch 2.0, setting size to 0 will also work and search_type=count will be deprecated.[[1]]</sup>

The shard query cache entries differ from one request to another, so they apply only to a narrow set of requests. If you’re searching for a different term or running a slightly different aggregation, it will be a cache miss. Also, when a refresh occurs and the shard’s contents change, all shard query cache entries are invalidated. Otherwise, new matching documents could have been added to the index, and you’d get outdated results from the cache.

This narrowness of cache entries makes the shard query cache valuable only when shards rarely change and you have many identical requests. For example, if you’re indexing logs and have time-based indices, you may often run aggregations on older indices that typically remain unchanged until they’re deleted. These older indices are ideal candidates for a shard query cache.

To enable the shard query cache by default on the index level, you can use the indices update settings API:

% curl -XPUT localhost:9200/get-together/_settings -d '{

"index.cache.query.enable": true

}'

Tip

As with all index settings, you can enable the shard query cache at index creation, but it makes sense to do that only if your new index gets queried a lot and updated rarely.

For every query, you can also enable or disable the shard query cache, overriding the index-level setting, by adding the query_cache parameter. For example, to cache the frequent top_tags aggregation on our get-together index, even if the default is disabled, you can run it like this:

% URL="localhost:9200/get-together/group/_search"

% curl "$URL?search_type=count&query_cache&pretty" -d '{

"aggs": {

"top_tags": {

"terms": {

"field": "tags.verbatim"

}

}

}

}'

Like the filter cache, the shard query cache has a size configuration parameter. The limit can be changed at the node level by adjusting indices.cache.query.size from elasticsearch.yml, from the default of 1% of the JVM heap.

When sizing the JVM heap itself, you need to make sure you have enough room for both the filter and the shard query caches. If memory (especially the JVM heap) is limited, you should lower cache sizes to make more room for memory that’s used anyway by index and search requests in order to avoid out-ofmemory exceptions.

Also, you need to have enough free RAM besides the JVM heap to allow the operating system to cache indices stored on disk; otherwise you’ll have a lot of disk seeks.

Next we’ll look at how you can balance the JVM heap with the OS caches and why that matters.

10.3.3. JVM heap and OS caches

If Elasticsearch doesn’t have enough heap to finish an operation, it throws an out-of-memory exception that effectively makes the node crash and fall out of the cluster. This puts an extra load on other nodes as they replicate and relocate shards in order to get back to the configured state. Because nodes are typically equal, this extra load is likely to make at least another node run out of memory. Such a domino effect can bring down your entire cluster.

When the JVM heap is tight, even if you don’t see an out-of-memory error in the logs, the node may become just as unresponsive. This can happen because the lack of memory pressures the garbage collector (GC) to run longer and more often in order to free memory. As the GC takes more CPU time, there’s less computing power on the node for serving requests or even answering pings from the master, causing the node to fall out of the cluster.

Too much GC? Let’s search the web for some GC tuning tips!

When GC is taking a lot of CPU time, the engineer in us is tempted to find that magic JVM setting that will cure everything. More often than not, it’s the wrong place to search for a solution because heavy GC is just a symptom of Elasticsearch needing more heap than it has.

Although increasing the heap size is an obvious solution, it’s not always possible. The same applies to adding more data nodes. Instead, you can look at a number of tricks to reduce your heap usage:

Reduce the index buffer size that we discussed in section 10.2.

Reduce the filter cache and/or shard query cache.

Reduce the size value of searches and aggregations (for aggregations, you also have to take care of shard_size).

If you have to make do with large sizes, you can add some non-data and non-master nodes to act as clients. They’ll take the hit of aggregating per-shard results of searches and aggregations.

Finally, Elasticsearch uses another cache type to work around the way Java does garbage collection. There’s a young generation space where new objects are allocated. They’re “promoted” to old generation if they’re needed for long enough or if lots of new objects are allocated and the young space fills up. This last problem appears especially with aggregations, which have to iterate through large sets of documents and create lots of objects that might be reused with the next aggregation.

Normally you want these potentially reusable objects used by aggregations to be promoted to the old generation instead of some random temporary objects that just happen to be there when the young generation fills up. To achieve this, Elasticsearch implements a PageCacheRecycler[8] where big arrays used by aggregations are kept from being garbage collected. This default page cache is 10% of the total heap, and in some cases it might be too much (for example, you have 30 GB of heap, making the cache a healthy 3 GB). You can control the size of this cache from elasticsearch.yml via cache.recycler.page.limit.heap.

8 https://github.com/elastic/elasticsearch/issues/4557

Still, there are times when you’d need to tune your JVM settings (although the defaults are very good), such as when you have almost enough memory but the cluster has trouble when some rare but long GC pauses kick in. You have some options to make GC kick in more often but stop the world less, effectively trading overall throughput for better latency:

Increase the survivor space (lower -XX:SurvivorRatio) or the whole young generation (lower XX:NewRatio) compared to the overall heap. You can check if this is needed by monitoring different generations.[9] More space should give more time for the young GC to clean up short-lived objects before they get promoted to the old generation, where a GC will stop the world for longer. But making these spaces too large will make the young GC work too hard and become inefficient, because longer-living objects have to be copied between the two survivor spaces

9

Sematext’s SPM can do that for you, as described in appendix D.

Use the G1 GC (-XX:+UseG1GC), which will dynamically allocate space for different generations and is optimized for large-memory, low-latency use cases. It’s not used as the default as of version

1.5 because there are still some bugs showing up[10] on 32-bit machines, so make sure you test it thoroughly before using G1 in production.

10 https://wiki.apache.org/lucene-java/JavaBugs

Can you have too large of a heap?

It might have been obvious that a heap that’s too small is bad, but having a heap that’s too large isn’t great either. A heap size of more than 32 GB will automatically make pointers uncompressed and waste memory. How much wasted memory? It depends on the use case: it can vary from as little as 1 GB for 32 GB if you’re doing mostly aggregations (which use big arrays that have few pointers) to something like 10 GB if you’re using filter caches a lot (which have many small entries with many pointers). If you really need more than 32 GB of heap, you’re sometimes better off running two or more nodes on the same machine, each with less than 32 GB of heap, and dividing the data between them through sharding.

Note

If you end up with multiple Elasticsearch nodes on the same physical machine, you need to make sure that two replicas of the same shard aren’t allocated on the same physical machine under different

Elasticsearch nodes. Otherwise, if a physical machine goes down, you’ll lose two copies of that shard. To prevent this, you can use shard allocation, as described in chapter 11.

Below 32 GB too much heap still isn’t ideal (actually, at exactly 32 GB you already lose compressed pointers, so it’s best to stick with 31 GB as a maximum). The RAM on your servers that isn’t occupied by the JVM is typically used by the operating system to cache indices that are stored on the disk. This is especially important if you have magnetic or network storage because fetching data from the disk while running a query will delay its response. Even with fast SSDs, you’ll get the best performance if the amount of data you need to store on a node can fit in its OS caches.

So far we’ve seen that a heap that’s too small is bad because of GC and out-of-memory issues, and one that’s too big is bad, too, because it diminishes OS caches. What’s a good heap size, then?

Ideal heap size: follow the half rule

Without knowing anything about the actual heap usage for your use case, the rule of thumb is to allocate half of the node’s RAM to Elasticsearch, but no more than 32 GB. This “half” rule often gives a good balance between heap size and OS caches.

If you can monitor the actual heap usage (and we’ll show you how to do that in chapter 11), a good heap size is just large enough to accommodate the regular usage plus any spikes you might expect. Memory usage spikes could happen—for example, if someone decides to run a terms aggregation with size 0 on an analyzed field with many unique terms. This will force Elasticsearch to load all terms in memory in order to count them. If you don’t know what spikes to expect, the rule of thumb is again half: set a heap size 50% higher than your regular usage.

For OS caches, you depend mostly on the RAM of your servers. That being said, you can design your indices in a way that works best with your operating system’s caching. For example, if you’re indexing application logs, you can expect that most indexing and searching will involve recent data. With timebased indices, the latest index is more likely to fit in the OS cache than the whole dataset, making most operations faster. Searches on older data will often have to hit the disk, but users are more likely to expect and tolerate slow response times on these rare searches that span longer periods of time. In general, if you can put “hot” data in the same set of indices or shards by using time-based indices, userbased indices, or routing, you’ll make better use of OS caches.

All the caches we discussed so far—filter caches, shard query caches, and OS caches—are typically built when a query first runs. Loading up the caches makes that first query slower, and the slowdown increases with the amount of data and the complexity of the query. If that slowdown becomes a problem, you can warm up the caches in advance by using index warmers, as you’ll see next.

10.3.4. Keeping caches up with warmers

A warmer allows you to define any kind of search request: it can contain queries, filters, sort criteria, and aggregations. Once it’s defined, the warmer will make Elasticsearch run the query with every refresh operation. This will slow down the refresh, but the user queries will always run on “warm” caches.

Warmers are useful when first-time queries are too slow and it’s preferable for the refresh operation to take that hit rather than the user. If our get-together site example had millions of events and consistent search performance was important, warmers would be useful. Slower refreshes shouldn’t concern you too much, because you expect groups and events to be searched for more often than they’re modified.

To define a warmer on an existing index, you’d issue a PUT request to the index’s URI, with _warmer as the type and the chosen warmer name as an ID, as shown in listing 10.7. You can have as many warmers as you want, but keep in mind that the more warmers you have, the slower your refreshes will be.

Typically, you’d use a few popular queries as your warmers. For example, in the following listing, you’ll put two warmers: one for upcoming events and one for popular group tags.

Listing 10.7. Two warmers for upcoming events and popular group tags

curl -XPUT 'localhost:9200/get-together/event/_warmer/upcoming_events' -d '{ "sort": [ {

"date": { "order": "desc" }

}]

}'

{"acknowledged": true}

curl -XPUT 'localhost:9200/get-together/group/_warmer/top_tags' -d '{

"aggs": {

"top_tags": {

"terms": {

"field": "tags.verbatim"

}

}

}

}'

{"acknowledged": true}

Later on, you can get the list of warmers for an index by doing a GET request on the _warmer type: curl localhost:9200/get-together/_warmer?pretty

You can also delete warmers by sending a DELETE request to the warmer’s URI: curl -XDELETE localhost:9200/get-together/_warmer/top_tags

If you’re using multiple indices, it makes sense to register warmers at index creation. To do that, define them under the warmers key in the same way you do with mappings and settings, as shown in the following listing.

If new indices are created automatically, which might occur if you’re using time-based indices, you can define warmers in an index template that will be applied automatically to newly created indices. We’ll talk more about index templates in chapter 11, which is all about how to administer your Elasticsearch cluster.

So far we’ve talked about general solutions: how to keep caches warm and efficient to make your searches fast, how to group requests to reduce network latency, and how to configure segment refreshing, flushing, and storing in order to make your indexing and searching fast. All of this also should reduce the load on your cluster.

Next we’ll talk about narrower best practices that apply to specific use cases, such making your scripts fast or doing deep paging efficiently.

1: