10.5. Summary

In this chapter we looked at a number of optimizations you can do to increase the capacity and responsiveness of your cluster:

Use the bulk API to combine multiple index, create, update, or delete operations in the same request.

To combine multiple get or search requests, you can use the multiget or multisearch API, respectively.

A flush operation commits in-memory Lucene segments to disk when the index buffer size is full, the transaction log is too large, or too much time has passed since the last flush.

A refresh makes new segments—flushed or not—available for searching. During heavy indexing, it’s best to lower the refresh rate or disable refresh altogether.

The merge policy can be tuned for more or less segments. Fewer segments make searches faster, but merges take more CPU time. More segments make indexing faster by spending less time on merging, but searches will be slower.

An optimize operation forces a merge, which works well for static indices that get lots of searches. Store throttling may limit indexing performance by making merges fall behind. Increase or remove the limits if you have fast I/O.

Combine filters that use bitsets in a bool filter and filters that don’t in and/or/not filters.

Cache counts and aggregations in the shard query cache if you have static indices.

Monitor JVM heap and leave enough headroom so you don’t experience heavy garbage collection or out-of-memory errors, but leave some RAM for OS caches, too.

Use index warmers if the first query is too slow and you don’t mind slower indexing.

If you have room for bigger indices, using ngrams and shingles instead of fuzzy, wildcard, or phrase queries should make your searches faster.

You can often avoid using scripts by creating new fields with needed data in your documents before indexing them.

Try to use Lucene expressions, term statistics, and field data in your scripts whenever they fit. If your scripts don’t need to change often, look at appendix B to learn how to write a native script in an Elasticsearch plugin.

Use dfs_query_then_fetch if you don’t have balanced document frequencies between shards.

Use the count search type if you don’t need any hits and the scan search type if you need many.