11.1. Improving defaults

Although the out-of-the-box Elasticsearch configuration will satisfy the needs of most users, it’s important to note that it’s a highly flexible system that can be tuned beyond its default settings for increased performance.

Most uses of Elasticsearch in production environments may fall into the category of occasional full-text search, but a growing number of deployments are pushing formerly edge-case uses into more common installations, such as the growing trends of using Elasticsearch as a sole source of data, logging aggregators, and even using it in hybrid storage architectures where it’s used in conjunction with other database types. These exciting new uses open the door for us to explore interesting ways in which to tune and optimize the Elasticsearch default settings.

11.1.1. Index templates

Creating new indices and associated mappings in Elasticsearch is normally a simple task once the initial design planning has been completed. But there are some scenarios in which future indices must be created with the same settings and mappings as the previous ones. These scenarios include the following:

Log aggregation— In this situation a daily log index is needed for efficient querying and storage, much as rolling log file appenders work. A common example of this is found in cloud-based deployments, where distributed systems push their logs onto a central Elasticsearch cluster.

Configuring the cluster to handle automatic templating of log data by day helps organize the data and eases searching for the proverbial needle in the haystack.

Regulatory compliance— Here blocks of data must be either kept or removed after a certain time period to meet compliance standards, as in financial sector companies where Sarbanes-Oxley compliance is mandated. These sorts of mandates require organized record keeping where template systems shine.

Multi-tenancy— Systems that create new tenants dynamically often have a need to compartmentalize tenant-specific data.

Templates have their uses when a proven and repeatable pattern is needed for homogenous data storage. The automated nature of how Elasticsearch applies templates is also an attractive feature.

Creating a template

As the name suggests, an index template will be applied to any new index created. Indices that match a predefined naming pattern will have a template applied to them, ensuring uniformity in index settings across all of the matching indices. The index-creation event will have to match the template pattern for the

template to be applied. There are two ways to apply index templates to newly created indices in Elasticsearch:

By way of the REST API

By a configuration file

The former assumes a running cluster; the latter does not and is often used in predeployment scenarios that a dev ops engineer or system administrator would employ in a production environment.

In this section we’ll illustrate a simple index template used for log aggregation, so your log aggregation tool will have a new index created per day. At the time of this writing, Logstash was the most popular log-aggregation tool used alongside Elasticsearch, and its integration was seamless, so focusing on Logstash-to-Elasticsearch index template creation makes the most sense.

By default, Logstash makes API calls using the daily timestamp appended to the index name; for example, logstash-11-09-2014. Assuming you’re using the Elasticsearch default settings, which allow for automatic index creation, once Logstash makes a call to your cluster with a new event, the new index will be created with a name of logstash-11-09-2014, and the document type will be automapped. You’ll use the REST API method first, as shown here:

Using the PUT command, you instruct Elasticsearch to apply this template whenever an index call matching the logstash-* pattern is received. In this case, when Logstash posts a new event to

Elasticsearch and an index doesn’t exist by the name given, a new one will be created using this template.

This template also goes a bit further in applying an alias, so you can group all of these indices under a given month. You’ll have to rename the index manually each month, but it affords a convenient way to group indices of log events by month.

Templates configured on the file system

If you want to have templates configured on the file system, which sometimes makes it easier to manage maintenance, the option exists. Configuration files must follow these simple rules:

Template configurations must be in JSON format. For convenience, name them with a .json extension: <FILENAME>.json.

Template definitions should be located in the Elasticsearch configuration location under a templates directory. This path is defined in the cluster’s configuration file (elasticsearch.yml) as path.conf; for example, /config/templates/*.

Template definitions should be placed in the directories of nodes that are eligible to be elected as master.

Using the previous template definition, your template.json file will look like this:

{

"template" : "logstash-*",

"settings" : {

"number_of_shards" : 2,

"number_of_replicas" : 1

"mappings" : { ... },

"aliases" : { "november" : {} }

}

Much like defining via the REST API, now every index matching the logstash-* pattern will have this template applied.

Multiple template merging

Elasticsearch also enables you to configure multiple templates with different settings. You can then expand on the previous example and configure a template to handle log events by month and another that will store all log events in one index, as the following listing shows.

Listing 11.1. Configuring multiple templates

In the previous example, the topmost template will be responsible for November-specific logs because it matches on the pattern of index names beginning with "logstash-09-". The second template acts as a catchall, aggregating all log stash indices and even containing a different setting for the date mapping.

One thing to note about this configuration is the order attribute. This attribute implies that the lowest order number will be applied first, with the higher order number then overriding it. Because of this, the two templates settings are merged, with the effect of all November log events not having the date field stored.

Retrieving index templates

To retrieve a list of all templates, a convenience API exists: curl -XGET localhost:9200/_template/

Likewise, you’re able to retrieve either one or many individual templates by name:

curl -XGET localhost:9200/_template/logging_index curl -XGET localhost:9200/_template/logging_index_1,logging_index_2

Or you can retrieve all template names that match a pattern:

curl -XGET localhost:9200/template/logging*

Deleting index templates

Deleting a template index is achieved by using the template name. In the previous section, we defined a template as such: curl -XPUT 'localhost:9200/_template/logging_index' -d '{ ... }'

To delete this template, use the template name in the request:

curl -XDELETE 'localhost:9200/_template/logging_index'

11.1.2. Default mappings

As you learned in chapter 2, mappings enable you to define specific fields, their types, and even how Elasticsearch will interpret and store them. Furthermore, you learned how Elasticsearch supports dynamic mapping in chapter 3, removing the need to define your mappings at index-creation time; instead those mappings are dynamically generated based on the content of the initial document you index. This section, much like the previous one that covered default index templates, will introduce you to the concept of specifying default mappings, which act as a convenience utility for repetitive mapping creation.

We just showed you how index templates can be used to save time and add uniformity across similar datatypes. Default mappings have the same beneficial effects and can be thought of in the same vein as templates for mapping types. Default mappings are most often used when there are indices with similar fields. Specifying a default mapping in one place removes the need to repeatedly specify it across every index.

Mapping is not retroactive

Note that specifying a default mapping doesn’t apply the mapping retroactively. Default mappings are applied only to newly created types.

Consider the following example, where you want to specify a default setting for how you store the _source for all of your mappings, except for a Person type:

curl -XPUT 'localhost:9200/streamglue/_mapping/events' -d ' {

"Person" :

{

"_source" : {"enabled" : false}

}, "default" :

{"_source" : {"enabled" : true }

}

In this case, all new mappings will by default store the document _source, but any mapping of type Person, by default, will not. Note that you can override this behavior in individual mapping specifications.

Dynamic mappings

By default, Elasticsearch employs dynamic mapping: the ability to determine the datatype for new fields within a document. You may have experienced this when you first indexed a document and noticed that Elasticsearch dynamically created a mapping for it as well as the datatype for each of the fields. You can alter this behavior by instructing Elasticsearch to ignore new fields or even throw exceptions on unknown fields. You’d normally want to restrict the new addition of fields to prevent data pollution and help maintain control over the schema definition.

Disabling dynamic mapping

Note also that you can disable the dynamic creation of new mappings for unmapped types by setting index.mapper.dynamic to false in your elasticsearch.yml configuration.

The next listing shows how to add a dynamic mapping.

Listing 11.2. Adding a dynamic mapping

The first mapping restricts the creation of new fields in the person mapping. If you attempt to insert a document with an unmapped field, Elasticsearch will respond with an exception and not index the document. For instance, try to index a document with an additional first_name field added:

curl -XPOST 'localhost:9200/first_index/person' -d

"email": "[email protected]",

"created_date" : "2014-09-01",

"first_name" : "Bob"

Here’s the response:

{

error: "StrictDynamicMappingException[mapping set to strict, dynamic introduction of [first_name] within [person] is not allowed]" status: 400 }

Dynamic mapping and templating together

This section wouldn’t be complete if we didn’t cover how dynamic mapping and dynamic templates work together, allowing you to apply different mappings depending on the field name or datatype.

Earlier we explored how index templates can be used to autodefine newly created indices for a uniform set of indices and mappings. We can expand on this idea now by incorporating what we’ve covered with dynamic mappings.

The following example solves a simple problem when dealing with data comprising UUIDs. These are unique alphanumeric strings that contain hyphen separators, such as "b20d5470-d7b4-11e3-9fa625476c6788ce". You don’t want Elasticsearch analyzing them with a default analyzer because it would split the UUID by hyphen when building the index tokens. You want to be able to search by the complete string UUID, so you need Elasticsearch to store the entire string as a token. In this case, you need to instruct Elasticsearch to not analyze any string field whose name ends in "_guid":

In this example, the dynamic template is used to dynamically map fields that matched a certain name and type, giving you more control over how your data is stored and made searchable by Elasticsearch. As an additional note, you can use the path_match or path_unmatch keyword, which allows you to match or unmatch the dynamic template using dot notation—for instance, if you wanted to match something like person.*.email. Using this logic, you can see a match on a data structure such as this:

{

"person" : {

"user" : {

"email": { "[email protected]" }

}

Dynamic templates are a convenient method of automating some of the more tedious aspects of Elasticsearch management. Next, we’ll explore allocation awareness.