9.1. Adding nodes to your Elasticsearch cluster

Even if you don’t end up in a situation at work like the one just described, during the course of your experimentation with Elasticsearch you’ll eventually come to the point where you need to add more processing power to your Elasticsearch cluster.

You need to be able to search and index data in your indices faster, with more parallelization; you’ve run out of disk space on your machine, or perhaps your Elasticsearch node is now running out of memory when performing queries against your data. In these cases, the easiest way to add performance to your Elasticsearch node is usually to turn it into an Elasticsearch cluster by adding more nodes, which you first learned about in chapter 2. Elasticsearch makes it easy to scale horizontally by adding nodes to your cluster so they can share the indexing and searching workload. By adding nodes to your Elasticsearch cluster, you’ll soon be able to handle indexing and searching the millions of groups and events headed your way.

9.1.1. Adding nodes to your cluster

The first step in creating an Elasticsearch cluster is to add another node (or nodes) to the single node to make it a cluster of nodes. Adding a node to your local development environment is as simple as extracting the Elasticsearch distribution to a separate directory, entering the directory, and running the bin/elasticsearch command, as the following code snippet shows. Elasticsearch will automatically pick the next port available to bind to—in this case, 9201—and automatically join the existing node like magic! If you want to go one step further, there’s no need to even extract the Elasticsearch distribution again; multiple instances of Elasticsearch can run from the same directory without interfering with one another:

Now that you have a second Elasticsearch node added to the cluster, you can run the health command from before and see how the status of the cluster has changed, as shown in the following listing.

Listing 9.1. Getting cluster health for a two-node cluster

There are now no unassigned shards in this cluster, as you can see from the unassigned_shards count, which is zero. How exactly did the shards end up on the other node? Take a look at figure 9.1 and see what happens to the test index before and after adding a node to the cluster. On the left side, the primary shards for the test index have all been assigned to Node1, whereas the replica shards are unassigned. In this state, the cluster is yellow because all primary shards have a home, but the replica shards don’t. Once a second node is added, the unassigned replica shards are assigned to the new Node2, which causes the cluster to move to the green state.

Figure 9.1. Shard allocation for the test index for one node transitioning to two nodes

When another node is added, Elasticsearch will automatically try to balance out the shards among all nodes. Figure 9.2 shows how the same shards are distributed across three Elasticsearch nodes in the same cluster. Notice that there’s no ban on having primary and replica shards on the same node as long as the primary and replica shards for the same shard number aren’t on the same node.

Figure 9.2. Shard allocation for the test index with three Elasticsearch nodes

If even more nodes are added to this cluster, Elasticsearch will try to balance the number of shards evenly across all nodes because each node added in this way shares the burden by taking a portion of the data (in the form of shards). Congratulations, you just horizontally scaled your Elasticsearch cluster!

Adding nodes to your Elasticsearch cluster comes with substantial benefits, the primary being high availability and increased performance. When replicas are enabled (which they are by default),

Elasticsearch will automatically promote a replica shard to a primary in the event the primary shard can’t be located, so even if you lose the node where the primary shards for your index are, you’ll still be able to access the data in your indices. This distribution of data among nodes also increases performance because search and get requests can be handled by both primary and replica shards, as you’ll recall from figure 2.9. Scaling this way also adds more memory to the cluster as a whole, so if memory-intensive searches and aggregations are taking too long or causing your cluster to run out of memory, adding more nodes is almost always an easy way to handle more numerous and complex operations.

Now that you’ve turned your Elasticsearch node into a true cluster by adding a node, you may be wondering how each node was able to discover and communicate with the other node or nodes. In the next section, we’ll talk about Elasticsearch’s node discovery methods.