2.2. Understanding the physical layout: nodes and shards

Understanding how data is physically laid out boils down to understanding how Elasticsearch scales. Although chapter 9 is dedicated entirely to scaling, in this section, we’ll introduce you to how scaling works by looking at how multiple nodes work together in a cluster, how data is divided in shards and replicated, and how indexing and searching work with multiple shards and replicas.

To understand the big picture, let’s review what happens when an Elasticsearch index is created. By default, each index is made up of five primary shards, each with one replica, for a total of ten shards, as illustrated in figure 2.3.

Figure 2.3. A three-node cluster with an index divided into five shards with one replica per shard

As you’ll see next, replicas are good for reliability and search performance. Technically, a shard is a directory of files where Lucene stores the data for your index. A shard is also the smallest unit that Elasticsearch moves from node to node.

2.2.1. Creating a cluster of one or more nodes

A node is an instance of Elasticsearch. When you start Elasticsearch on your server, you have a node. If you start Elasticsearch on another server, it’s another node. You can even have more nodes on the same server by starting multiple Elasticsearch processes.

Multiple nodes can join the same cluster. As we’ll discuss later in this chapter, starting nodes with the same cluster name and otherwise default settings is enough to make a cluster. With a cluster of multiple nodes, the same data can be spread across multiple servers. This helps performance because

Elasticsearch has more resources to work with. It also helps reliability: if you have at least one replica per shard, any node can disappear and Elasticsearch will still serve you all the data. For an application that’s using Elasticsearch, having one or more nodes in a cluster is transparent. By default, you can connect to any node from the cluster and work with the whole data just as if you had a single node.

Although clustering is good for performance and availability, it has its disadvantages: you have to make sure nodes can communicate with each other quickly enough and that you won’t have a split brain (two parts of the cluster that can’t communicate and think the other part dropped out). To address such issues, chapter 9 discusses scaling out.

What happens when you index a document?

By default, when you index a document, it’s first sent to one of the primary shards, which is chosen based on a hash of the document’s ID. That primary shard may be located on a different node, like it is on Node 2 in figure 2.4, but this is transparent to the application.

Figure 2.4. Documents are indexed to random primary shards and their replicas. Searches run on complete sets of shards, regardless of their status as primaries or replicas.

Then the document is sent to be indexed in all of that primary shard’s replicas (see the left side of figure 2.4). This keeps replicas in sync with data from the primary shards. Being in sync allows replicas to serve searches and to be automatically promoted to primary shards in case the original primary becomes unavailable.

What happens when you search an index?

When you search an index, Elasticsearch has to look in a complete set of shards for that index (see right side of figure 2.4). Those shards can be either primary or replicas because primary and replica shards typically contain the same documents. Elasticsearch distributes the search load between the primary and replica shards of the index you’re searching, making replicas useful for both search performance and fault tolerance.

Next we’ll look at the details of what primary and replica shards are and how they’re allocated in an Elasticsearch cluster.

2.2.2. Understanding primary and replica shards

Let’s start with the smallest unit Elasticsearch deals with, a shard. A shard is a Lucene index: a directory of files containing an inverted index. An inverted index is a structure that enables Elasticsearch to tell you which document contains a term (a word) without having to look at all the documents.

Elasticsearch index vs. Lucene index

You’ll see the word “index” used frequently as we discuss Elasticsearch; here’s how the terminology works.

An Elasticsearch index is broken down into chunks: shards. A shard is a Lucene index, so an

Elasticsearch index is made up of multiple Lucene indices. This makes sense because Elasticsearch uses Apache Lucene as its core library to index your data and search through it.

Throughout this book, whenever you see the word “index” by itself, it refers to an Elasticsearch index. If we’re digging into the details of what’s in a shard, we’ll specifically use the term “Lucene index.”

In figure 2.5, you can see what sort of information the first primary shard of your get-together index may contain. The shard get-together0, as we’ll call it from now on, is a Lucene index—an inverted index. By default, it stores the original document’s content plus additional information, such as term dictionary and term frequencies, which helps searching.

Figure 2.5. Term dictionary and frequencies in a Lucene index

The term dictionary maps each term to identifiers of documents containing that term (see figure 2.5). When searching, Elasticsearch doesn’t have to look through all the documents for that term—it uses this dictionary to quickly identify all the documents that match.

Term frequencies give Elasticsearch quick access to the number of appearances of a term in a document. This is important for calculating the relevancy score of results. For example, if you search for “denver”, documents that contain “denver” many times are typically more relevant. Elasticsearch gives them a higher score, and they appear higher in the list of results. By default, the ranking algorithm is TF-IDF, as we explained in chapter 1, section 1.1.2, but you have a lot more options. We’ll discuss search relevancy in great detail in chapter 6.

A shard can be either a primary or a replica shard, with replicas being exactly that—copies of the primary shard. A replica is used for searching or it becomes a new primary shard if the original primary shard is lost.

An Elasticsearch index is made up of one or more primary shards and zero or more replica shards. In

Figure 2.6, you can see that the Elasticsearch get-together index is made up of six total shards: two primary shards (the darker boxes) and two replicas for each shard (the lighter boxes) for a total of four replicas.

You can change the number of replicas per shard at any time because replicas can always be created or removed. This doesn’t apply to the number of primary shards an index is divided into; you have to decide on the number of shards before creating the index.

Keep in mind that too few shards limit how much you can scale, but too many shards impact performance. The default setting of five is typically a good start. You’ll learn more in chapter 9, which is all about scaling. We’ll also explain how to add/remove replica shards dynamically.

All the shards and replicas you’ve seen so far are distributed to nodes within an Elasticsearch cluster. Next we’ll look at some details about how Elasticsearch distributes shards and replicas in a cluster having one or more nodes.

2.2.3. Distributing shards in a cluster

The simplest Elasticsearch cluster has one node: one machine running one Elasticsearch process. When you installed Elasticsearch in chapter 1 and started it, you created a one-node cluster.

As you add more nodes to the same cluster, existing shards get balanced between all nodes. As a result, both indexing and search requests that work with those shards benefit from the extra power of your added nodes. Scaling this way (by adding nodes to a cluster) is called horizontal scaling; you add more nodes, and requests are then distributed so they all share the work. The alternative to horizontal scaling is to scale vertically; you add more resources to your Elasticsearch node, perhaps by dedicating more processors to it if it’s a virtual machine, or adding RAM to a physical machine. Although vertical scaling helps performance almost every time, it’s not always possible or cost-effective. Using shards enables you to scale horizontally.

Suppose you want to scale your get-together index, which currently has two primary shards and no replicas. As shown in figure 2.7, the first option is to scale vertically by upgrading the node: for example, adding more RAM, more CPUs, faster disks, and so on. The second option is to scale horizontally by adding another node and having your data distributed between the two nodes.

Figure 2.7. To improve performance, scale vertically (upper-right) or scale horizontally (lower-right).

We talk more about performance in chapter 10. For now, let’s see how indexing and searching work across multiple shards and replicas.

2.2.4. Distributed indexing and searching

At this point you might wonder how indexing and searching work with multiple shards spread across multiple nodes.

Let’s take indexing, as shown in figure 2.8. The Elasticsearch node that receives your indexing request first selects the shard to index the document to. By default, documents are distributed evenly between shards: for each document, the shard is determined by hashing its ID string. Each shard has an equal hash range, with equal chances of receiving the new document. Once the target shard is determined, the current node forwards the document to the node holding that shard. Subsequently, that indexing operation is replayed by all the replicas of that shard. The indexing command successfully returns after all the available replicas finish indexing the document.

Figure 2.8. Indexing operation is forwarded to the responsible shard and then to its replicas.

With searching, the node that receives the request forwards it to a set of shards containing all your data. Using a round-robin, Elasticsearch selects an available shard (which can be primary or replica) and forwards the search request to it. As shown in figure 2.9, Elasticsearch then gathers results from those shards, aggregates them into a single reply, and forwards the reply back to the client application.

Figure 2.9. Search request is forwarded to primary/replica shards containing a complete set of data. Then results are aggregated and sent back to the client.

By default, primary and replica shards get hit by searches in round-robin, assuming all nodes in your cluster are equally fast (identical hardware and software configurations). If that’s not the case, you can organize your data or configure your shards to prevent the slower nodes from becoming a bottleneck. We explore such options further in chapter 9. For now, let’s start indexing documents in the single-node Elasticsearch cluster that you started in chapter 1.