9.6. Scaling strategies

It might seem easy enough to add nodes to a cluster to increase the performance, but this is actually a case where a bit of planning goes a long way toward getting the best performance out of your cluster.

Every use of Elasticsearch is different, so you’ll have to pick the best options for your cluster based on how you’ll index data, as well as how you’ll search it. In general, though, there are at least three things you’ll want to consider when planning for a production Elasticsearch cluster: over-sharding, splitting data between indices and shards, and maximizing throughput.

9.6.1. Over-sharding

Let’s start by talking about over-sharding. Over-sharding is the process whereby you intentionally create a larger number of shards for an index so you have room to add nodes and grow in the future; this is best illustrated by a diagram, so take a look at figure 9.8.

Figure 9.8. A single node with a single shard and two nodes trying to scale a single shard

In figure 9.8, you’ve created your get-together index with a single shard and no replicas. But what happens when you add another node?

Whoops! You’ve totally removed any benefit you get from adding nodes to the cluster. By adding another node, you’re unable to scale because all of the indexing and querying load will still be handled by the node with the single shard on it. Because a shard is the smallest thing that Elasticsearch can move around, it’s a good idea to always make sure you have at least as many primary shards in your cluster as you plan to have nodes; if you currently have a 5-node cluster with 11 primary shards, you have room to grow when you need to add more nodes to handle additional requests. Using the same example, if you suddenly need more than 11 nodes, you won’t be able to distribute the primary shards across nodes because you’ll have more nodes than shards.

That’s easy to fix, you might say: “I’ll just create an index with 100 primary shards!” It may seem like a good idea at first, but there’s a hidden cost to each shard Elasticsearch has to manage. Because each shard is a complete Lucene index, as you learned in chapter 1, each shard requires a number of file descriptors for each segment of the index, as well as a memory overhead. By creating too large a number of shards for an index, you may be using memory that could be better served to bolster performance, or you could end up hitting the machine’s file descriptor or RAM limits. In addition, when compressing your data, you’ll end up splitting the data across 100 different things, lowering the compression rate you would have gotten if you had picked a more reasonable size.

It’s worth noting that there is no perfect shard-to-index ratio for all use cases; Elasticsearch picks a good default of five shards for the general case, but it’s always important to think about how you plan on growing (or shrinking) in the future with regard to the number of shards you create and index with. Don’t forget: once an index has been created with a number of shards, the number of primary shards can never be changed for that index! You don’t want to be in the position of having to re-index a large portion of your data six months down the line because there wasn’t enough planning up front. We’ll also talk more about this in the next chapter when we discuss indexing in depth.

Along the same lines as choosing the number of shards to create an index with, you’ll also need to decide on how exactly to split your data across indices in Elasticsearch.

9.6.2. Splitting data into indices and shards

Unfortunately for now, there’s no way to increase or decrease the number of primary shards in an index, but you could always plan your data to span multiple indices. This is another perfectly valid way to split data. Taking our get-together example, there’s nothing stopping you from creating an index for every different city an event occurs in. For example, if you expect to have a larger number of events in New York than Sacramento, you could create a sacramento index with two primary shards and a newyork index with four primary shards, or you could segment the data by date, creating an index for each year an event occurs or is created: 2014, 2015, 2016, and so on. Segmenting data in this way can also be helpful when searching because the segmentation is handled by putting the right data in the right place; if the customer wants to search only for events or groups from the year 2014 or 2015, you’ll have to search only those indices rather than the entire get-together index.

Another way to plan using indices is with aliases. An alias acts like a pointer to an index or a set of indices. An alias also allows you to change the indices that it points to at any time. This is incredibly useful for segmenting your data in a semantic way; you could create an alias called last-year that points to 2015; then, when January 1, 2016 rolls around, you can change the alias to point to the 2015 index. This technique is commonly used when indexing date-based information (like log files) so that data can be segmented by date on a monthly/weekly/daily basis and an alias named current can be used to always point to the data that should be searched without having to change the name of the index being searched every time the segment rolls over. Again, aliases allow an incredible level of flexibility and have almost zero overhead, so experimentation is encouraged. We’ll talk in more depth about aliases later on in this chapter.

When creating indices, don’t forget that because each index has its own shards, you’ll still incur the overhead of creating a shard, so make sure not to create too many shards by creating too many indices and using resources that could be better spent handling requests. Once you know how your data will be laid out in the cluster, you can work on tweaking the node configuration to maximize your throughput.

9.6.3. Maximizing throughput

Maximizing throughput is one of those fuzzy, hazy terms that can mean an awful lot of things. Are you trying to maximize the indexing throughput? Make searches faster? Execute more searches at once? There are different ways to tweak Elasticsearch to accomplish each task. For example, if you received thousands of new groups and events, how would you go about indexing them as fast as possible? One way to make indexing faster is to temporarily reduce the number of replica shards in your cluster. When indexing data, by default the request won’t complete until the data exists on the primary shard as well as all replicas, so it may be advantageous to reduce the number of replicas to one (or zero if you’re okay with the risk) while indexing and then increase the number back to one or more once the period of heavy indexing has completed.

What about searches? Searches can be made faster by adding more replicas because either a primary or a replica shard can be used to search on. To illustrate this, check out figure 9.9, which shows a three-node cluster where the last node can’t help with search requests until it has a copy of the data.

Figure 9.9. Additional replicas handling search and aggregations

But don’t forget that creating more shards in an Elasticsearch cluster does come with the small overhead in file descriptors and memory. If the volume of searches is getting too high for the nodes in the cluster to keep up, consider adding nodes with node.data and node.master both set to false. These nodes can then be used to handle incoming requests, distribute the request to the data nodes, and collect the results for responses. This way, the nodes searching the shards don’t have to handle connections from search clients; they only need to search shards. We’ll talk more about different ways of speeding up both indexing and searching in the next chapter.