9.4. Upgrading Elasticsearch nodes

There comes a point with every installation of Elasticsearch when it’s time to upgrade to the latest version. We recommend that you always run the latest version of Elasticsearch because there are always new features being added, as well as bugs being fixed. That said, depending on the constraints of your environment, upgrading may be more or less complex.

Upgrade caveats

Before we get to upgrading instructions, it’s important to understand that there are some limitations when upgrading Elasticsearch instances. Once you’ve upgraded an Elasticsearch server, the server can’t be downgraded if any new documents have been written. When you perform upgrades to a production instance, you should always back up your data before performing an upgrade. We’ll talk more about backing up your data in chapter 11.

Another important thing to consider is that although Elasticsearch can handle a mixed-version

environment easily, there have been cases where different JVM versions serialize information differently, so we recommend that you not mix different JVM versions within the same Elasticsearch cluster.

The simplest way to upgrade an Elasticsearch cluster is to shut down all nodes and then upgrade each Elasticsearch installation with whatever method you originally used—for example, extracting the distribution if you used the .tar.gz distribution or installing the .deb package using dpkg if you’re using a Debian-based system. Once each node has been upgraded, you can restart the entire cluster and wait for Elasticsearch to reach the green state. Voila, upgrade done!

This may not always be the case, though; in many situations downtime can’t be tolerated, even during offpeak hours. Thankfully, you can perform a rolling restart to upgrade your Elasticsearch cluster while still serving indexing and searching requests.

9.4.1. Performing a rolling restart

A rolling restart is another way of restarting your cluster in order to upgrade a node or make a

nondynamic configuration change without sacrificing the availability of your data. This can be particularly good for production deployments of Elasticsearch. Instead of shutting down the whole cluster at once, you shut nodes down one at a time. This process is slightly more involved than a full restart because of the multiple steps required.

The first step in performing a rolling restart is to decide if you want Elasticsearch to automatically rebalance shards while each individual node is not running. The majority of people don’t want

Elasticsearch to start its automatic recovery in the event a node leaves the cluster for an upgrade because it means that they’ll be rebalancing every single node. In reality the data is still there; the node just needs to be restarted and to rejoin the cluster in order for it to be available.

For most people, it makes sense not to shift data around the cluster while performing the upgrade. You can accomplish this by setting the cluster.routing.allocation.enable setting to none while performing the upgrade. To clarify, the entire process looks like this:

Disable allocation for the cluster.
Shut down the node that will be upgraded.
Upgrade the node.
Start the upgraded node.
Wait until the upgraded node has joined the cluster.
Enable allocation for the cluster.
Wait for the cluster to return to a green state.

Repeat this process for each node that needs to be upgraded. To disable allocation for the cluster, you can use the cluster settings API with the following settings:

Once you run this command, Elasticsearch will no longer rebalance shards around the cluster. For instance, if a primary shard is lost for an index because the node it resided on is shut down, Elasticsearch will still turn the replica shard into a new primary, but a new replica won’t be created. While in this state, you can safely shut down the single Elasticsearch node and perform the upgrade.

After upgrading the node, make sure that you reenable allocation for the cluster; otherwise you’ll be wondering why Elasticsearch doesn’t automatically replicate your data in the future! You can reenable allocation by setting the cluster.routing.allocation.enable setting to all instead of none, like this:

You need to perform these two book-ending steps, disabling allocation and reenabling allocation, for every node in the cluster being upgraded. If you were to perform them only once at the beginning and once at the end, Elasticsearch wouldn’t allocate the shards that exist on the upgraded node every time you upgraded a node, and your cluster would be red once you upgraded multiple nodes. By reenabling allocation and waiting for the cluster to return to a green state after each node is upgraded, your data is allocated and available when you move to the next node that needs to be upgraded. Repeat these steps for each node that needs to be upgraded until you have a fully upgraded cluster.

There’s one more thing to mention in this section, and that’s indices that don’t have replicas. The previous examples all take into account the data having at least a single replica so that a node going down doesn’t remove access to the data. If you have an index that has no replicas, you can use the decommissioning steps we covered in section 9.3.1 to decommission the node by moving all the data off it before shutting it down to perform the upgrade.

9.4.2. Minimizing recovery time for a restart

You may notice that even with the disable and enable allocation steps, it can still take a while for the cluster to return to a green state when upgrading a single node. Unfortunately, this is because the replication that Elasticsearch uses is for each shard segment, rather than document-level. This means that the Elasticsearch node sending data to be replicated is saying, “Do you have segments_1?” If it doesn’t have the file or the file isn’t the same, the entire segment file is copied. A larger amount of data may be copied in the event that the documents are the same. Until Elasticsearch has a way of verifying the last document written in a segment file, it has to copy over any differing files when replicating data between the primary shard and the replica shard.

There are two different ways to make segment files identical on the primary and replica shards. The first is using the optimize API that we’ll talk about in chapter 10 to create a single, large segment for both the primary and the replica. The second is to toggle the number of replicas to 0 and then back to a higher number; this ensures that all replica copies have the same segment files as the primary shard. This means that for a short period you’ll have only a single copy of the data, so beware of doing this in a production environment!

Finally, in order to minimize recovery time, you can also halt indexing data into the cluster while you’re performing the node upgrade.

Now that we’ve covered upgrading a node, let’s cover a helpful API for getting information out of the cluster in a more human-friendly way: the _cat API.