6.3. Boosting

Boosting is the process by which you can modify the relevance of a document. There are two different types of boosting. You can boost a document while you are indexing it or when you query for the document. Because changing the boosting of a document at index time stores data in the index, and the only way to change that boosting value is to re-index the document, we definitely recommend you use the query-time boosting because it’s the most flexible and allows you to change your mind about what fields or terms are important without having to re-index your data.

Let’s take the example from the get-together index. In the example, if you’re searching for a group, it makes sense that matching a group’s title is more important than matching the description of the group. Take the Elasticsearch Berlin group. The title contains only the most important information that the group is focused on, Elasticsearch in the Berlin area, versus the description of the group, which may contain many more terms. The title of a group should have more weight than the description, and to accomplish this, you’ll use boosting.

Before you start, though, it’s important to mention that boost numbers are not exact multipliers. This means that the boost value is normalized when computing the scores. For example, if you specify a boost of 10 for every single field, it will end up normalized to 1 for every field, meaning no boost is applied. You should think of boost numbers as relative; boosting the name field by 3 means that the name field is roughly three times as important as the other fields.

6.3.1. Boosting at index time

As we mentioned, in addition to boosting a document during a query, you can also boost it at index time. Even though we don’t recommend this type of boosting, as you’ll see shortly, it can still be useful in some cases, so let’s talk about how to set it up.

When doing this type of boosting, you need to specify the mapping for your field with the boost parameter. For example, to boost the name field for the group type, you’d create an index with mappings that look like those in the next listing.

Listing 6.3. Boosting the name field in the group type at index time

After specifying this mapping for the index, any document that’s indexed automatically has a boost applied to the terms in the name field (stored with the document in the Lucene index). Again, remember that this boost value is fixed, which means if you decide you want to change it, you’ll need to re-index.

Another reason to not do index-time boosting is that boost values are stored as low-precision values in

Lucene’s internal index structure; only a single byte is used to store the floating-point number, so it’s possible to lose precision when calculating the final score of a document.

The final reason to not use index-time boosting is that the boost is applied to all terms. Therefore, matching multiple terms in the boosted field implies a multiplied boost, increasing the weight for the field even more.

Because of these issues with boosting at index time, it’s much better to boost when performing the queries, as you’ll see next.

6.3.2. Boosting at query time

There are quite a few ways to perform boosting when searching. If you’re using the basic match,

multi_match, simple_query_string, or query_string queries, you control the boost either on a per-term or per-field basis. Almost all of Elasticsearch’s query types support boosting. If this isn’t flexible enough, you can control the boosting in a more fine-grained manner with the function_score query, which we’ll cover a little later in the chapter.

With the match query, you can boost the query by using the additional boost parameter, as shown in the next listing. Boosting the query means that each found term in the configured field you query for gets a boost.

Listing 6.4. Query-time boosting using the match query

This also works for other queries that Elasticsearch provides, such as the term query, prefix query, and so on. In the previous example, notice that a boost was added only to the first match query. Now the first match query has a bigger impact on the final score than the second match query. It only makes sense to boost a query when you’re combining multiple queries using the bool or and/or/not queries.

6.3.3. Queries spanning multiple fields

For queries that span multiple fields, such as the multi_match query, you also have access to an alternative syntax. You can specify the boost for the entire multi_match, similar to the match query with the boost parameter you’ve already seen, as shown in the next listing.

Listing 6.5. Specify a boost for the entire multi_match query

curl -XPOST 'localhost:9200/get-together/_search?pretty' -d'{

"query": {

"multi_match": {

"query": "elasticsearch big data",

"fields": ["name", "description"],

"boost": 2.5

}

}

}'

Or you can specify a boost for only particular fields by using a special syntax. By appending the field name with a caret (^) and the boost value, you tell Elasticsearch to boost only that field. The following listing shows an example of the previous query, but instead of boosting the entire query, you boost only the name field.

Listing 6.6. Boosting on the name field only

In the query_string query, you can boost individual terms using a special syntax, appending the term with a caret (^) and the boost value. An example searching for “elasticsearch” and “big data” and boosting “elasticsearch” by 3 would look like the next listing.

Listing 6.7. Boosting individual terms in query_string queries

As we mentioned before, keep in mind when boosting either fields or terms that a boost is a relative value and not an absolute multiplier. If you boost all of the terms you’re searching for by the same amount, it’s the same as though you boosted none of them because Lucene normalizes the boost values. Remember that boosting a field by 4 doesn’t automatically mean that the score for that field will be multiplied by 4, so don’t worry if the score isn’t an exact multiplication.

Because boosting during query time is highly flexible, play around with it! Don’t be afraid to experiment with the dataset until you get the desired relevancy from your results. Changing the boosting is as easy as adjusting a number in the query you send to Elasticsearch.