6.4. Understanding how a document was scored with explain

Before we go much further into customizing the scoring of documents, we should cover how you can break down the scoring of a document on a result-by-result basis, with the actual numbers Lucene is using under the hood. This is helpful in understanding why one document matches a query better than another from Elasticsearch’s perspective.

This is called explaining the score, and you can tell Elasticsearch to do it by specifying the explain=true flag, either on the URL when sending the request or by setting the explain flag to true in the body of the request itself. This can be useful in explaining why a document was scored a particular way, but it has another use: explaining why a document didn’t match a query. This turns out to be useful if you expect a document to match a query but it isn’t returned in the results.

Before we get to that, though, let’s take a look at an example of explaining the results of a query in the next listing.

Listing 6.8. Setting the explain flag in the request body

You can see in this listing how to add the explain parameter. This, in turn, produces verbose output. Let’s take a look at the first result returned from this request:

The added part of this response is the new _explanation key, which contains a breakdown of each of the different parts of the score. In this case, you’re searching the description for “elasticsearch,” and the term “elasticsearch” occurs once in the description of the document, so the term frequency (TF) for that term is 1.

Likewise, the inverse document frequency (IDF) explanation shows that the term “elasticsearch” occurs in 6 out of the 12 documents in this index. Finally, you can also see the normalization for this field, which Lucene uses internally. These scores multiplied together determine the final score:

1.0 x 1.5389965 x 0.3125 = 0.4809364.

Keep in mind that this is only a simple example with a single query term, and we looked only at the explanation for a single document. The explanation can be extremely verbose and much more difficult to understand when used for more complex queries. It’s also important to mention that using the explain feature adds additional overhead to Elasticsearch when querying, so make sure you use it only to debug a query, rather than specifying it with every request by default.

6.4.1. Explaining why a document did not match

We mentioned earlier that explain has another use. Just as you can get an explanation of how the score was calculated for a particular matching document, you can also use the special explain API to tell why a document did not match a query.

But in this case, because you can’t simply add the explain parameter, there’s a different API to use it, as shown in the next listing.

Listing 6.9. Explain API to discover why a document didn’t match a query

In this example, because the term “elasticsearch” doesn’t occur in the description field for this document, the explanation is a simple “no matching term.” You can also use this API to get the score of a single document if you know the document’s ID.

Armed with this tool, which allows you to determine how documents are scored, experiment. Play around. Don’t be afraid to use the tools in this book to modify your scoring.

Next, before we get into more meat about tweaking the score, we’ll talk about the impact of scoring and what you can do if you find that scoring is taking too long.