6.5. Reducing scoring impact with query rescoring

Something we haven’t talked about yet is the impact of scoring on the speed of the system. In most regular querying, computing the score of a document requires a small amount of overhead. This is because TFIDF has been heavily optimized by the Lucene team to be efficient.

In some cases, however, scoring can be more resource-intensive:

Scoring with a script runs a script to calculate the score for each document in the index

Doing a phrase query searches for words within a certain distance from each other, with a large slop (discussed in section 4.2.1)

In those cases, you may want to lessen the impact of the scoring algorithm running on millions or billions of documents.

To address this, Elasticsearch has a feature called rescoring. Rescoring means that an initial query is performed, and then a second round of scoring is computed on the results that are returned; hence the name. This means that for a potentially expensive query that uses a script, you can execute it on only the top 1,000 hits retrieved, using a much cheaper match query. Let’s look at an example of using rescore in the next listing.

Listing 6.10. Using rescore to score a subset of matching documents

In this example you search for all the documents that have “elasticsearch” in the title and then take the top 20 results and rescore them, using a phrase query with a high level of slop. Even though a phrase query with a high slop value can be expensive to run, you don’t have to worry, because the query will run

on only the top 20 documents instead of potentially millions or billions of documents. You can use the query_weight and rescore_query_weight parameters to weigh each of the different queries, depending on how much you want the score to be determined by the initial query and the rescore query. You can use multiple rescore queries in sequence, each one taking the previous one as the input.