E.3. Functionality tricks

Just as you can filter registered queries based on their metadata, you can query on this metadata and use the score to decide which query is more relevant. In this section, we’ll look at how this works and also at using aggregations to get better insights on matching queries.

Note

Remember that for queries and aggregations, you’ll run them on the registered queries, not on the percolated documents. This means you’ll get ranking and statistics on the queries, not on the documents.

If the logic of querying queries sounds a bit twisted, let’s start with another functionality trick:

highlighting. This one is more straightforward because the highlighted text comes from the percolated document.

E.3.1. Highlighting percolated documents

Highlighting will let you know which words from the document you’re percolating matched the query. In appendix C we discussed the features of highlighting in the context of regular queries, but all of them work with the percolator, too.

If you ran listing F.4, you can try a highlighted percolation by adding a highlight section to your percolate request. You should also specify a size value in order to place a limit on how many queries to highlight:

% curl 'localhost:9200/smart-percolate/event/_percolate?pretty' -d '{

"doc": {

"title": "Nesting Elasticsearch Aggregations"

},

"highlight": {

"fields": {

"title": {}

}

},

"size": 2

}'

For each query, you’ll see the matching terms from the percolated document:

"_index" : "smart-percolate",

"_id" : "1",

"highlight" : {

"title" : [ "Nesting <em>Elasticsearch</em> <em>Aggregations</em>" ]

}

Scoring, on the other hand, works “upside down,” just like the percolator itself: queries are scored, not the percolated documents.

E.3.2. Ranking matching queries

Let’s take the use case of contextual advertising. A user is looking at blog posts on your website, and you have some ads registered as queries. During page load, you can percolate the post against those queries to see which ads are appropriate for the displayed content. This allows you to show tech ads for tech posts, holiday ads for holiday posts, and so on. But you have limited ad space, so which ads are you going to show?

How about sorting ads by some criterion, like the revenue you get for each ad? Then you can use a size value to get back only as many ads as you can display.

To sort registered queries by the value of a field, you can use the function score query, which was introduced in chapter 6. In the following listing, you’ll use it to sort ads by the value of ad_price.

Listing E.5. Sorting registered queries by a metadata value

Note that the function score query doesn’t do any filtering—although that’s possible, too—it simply defines the _score value, which is used for sorting.

At this point, you might be wondering why you sort on _score and not on the ad_price field directly. There are two reasons:

Percolator supports sorting only on _score (as of version 1.4). In practice, you probably want to combine multiple sort criteria.

In the case of ads, you might want to throw a random value into the mix to make sure you show all ads eventually; just increase the odds for the expensive ones. The function score query allows you to define different weights for different criteria and combine them.

Finally, you might want to get more insight about how matching queries are distributed. You can get this through aggregations.

E.3.3. Aggregations on matching query metadata

Let’s say you’re responsible for an online shop’s search feature. When a new product is added, you want to make sure the description matches searches of users normally looking for this type of product.

If you register user searches as percolator queries, you can percolate a product document before submitting it to predict how often that product would show up in searches. If the product shows in too few or too many searches, it could be a problem. In these situations, you can get more information about the distribution of these matching queries by running an aggregation on a metadata field or even the actual query text.

In the listing that follows, you’ll prepare and then run a percolation on user searches, aggregating on the query terms. In this example, the term cheap will appear in the top terms for matching queries, suggesting that price is important for users looking at this type of product. Listing E.6. Using aggregations to get matching query metadata and term statistics

The aggregation part of the response for the query would be like this:

"aggregations" : {

"top_query_terms" : {

"doc_count_error_upper_bound" : 0,

"sum_other_doc_count" : 0,

"buckets" : [ {

"key" : "cheap",

"doc_count" : 2

}, {

"key" : "pc",

"doc_count" : 2

}, {

"key" : "linux",

"doc_count" : 1

} ]

}

}

If cheap is the top term here, and the computer you’re adding is indeed cheap, it would be good to add it to the description so that people searching for this type of product will find it.

The key thing to remember here is that as with most of this appendix, features like aggregations work on registered queries and not on the percolated documents. We don’t call percolation “search upside down” for nothing!