F.2. Autocomplete suggesters

If autocomplete was cool in 2005, now it’s a must—any search without it looks ancient. You expect a good autocomplete to help you search faster (especially on mobile devices) and better (you type in e, so it should know you’re looking for Elasticsearch) but also to allow you to explore popular options

(“elasticsearch tutorial”—that’s actually a good idea!). Finally, a good autocomplete will reduce the load on your main search system, especially if you have some sort of instant search available—when you jump directly to a popular result without executing the full-blown search.

A good autocomplete has to be fast and relevant: fast because it has to generate suggestions as the user is typing, and relevant because you don’t want to suggest a query with no results or one that isn’t likely to be useful.

You can help with the quality of suggestions by keeping what would be good candidates, such as successful products or queries, in a separate index. You could then run the prefix queries we introduced in chapter 4 to generate suggestions. But those queries might not be fast enough because ideally you need to come up with a suggestion before the user types the next character.

The completion and context suggesters help you build a faster autocomplete. They’re built on Lucene’s Suggest module, keeping data in memory in finite state transducers (FSTs). FSTs are essentially graphs that are able to store terms in a way that’s compressed and easy to retrieve. Figure F.6 illustrates how the terms index, search, and suggest would be stored.

Figure F.6. In-memory FSTs help you get fast suggestions based on a prefix.

The actual implementation is a bit more complex—because it allows you to add weights, for instance— but you can imagine why in-memory FSTs are fast: you just have to follow the paths and see that prefix s would lead to search and suggest.

Next, we’ll look at how the Completion Suggester works, then move on to the Context Suggester, which is an extension of it, much like the phrase suggester we discussed earlier is an extension of the simpler term suggester.

Note

For versions 2.0 and later, a new Completion Suggester is planned. It should have all the features of the current Completion and Context suggesters, plus a few more (like flexible scoring based on geo distance or edit distance). The basic principles remain the same, though. For more information on Completion Suggester Version 2, take a look at the main issue here: https://github.com/elastic/elasticsearch/issues/8909. When this suggester is released, you should see updated documentation in the Suggesters page: www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html.

F.2.1. Completion Suggester

To tell Elasticsearch that you meant to store suggestions in FSTs for autocomplete, you need to define a field in the mapping with type set to completion. The easiest way to store suggestions is by adding such a field as a multi-field to a field that you’re already indexing, like in the following listing. There, you’ll index places like restaurants, and you’ll add a suggest subfield to each place’s name field.

Listing F.7. Simple autocomplete based on existing data

If such a simple autocomplete implementation isn’t enough—for example, because results aren’t ranked— there are quite a few options that can help you improve relevancy. Some of them have to be done at index time (for example, you can add a weight to each suggestion), whereas others work at search time (you can enable fuzziness). On top of all this, suggestions can have payloads, where you can store document IDs that you can use for instant search.

Improving relevancy at index time

As with regular searches on string fields, the input text is analyzed at both index time and search time. That’s why Pizza Hut matched p. You can control analysis through the index_analyzer and search_analyzer options. For example, if you wanted case-sensitive suggesting (so that only P matches, not p), you can use the keyword analyzer:

"suggest": {

"type": "completion",

"index_analyzer": "keyword",

"search_analyzer": "keyword"

If you need more information about analysis, you’ll find it in chapter 6.

In most cases, you’ll keep suggestions in a separate field, index, or even a separate Elasticsearch cluster.

This helps when you want to control suggestions based on how they perform and also to be able to scale suggesters separately from the main search system.

When suggestions are in a different field, you can separate the inputs you match from the suggestion you provide (output). For example, a document like this

{

"name": {

"input": "phone",

"output": "iphone"

}

} would let you suggest iphone for the input text ph. Also, you can provide multiple inputs:

{

"name": {

"input": ["iphone", "phone"],

"output": "iphone"

}

}

Finally, you can rank suggestions based on weights you provide at index time. In the next listing, you’ll combine inputs, outputs, and weights to implement autocomplete on top of group tags for the get-together use case you’ve been running for most of this book.

Listing F.8. Using weights, inputs, and outputs

Improving relevancy at search time

When you run the suggest request, you can decide which suggestions will appear. Like with other suggesters, size lets you control how many suggestions to return. Then, if you want to tolerate typos, you need a fuzzy object under the completion object of your suggest request. With fuzzy search enabled this way, you can configure additional options, like the following:

fuzziness, which allows you to specify the maximum allowed edit distance min_length, where you specify at which length of the input text to enable fuzzy search prefix_length, which improves performance at the cost of flexibility by considering these first characters correct

All those options go under the completion object of your suggest request:

% curl 'localhost:9200/autocomplete/_suggest?pretty' -d '{

"tags-autocomplete": {

"text": "daata",

"completion": {

"field": "tags",

"size": 3,

"fuzzy": {

"fuzziness": 2,

"min_length": 4,

"prefix_length": 2

}

}

}}'

Implementing instant search with payloads

Many search solutions let you go directly to a specific result when clicking on a suggestion instead of running that search. Figure F.7 shows an example from SoundCloud.

Figure F.7. Instant search lets you jump to the result without running an actual search.

To implement this in Elasticsearch, you’d put a payload in your completion field, and that payload would be the ID of the document you’re suggesting. You can then use the ID to get the document, as you’ll do in the next listing.

Listing F.9. Payload lets you get documents instead of searching for the suggested text

The Completion Suggester returns all results matching the input text, which might work well with something like SoundCloud. But some use cases require filtering, like your get-together site: you only want to suggest events reasonably close to the user and ignore the others. To do this, you’ll need the Context Suggester, which is built to add filtering functionality on top of the Completion Suggester.

F.2.2. Context Suggester

The Context Suggester allows you to filter on a context, which can be a category (term) or a geo location. To enable these contexts, you need to specify them in the mapping and then provide contexts in documents and in your suggest requests.

Defining contexts in the mapping

You can add one or more context values to your completion field in the mapping. Each context has a type, which can be either category or geo. For geo contexts, you need to specify a precision value:

"name": {

"type": "completion",

"context": {

"location": {

"type": "geo",

"precision": "100km"

},

"category": {

"type": "category"

}

}

}

Contexts under the hood

Contexts work on top of the same FST structure that the Completion Suggester uses. To enable filtering, the context would be used as a prefix to the actual suggestion, like search_lucene if search is the category and lucene is the text you want to match.

For geo contexts, the prefix is a geohash, like abcde. As you saw in appendix A about geo search, a geohash indicates a rectangular area on the map, and the longer the string, the higher the precision. For example, gc is a rectangle taking up most of Britain and Ireland, whereas gcp only goes from London to Southampton.[3]

3

Snapshot taken from GeohashExplorer: http://geohash.gofreerange.com/.

Given a point on the map, you can approximate it with a geohash more or less precisely, depending on the hash length. For suggestions, you’d typically pick a precision that would reflect how near a point of interest should be to the current location. For example, restaurants would work with more precise hashes (like 10 km wide) than get-together events (which may be 100 km wide), assuming that users are more likely to drive farther for a monthly event than for a burger.

Adding contexts to documents and suggest requests

With the mapping in place, you’d put contexts in documents under the context field of your completion:

{

"name": {

"input": "Elasticsearch Denver",

"context": {

"location": {

"lat": 39.752337,

"lon": -105.00083

},

"category": ["big data"]

}

}

}

When fetching suggestions, you should add a context value to your completion request as well:

% curl 'localhost:9200/autocomplete/_suggest?pretty -d '{

"name-autocomplete": {

"text": "denv",

"completion": {

"field": "name",

"context": { "category": "big data",

"location": {

"lat": 39,

"lon": -105

}

}

}

}

}'

Troubleshooting Context (and Completion) Suggester errors

Normally, if you define contexts and run the Context Suggester with no context in the request, you’ll get an error for every shard:

"reason": "BroadcastShardOperationFailedException[[autocomplete][0] ]; nested: ElasticsearchException[failed to execute suggest]; nested:

ElasticsearchIllegalArgumentException[suggester [completion] requires context to be setup]; "

But if you really need to specify contexts to only some of your requests and documents, you can specify a default value in the mapping:

"name": {

"type": "completion",

"context": {

"category": {

"type": "category", "default": "default_category"

}

}

}

Then you can index documents without categories:

{

"name": {

"input": "test meeting"

}

}

Finally, if the user doesn’t enter any filtering context, you can fill in the default value on your application; this is also possible with a geo context:

"name-autocomplete": {

"text": "te",

"completion": {

"field": "name",

"context": { "category": "default_category"

}

}

}

From a functionality standpoint, this works as if you have the Context Suggester when you need it and the

Completion Suggester when you don’t. But in both cases you might get suggestions from deleted documents. This happens because the FSTs used under the hood are built for each Lucene segment in the index, and they never get changed until the segment is deleted during merging (when the FST gets deleted as well). As you may recall from chapter 3, when a document is deleted, it’s not really gone from the segment; it’s just marked as deleted.

Although searches are smart enough to filter out deleted documents, suggesters are not, at least in version

1.4. Until this is addressed in the new Completion Suggester (see

https://github.com/elastic/elasticsearch/issues/8909), you can work around this issue by changing your merge policy or optimizing so you have as few deleted documents in the index as possible. For more information on merges, go to chapter 10, section 10.2.2.