Appendix C. Highlighting

Highlighting indicates why a document results from a query by emphasizing matching terms, giving the user an idea of what the document is about, and also showing its relationship to the query, as shown in figure C.1.

Figure C.1. Highlighting shows why a document matched a query.

Although figure C.1 is taken from DuckDuckGo, Elasticsearch offers highlighting functionality, too. For example, you can search for “elasticsearch” in get-together event titles and make that word stand out like this:

"title" : [ "Introduction to <em>Elasticsearch</em>" ],

To get such highlighting, you’ll need three things, and we’ll discuss them in detail in this appendix:

A highlight part of your search request, which will go on the same level as query and aggregations

A list of fields you want to be highlighted, like the event name or its description

Highlighted fields included in _source or stored individually

Note

All fields are included in _source by default but aren’t stored individually. You can find more information about _source and stored fields in chapter 3, section 3.4.1.

After you do the basic highlighting, you might want to turn some knobs. In this appendix, we’ll also discuss the most important highlighting options:

What to match— You can decide, for example, to show a snippet of a field, even if there are no terms to highlight in there, to show the same fields for all documents. Or you might want to use a different query for highlighting than the one you use for searching.

How fragments should look— With large fields, you typically don’t get back all their contents with highlighted terms; you just get one or more fragments of text around those terms. You can configure how many fragments to allow, which order they should be shown, and how big they should be in. How to highlight— You can change the default <em> and </em> tags to something else. If you stick to HTML tags, you can have Elasticsearch encode the whole fragments in HTML (for example, by escaping ampersand (&) characters) so you can render those fragments correctly in your application.

We’ll also discuss different highlighting implementations. The default implementation is called plain and relies on re-analyzing the text from stored fields in order to highlight relevant terms. This process might become too expensive for big fields, like the contents of a blog post. Alternatively, you can use the Postings Highlighter or the Fast Vector Highlighter. Both require you to change the mapping to make

Elasticsearch store additional data: term offsets for the Postings Highlighter and term vectors for the Fast Vector Highlighter. Both changes will increase your index size and use more computing power while indexing.

Each highlighting implementation comes with its own set of features, and we’ll talk about them later in this appendix. But first, let’s deal with the basics of highlighting.