4.4. Beyond match and filter queries

General-purpose queries that we’ve discussed so far, such as the query_string and the match queries, are particularly useful when the user is faced with a search box because you can run such a query with the words the user types in.

To narrow the scope of a search, some user interfaces also include other elements next to the search box, such as a calendar widget that allows you to search for newly created groups or a check box for filtering events that have a location already established.

4.4.1. Range query and filter

The range query and filter are self-explanatory; they’re used to query for values between a certain range and can be used for numbers, dates, and even strings.

To use the range query, you specify the top and bottom values for a field. For example, to search for all groups created after June 1 and before September 1, 2012 in the index, use the following query:

Or you could use a filter instead:

See table 4.2 for the meaning of the parameters gt, gte, lt, and lte.

Table 4.2. Range query parameters

Parameter Meaning
gt Search for fields greater than the value, not including the value.
gte Search for fields greater than the value, including the value.
lt Search for fields less than the value, not including the value.
lte Search for fields less than the value, including the value.

The range query also supports ranges of strings, so if you wanted to search for all the groups in gettogethers between "c" and "e", you could search using the following:

% curl 'localhost:9200/get-together/_search' –d '

{

"query": {

"range": {

"name": {

"gt": "c",

"lt": "e"

}

}

}

}'

When you use the range query, think long and hard about whether a filter would be a better choice. Because documents that fall into the range of the query have a binary match (“Yes, this document is in the range” or “No, this document isn’t in the range”), the range query doesn’t need to be a query. For better performance, it should be a filter. If you’re unsure whether to make it a query or a filter, make it a filter. In 99% of cases, making a range query a filter is the right thing to do.

4.4.2. Prefix query and filter

Similar to the term query, the prefix query and filter allow you to search for a term containing the given prefix, where the prefix isn’t analyzed before searching. For example, to search the index for all events that start with “liber,” the following query is used:

% curl 'localhost:9200/get-together/event/_search' –d '

{

"query": {

"prefix": {

"title": "liber"

}

}

}'

And, similarly, you can use a filter instead of a regular query, which has almost the same syntax:

% curl 'localhost:9200/get-together/event/_search' –d '

{

"query": {

"filtered": {

"query": {

"match_all": {}

},

"filter": {

"prefix": {

"title": "liber"

}

}

}

}

}'

But wait! What happens if you were to send the same request but with “Liber” instead of “liber”? Because the search prefix isn’t analyzed before being sent, it won’t find the terms that have been lowercased in the index. This is because of the way Elasticsearch analyzes documents and queries, which we cover in much more depth in chapter 5. Because of this behavior, the prefix query is a good choice for autocompletion of a partial term that a user enters if the term is part of the index. For example, you could provide a categories input box when existing categories are already known. If a user was typing terms that were part of an index, you could take the text entered into a search box by the user, lowercase it, and then use a prefix query to see what other results show up. Once you have matching results from a prefix query, you can offer them as suggestions while the user is typing. But if you need to analyze the term or want an amount of fuzziness in the results, it’s probably better to stick with the

match_phrase_prefix query for autocomplete functionality. We’ll talk more about suggestions and suggesters in appendix F.

4.4.3. Wildcard query

You may be tempted to think of the wildcard query as a way to search with regular expressions, but in truth, the wildcard query is closer to the way shell wildcard globbing works; for example, running

ls *foo?ar matches words such as “myfoobar,” “foocar,” and “thefoodar.”

Using a string, you can allow Elasticsearch to substitute either any number of characters (including none of them) for the * wildcard or a single character for the ? wildcard.

For example, a query for “ban” would match “bacon,” “barn,” “ban,” and “baboon” because the can be any character sequence, whereas a query for “ba?n” would match only “barn” because ? must match a single character at all times. Listing 4.22 demonstrates these wildcard queries using a new index called wildcard-test.

You can also mix and match with multiple * and ? characters to match a more complex wildcard pattern, but keep in mind that when a string is analyzed, spaces are stripped out by default, so ? can’t match a space if spaces aren’t indexed.

Listing 4.22. Example wildcard query

Something to note when using this query is that the wildcard query isn’t as lightweight as other queries like the match query; the sooner a wildcard character ( or ?) occurs in the query term, the more work Lucene and Elasticsearch have to do to match it. Take, for example, the search term “h”; Elasticsearch must now match every term starting with “h”. If the term was “hi*”, Elasticsearch would only have to search through every term starting with “hi”, which is a smaller subset of all terms starting with “h”. Because of this overhead and performance considerations, be careful to test the wildcard query on a copy of your data before putting these queries into production! We’ll talk more about a similar query, the regexp query, in chapter 6, where we discuss searching with relevancy.