4.1. Structure of a search request

Elasticsearch search requests are JSON document-based requests or URL-based requests. The requests are sent to the server, and because all search requests follow the same format, it’s helpful to understand the components that you can change for each search request. Before we discuss the different components, we have to talk about the scope of your search request.

4.1.1. Specifying a search scope

All REST search requests use the _search REST endpoint and can be either a GET request or a POST request. You can search an entire cluster or you can limit the scope by specifying the names of indices or types in the request URL. The following listing provides example search URLs that limit the scope of searches.

Listing 4.1. Limiting the search scope in the URL

Next to indexes you can also use aliases to search through multiple indexes. This method is used often to search through all available time-stamped indices. Think about indices in the format logstash-yyyymmdd,

with one alias called logstash that points to all indices. You can also do a basic search and limit it to all logstash-based indices: curl 'localhost:9200/logstash/_search'. For the best

performance, limit your queries to the smallest number of indices and types possible because anything Elasticsearch doesn’t have to search means faster responses. Remember that each search request has to be sent to all shards of an index; the more indices you have to send search requests to, the more shards are involved.

Now that you know how to limit the scope for your search request, the next step is to discuss the basic components of the search request.

4.1.2. Basic components of a search request

Once you’ve selected the indices to search, you need to configure the most important components of the search request. These components deal with the amount of documents to return, select the best documents to return, and configure which documents you don’t want in your results:

query— The most important component for your search request, this part configures the best documents to return based on a score, as well as the documents you don’t want to return. This component is configured using the query DSL and the filter DSL. An example is to search for all events with the word “elasticsearch” in the title limited to events in this year.

size— Represents the amount of documents to return.

from— Together with size, from is used to do pagination. Be careful, though; in order to determine the second page of 10 items, Elasticsearch has to calculate the top 20 items. If your result set grows, getting a page somewhere in the middle would be expensive.

_source— Specifies how the _source field is returned. The default is to return the complete _source field. By configuring _source, you filter the fields that are returned. Use this if your indexed documents are big and you don’t need the full content in your result. Be aware that you shouldn’t disable the _source field in your index mappings if you want to use this. See the note for the difference between using fields and _source.

sort— The default sorting is based on the score for a document. If you don’t care about the score or you expect a lot of documents with the same score, adding a sort helps you to control which documents get returned.

Note

Before version 1 of Elasticsearch, field was the component to use for filtering the fields to return. This is still possible; the behavior is to return stored fields if available. If no stored field is available, the field is obtained from the source. If you don’t explicitly store fields in the index, it’s better to use the _source component. Using _source filtering, Elasticsearch doesn’t have to check for a stored field first before obtaining the field from the _source.

Results start and page size

The aptly named from and size fields are sent to specify the offset to start results from and the size of each “page” of results. For example, if you send a from value of 7 and a size of 5, Elasticsearch will send the 8th, 9th, 10th, 11th, and 12th results back (because the from parameter starts at 0, specifying 7 starts at the 8th result). If these two parameters aren’t sent, Elasticsearch defaults to starting at the first result (the “0th”), and sends 10 results with the response. There are two distinct ways of sending a search request to Elasticsearch.

In the next section we discuss sending a URL-based search request; after that we discuss the request body–based search requests. The discussed basic components of the search request will be used in both mechanisms.

URL-based search request

In this section you’ll create a URL-based search request using the four basic components discussed in the previous section. The URL-based search is meant to be useful for quick curl-based requests. Not all search features are exposed using the URL-based search. In the following listing, the search request will search for all events, but you want the second page of 10 items.

Listing 4.2. Paginating results using from and size

In listing 4.3, you create the search request to return the default first 10 events of all events, but ordered by their date in ascending order. If you want to, you can combine both search request configurations as well. Also try the same search request in descending (desc) order and check if the order of the events is changed, as shown in the next listing.

Listing 4.3. Changing the order of the results

In listing 4.4 you limit the fields from sources that you want in the response. Imagine you only want to have the title of the event together with the date of the event. Again, you want the events ordered by date. You configure the _source component to ask for the title and date only. More options for the _source are explained in the next section when we discuss the request body–based search. The response in the listing shows one of the hits.

Listing 4.4. Limiting the fields from source that you want in the response

So far you’ve only created search requests using the match_all query. The query and filter DSL is discussed in section 4.2, but we do think it’s important to show how you can create a URL-based search request where you want to return only documents containing the word “elasticsearch” in the title, as in the next listing. Again you sort by date. Notice the q=title:elasticsearch part. This is where you specify that you want to query on the field title for the word “elasticsearch.”

Listing 4.5. Changing the order of the results

With q= you indicate you want to provide a query in the search request. With

title:elasticsearch you specify that you’re looking for the word “elasticsearch” in the title field.

We leave it up to you to try out the query and check that the response contains only events with the word “elasticsearch” in the title. Feel free to play around with other words and fields. Again, you can combine the mentioned components of the search API in one query.

Now that you’re comfortable with search requests using the URL, you’re ready to move on to the request body–based search requests.

4.1.3. Request body–based search request

In the previous section we demonstrated how to use the basic search request components in URL-based queries. This is a nice way of interacting with Elasticsearch if you’re on the command line, for instance. When executing more advanced searches, using request body–based searches gives you more flexibility and more options. Even when using request body–based searches, some of the components can be provided in the URL as well. We focus in this section on the request body because we already discussed all URL-based configurations in the previous section. The example in the following listing searches for the second page of the get-together index when all documents are matched.

Listing 4.6. Paginating results using from and size

Other than noticing the "query" section, which is an object in every query, don’t worry about the "match_all" section yet. We talk about it in section 4.2 when discussing the query and filter DSL.

Fields returned with results

The next element that all search requests share is the list of fields Elasticsearch should return for each matching document. This is specified by sending the _source component with the search request. If no _source is specified with the request, Elasticsearch returns either the entire _source of the document by default, or, if the _source isn’t stored, only the metadata about the matching document: _id, _type, _index, and _score.

The previous query is used in the following listing, returning the name and date fields of each matching group.

Listing 4.7. Filtering the returned _source

Wildcards in returned fields with _source

Not only can you return a list of fields, you can also specify wildcards. For example, if you wanted to return both a "name" and "nation" field, you could specify _source: "na". You can also specify multiple wildcards using an array of wildcard strings, like _source: ["name.",

"address.*"].

Not only can you specify which fields to include, you can also specify which fields you don’t want to return. The next listing gives an example.

Listing 4.8. Filtering the returned _source showing include and exclude

Sort order for results

The last element most searches include is the sort order for the results. If no sort order is specified, Elasticsearch returns matching documents sorted by the _score value in descending order, with the most relevant (highest scoring) documents first. To sort fields in either ascending or descending order, specify an array of maps instead of an array of fields. You can sort on any number of fields by specifying a list of fields or field maps in the sort value. For example, using the previous organizer search, you can return results sorted first by creation date, starting with the oldest; then by the name of the get-together group, in reverse alphabetical order; and finally by the _score of the result, as shown in the following listing.

When sorting on multivalued fields (tags, for instance), you don’t know how the sorting uses the values. It will pick one to sort on, but you can’t know which one. The same is true for analyzed fields. An analyzed field will regularly result in multiple terms as well. Therefore it’s best to sort on not-analyzed or numeric fields.

The basic components in action

Now that we’ve covered the basic search components, the next listing shows an example of a search request that uses them all.

Listing 4.10. Query with all four elements: scope, pagination, fields, and sort order

Before we go into more details on the query and filter API, we have to cover one other item: the structure of the search response.

4.1.4. Understanding the structure of a response

Let’s look at an example search and see what the response looks like. The next listing searches for groups about “elasticsearch.” For brevity we used the URL-based search.

Listing 4.11. Example search request and response

Remember that if you don’t store either the _source of the document or the fields, you won’t be able to retrieve the value from Elasticsearch!

Now that you’re familiar with the basic components of a search request, there’s one component that we haven’t really discussed yet: the query and filter DSL. This was done on purpose, because the topic is so big it deserves its own section.