Searchable data

In this chapter, you’ll again use the dataset formed around the get-together website we’ve touched on in previous examples. This dataset contains two different types of documents: groups and events. To follow along and perform the queries yourself, download and run the populate.sh script to populate an Elasticsearch index. The samples are created with a fresh run of the script; if you want to tag along, please run the script again.

To download the script, see the source code for the book at https://github.com/dakrone/elasticsearch-inaction.

To start off, we discuss the components common to all search requests and results so you’ll have an understanding of what a search request and the result of that search request look like in general. We then move on to discussing the query and filter DSL as one of the main elements of the search API. Next, we discuss the differences between queries and filters, followed by a look at some of the most commonly used filters and queries. If you’re wondering about the details of how Elasticsearch calculates the score for documents, don’t worry; we discuss that in chapter 6, where we talk about searching with relevancy. Finally, we provide a quick-and-dirty guide to help you choose which type of query and filter combination to use for a particular application. Make sure to check it out if there seem to be too many types of queries and filters to keep straight!

Before we start, let’s revisit what happens when you perform a search in Elasticsearch (see figure 4.1). The REST API search request is first sent to the node you choose to connect to, which in turn sends the search request to all shards (either primary or replica) for the index or indices being queried. When enough information has been collected from all shards to sort and rank the results, only the shards containing the document content that will be returned are asked to return the relevant content.

Figure 4.1. How a search request is routed; the index consists of two shards and one replica per shard. After locating and scoring the documents, only the top 10 documents are fetched.

This search routing behavior is configurable; the default behavior is shown in figure 4.1 and is called “query_then_fetch.” We’ll look at how to change it later on in chapter 10. For now, let’s look at the basic structure that all Elasticsearch search requests share.