8.3. Nested type: connecting nested documents

The nested type is defined in the mapping in much the same way as the object type, which we’ve already discussed. Internally, nested documents are indexed as different Lucene documents. To indicate that you want to use the nested type instead of the object type, you have to set type to nested, as you’ll see in section 8.3.1.

From an application’s perspective, indexing nested documents is the same as indexing objects because the JSON document indexed as an Elasticsearch document looks the same. For example:

{

"name": "Elasticsearch News",

"members": [

{

"first_name": "Lee", "last_name": "Hinman"

},

{

"first_name": "Radu",

"last_name": "Gheorghe"

}

]

}

At the Lucene level, Elasticsearch will index the root document and all the members objects in separate documents. But it will put them in a single block, as shown in figure 8.8.

Figure 8.8. A block of documents in Lucene storing the Elasticsearch document with nested-type objects

Documents of a block will always stay together, ensuring they get fetched and queried with the minimum number of operations.

Now that you know how nested documents work, let’s see how to make Elasticsearch use them. You have to specify that you want them nested at index time and at search time:

Inner objects must have a nested mapping, to get them indexed as separate documents in the same block.

Nested queries and filters must be used to make use of those blocks while searching.

We’ll discuss how you can do each in the next two sections.

8.3.1. Mapping and indexing nested documents

The nested mapping looks similar to the object mapping, except instead of the type being object, you have to make it nested. In the following listing you’ll define a mapping with a nested type field and index a document that contains an array of nested objects.

Listing 8.3. Mapping and indexing nested documents

JSON objects with the nested mapping, like the ones you indexed in this listing, allow you to search them with nested queries and filters. We’ll explore those searches in a bit, but the thing to remember now is that nested queries and filters allow you to search within the boundaries of such documents. For example, you’ll be able to search for groups with members with the first name “Lee” and the last name “Hinman.” Nested queries won’t do cross-object matches, thus avoiding unexpected matches such as “Lee” with the last name “Gheorghe.”

Enabling cross-object matches

In some situations, you might need cross-object object matches as well. For example, if you’re searching for a group that has both Lee and Radu, a query like this would work for the regular JSON objects we discussed in the section on object type:

"query": {

"bool": {

"must": [

{

"term": {

"members.first_name": "lee"

}

},

{

"term": { "members.first_name": "radu"

}

}

]

}

}

This query would work because when you have everything in the same document, both criteria will match.

With nested documents, a query structured this way won’t work because members objects would be stored in separate Lucene documents. And there’s no members object that will match both criteria: there’s one for Lee and one for Radu, but there’s no document containing both.

In such situations, you might want to have both: objects for when you want cross-object matches and nested documents for when you want to avoid them. Elasticsearch lets you do that through a couple of mapping options: include_in_root and include_in_parent.

include_in_root

Adding include_in_root to your nested mapping will index the inner members objects twice: one time as a nested document and one time as an object within the root document, as shown in figure 8.9.

Figure 8.9. With include_in_root, fields of nested documents are indexed in the root document, too.

The following mapping will let you use nested queries for the nested documents and regular queries for when you need cross-object matches:

"members": {

"type": "nested",

"include_in_root": true,

"properties": {

"first_name": { "type": "string" },

"last_name": { "type": "string" }

}

}

include_in_parent

Elasticsearch allows you to have multiple levels of nested documents. For example, if your group can have members as its nested children, members can have children of their own, such as the comments they posted on that group. Figure 8.10 illustrates this hierarchy.

Figure 8.10. include_in_parent indexes a nested document’s field into the immediate parent, too.

With the include_in_root option you just saw, you can add the fields at any level to the root document—in this case, the grandparent. There’s also an include_in _parent option, which allows you to index the fields of one nested document into the immediate parent document. For example, the following listing will include the comments in the members documents.

Listing 8.4. Using include_in_parent when there are multiple nested levels

By now you’re probably wondering how you’d query these nested structures. This is exactly what we’ll look at next.

8.3.2. Searches and aggregations on nested documents

As with mappings, when you run searches and aggregations on nested documents you’ll need to specify that the objects you’re looking at are nested. There are nested queries, filters, and aggregations that help you achieve this. Running these special queries and aggregations will trigger Elasticsearch to join

the different Lucene documents within the same block and treat the resulting data as the same Elasticsearch document.

The way to search within nested documents is to use the nested query or nested filter. As you might expect after chapter 4, these are equivalent, with the traditional differences between queries and filters:

Queries calculate score; thus they’re able to return results sorted by relevance. Filters don’t calculate score, making them faster and easier to cache.

Tip

In particular, the nested filter isn’t cached by default. You can change this by setting _cache to true, as you can do in all filters.

If you want to run aggregations on nested fields—for example, to get the most frequent group members— you’ll have to wrap them in a nested aggregation. If sub-aggregations have to refer to the parent Lucene document—like showing top group tags for each member—you can go up the hierarchy with the reverse_nested aggregation.

Nested query and filter

When you run a nested query or filter, you need to specify the path argument to tell Elasticsearch where in the Lucene block those nested objects are located. In addition to that, your nested query or filter will wrap a regular query or filter, respectively. In the next listing, you’ll search for members with the first name “Lee” and the last name “Gheorghe,” and you’ll see that the document indexed in listing 8.3 won’t match because you have only Lee Hinman and Radu Gheorghe and no member called Lee Gheorghe.

Listing 8.5. Nested query example

A nested filter would look exactly the same as the nested query you just saw. You’ll have to replace the word query with filter.

Searching in multiple levels of nesting

Elasticsearch also allows you to have multiple levels of nesting. For example, back in listing 8.4, you added a mapping that nests on two levels: members and their comments. To search in the commentsnested documents, you’d have to specify members.comments as the path, as shown in the following listing.

Listing 8.6. Indexing and searching multiple levels of nested documents

Aggregating scores of nested objects

The nested query calculates the score, but we didn’t mention how. Let’s say you have three members in a group: Lee Hinman, Radu Gheorghe, and another guy called Lee Smith. If you have a nested query for “Lee,” it will match two members. Each inner member document will get its own score, depending on how well it matches the criteria. But the query coming from the application is for group documents, so Elasticsearch will need to give back a score for the whole group document. At this point, there are four options, which can be specified with the score_mode option:

avg— This is the default option, which will take the scores of the matching inner documents and return their average score.

total— This will sum up the matching inner documents’ scores and return it, which is useful when the number of matches counts.

max— The maximum inner document score is returned. none— No score is kept or counted toward the total document score.

If you’re thinking that there are too many options for including the nested type in the root or parent and the score options, see table 8.1 for a quick references on all those options and when they’re useful.

Table 8.1. Nested type options

Option Description Example
include_in_parent: true Indexes the nested document in the parent document, too."first_name:Lee AND last_name:Hinman", for which you need the nested type, as well as "first_name:Lee AND first_name:Radu", for which you need the object type.
include_in_root: true Indexes the nested document in the root document. Same scenario as previously, but you have multiple layers; for example, event>members>comments.
score_mode: avg Average score of matching nested documents count. Search for groups hosting events about Elasticsearch.
score_mode: total Sums up nested document scores. Search for groups hosting most events that have to do with Elasticsearch.
score_mode: max Maximum nested document score. Search for groups hosting top events about Elasticsearch.
score_mode: none No score counts towards the total score. Filter groups hosting events about

Getting which inner document matched

When you index big documents with many nested subdocuments in them, you might wonder which of the nested documents matched a specific nested query—in this case, which of the group members matched a query looking for lee in first_name. Starting with Elasticsearch 1.5, you can add an inner_hits object within your nested query or filter to show the matching nested documents. Like your main search request, it supports options such as from and size:

"query": {

"nested": {

"path": "members",

"query": {

"term": {

"members.first_name": "lee"

}

},

"inner_hits": {

"from": 0, "size": 1

}

}

}

The reply will contain an inner_hits object for each matching document, looking much like a regular query reply, except that each document is a nested subdocument:

"_source":{

"name": "Elasticsearch News",

[...]

"inner_hits" : {

"members" : {

"hits" : {

"total" : 1,

"max_score" : 1.4054651,

"hits" : [ {

"_index" : "get-together",

"_type" : "group-nested",

"_id" : "1",

"_nested" : {

"field" : "members", "offset" : 0

},

"_score" : 1.4054651,

"_source":{"first_name":"Lee","last_name":"Hinman"}

} ]

}

}

In order to identify the subdocument, you can look at the _nested object. field is the path of the nested object, and offset shows the location of that nested document in the array. In this case, Lee is the first member.

Nested sorting

In most use cases you’d sort root documents by score, but you can also sort them based on numeric values of inner nested documents. This would be done in a similar way to sorting on other fields, as you saw in chapter 6. For example, if you have a price aggregator site with products as root documents and offers from various shops as nested documents, you can sort on the minimum price of each offer. Similar to the score_mode option you’ve seen before, you can specify a mode option and take the min, max, sum, or avg value of nested documents as the sort value for the root document:

"sort": [

{

"offers.price": {

"mode": "min",

"order": "asc"

}

}

]

Elasticsearch will be smart about it and figure out that offers.price is located in the offers object (if that’s what you defined in the mapping) and access the price field under those nested documents for sorting.

Nested and reverse nested aggregations

In order to do aggregations on nested type objects, you have to use the nested aggregation. This is a single-bucket aggregation, where you indicate the path to the nested object containing your field. As shown in figure 8.11, the nested aggregation triggers Elasticsearch to do the necessary joins in order for other aggregations to work properly on the indicated path.

Figure 8.11. Nested aggregation doing necessary joins for other aggregations to work on the indicated path

For example, you’d normally run a terms aggregation on a member name field in order to get the top users by the number of groups they’re part of. If that name field is stored within the members nested type object, you’ll wrap that terms aggregation in a nested aggregation that has the path set to members:

% curl "localhost:9200/get-together/group/_search?pretty" -d '{

"aggregations" : {

"members" : {

"nested" : { "path" : "members"

},

"aggregations" : {

"frequent_members" : {

"terms" : {

"field" : "members.name"

}

}

}

}

}

}'

You can put more aggregations under the members nested aggregation and Elasticsearch will know to look in the members type for all of them.

There are use cases where you’d need to navigate back to the parent or root document. For example, you want each of the obtained frequent members to show the top group tags. To do that, you’ll use the reverse_nested aggregation, which will tell Elasticsearch to go up the nested hierarchy:

"frequent_members" : {

"terms" : {

"field" : "members.name"

},

"aggregations": {

"back_to_group": {

"reverse_nested": {},

"aggregations": {

"tags_per_member": {

"terms": { "field": "tags"

}

}

}

}

}

}

The nested and reverse_nested aggregations can effectively be used to tell Elasticsearch in which Lucene document to look for the fields of the next aggregation. This gives you the flexibility to use all the aggregation types you saw in chapter 7 for nested documents, just as you could use them for objects. The only downside of this flexibility is the performance ópenalty.

Performance considerations

We’ll cover performance in more detail in chapter 10, but in general you can expect nested queries and aggregations to be slower than their object counterparts. That’s because Elasticsearch needs to do some extra work to join multiple documents within a block. But because of the underlying implementation using blocks, these queries and aggregations are much faster than they would be if you had to join completely separate Elasticsearch documents.

This block implementation also has its drawbacks. Because nested documents are stuck together, updating or adding one inner document requires re-indexing the whole ensemble. Applications also work with nested documents in a single JSON.

If your nested documents become big, as they would in a get-together site if you had one document per group and all its events nested, a better option might be to use separate Elasticsearch documents and define parent-child relations between them.

Using nested type to define document relationships: pros and cons

Before moving on, here’s a quick recap of why you should (or shouldn’t) use nested documents. The plus points:

Nested types are aware of object boundaries: no more matches for “Radu Hinman”!

You can index the whole document at once, as you would with objects, after you define your nested mapping.

Nested queries and aggregations join the parent and child parts, and you can run any query across the union. No other option described in this chapter provides this feature.

Query-time joins are fast because all Lucene documents making the Elasticsearch document are together in the same block in the same segment.

You can include child documents in parents to get all the functionality from objects if you need it. This functionality is transparent for your application.

The downsides:

Queries will be slower than their object equivalents. If objects provide you all the needed functionality, they’re the better option because they’re faster. Updating a child will re-index the whole document.