8.2. Having objects as field values

As you saw back in chapter 2, documents in Elasticsearch can be hierarchical. For example, in the code samples, an event of the get-together site has its location as an object with two fields—name and geolocation:

{

"title": "Using Hadoop with Elasticsearch",

"location": {

"name": "SkillsMatter Exchange",

"geolocation": "51.524806,-0.099095"

}

If you’re familiar with Lucene, you may ask yourself, “How can Elasticsearch documents be hierarchical when Lucene supports only flat structures?” With objects, Elasticsearch flattens hierarchies internally by putting each inner field with its full path as a separate field in Lucene. You can see the process in figure 8.6.

Figure 8.6. JSON hierarchical structure stored as a flat structure in Lucene

Typically, when you want to search in an event’s location name, you’ll refer to it as location.name. We’ll look at that in section 8.2.2, but before we go into searching, let’s define a mapping and see how to index some documents.

8.2.1. Mapping and indexing objects

By default, inner object mappings are automatically detected. In listing 8.1 you’ll index a hierarchical document and see how the detected mapping looks. If those events documents look familiar to you, it’s because the code samples store the location of an event in an object, too. You can go to https://github.com/dakrone/elasticsearch-in-action to get the code samples now if you haven’t done so already.

Listing 8.1. Inner JSON objects mapped as the object type

You can see that the inner object has a list of properties just like the root JSON object has. You configure field types from inner objects in the same way you do for fields in the root object. For example, you can upgrade location.address to have multiple fields, as you saw in chapter 3. This will allow you to index the address in different ways, such as having a not_analyzed version for exact matches in addition to the default analyzed version.

Tip

If you need to look at core types or how to use multi-fields, you can revisit chapter 3. For more details on analysis, go back to chapter 5.

The mapping for a single inner object will also work if you have multiple such objects in an array. For example, if you index the following document, the mapping from listing 8.1 will stay the same:

{

"title": "Introduction to objects",

"location": [

{

"name": "Elasticsearch in Action book",

"address": "chapter 8"

{

"name": "Elasticsearch Guide",

"address": "elasticsearch/reference/current/mapping-object-type.html"

}

]

To summarize, working with objects and arrays of objects in the mapping is very much like working with the fields and arrays you saw in chapter 3. Next we’ll look at searches, which also work like the ones you saw in chapters 4 and 6.

8.2.2. Searching in objects

By default, Elasticsearch will recognize and index hierarchical JSON documents with inner objects without defining anything up front. As you can see in figure 8.7, the same goes for searching. By default, you have to refer to inner objects by specifying the path to the field you’re looking at, such as location.name.

Figure 8.7. You can search in an object’s field by specifying that field’s full path.

As you worked through chapters 2 and 4, you indexed documents from the code samples. You can now search through events happening in offices, as in listing 8.2, where you’ll specify the full path of location.name as the field to search on.

TIP

If you didn’t index the documents from the code samples yet, you can do it now by cloning the repository at https://github.com/dakrone/elasticsearch-in-actionand running the populate.sh script.

Listing 8.2. Searching in location.name from events indexed by the code samples

Aggregations

While searching, treat object fields like location.name in the same way as any other field. This also works with the aggregations that you saw in chapter 7. For example, the following terms aggregation gets the most-used words in the location.name field to help you build a word cloud:

% curl "localhost:9200/get-together/event/_search?pretty" -d '{

"aggregations" : {

"location_cloud" : {

"terms" : { "field" : "location.name"

}

}}'

Objects work best for one-to-one relationships

One-to-one relationships are the perfect use case for objects: you can search in the inner object’s fields as if they were fields in the root document. That’s because they are! At the Lucene level, location.name is another field in the same flat structure.

You can also have one-to-many relationships with objects by putting them in arrays. For example, take a group with multiple members. If each member had its own object, you’d represent them like this:

"members": [

{

"first_name": "Lee", "last_name": "Hinman"

{

"first_name": "Radu", "last_name": "Gheorghe"

}

]

You can still search for members.first_name:lee and it will match “Lee” as expected. But you need to keep in mind that in Lucene the structure of the document looks more like this:

"members.first_name": ["Lee", "Radu"],

"members.last_name": ["Hinman", "Gheorghe"]

It only works well if you search in one field, even if you have multiple criteria. If you search for

members.first_name:lee AND members.last_name:gheorghe, the document will still match because it matches each of those two criteria. This happens even though there’s no member named Lee Gheorghe because Elasticsearch throws everything in the same document and it’s not aware of boundaries between objects. To have Elasticsearch understand those boundaries, you can use the nested type, covered next.

Using objects to define document relationships: pros and cons

Before moving on, here’s a quick recap of why you should (or shouldn’t) use objects. The plus points:

They’re easy to use. Elasticsearch detects them by default; in most cases you don’t have to define anything special up front to index objects.

You can run queries and aggregations on objects as you would do with flat documents. That’s because at the Lucene level they are flat documents.

No joins are involved. Because everything is in the same document, using objects will give you the best performance of any of the options discussed in this chapter.

The downsides:

There are no boundaries between objects. If you need such functionality, you need to look at other options—nested, parent-child, and denormalizing—and eventually combine them with objects if it suits your use case.