Chapter 8. Relations among documents

This chapter covers

Objects and arrays of objects

Nested mapping, queries, and filters

Parent mapping, has_parent, and has_child queries and filters Denormalization techniques

Some data is inherently relational. For example, with the get-together site we’ve used throughout the book, there are groups of people with the same interests and events organized by those groups. How might you search for groups that host events about a certain topic?

If your data is flat structured, then you might as well skip this chapter and move on to scaling out, which will be discussed in chapter 9. This is typically the case for logs, where you have independent fields, such as timestamp, severity, and message. If, on the other hand, you have related entities in your data, such as blog posts and comments, users and products, and so on, then by now you may wonder how you should best represent those relationships in your documents so you can run queries and aggregations across those relationships.

With Elasticsearch you don’t have joins like in an SQL database. As we’ll discuss in section 8.4 on denormalizing (duplicating data), that’s because having query-time joins in a distributed system is typically slow, and Elasticsearch strives to be real time and return query results in milliseconds. On the upside, there are multiple ways to define relationships in Elasticsearch. You can, for example, search for events based on their locations or search for groups based on properties of the events they host. We’ll explore all the possibilities for defining relationships among documents in Elasticsearch—object types, nested documents, parent-child relationships, and denormalizing—and we’ll explore the advantages and disadvantages of each in this chapter.