8.6. Application-side joins

Instead of denormalizing, another option for the groups and members relationship is to keep them in separate indices and do the joins from your application. Much like Elasticsearch does with parent-child, it requires you to store IDs to indicate which member belongs to which group, and you have to query both.

For example, if you have a query for groups with “Denver” in the name, where “Lee” or “Radu” is a member, you can run a bool query on members first to find out which ones are Lee and Radu. Once you get the IDs, you can run a second query on groups, where you add the member IDs in a terms filter next to the Denver query. The whole process is illustrated in figure 8.21.

Figure 8.21. Application-side joins require you to run two queries.

This works well when there aren’t many matching members. But if you want to include all members from a city, for example, the second query will have to run a terms filter with possibly thousands of members, making it expensive. Still, there are some things you can do:

When you run the first query, if you need only member IDs, you can disable retrieving the _source field to reduce traffic:

"query": {

"filtered": {

[...]

}

},

"_source": false

In the second query, if you have lots of IDs, it might be faster to execute the terms filter on field data:

"query": {

"filtered": {

"filter": {

"terms": {

"members": [1, 4],

"execution": "fielddata"

}

}

}

}

We’ll cover more about performance in chapter 10, but when you model document relations, it ultimately comes down to picking your battles.