A.5. Shape intersections

Elasticsearch can index documents with shapes, such as polygons showing the area of a park, and filter documents based on whether parks overlap other shapes, such as the city center. It does this by default through the geohashes discussed in the previous section. The process is described in figure A.4: each shape is approximated (we’ll discuss precision later) to a group of rectangles defined by geohashes. When you search, Elasticsearch will easily find out if at least one geohash of a certain shape overlaps a geohash of another shape.

Figure A.4. Shapes represented in geohashes. Searching for shapes matching shape 1 will return shape 2.

A.5.1. Indexing shapes

Let’s say you have a shape of a park that’s a polygon with four corners. To index it, you’d first have to define a mapping of that shape field—let’s call it area—of type geo_shape. With the mapping in place, you can start indexing documents: the area field of each document would have to mention that the shape’s type is polygon and show the array of coordinates for that polygon, as shown in the next listing.

Polygons aren’t the only shape type Elasticsearch supports. You can have multiple polygons in a single shape (type: multipolygon). There are also the point and multipoint types, one or more chained lines (linestring, multilinestring), rectangles (envelope), and more. You can find the complete list here: www.elastic.co/guide/en/elasticsearch/reference/current/mapping-geo-shapetype.html.

The amount of space a shape occupies in your index depends heavily on how you index it. Because geohashes can only approximate most shapes, it’s up to you to define how small those geohash rectangles can be. The smaller they are, the better the resolution/approximation, but your index size increases because smaller geohash cells have longer strings and—more importantly—more parent ngrams to index as well. Depending on where you are in this tradeoff, you’ll specify a precision parameter in your mapping, which defaults to 50m. This means the worst-case scenario is to get an error of 50m.

A.5.2. Filtering overlapping shapes

With your park documents indexed, let’s say you have another four-cornered shape that represents your city center. To see which parks are at least partly in the city center, you’d use the geo shape filter. You can provide the shape definition of your city center in the filter, as shown in the following listing.

Listing A.4. geo shape filter example

If you followed listing A.3, you should see that the indexed shape matches. Change the query to something like [[95, 30.5], [96, 30.5], [95, 31.5], [96, 32.5]], and the query won’t return any hits because there’s no common geohash to trigger an overlap.

Geohashes are powerful because they provide a way to do geospatial search using the same underlying mechanisms as the term queries we discussed throughout the book. Although geohashes are only an approximation of a point or a shape, using them is typically faster than doing calculations or range filtering on raw latitude and longitude numbers, as you saw in the first part of the appendix.