Chapter 1. Introducing Elasticsearch

This chapter covers

Understanding search engines and the issues they address

How Elasticsearch fits in the context of search engines

Typical scenarios for Elasticsearch

Features Elasticsearch provides

Installing Elasticsearch

We use search everywhere these days. And that’s a good thing, because search helps you finish tasks quickly and easily. Whether you’re buying something from an online shop or visiting a blog, you expect to have a search box somewhere to help you find what you’re looking for without scanning the entire website. Maybe it’s me, but when I (Radu) wake up in the morning, I wish I could enter the kitchen and type in “bowl” in a search box somewhere and have my favorite bowl highlighted.

We’ve also come to expect those search boxes to be smart. I don’t want to have to type the entire word “bowl;” I expect the search box to come up with suggestions, and I don’t want results and suggestions to come to me in random order. I want the search to be smart and give me the most relevant results first—to guess what I want, if that’s possible. For example, if I search for “laptop” from an online shop but have to scroll through laptop accessories before I get to a laptop, I’m likely to go somewhere else after the first page of results. And this need for relevant results and suggestions isn’t only because we’re in a hurry and spoiled with good search interfaces; it’s also because there’s increasingly more stuff to choose from. For example, a friend asked me to help her buy a new laptop. Typing “best laptop for my friend” in the search box of an online store that sells thousands of laptops wouldn’t be effective. Good keyword searching is often not enough; you need some statistics on the results so you can narrow them down to what the user is interested in. I narrowed down my laptop search by selecting the size of the screen, the price range, and so on, until I only had five or so laptops to choose from.

Finally, there’s the matter of performance—because nobody wants to wait. I’ve seen websites where you search for something and get the results in few minutes. Minutes! For a search!

If you want to provide search for your data, you’ll have to deal with all these issues: returning relevant search results, returning statistics, and doing all that quickly. This is where search engines like Elasticsearch come into play because they’re built to meet exactly those challenges. You can deploy a search engine on top of a relational database to create indices and speed up the SQL queries. Or you can index data from your NoSQL data store to add search capabilities there. You can do that with Elasticsearch, and it works well with document-oriented stores like MongoDB because data is represented in Elasticsearch as documents, too. Modern search engines like Elasticsearch also do a good job of storing your data so you can use it as a NoSQL data store with powerful search capabilities.

Elasticsearch is open-source and distributed, and it’s built on top of Apache Lucene,[1] an open-source search engine library, which allows you to implement search functionality in your own Java application. Elasticsearch takes this Lucene function and extends it to make storing, indexing, and searching faster, easier, and, as the name suggests, elastic. Also, your application doesn’t need to be written in Java to work with Elasticsearch; you can send data over HTTP in JSON to index, search, and manage your Elasticsearch cluster.

1

More information about Apache Lucene can be found at http://lucene.apache.org/core/.

This chapter expounds on these searching and data features, and you’ll learn how to use them throughout this book. First, let’s take a closer look at the challenges search engines are typically confronted with and Elasticsearch’s approach to solving them.