5.1.3. Token filtering

Once the block of text has been converted into tokens, Elasticsearch will then apply what are called token filters to each token. These token filters take a token as input and can modify, add, or remove more tokens as needed. One of the most useful and common examples of a token filter is the lowercase token filter, which takes in a token and lowercases it to ensure that you will be able to find a get-together about “NoSql” when searching for the term “nosql.” The tokens can go through more than one token filter, each doing different things to the tokens to mold the data into the best format for your index.

In the example in figure 5.1 there are three token filters: the first lowercasing the tokens, the second removing the stopword “and” (we’ll talk about stopwords later in this chapter), and the third adding the term “tools” to “technologies,” using synonyms.