Elasticsearch Boolean Queries

How to Query Elasticsearch With Boolean Queries

Updated March 2021

Boolean queries in Elasticsearch are a popular query type because of their versatility and ease of use.  Boolean queries, or bool queries, find or match documents by using boolean clauses.  For the vast majority of cases, the filtering clause will be used because it can be cached for faster search times.

In this article we will describe how a boolean query is written and work through several example queries.

Elasticsearch Boolean Clauses

The four boolean clauses used for bool queries are filter, must, must_not, and should.

filter Filter is used to par down the dataset; a document will either fit into a filter or be excluded by it.  Filter queries can be used to reduce datasets to a particular date or date range, specific location, or other exact matches.  It is important to understand that filtering increases search performance. Filter queries are automatically stored in the Elasticsearch cache. The next time the exact same filter query is run, the results will be pulled instantly from the cache.  We will go into more depth about filtering below.

must Must is similar to the “and” operator used when making a Google search.  Using must tells Elasticsearch that document matches need to include all of the queries that fall under the must clause.  If you have more than one query, then all of those queries need to match.

must-not Must_not is similar to the “not” operator used when making a Google search.  It is the opposite of the must clause.  Using must_not tells Elasticsearch that document matches cannot include any of the queries that fall under the must_not clause. 

should – It would be ideal for the matching documents to include all of the queries in the should clause, but they do not have to be included.  Scoring is used to rank the matches.  Further down in this post we have a section on how scoring is used in Elasticsearch.  Simply put, the more should queries that a matched document has, the higher the resulting score for that document.  As a default setting, any match needs to contain at least one of the should queries.  This minimum value can be changed using the parameter minimum_number_should_match.

Filter Query for Faster Search Performance

We want to highlight how important the filter clause is for reducing search times in Elasticsearch. 

If there is a query that is commonly used, then writing that query with the filter clause is ideal because filtered queries are automatically stored in memory for fast retrieval. 

In this example we are creating a filter for active customers within Texas that have a total spend of greater than $200.00.  This filtered search will automatically be stored in memory by Elasticsearch.

As a note, “gte” refers to greater-than-or-equal-to.

GET customers / _search

{

          “query”: {

                   “bool”: {

                            “filter”: [

                                     {“term”: { “state”: “Texas” }},

                                     {“range”: { “total_spend”: {“gte”: “200.00” }}}

                           ],

                   }

          }

}

For some searches it makes sense to use scoring to rank the relevancy of matches. Scored searches aren’t compatible with caching though so keep that in mind. One way of handling this limitation is that you can create a filtered cache for the aspects of the query that don’t require scoring, and then run the scored portion of the query on the filtered subset of the database.  With that, let’s dive into scoring.

Scoring of Matches

Elasticsearch ascribes scores to matches that rank the matched documents by their relevance to the search parameters.  All scores are represented by a floating number greater than zero, and the greater the score the better matched it is to the search query.

There are mechanisms to override scoring, for instance when using Elasticsearch for retail purposes.  There is a good post about customizing scoring available here.  

For the purposes of this article, however, we will stick to the traditional use of scoring.  Scores are calculated by accessing the matched documents to the components of the search.  For instance, if the search is as follows:

GET food/_search

{

          “query”: {

                   “bool”: {

                            “must”: {

                                     “term”: { “text”: “chicken fried steak” }

                            }

                  }

          }

}

Then Elasticsearch will consider each of the three words in the “text” field separately, as if the query was written as:

GET food/_search

{

          “query”: {

                   “bool”: {

                            “must”: [

                                   {“term”: { “text”: “chicken” }},

                                   {“term”: { “text”: “fried” }},

                                   {“term”: { “text”: “steak” }}

                            ]

                   }

          }

}

Filters do not affect scoring. 

The exact scoring equation is detailed, and more can be found about it here

Elasticsearch Bool Query Examples

Let us run through a few example queries.  In this first example we are searching through a grocery inventory for avocados. 

The matches should also contain the term organic and/or California. However, matches do not need to be both organic and be grown in California because both terms are listed under the should clause, not the must clause.

Matches that include both the terms organic and California will have a higher score than the matches that only contain one of those terms.

GET grocery/_search

{

          “query”: {

                   “bool”: {

                            “must”: {

                                     “term”: { “text”: “avocado” }

                            },

                            “should”: [

                                    {“term”: { “text”: “organic” }},

                                    {“term”: { “text”: “California }}

                            ]

                   }

          }

}

In this next example we manually adjust the value for the minimum number of should matches.  Here we are searching through a movie database.  We want to find comedies that have at least two of the listed awards:  Academy Award, Golden Globe, and/or People’s Choice. 

By default the minimum number of should matches is set to 1, but here we changed the minimum value to 2.  In our returned matches only comedy films that have received at least two of the awards will be listed.  Those movies that received three awards will also be listed as matches and will have a higher score than those movies with only two awards.

GET movies/_search

{

          “query”: {

                   “bool”: {

                            “must”: {

                                     “term”: { “text”: “comedy” }

                            },

                            “should”: [

                                     {“term”: { “text”: “Academy Award”   }},

                                     {“term”: { “text”: “Golden Globe”   }},

                                     {“term”: { “text”: “People’s Choice”   }}

                            ],

                            “minimum_should_match”: 2

                   }

          }

}

For this final example, we are reviewing weather conditions over a period of time.  Here we are searching a database that has temperatures for cities all over the globe.  We want to determine between March 4, 2021 and March 8, 2021 what cities had temperatures at or below 0.00 °C.  We are using the “lte” parameter here which means less-than-or-equal-to.

GET weather/_search

{

          “query”: {

                   “bool”: {

                            “must”: [

                                     {“query_string”: { “query”:”log_timestamp: [2021-03-04 TO 2021-03-08]” }},

                                     {“range’: {“temperature”: {“lte”: “0.00”}}}

                            ]

                   }

          }

}

Elasticsearch Querying and Indexing Resources

We want to direct readers of this article to two other articles on similar topics around Elasticsearch indexing and querying.  The first is How to Index Elasticsearch.  That article defines how indices are used to both organize and distribute data within a cluster. 

The second article is How to Query Elasticsearch in Kibana.  This article covers both Lucene and Kibana Query Syntax (KQL) and gives examples of querying with both.

As always feel welcome to reach out to us with your questions.

Elastic Stack Consulting Services

If you are interested in 24/7 support, consulting, and/or fully managed Elasticsearch services on your environment, you can find more information on our Elasticsearch consulting page.

Schedule a call with an Elastic Stack engineer.

Leave a Reply