Pink background with magnifying glass and text

How to Query OpenSearch With Boolean Queries

Published November 2022

OpenSearch boolean queries find or match documents using boolean clauses. 

In this article we describe how to construct a boolean query, or bool query.  We will also work through several example OpenSearch bool queries.

OpenSearch Boolean Clauses

The four boolean clauses used for bool queries are filter, must, must_not, and should.

filter Filter prunes the dataset; a document will either fit into a filter or be excluded by it.  Filter queries can reduce datasets to a specific date range, location, or other exact match. 

Filtering increases search performance because the OpenSearch cache stores filter queries. The next time a repeat filter query is run, the results get pulled from the cache.  We will go into more depth about filtering below.

must Must is like the “and” operator used when making a Google search.  Must tells OpenSearch that document matches need to include all the queries that fall under the must clause.  If you have more than one query, then all those queries need to match.

must-not Must_not is like the “not” operator used when making a Google search.  It is the opposite of the must clause.  Must_not tells OpenSearch that document matches cannot include any of the queries that fall under the must_not clause.

should – It would be ideal for the matching documents to include all the queries in the should clause, but they don’t have to be included.  Scoring ranks the matches.

Further down in this article we have a section on how OpenSearch uses scoring. Simply put, the more should queries that a matched document has, the higher the score for that document. 

As a default setting, any match needs to contain at least one of the should queries.  This minimum value is adjustable using the parameter minimum_number_should_match.

Filter Query for Faster Search Performance

Filtered queries are stored in the cache and retrieved from memory. This makes the filter clause important for reducing search times in OpenSearch.  It’s best practice to create a filter for regularly used queries. 

In this example we are creating a filter for active customers within Chicago that have a total spend of greater than $200.00.  This filtered search will automatically be stored in memory by OpenSearch.

As a note, “gte” refers to greater-than-or-equal-to.

				
					GET customers/_search
{
      	“query”: {
      	     	“bool”: {
      	     	     	“filter”: [
      	     	     	     	{“term”: { “state”: “Chicago” }},
      	     	     	     	{“range”: { “total_spend”: {“gte”: “200.00” }}}
      	     	    	],
      	     	}
      	}
}

				
			

For some searches it makes sense to use scoring to rank the relevancy of matches. But, scored searches aren’t compatible with caching.

One way of handling this limitation is to break the search up into two pieces.  First, create a filtered cache for the aspects of the query that don’t require scoring.  Then, run the scored portion of the query on the filtered subset of the dataset.

With that, let’s discuss scoring.

Scoring of Matches

OpenSearch assigns scores to matches based on their relevance to the query.  Scores are represented by a floating number greater than zero. The greater the score the better matched the document is to the search query.

There are ways to override scoring, for instance when using OpenSearch for retail purposes.  There is a helpful post about customizing scoring available here.  It’s a post about Elasticsearch but applies to OpenSearch queries as well.  For the purposes of this article, we will stick to the traditional use of scoring. 

Scores are calculated by comparing the matched documents to the components of the query.  Filters do not affect scoring. 

The exact scoring equation is detailed, and more can be found about it here.  Again, this is documentation for Elasticsearch but also applies to OpenSearch.

OpenSearch Bool Query Examples

Let’s work through a few example queries.

Example Query 1

In this first example we are searching through a grocery inventory for avocados. 

The matches should also contain the term organic and/or California. But, matches do not need to be both organic and be grown in California. This is because both terms are listed under the should clause, not the must clause.

Documents that include both organic and California will have higher scores than matches that only contain organic or California.

				
					
GET grocery/_search
{
      	“query”: {
      	     	“bool”: {
      	     	     	“must”: {
      	     	     	     	“term”: { “text”: “avocado” }
      	     	     	},
      	     	     	“should”: [
      	     	     	    	{“term”: { “text”: “organic” }},
      	     	     	    	{“term”: { “text”: “California }}
      	     	     	]
      	     	}
      	}
}

				
			

Example Query 2

In this next example we adjust the value for the minimum number of should matches.  Here we are searching through a movie database.  We want to find comedies that have at least 2 of the listed awards:  Academy Award, Golden Globe, and/or People’s Choice. 

By default the minimum number of should matches is set to 1, but here we changed the minimum value to 2. 

Our matches will only include comedy films that have received at least 2 of the awards. 

The films that received all 3 awards are also matches.  These films will have a higher score than the films with only 2 awards.

				
					GET movies/_search
{
      	“query”: {
      	     	“bool”: {
      	     	     	“must”: {
      	     	     	     	“term”: { “text”: “comedy” }
      	     	     	},
      	     	     	“should”: [
      	     	     	     	{“term”: { “text”: “Academy Award”   }},
      	     	     	     	{“term”: { “text”: “Golden Globe”   }},
      	     	     	     	{“term”: { “text”: “People’s Choice”   }}
      	     	     	],
      	     	     	“minimum_should_match”: 2
      	     	}
      	}
}

				
			

Example Query 3

For this final example, we are reviewing weather conditions over a period of time.  We are searching a database that has temperatures for cities around the globe. 

We want to determine what cities had temperatures at or below 0.00 °C between January 15, 2022 and October 20, 2022. 

We are using the “lte” parameter here which means less-than-or-equal-to.

				
					GET weather/_search
{
      	“query”: {
      	     	“bool”: {
      	     	     	“must”: [
      	     	     	     	{“query_string”: { “query”:”log_timestamp: [2022-15-01 TO 2022-10-20]” }},
      	     	     	     	{“range’: {“temperature”: {“lte”: “0.00”}}}
      	     	     	]
      	     	}
      	}
}

				
			

OpenSearch Querying Resources

We want to direct readers to another article on OpenSearch queries.  Vector search is growing in popularity because of its ability to find similarities between items/documents.  Check out our post Vector Search on OpenSearch to learn more about it.

Have OpenSearch Questions?

Managed OpenSearch on your environment with
24/ 7 support.

Consulting support to implement, troubleshoot, and optimize OpenSearch.

Schedule a call with a OpenSearch solution architect.