Elasticsearch Query With “Not Contains” Example
Elasticsearch is a distributed full-text search engine where most queries operate by matching documents that contain tokens. To find documents that do not contain a token or pattern, we typically use boolean queries with a must_not clause. However, care is needed because the semantics depend on analyzers, the field mapping, and whether the field is indexed as a keyword or analyzed text. Let us delve into understanding how to implement an Elasticsearch query with “not contains” and its practical uses.
1. What is Elasticsearch?
Elasticsearch is an open-source, distributed search and analytics engine built on top of the Apache Lucene library. It is widely used for full-text search, structured and unstructured data querying, log analytics, and real-time monitoring applications.
In simple terms, Elasticsearch allows you to store large amounts of data and perform lightning-fast searches across them. It forms the core of the Elastic Stack (ELK Stack), which includes Kibana for visualization, Logstash for data processing, and Beats for lightweight data shipping.
Elasticsearch indexes documents in JSON format and makes them searchable through a powerful Query DSL (Domain Specific Language). This enables users to perform complex queries like regexp, wildcard, or must_not filters, making it ideal for scenarios such as searching logs, products, or text data with conditions like “elasticsearch query not contains“.
Overall, Elasticsearch combines scalability, speed, and flexibility, making it one of the most popular search and analytics solutions used in modern software architectures.
1.1 Setup on Docker
In this section, we will set up a single-node Elasticsearch instance using Docker. Docker provides an easy and isolated environment to run Elasticsearch without worrying about dependencies or system configurations. Below is a simple docker-compose.yml configuration using the official Elasticsearch image:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.10.2
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- xpack.security.enabled=false
ports:
- 9200:9200
volumes:
- esdata:/usr/share/elasticsearch/data
volumes:
esdata:
Start the container using: docker compose up -d. Once started, Elasticsearch will be available at http://localhost:9200. You can verify it by running: curl http://localhost:9200. Next, we create an index named products with a custom analyzer. This index will have two main fields:
name— atextfield using a lowercase analyzer for analyzed text searchesname.keyword— akeywordsubfield for exact match, wildcard, and regexp queries
1.1.1 Index Creation with Mappings and Analyzer
This command creates the products index with custom mappings and a lowercase analyzer to ensure consistent tokenization and case-insensitive search.
curl -X PUT "http://localhost:9200/products" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"analyzer": {
"lc_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "lc_analyzer",
"fields": {
"keyword": { "type": "keyword" }
}
},
"description": { "type": "text", "analyzer": "lc_analyzer" }
}
}
}
'
Now, let’s index some sample documents using the _bulk API for efficiency:
curl -X POST "http://localhost:9200/products/_bulk?refresh=wait_for" -H 'Content-Type: application/json' -d'
{ "index": { "_id": 1 }}
{ "name": "Red cotton tshirt", "description": "Comfortable red shirt" }
{ "index": { "_id": 2 }}
{ "name": "Blue denim jeans", "description": "Dark blue jeans" }
{ "index": { "_id": 3 }}
{ "name": "Green cotton shorts", "description": "Light green shorts" }
{ "index": { "_id": 4 }}
{ "name": "Black leather jacket", "description": "Stylish jacket" }
'
After indexing, you can start querying your data. To exclude documents containing certain terms or patterns, Elasticsearch provides the bool query with a must_not clause. This allows you to implement “not contains” logic efficiently.
2. Code Examples
2.1.1 Regexp Queries with must_not for Exclusions
Use regular expression queries against the keyword subfield for predictable substring/regex semantics. Regexp queries on analyzed text are less predictable because of tokenization.
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"must_not": [
{ "regexp": { "name.keyword": ".*tshirt.*" }},
{ "regexp": { "name.keyword": ".*jacket$" }}
]
}
}
}
'
This query uses a bool query with a must clause containing match_all to consider all documents initially, and a must_not clause with two regexp queries applied on the name.keyword field. Using keyword ensures the entire string is matched without tokenization, making the regex predictable. The first regex .*tshirt.* excludes any document whose name contains “tshirt” anywhere, while the second regex .*jacket$ excludes documents whose name ends with “jacket”. Together, these rules filter out unwanted products and return only the documents that do not match either pattern.
{
"hits": {
"total": { "value": 2, "relation": "eq" },
"hits": [
{ "_id": "2", "_source": { "name": "Blue denim jeans", "description": "Dark blue jeans" } },
{ "_id": "3", "_source": { "name": "Green cotton shorts", "description": "Light green shorts" } }
]
}
}
In this example, documents containing “tshirt” or ending with “jacket” in the name field are excluded, leaving only the remaining products.
2.1.2 Wildcard Queries with must_not for Exclusions
Wildcard queries are simpler for basic substring-style exclusions. Use them on the keyword field. Beware: wildcards that start with * can be slow on large datasets.
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"must_not": [
{ "wildcard": { "name.keyword": "*cotton*" }}
]
}
}
}
'
This query uses a bool query with match_all in the must clause to initially consider all documents, and a must_not clause containing a wildcard query on the name.keyword field. The wildcard pattern *cotton* excludes any document whose name contains the substring “cotton”. Using the keyword subfield ensures the entire field value is evaluated without tokenization. While wildcard queries are simple and intuitive for substring matching, patterns with leading asterisks (*) can be slower on large datasets because they require scanning many terms in the index.
{
"hits": {
"total": { "value": 2, "relation": "eq" },
"hits": [
{ "_id": "2", "_source": { "name": "Blue denim jeans", "description": "Dark blue jeans" } },
{ "_id": "4", "_source": { "name": "Black leather jacket", "description": "Stylish jacket" } }
]
}
}
In this output, the query has excluded all documents where the name field contains the substring “cotton”. As a result, only the documents with names “Blue denim jeans” and “Black leather jacket” are returned. Documents like “Red cotton tshirt” and “Green cotton shorts” are filtered out because they match the wildcard pattern *cotton*.
2.1.3 Query String Queries with must_not for Exclusions
query_string supports expressive Lucene syntax (wildcards, boolean operators). It can search multiple fields and honor analyzers when used on text fields.
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"must_not": [
{
"query_string": {
"fields": ["description"],
"query": "dark OR comfortable"
}
}
]
}
}
}
'
This query uses a bool query with match_all in the must clause to initially consider all documents, and a must_not clause containing a query_string query on the description field. The query string “dark OR comfortable” tells Elasticsearch to exclude any document whose description contains either the word “dark” or “comfortable”. Because description is a text field with a lowercase analyzer, the matching is case-insensitive and token-based. This approach allows complex exclusion logic using Boolean operators and multiple fields while still respecting analyzers.
{
"hits": {
"total": { "value": 2, "relation": "eq" },
"hits": [
{ "_id": "3", "_source": { "name": "Green cotton shorts", "description": "Light green shorts" } },
{ "_id": "4", "_source": { "name": "Black leather jacket", "description": "Stylish jacket" } }
]
}
}
In this output, the query has excluded documents whose description contains the words “dark” or “comfortable”. Therefore, documents like “Red cotton tshirt” and “Blue denim jeans” are filtered out, leaving only “Green cotton shorts” and “Black leather jacket” in the search results.
2.1.4 Filtering with must_not Using Match and a Custom Analyzer
Using match is the most common for natural language fields. To exclude documents that contain a token, ensure your analyzer tokenizes the content the way you expect. We’ll demonstrate how a custom analyzer affects results.
curl -X GET "http://localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"must_not": [
{ "match": { "name": "blue" }}
]
}
}
}
'
This query uses a bool query with match_all in the must clause to initially consider all documents, and a must_not clause with a match query on the name field to exclude any document containing the token “blue”. Because the name field uses a custom lowercase analyzer (lc_analyzer), the matching is case-insensitive, so “Blue” and “blue” are treated the same.
{
"hits": {
"total": { "value": 2, "relation": "eq" },
"hits": [
{ "_id": "1", "_source": { "name": "Red cotton tshirt", "description": "Comfortable red shirt" } },
{ "_id": "3", "_source": { "name": "Green cotton shorts", "description": "Light green shorts" } }
]
}
}
In this output, the document “Blue denim jeans” is excluded because its name contains the token “blue”, leaving only the remaining products in the search results.
3. Conclusion
There are multiple ways to express “not contains” semantics in Elasticsearch. The approach you choose depends on several factors: whether you want token-aware (analyzed) exclusions or literal substring exclusions, the performance trade-offs since wildcards and regexes can be expensive, and whether you need case-insensitivity or partial token matching, which can be achieved using analyzers like lowercase or n-grams.


