How to Build Production‑Ready Chatbots with OpenSearch

Why OpenSearch for chatbots?

OpenSearch gives you the pieces you need to power modern chatbots: fast vector search for semantic retrieval, neural sparse + hybrid search to blend keyword and semantic signals, search pipelines for query orchestration and reranking, and built‑in conversation memory plus RAG processors to talk to an LLM. That means you can keep your knowledge base in OpenSearch, retrieve the right context in milliseconds, and generate grounded answers with your LLM of choice.

Vector search

OpenSearch implements k‑nearest neighbors (k‑NN) over knn_vector fields with approximate (ANN) and exact methods, so you can store document embeddings and retrieve the most similar chunks at low latency.

Neural sparse + hybrid search

You can generate sparse embeddings (token:weight pairs) and combine them with dense vectors and keyword BM25 in a hybrid query—often improving relevance over any single technique.

Pipelines & reranking

Search pipelines orchestrate steps like query rewriting, normalization, and reranking (e.g., via Cohere/Amazon Bedrock models or by-document fields) without moving data out of OpenSearch.

RAG & conversation memory

OpenSearch provides conversation memory APIs and a RAG processor that fetches prior messages + retrieved docs and sends them to an LLM—then stores the LLM reply back into memory.

Vector search

Neural sparse + hybrid search

Pipelines & reranking

RAG & conversation memory

You can generate sparse embeddings (token:weight pairs) and combine them with dense vectors and keyword BM25 in a hybrid query—often improving relevance over any single technique.

Search pipelines orchestrate steps like query rewriting, normalization, and reranking (e.g., via Cohere/Amazon Bedrock models or by-document fields) without moving data out of OpenSearch.

OpenSearch provides conversation memory APIs and a RAG processor that fetches prior messages + retrieved docs and sends them to an LLM—then stores the LLM reply back into memory.

A reference architecture for an OpenSearch‑powered chatbot

Here is a basic architecture for a chatbot built with OpenSearch. First, data is ingested and chunked. During ingestion, each chunk (dense or sparse) is embedded. Then chunks are indexed in an OpenSearch index with (for dense) and rank_features (for sparse). Query orchestration is carried out using a search pipeline (hybrid search + normalization + optional rerank). Conversation memory saves both user and assistant turns.
Finally, the RAG step sends the retrieved chunks and the conversation history to your LLM to compose the final answer.

Want help building a chatbot with OpenSearch?

Dattell designs, deploys, and operates OpenSearch‑backed chatbots end‑to‑end—embedding pipelines, hybrid retrieval, search pipelines, conversation memory/RAG, and production SLAs across AWS/Azure/GCP or on‑prem.

Support for OpenSearch AI tools

Want help building a chatbot with OpenSearch?

Support for OpenSearch AI tools

Step 1 — Create a vector index for your knowledge base

Dense embeddings typically use 384–1024 dimensions depending on the model. Enable k‑NN and map a knn_vector field.

				
					PUT kb_docs
{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "doc_id":  { "type": "keyword" },
      "title":   { "type": "text" },
      "content": { "type": "text" },
      "embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "space_type": "cosinesimil",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}

Step 2 — Generate embeddings automatically at ingest

Use an ingest pipeline with a text‑embedding processor so every new/updated document gets an embedding written into your embedding field.

				
					PUT _ingest/pipeline/kb_embed
{
  "description": "Create embeddings for content -> embedding",
  "processors": [
    {
      "text_embedding": {
        "model_id": "<your_embedding_model_id>",
        "field_map": { "content": "embedding" }
      }
    }
  ]
}

Next, index documents with that pipeline. The text_embedding processor calls your registered model and writes the vector to embedding automatically. You can also call registered models from ingest using the generic ml_inference processor.

				
					POST kb_docs/_doc?pipeline=kb_embed
{ "doc_id":"kb-001", "title":"Reset VPN", "content":"Steps to reset the VPN on macOS..." }

Step 3 — Retrieval: dense, sparse, and hybrid

Dense k‑NN

				
					POST kb_docs/_search
{
  "size": 5,
  "query": {
    "knn": {
      "embedding": {
        "vector": [/* your query embedding */],
        "k": 50
      }
    }
  }
}

Neural sparse + hybrid

OpenSearch can generate sparse embeddings and combine them with BM25 and/or dense vectors in a hybrid query—often a strong default for chatbots that must respect keywords, acronyms, and product names while capturing semantics.

Step 4 — Orchestrate with a search pipeline (+ optional reranking)

Search pipelines let you modularize retrieval steps server‑side: normalize scores, enrich queries, call ML for inference, and rerank results—so your app code stays simple.

				
					PUT /_search/pipeline/hybrid_pipeline
{
  "request_processors": [
    { "filter_query": { "query": { "term": { "is_public": true } } } }
  ],
  "response_processors": [
    { "normalization": { "technique": "min_max" } },
    { "rerank": { "type": "by_field", "field": "rerank_score", "size": 50 } }
  ]
}

Attach it per‑request:
GET kb_docs/_search?search_pipeline=hybrid_pipeline

OpenSearch documents the primitives (request/response processors) and shows end‑to‑end examples for reranking (by field or with cross‑encoders/hosted models).

Step 5 — Add conversation memory and RAG

OpenSearch ships conversation memory (stores message history) and a RAG processor that sends retrieved docs + prior turns to your LLM, then stores the answer back in memory.

Create memory:

				
					POST /_plugins/_ml/memory/
{ "name": "customer-support-chat" }

Use RAG at query time:

				
					GET /kb_docs/_search
{
  "query": { "match": { "content": "How do I reset my VPN?" } },
  "ext": {
    "generative_qa_parameters": {
      "llm_model": "gpt-3.5-turbo",
      "llm_question": "User asks: How to reset VPN on macOS?",
      "memory_id": "<your_memory_id>",
      "context_size": 5,
      "message_size": 5
    }
  }
}

Under the hood, the pipeline fetches search results and recent messages, then calls the LLM via your connector. It persists the reply and returns both the documents and the generated answer.

Putting it together from an app (Python)

Below is a minimal pattern using opensearch-py. In production you’ll: a) embed queries client‑side (or use a request processor), b) call the search pipeline, and c) display sources alongside the LLM answer.

				
					from opensearchpy import OpenSearch

client = OpenSearch(
    hosts=[{"host": "localhost", "port": 9200}],
    http_auth=("admin", "admin"),
    use_ssl=False
)

# 1) Embed the user question with the same model used at ingest
# query_vec = embed("How do I reset my VPN on macOS?")  # your embedding fn

# 2) Retrieve candidates (dense example shown; for hybrid, call your pipeline)
resp = client.search(
    index="kb_docs",
    body={
        "size": 8,
        "query": {
            "knn": { "embedding": { "vector": query_vec, "k": 64 } }
        }
    },
    params={"search_pipeline": "hybrid_pipeline"}  # optional
)

# 3) (Option A) Do your own RAG call in app code
# 4) (Option B) Let OpenSearch do RAG + memory; see the REST example above

Security & governance checklist

Here are a few basic security topics that are applicable to most use cases. Many use cases will have unique needs, and this list should not be considered exhaustive. Firstly, enforce index-level and field-level security for confidential data. Secondly, use document-level security and per-tenant routing if the chatbot serves multiple teams or customers. Thirdly, log all prompts, retrieved chunks, and model outputs for auditability. And lastly, add guardrails such as prompt templates, allow-list tools, and timeouts in your pipeline and application.

Want help building a chatbot with OpenSearch?

Check out our OpenSearch Support Services page for more information on how we support teams with their AI workflows.

Support for OpenSearch AI workflows

Want help building a chatbot with OpenSearch?

Check out our OpenSearch Support Services page for more information on how we support teams with their AI workflows.

Support for OpenSearch AI workflows

Want help building a chatbot with OpenSearch?

Check out our OpenSearch Support Services page for more information on how we support teams with their AI workflows.

Support for OpenSearch AI workflows

How to Build Production‑Ready Chatbots with OpenSearch

How to Build Production‑Ready Chatbots with OpenSearch

Why OpenSearch for chatbots?

Vector search

Neural sparse + hybrid search

Pipelines & reranking

RAG & conversation memory

Vector search

Neural sparse + hybrid search

Pipelines & reranking

RAG & conversation memory

A reference architecture for an OpenSearch‑powered chatbot

Want help building a chatbot with OpenSearch?

Want help building a chatbot with OpenSearch?

Step 1 — Create a vector index for your knowledge base

Step 2 — Generate embeddings automatically at ingest

Step 3 — Retrieval: dense, sparse, and hybrid

Step 4 — Orchestrate with a search pipeline (+ optional reranking)

Step 5 — Add conversation memory and RAG

Putting it together from an app (Python)

Security & governance checklist

Want help building a chatbot with OpenSearch?

Want help building a chatbot with OpenSearch?

Want help building a chatbot with OpenSearch?

Discover more from