Elasticsearch Semantic Search Tutorial | Multilingual-E5 + kNN Step-by-Step Guide

Elasticsearch Semantic Search Tutorial - Deploy Multilingual-E5 and Perform kNN Search thumbnail

By Dhiraj Ray 03 March, 2026

This complete developer guide shows you how to build semantic search in Elasticsearch 9 using the multilingual-e5-small embedding model.

Unlike traditional keyword search (BM25), semantic search uses vector embeddings to understand meaning and context.

In my last article, we implemented ElasticSearch vector search in spring boot. Here, we will mostly learn about the setup and executing JSON query on Elastic endpoints.

What We Will Learn

How to install Python and required ML dependencies
How to import a HuggingFace model into Elasticsearch using Eland
How to deploy ML models in Elasticsearch
How to create a dense_vector index
How to build an ingest pipeline for automatic embeddings
How to index documents with vectors
How to perform kNN semantic search
How to implement hybrid search (semantic + keyword)

Step 1: Verify Elasticsearch ML Capabilities

GET https://localhost:9200/_ml/info

If ML not enabled -> enable in elasticsearch.yml:

xpack.ml.enabled: true es-ml-info-postman

Step 2: Install Python and Required Packages

Install Python 3.9+

python --version

You can find the downloads file here. Install Python 3.11 (most stable for ML tools).

Install Eland and Transformers

py -3.11 -m pip install "eland[pytorch]"

This will install:

torch
transformers
tqdm
sentencepiece
required ML libraries

Why These Packages?

torch -> Required for model execution
transformers -> Loads HuggingFace models
eland -> Imports ML models into Elasticsearch

Step 3: Import Multilingual-E5 Model into Elasticsearch

We will import: intfloat/multilingual-e5-small


C:\Users\only2\AppData\Local\Programs\Python\Python311\Scripts\eland_import_hub_model.exe `
  --url https://localhost:9200 `
  -u elastic `
  -p <password>`
  --ca-certs "C:\D\soft\elasticsearch-9.2.5\config\certs\http_ca.crt" `
  --hub-model-id intfloat/multilingual-e5-small `
  --task-type text_embedding `
  --es-model-id multilingual-e5-small

All of that is part of Elastic ML, which requires trial/enterprise.

Step 4: Deploy the Model

POST https://localhost:9200/_ml/trained_models/multilingual-e5-small/deployment/_start

Test Inference


    POST https://localhost:9200/_ml/trained_models/multilingual-e5-small/_infer
    {
  "docs": [
    {
      "text_field": "Elasticsearch enables semantic search."
    }
  ]
}

Step 5: Create a Vector Index Using dense_vector

PUT https://localhost:9200/semantic-docs


{
  "mappings": {
    "properties": {
      "content": { "type": "text" },
      "content_vector": {
        "type": "dense_vector",
        "dims": 384,
        "index": true,
        "similarity": "cosine",
        "index_options": {
          "type": "bbq_hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  }
}

Important: The model outputs 384-dimensional embeddings.

Step 6: Create an Ingest Pipeline for Automatic Embedding Generation

PUT https://localhost:9200/_ingest/pipeline/semantic-pipeline


{
  "processors": [
    {
      "inference": {
        "model_id": "multilingual-e5-small",
        "field_map": {
          "content": "text_field"
        },
        "target_field": "ml_output"
      }
    },
    {
      "script": {
        "source": "ctx.content_vector = ctx.ml_output.predicted_value; ctx.remove('ml_output');"
      }
    }
  ]
}

Step 7: Index Documents with Embeddings

POST https://localhost:9200/semantic-docs/_doc?pipeline=semantic-pipeline

{
  "content": "Vector embeddings help computers understand natural language meaning."
}

Repeat for multiple documents.

{
  "content": "Elasticsearch enables semantic search using vector embeddings."
}

{
  "content": "Machine learning improves information retrieval accuracy."
}

{
  "content": "Spring Boot integrates easily with Elasticsearch for search applications."
}

    {
  "content": "Traditional keyword search relies on exact term matching."
}

Step 8: Perform kNN Semantic Search

POST https://localhost:9200/semantic-docs/_search


{
  "knn": {
    "field": "content_vector",
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "multilingual-e5-small",
        "model_text": "machine learning search"
      }
    },
    "k": 3,
    "num_candidates": 100
  },
  "query": {
    "match": {
      "content": "machine learning"
    }
  }
}

Elasticsearch:

Generates embedding for query
Computes cosine similarity
Returns nearest neighbors

Hybrid Search (Semantic + BM25)

Combine keyword and vector search for production systems:


{
  "knn": {
    "field": "content_vector",
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "multilingual-e5-small",
        "model_text": "machine learning search"
      }
    },
    "k": 3,
    "num_candidates": 100
  },
  "query": {
    "match": {
      "content": "machine learning"
    }
  }
}

Why Use Semantic Search in Elasticsearch?

Understands context instead of exact words
Handles synonyms naturally
Improves relevance dramatically
Works across multiple languages

Document Indexing with Semantic Embeddings

Once you have text embeddings working, the next natural step is document-level semantic search. Instead of treating a document as a single block of text, we break it into smaller chunks-paragraphs, sections, or fixed word-length segments. Each chunk is then converted into a dense vector using the multilingual-e5-small model, making it searchable in a semantic vector space. This allows you to find not just the right document, but the exact passage or sentence that matches a query.

Index Setup

We define an Elasticsearch index with fields for document_id, page_number, chunk_index, content, and content_vector:


PUT https://localhost:9200/semantic-docs
{
  "mappings": {
    "properties": {
      "document_id": { "type": "keyword" },
      "page_number": { "type": "integer" },
      "chunk_index": { "type": "integer" },
      "content": { "type": "text" },
      "content_vector": {
        "type": "dense_vector",
        "dims": 384,
        "index": true,
        "similarity": "cosine",
        "index_options": {
          "type": "bbq_hnsw",
          "m": 16,
          "ef_construction": 100
        }
      }
    }
  }
}

Ingest Pipeline

Next, we set up an ingest pipeline that automatically converts each chunk into an embedding when it's indexed:


PUT https://localhost:9200/_ingest/pipeline/semantic-pipeline
{
  "processors": [
    {
      "inference": {
        "model_id": "multilingual-e5-small",
        "field_map": { "content": "text_field" },
        "target_field": "ml_output",
        "inference_config": { "text_embedding": {} }
      }
    },
    {
      "script": {
        "source": "ctx.content_vector = ctx.ml_output.predicted_value; ctx.remove('ml_output');"
      }
    }
  ]
}

Indexing Document Chunks

When a document is indexed, each chunk passes through the pipeline, and the resulting vector is stored alongside the text:


POST https://localhost:9200/semantic-docs/_doc?pipeline=semantic-pipeline
{
  "document_id": "doc_001",
  "page_number": 1,
  "chunk_index": 0,
  "content": "Elasticsearch is a distributed search engine designed for full-text and semantic search."
}

Semantic Search in Documents

You can then perform semantic searches to find the most relevant chunks:


POST https://localhost:9200/semantic-docs/_search
{
  "size": 3,
  "knn": {
    "field": "content_vector",
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "multilingual-e5-small",
        "model_text": "How does Elasticsearch understand text meaning?"
      }
    },
    "k": 3,
    "num_candidates": 100
  }
}

The response includes metadata like document_id, page_number, and chunk_index, letting you pinpoint the exact passage that matches the query. You can also aggregate results by document_id for top document-level matches. This approach extends your text embedding pipeline with almost no extra work, and any new document can be chunked and indexed to immediately support semantic search.

Frequently Asked Questions (FAQ)

What is dense_vector in Elasticsearch?

dense_vector is a field type that stores numerical vector embeddings for similarity search.

What is multilingual-e5-small?

A HuggingFace embedding model that generates 384-dimensional sentence embeddings.

What similarity metric should I use?

Cosine similarity is recommended for E5 models.

Conclusion

We now have a production-ready semantic search pipeline built entirely inside Elasticsearch.

Support Us!

Buying me a coffee helps keep the project running and supports new features.

Thank you for helping this blog thrive!

I write about cryptography, web security, and secure software development. Creator of practical crypto validation tools at Devglan.

Elasticsearch Semantic Search Tutorial - Deploy Multilingual-E5 and Perform kNN Search