This complete developer guide shows you how to build semantic search in Elasticsearch 9 using the multilingual-e5-small embedding model.
Unlike traditional keyword search (BM25), semantic search uses vector embeddings to understand meaning and context.
In my last article, we implemented ElasticSearch vector search in spring boot. Here, we will mostly learn about the setup and executing JSON query on Elastic endpoints.
What We Will Learn
- How to install Python and required ML dependencies
- How to import a HuggingFace model into Elasticsearch using Eland
- How to deploy ML models in Elasticsearch
- How to create a dense_vector index
- How to build an ingest pipeline for automatic embeddings
- How to index documents with vectors
- How to perform kNN semantic search
- How to implement hybrid search (semantic + keyword)
Step 1: Verify Elasticsearch ML Capabilities
GET https://localhost:9200/_ml/info
If ML not enabled -> enable in elasticsearch.yml:
xpack.ml.enabled: true
Step 2: Install Python and Required Packages
Install Python 3.9+
python --version
You can find the downloads file here. Install Python 3.11 (most stable for ML tools).
Install Eland and Transformers
py -3.11 -m pip install "eland[pytorch]"
This will install:
- torch
- transformers
- tqdm
- sentencepiece
- required ML libraries
Why These Packages?
- torch -> Required for model execution
- transformers -> Loads HuggingFace models
- eland -> Imports ML models into Elasticsearch
Step 3: Import Multilingual-E5 Model into Elasticsearch
We will import: intfloat/multilingual-e5-small
C:\Users\only2\AppData\Local\Programs\Python\Python311\Scripts\eland_import_hub_model.exe ` --url https://localhost:9200 ` -u elastic ` -p <password>` --ca-certs "C:\D\soft\elasticsearch-9.2.5\config\certs\http_ca.crt" ` --hub-model-id intfloat/multilingual-e5-small ` --task-type text_embedding ` --es-model-id multilingual-e5-small
All of that is part of Elastic ML, which requires trial/enterprise.
Step 4: Deploy the Model
POST https://localhost:9200/_ml/trained_models/multilingual-e5-small/deployment/_start
Test Inference
POST https://localhost:9200/_ml/trained_models/multilingual-e5-small/_infer
{
"docs": [
{
"text_field": "Elasticsearch enables semantic search."
}
]
}
Step 5: Create a Vector Index Using dense_vector
PUT https://localhost:9200/semantic-docs
{
"mappings": {
"properties": {
"content": { "type": "text" },
"content_vector": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "bbq_hnsw",
"m": 16,
"ef_construction": 100
}
}
}
}
}
Important: The model outputs 384-dimensional embeddings.
Step 6: Create an Ingest Pipeline for Automatic Embedding Generation
PUT https://localhost:9200/_ingest/pipeline/semantic-pipeline
{
"processors": [
{
"inference": {
"model_id": "multilingual-e5-small",
"field_map": {
"content": "text_field"
},
"target_field": "ml_output"
}
},
{
"script": {
"source": "ctx.content_vector = ctx.ml_output.predicted_value; ctx.remove('ml_output');"
}
}
]
}
Step 7: Index Documents with Embeddings
POST https://localhost:9200/semantic-docs/_doc?pipeline=semantic-pipeline
{
"content": "Vector embeddings help computers understand natural language meaning."
}
Repeat for multiple documents.
{
"content": "Elasticsearch enables semantic search using vector embeddings."
}
{
"content": "Machine learning improves information retrieval accuracy."
}
{
"content": "Spring Boot integrates easily with Elasticsearch for search applications."
}
{
"content": "Traditional keyword search relies on exact term matching."
}
Step 8: Perform kNN Semantic Search
POST https://localhost:9200/semantic-docs/_search
{
"knn": {
"field": "content_vector",
"query_vector_builder": {
"text_embedding": {
"model_id": "multilingual-e5-small",
"model_text": "machine learning search"
}
},
"k": 3,
"num_candidates": 100
},
"query": {
"match": {
"content": "machine learning"
}
}
}
Elasticsearch:
- Generates embedding for query
- Computes cosine similarity
- Returns nearest neighbors
Hybrid Search (Semantic + BM25)
Combine keyword and vector search for production systems:
{
"knn": {
"field": "content_vector",
"query_vector_builder": {
"text_embedding": {
"model_id": "multilingual-e5-small",
"model_text": "machine learning search"
}
},
"k": 3,
"num_candidates": 100
},
"query": {
"match": {
"content": "machine learning"
}
}
}
Why Use Semantic Search in Elasticsearch?
- Understands context instead of exact words
- Handles synonyms naturally
- Improves relevance dramatically
- Works across multiple languages
Document Indexing with Semantic Embeddings
Once you have text embeddings working, the next natural step is document-level semantic search. Instead of treating a document as a single block of text, we break it into smaller chunks-paragraphs, sections, or fixed word-length segments. Each chunk is then converted into a dense vector using the multilingual-e5-small model, making it searchable in a semantic vector space. This allows you to find not just the right document, but the exact passage or sentence that matches a query.
Index Setup
We define an Elasticsearch index with fields for document_id, page_number, chunk_index, content, and content_vector:
PUT https://localhost:9200/semantic-docs
{
"mappings": {
"properties": {
"document_id": { "type": "keyword" },
"page_number": { "type": "integer" },
"chunk_index": { "type": "integer" },
"content": { "type": "text" },
"content_vector": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "bbq_hnsw",
"m": 16,
"ef_construction": 100
}
}
}
}
}
Ingest Pipeline
Next, we set up an ingest pipeline that automatically converts each chunk into an embedding when it's indexed:
PUT https://localhost:9200/_ingest/pipeline/semantic-pipeline
{
"processors": [
{
"inference": {
"model_id": "multilingual-e5-small",
"field_map": { "content": "text_field" },
"target_field": "ml_output",
"inference_config": { "text_embedding": {} }
}
},
{
"script": {
"source": "ctx.content_vector = ctx.ml_output.predicted_value; ctx.remove('ml_output');"
}
}
]
}
Indexing Document Chunks
When a document is indexed, each chunk passes through the pipeline, and the resulting vector is stored alongside the text:
POST https://localhost:9200/semantic-docs/_doc?pipeline=semantic-pipeline
{
"document_id": "doc_001",
"page_number": 1,
"chunk_index": 0,
"content": "Elasticsearch is a distributed search engine designed for full-text and semantic search."
}
Semantic Search in Documents
You can then perform semantic searches to find the most relevant chunks:
POST https://localhost:9200/semantic-docs/_search
{
"size": 3,
"knn": {
"field": "content_vector",
"query_vector_builder": {
"text_embedding": {
"model_id": "multilingual-e5-small",
"model_text": "How does Elasticsearch understand text meaning?"
}
},
"k": 3,
"num_candidates": 100
}
}
The response includes metadata like document_id, page_number, and chunk_index, letting you pinpoint the exact passage that matches the query. You can also aggregate results by document_id for top document-level matches. This approach extends your text embedding pipeline with almost no extra work, and any new document can be chunked and indexed to immediately support semantic search.
Frequently Asked Questions (FAQ)
What is dense_vector in Elasticsearch?
dense_vector is a field type that stores numerical vector embeddings for similarity search.
What is multilingual-e5-small?
A HuggingFace embedding model that generates 384-dimensional sentence embeddings.
What similarity metric should I use?
Cosine similarity is recommended for E5 models.
Conclusion
We now have a production-ready semantic search pipeline built entirely inside Elasticsearch.