Let's build our own AI-powered document search application. In this guide, we'll go from zero to a fully working RAG (Retrieval-Augmented Generation)-based document search system using Spring AI, Ollama, and ChromaDB.
1. Project Setup
Add the required Maven dependencies for building a document search app using Spring AI, Ollama, and ChromaDB. We are using Spring Boot version 4.0.5
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-chat-memory</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-vector-store-chroma</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
2. Environment Setup
Ollama Setup
Ollama is a lightweight tool that lets you run and manage large language models (LLMs) locally on your machine, enabling intelligent query understanding and response generation for our document search application.
Install and run Ollama locally. You can pull the required models like:
ollama pull llama3
You can refer my previous article about setting it up locally in a greater detail. I have it running locally now for this demo.
ChromaDB Setup
ChromaDB is an open-source vector database designed to store and search embeddings efficiently. It powers our document search application by retrieving the most relevant document chunks based on semantic similarity.
Setting up ChromaDB in a local environment is very striaght forward and you can follow this official docs.
Start ChromaDB locally:
chroma run --host 0.0.0.0 --port 8000 --path C:\D\soft\chroma
Verify if ChromaDB is running:
curl http://127.0.0.1:8000/api/v2/heartbeat
{
"nanosecond heartbeat": 1775841192193573200
}
3. Spring AI ChromaDB Configuration
Spring AI provides Spring Boot auto-configuration for the Chroma Vector Store. This acts as the core storage layer for our document search system, enabling fast semantic retrieval of indexed documents.
spring:
ai:
vectorstore:
chroma:
client:
host: http://127.0.0.1
port: 8000
initialize-schema: true
collection-name: knowledge_store
With this configuration, we can directly autowire VectorStore in our document ingestion and search pipeline.
4. Spring AI Ollama Configuration
Spring AI provides Spring Boot auto-configuration for Ollama to handle embeddings and natural language queries for semantic document search.
spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: llama3.2
temperature: 0.7
embedding:
options:
model: nomic-embed-text
init:
pull-model-strategy: never
timeout: 60s
Tip: nomic-embed-text is great for high-quality embeddings.
With this configuration, Spring AI calls EmbeddingModel under the hood for generating embeddings while indexing and searching documents.
5. Document Ingestion
This is a critical step in building a document search app. In this step, raw documents are converted into searchable chunks, embedded, and stored in ChromaDB.
- Accept raw documents
- Split into searchable chunks
- Convert chunks -> embeddings
- Store in vector DB
File Ingestion
This method processes an uploaded file by detecting its type (PDF or others), extracting its content, and preparing it for indexing in the document search system.
public int ingestFile(MultipartFile file) {
String filename = file.getOriginalFilename() != null
? file.getOriginalFilename() : "unknown";
String contentType = file.getContentType() != null
? file.getContentType() : "application/octet-stream";
log.info("Ingesting file '{}' , size={} bytes", filename, file.getSize());
try {
Resource resource = toResource(file, file.getOriginalFilename());
List<Document> documents;
if (isPdf(filename, contentType)) {
documents = readPdf(resource);
} else {
documents = readWithTika(resource);
}
return splitAndStore(documents, filename);
} catch (IOException ex) {
throw new RuntimeException("Failed to read file: " + filename, ex);
}
}
Ingesting PDF Files
This method reads PDF content using spring-ai-pdf-document-reader and converts it into structured documents ready for indexing and search.
private List<Document> readPdf(Resource resource) {
var config = PdfDocumentReaderConfig.builder()
.withPageExtractedTextFormatter(
ExtractedTextFormatter.builder()
.withNumberOfBottomTextLinesToDelete(3) // strip footers
.withNumberOfTopPagesToSkipBeforeDelete(1)
.build())
.withPagesPerDocument(1)
.build();
return new PagePdfDocumentReader(resource, config).get();
}
Key TokenTextSplitter Configuration
This method splits documents into smaller chunks using a token-based splitter and enriches them with metadata to improve search relevance.
private int splitAndStore(List<Document> documents, String source) {
documents.forEach(doc -> doc.getMetadata().putAll(buildMetadata(source)));
var splitter = TokenTextSplitter.builder()
.withChunkSize(CHUNK_SIZE)
.withMinChunkSizeChars(MIN_CHUNK_SIZE_CHARS)
.withMinChunkLengthToEmbed(5)
.withMaxNumChunks(10_000)
.withKeepSeparator(true)
.build();
List<Document> chunks = splitter.apply(documents);
log.info("Split '{}' into {} chunks (chunkSize={}, minChunkSizeChars={})",
source, chunks.size(), CHUNK_SIZE, MIN_CHUNK_SIZE_CHARS);
// Embed and store - Spring AI calls EmbeddingModel under the hood
vectorStore.add(chunks);
int count = chunks.size();
totalChunks.addAndGet(count);
log.info("Stored {} chunks from '{}'. Total in store: {}", count, source, totalChunks.get());
return count;
}
- withChunkSize - 300-500 tokens recommended
- withMinChunkSizeChars - Avoid tiny fragments
- withMinChunkLengthToEmbed - Skip noisy chunks
- withMaxNumChunks - Safety limit
- withKeepSeparator - Preserve readability
- withChunkOverlap - 30-50 tokens overlap (important)
Ingesting Sample Documents
We are now ready to index documents for our document search system. Start the application and make the following HTTP request to upload a document. This will generate embeddings and store them in ChromaDB.
curl --location 'http://localhost:8080/api/documents/upload' \
--form 'file=@sample-docs/spring-ai-overview.txt'
With document ingestion complete, our searchable document index is now ready. Let's move ahead and implement the query and retrieval system.
6. Document Search System
Search Client Configuration
Configure Spring AI client for processing user queries and interacting with Ollama.
@Bean
public ChatClient chatClient(ChatModel chatModel) {
return ChatClient.builder(chatModel)
.defaultSystem(SYSTEM_PROMPT)
.build();
}
Search Memory
This configuration maintains recent interactions to improve query understanding and search relevance.
@Bean
public ChatMemory chatMemory() {
return MessageWindowChatMemory.builder()
.chatMemoryRepository(new InMemoryChatMemoryRepository())
.maxMessages(20)
.build();
}
Search Service
This service handles user queries, performs semantic search on indexed documents, and streams relevant results.
public SseEmitter chatStream(ChatRequest request) {
SseEmitter emitter = new SseEmitter(120_000L);
Flux<String> tokenStream = chatClient.prompt()
.user(request.getMessage())
.advisors(buildAdvisors(request.getConversationId()))
.stream()
.content();
}
Spring AI Advisors
This method builds a list of advisors by combining query context with a RAG-based retrieval mechanism to fetch relevant document chunks.
private List<Advisor> buildAdvisors(String conversationId) {
List<Advisor> advisors = new ArrayList<>();
advisors.add(MessageChatMemoryAdvisor.builder(chatMemory)
.conversationId(conversationId)
.build());
advisors.add(QuestionAnswerAdvisor.builder(vectorStore)
.searchRequest(ragSearchRequest())
.build());
return advisors;
}
7. Searching Documents
These endpoints allow users to search documents using natural language queries, supporting both streaming (real-time responses via SSE) and standard synchronous responses.
@PostMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter chatStream(@RequestBody ChatRequest request) {
log.info("Streaming chat request [conversationId={}]", request.getConversationId());
return chatService.chatStream(request);
}
@PostMapping
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest request) {
log.info("Chat request [conversationId={}]", request.getConversationId());
return ResponseEntity.ok(chatService.chat(request));
}
The overall flow looks like this: User Query → Embedding → Vector Search → Relevant Document Chunks → Ollama → Final Response
Conclusion
We've successfully built an AI-powered document search application (RAG) using Spring AI, Ollama, and ChromaDB. This architecture is scalable, efficient, and perfect for building intelligent search experiences over your documents.