Build an AI-Powered Document Search App (RAG) with Spring AI, Ollama & ChromaDB Build Now!
  1. Home
  2. Tech Blogs
  3. Build an AI-Powered Document Search App (RAG) with Spring AI, Ollama & ChromaDB

Build an AI-Powered Document Search App (RAG) with Spring AI, Ollama & ChromaDB

Build an AI-Powered Document Search App (RAG) with Spring AI, Ollama & ChromaDB thumbnail

Let's build our own AI-powered document search application. In this guide, we'll go from zero to a fully working RAG (Retrieval-Augmented Generation)-based document search system using Spring AI, Ollama, and ChromaDB.

1. Project Setup

Add the required Maven dependencies for building a document search app using Spring AI, Ollama, and ChromaDB. We are using Spring Boot version 4.0.5


<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-chroma</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
 

2. Environment Setup

Ollama Setup

Ollama is a lightweight tool that lets you run and manage large language models (LLMs) locally on your machine, enabling intelligent query understanding and response generation for our document search application.

Install and run Ollama locally. You can pull the required models like:

ollama pull llama3

You can refer my previous article about setting it up locally in a greater detail. I have it running locally now for this demo.

ollama-local-setup-windows

ChromaDB Setup

ChromaDB is an open-source vector database designed to store and search embeddings efficiently. It powers our document search application by retrieving the most relevant document chunks based on semantic similarity.

Setting up ChromaDB in a local environment is very striaght forward and you can follow this official docs.

Start ChromaDB locally:

chroma run --host 0.0.0.0 --port 8000 --path C:\D\soft\chroma
chroma-local-setup-windows

Verify if ChromaDB is running:

curl http://127.0.0.1:8000/api/v2/heartbeat
{
  "nanosecond heartbeat": 1775841192193573200
}

3. Spring AI ChromaDB Configuration

Spring AI provides Spring Boot auto-configuration for the Chroma Vector Store. This acts as the core storage layer for our document search system, enabling fast semantic retrieval of indexed documents.


spring:
  ai:
    vectorstore:
      chroma:
        client:
          host: http://127.0.0.1
          port: 8000
        initialize-schema: true
        collection-name: knowledge_store

With this configuration, we can directly autowire VectorStore in our document ingestion and search pipeline.

4. Spring AI Ollama Configuration

Spring AI provides Spring Boot auto-configuration for Ollama to handle embeddings and natural language queries for semantic document search.


spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: llama3.2
          temperature: 0.7
      embedding:
        options:
          model: nomic-embed-text
      init:
        pull-model-strategy: never
        timeout: 60s

Tip: nomic-embed-text is great for high-quality embeddings.

With this configuration, Spring AI calls EmbeddingModel under the hood for generating embeddings while indexing and searching documents.

5. Document Ingestion

This is a critical step in building a document search app. In this step, raw documents are converted into searchable chunks, embedded, and stored in ChromaDB.

  • Accept raw documents
  • Split into searchable chunks
  • Convert chunks -> embeddings
  • Store in vector DB

File Ingestion

This method processes an uploaded file by detecting its type (PDF or others), extracting its content, and preparing it for indexing in the document search system.


public int ingestFile(MultipartFile file) {

String filename = file.getOriginalFilename() != null
        ? file.getOriginalFilename() : "unknown";
String contentType = file.getContentType() != null
        ? file.getContentType() : "application/octet-stream";
log.info("Ingesting file '{}' , size={} bytes", filename, file.getSize());

try {
    Resource resource = toResource(file, file.getOriginalFilename());
    List<Document> documents;

    if (isPdf(filename, contentType)) {
        documents = readPdf(resource);
    } else {
        documents = readWithTika(resource);
    }
    return splitAndStore(documents, filename);

} catch (IOException ex) {
    throw new RuntimeException("Failed to read file: " + filename, ex);
}

} 

Ingesting PDF Files

This method reads PDF content using spring-ai-pdf-document-reader and converts it into structured documents ready for indexing and search.


private List<Document> readPdf(Resource resource) {
    var config = PdfDocumentReaderConfig.builder()
            .withPageExtractedTextFormatter(
                    ExtractedTextFormatter.builder()
                            .withNumberOfBottomTextLinesToDelete(3)  // strip footers
                            .withNumberOfTopPagesToSkipBeforeDelete(1)
                            .build())
            .withPagesPerDocument(1)
            .build();

    return new PagePdfDocumentReader(resource, config).get();
}

Key TokenTextSplitter Configuration

This method splits documents into smaller chunks using a token-based splitter and enriches them with metadata to improve search relevance.


private int splitAndStore(List<Document> documents, String source) {
    documents.forEach(doc -> doc.getMetadata().putAll(buildMetadata(source)));

    var splitter = TokenTextSplitter.builder()
            .withChunkSize(CHUNK_SIZE)
            .withMinChunkSizeChars(MIN_CHUNK_SIZE_CHARS)	
            .withMinChunkLengthToEmbed(5)
            .withMaxNumChunks(10_000)
            .withKeepSeparator(true)
            .build();

    List<Document> chunks = splitter.apply(documents);
    log.info("Split '{}' into {} chunks (chunkSize={}, minChunkSizeChars={})",
            source, chunks.size(), CHUNK_SIZE, MIN_CHUNK_SIZE_CHARS);

    // Embed and store - Spring AI calls EmbeddingModel under the hood
    vectorStore.add(chunks);

    int count = chunks.size();
    totalChunks.addAndGet(count);

    log.info("Stored {} chunks from '{}'. Total in store: {}", count, source, totalChunks.get());
    return count;
}
  • withChunkSize - 300-500 tokens recommended
  • withMinChunkSizeChars - Avoid tiny fragments
  • withMinChunkLengthToEmbed - Skip noisy chunks
  • withMaxNumChunks - Safety limit
  • withKeepSeparator - Preserve readability
  • withChunkOverlap - 30-50 tokens overlap (important)

Ingesting Sample Documents

We are now ready to index documents for our document search system. Start the application and make the following HTTP request to upload a document. This will generate embeddings and store them in ChromaDB.


curl --location 'http://localhost:8080/api/documents/upload' \
--form 'file=@sample-docs/spring-ai-overview.txt'

With document ingestion complete, our searchable document index is now ready. Let's move ahead and implement the query and retrieval system.

6. Document Search System

Search Client Configuration

Configure Spring AI client for processing user queries and interacting with Ollama.


@Bean
public ChatClient chatClient(ChatModel chatModel) {
    return ChatClient.builder(chatModel)
            .defaultSystem(SYSTEM_PROMPT)
            .build();
}

Search Memory

This configuration maintains recent interactions to improve query understanding and search relevance.


@Bean
public ChatMemory chatMemory() {
    return MessageWindowChatMemory.builder()
            .chatMemoryRepository(new InMemoryChatMemoryRepository())
            .maxMessages(20)
            .build();
}

Search Service

This service handles user queries, performs semantic search on indexed documents, and streams relevant results.


public SseEmitter chatStream(ChatRequest request) {
    SseEmitter emitter = new SseEmitter(120_000L);

Flux<String> tokenStream = chatClient.prompt()
        .user(request.getMessage())
        .advisors(buildAdvisors(request.getConversationId()))
        .stream()
        .content();

} 

Spring AI Advisors

This method builds a list of advisors by combining query context with a RAG-based retrieval mechanism to fetch relevant document chunks.


private List<Advisor> buildAdvisors(String conversationId) {
    List<Advisor> advisors = new ArrayList<>();

    advisors.add(MessageChatMemoryAdvisor.builder(chatMemory)
            .conversationId(conversationId)
            .build());

    advisors.add(QuestionAnswerAdvisor.builder(vectorStore)
            .searchRequest(ragSearchRequest())
            .build());

    return advisors;
}

7. Searching Documents

These endpoints allow users to search documents using natural language queries, supporting both streaming (real-time responses via SSE) and standard synchronous responses.


@PostMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public SseEmitter chatStream(@RequestBody ChatRequest request) {
    log.info("Streaming chat request [conversationId={}]", request.getConversationId());
    return chatService.chatStream(request);
}

@PostMapping
public ResponseEntity<ChatResponse> chat(@RequestBody ChatRequest request) {
    log.info("Chat request [conversationId={}]", request.getConversationId());
    return ResponseEntity.ok(chatService.chat(request));
}

The overall flow looks like this: User Query → Embedding → Vector Search → Relevant Document Chunks → Ollama → Final Response

stream-text-spring-ai-postman

Conclusion

We've successfully built an AI-powered document search application (RAG) using Spring AI, Ollama, and ChromaDB. This architecture is scalable, efficient, and perfect for building intelligent search experiences over your documents.

Support Us!

Buying me a coffee helps keep the project running and supports new features.

cards
Powered by paypal

Thank you for helping this blog thrive!

About The Author

author-image
I write about cryptography, web security, and secure software development. Creator of practical crypto validation tools at Devglan.

Further Reading on Spring AI