Build a Spring AI RAG Application with PGVector and Ollama

By Dhiraj Ray 07 May, 2026

Retrieval-Augmented Generation (RAG) is becoming one of the most practical approaches for building AI-powered applications on top of private enterprise knowledge.

Instead of relying only on the pretrained knowledge of an LLM, a RAG system retrieves relevant contextual information from your own documents and injects it into the prompt before generating the final response.

In this tutorial, we will build a complete Spring AI RAG application using:

Spring Boot 4
Spring AI
PGVector
PostgreSQL
Ollama
PDF ingestion
Semantic vector search

This architecture is extremely useful for building AI-powered search engines, knowledge bases, documentation assistants and SEO-focused content indexing systems.

In our previous article, we built a similar RAG application using ChromaDB:

Build a Spring AI RAG App with ChromaDB

Once this retrieval layer is ready, you can extend the same implementation to build a conversational chatbot using Spring AI:

Build an AI Chat Application using Spring AI

You can also add Redis-based caching on top of this RAG pipeline for improving response time and reducing repeated vector retrieval operations:

Add Redis Caching to Spring AI Applications

Local AI Infrastructure Setup

Before building the RAG pipeline, we need a local AI infrastructure for:

Running PostgreSQL with PGVector
Managing the database using pgAdmin
Running local LLMs and embedding models using Ollama

We will use Docker Compose to provision the entire stack locally.


services:

  # PostgreSQL + pgvector
  postgres:
    image: pgvector/pgvector:pg16

    container_name: spring-ai-rag-postgres

    environment:
      POSTGRES_DB: spring_ai_rag
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres

    ports:
      - "5432:5432"

    command: >
      postgres
      -c shared_buffers=256MB
      -c max_connections=200

    volumes:
      - postgres_data:/var/lib/postgresql/data

    restart: unless-stopped

    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d spring_ai_rag"]
      interval: 10s
      timeout: 5s
      retries: 5


  #  pgAdmin 
  pgadmin:
    image: dpage/pgadmin4:latest

    container_name: spring-ai-rag-pgadmin

    environment:
      PGADMIN_DEFAULT_EMAIL: admin@local.dev
      PGADMIN_DEFAULT_PASSWORD: admin

    ports:
      - "5050:80"

    depends_on:
      postgres:
        condition: service_healthy

    volumes:
      - pgadmin_data:/var/lib/pgadmin

    restart: unless-stopped


  #  Ollama
  ollama:
    image: ollama/ollama:latest

    container_name: spring-ai-rag-ollama

    ports:
      - "11434:11434"

    volumes:
      - ollama_data:/root/.ollama

    restart: unless-stopped


volumes:
  postgres_data:
  pgadmin_data:
  ollama_data:

Start the infrastructure using:


docker compose up -d

This setup gives us:

PostgreSQL + PGVector for vector storage and semantic search
pgAdmin for database management
Ollama for running local chat and embedding models

docker-desktop-running-ollama-pgvector-pgadmin-locally

Running the entire stack locally is extremely useful for building private, offline and cost-effective AI applications without depending on external AI APIs.

Project Dependencies

We are using Spring Boot 4 with Spring AI 2.0.0-M5.


<properties>
    <java.version>17</java.version>
    <spring-ai.version>2.0.0-M5</spring-ai.version>
</properties>

The following dependencies are required for:

Ollama integration
PGVector vector store
PDF ingestion
Apache Tika document parsing


<dependencies>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-webmvc</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-validation</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pdf-document-reader</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-ollama</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-tika-document-reader</artifactId>
    </dependency>

    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
    </dependency>

</dependencies>

Setting up PGVector using PostgreSQL

Instead of using a separate vector database, we will use PostgreSQL with the pgvector extension.

PGVector allows PostgreSQL to store embeddings and perform similarity search directly inside the database.

Start PostgreSQL and Ollama using Docker Compose


services:

  postgres:
    image: pgvector/pgvector:pg16

    container_name: spring-ai-rag-postgres

    environment:
      POSTGRES_DB: spring_ai_rag
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres

    ports:
      - "5432:5432"

  ollama:
    image: ollama/ollama:latest

    container_name: spring-ai-rag-ollama

    ports:
      - "11434:11434"

Enable pgvector Extension

After PostgreSQL starts, enable the vector extension manually.


CREATE EXTENSION IF NOT EXISTS vector;

Create Vector Table

We are using a 768-dimensional embedding model.


CREATE TABLE document_embedding (
    id UUID PRIMARY KEY,
    content TEXT,
    metadata JSON,
    embedding VECTOR(768)
);

Create Vector Index

HNSW indexing significantly improves vector similarity search performance.


CREATE INDEX document_embedding_embedding_idx
ON document_embedding
USING hnsw (embedding vector_cosine_ops);

Pull Ollama Models

Since Ollama is running inside Docker, we need to pull the embedding and chat models.


docker exec -it spring-ai-rag-ollama ollama pull nomic-embed-text


docker exec -it spring-ai-rag-ollama ollama pull llama3.2

Spring AI Configuration

The following configuration connects Spring AI with PostgreSQL, PGVector and Ollama.


spring:
  application:
    name: spring-ai-rag

  datasource:
    url: jdbc:postgresql://localhost:5432/spring_ai_rag
    username: postgres
    password: postgres

  ai:
    vectorstore:
      pgvector:
        initialize-schema: false
        table-name: document_embedding
        dimensions: 768

    ollama:
      base-url: http://localhost:11434

      embedding:
        options:
          model: nomic-embed-text

      chat:
        options:
          model: llama3.2
          temperature: 0.7
          num-ctx: 8192

Here:

nomic-embed-text generates vector embeddings
llama3.2 generates conversational responses
document_embedding stores vectorized chunks

Building the ChatClient Configuration

We configure a reusable Spring AI ChatClient with a strict system prompt.

This is important for reducing hallucinations and forcing the LLM to answer only from retrieved context.


@Configuration
@RequiredArgsConstructor
public class ChatClientConfig {

    private static final String SYSTEM_PROMPT = """
            You are a helpful AI assistant.
            Answer the user's question ONLY using the provided context.
            If the answer is not present in the context, say:
            "I could not find relevant information."
            Do not make up or assume information.
            """;

    @Bean
    public ChatClient chatClient(ChatModel chatModel) {
        return ChatClient.builder(chatModel)
                .defaultSystem(SYSTEM_PROMPT)
                .build();
    }

}

Building the Document Ingestion Pipeline

The ingestion layer is responsible for:

Reading PDF files
Extracting text
Chunking content
Generating embeddings
Storing vectors in PGVector

DocumentIngestionService


@Slf4j
@Service
@RequiredArgsConstructor
public class DocumentIngestionService {

    private static final int CHUNK_SIZE = 500;
    private static final int MIN_CHUNK_SIZE_CHARS = 350;
    private static final long MAX_FILE_SIZE = 10 * 1024 * 1024;

    private final VectorStore vectorStore;

    public int ingestFile(MultipartFile file) {
        validateFile(file);

        String filename = Optional.ofNullable(file.getOriginalFilename())
                .orElse("unknown");

        String contentType = Optional.ofNullable(file.getContentType())
                .orElse("application/octet-stream");

        long fileSize = file.getSize();

        String documentId = UUID.randomUUID().toString();

        log.info("Starting document ingestion: filename={} contentType={} size={} bytes documentId={}"
                , filename, contentType, fileSize, documentId);

        try {
            Resource resource = toResource(file, filename);

            List<Document> documents = isPdf(filename, contentType)
                    ? readPdf(resource)
                    : readWithTika(resource);

            enrichMetadata(documents, documentId, filename, contentType, fileSize);

            return splitAndStore(documents, filename);

        } catch (IOException ex) {
            log.error("Failed to ingest file '{}'", filename, ex);

            throw RagAppException.internalError(
                    "Failed to ingest file: " + filename,
                    ex
            );
        }
    }
}

PDF Reader Configuration

We use Spring AI PDF reader with page-level extraction.


private List<Document> readPdf(Resource resource) {

    var config = PdfDocumentReaderConfig.builder()
            .withPageExtractedTextFormatter(
                    ExtractedTextFormatter.builder()
                            .withNumberOfBottomTextLinesToDelete(3)
                            .withNumberOfTopPagesToSkipBeforeDelete(1)
                            .build())
            .withPagesPerDocument(1)
            .build();

    return new PagePdfDocumentReader(resource, config).get();
}

This helps preserve page-level context while performing vector retrieval.

Chunking Documents

Large documents are split into semantic chunks before generating embeddings.


var splitter = TokenTextSplitter.builder()
        .withChunkSize(CHUNK_SIZE)
        .withMinChunkSizeChars(MIN_CHUNK_SIZE_CHARS)
        .withMinChunkLengthToEmbed(5)
        .withMaxNumChunks(10_000)
        .withKeepSeparator(true)
        .build();

Chunking is one of the most important parts of any RAG system because it directly impacts retrieval quality.

Store Embeddings in PGVector


vectorStore.add(chunks);

Spring AI automatically:

Generates embeddings using Ollama
Stores vectors inside PostgreSQL
Persists metadata alongside vectors

Document Ingestion API

We expose a multipart upload endpoint for ingesting PDF files.


@Slf4j
@RestController
@RequestMapping("/api/documents")
@RequiredArgsConstructor
public class DocumentController {

    private final DocumentIngestionService ingestionService;

    @PostMapping(value = "/ingest", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
    public ResponseEntity<DocIngestionResponse> ingestDocument(@RequestParam("file") @NotNull MultipartFile file) {

        int chunks = ingestionService.ingestFile(file);

        return ResponseEntity.ok(DocIngestionResponse.builder()
                .source(file.getOriginalFilename())
                .chunksStored(chunks)
                .message("Successfully ingested %d chunks from '%s'"
                        .formatted(chunks, file.getOriginalFilename()))
                .build());
    }
}

Building the Retrieval Layer

The RAG retrieval pipeline performs:

Similarity search in PGVector
Retrieval of relevant chunks
Prompt augmentation
LLM response generation

RAG Service


@Service
@RequiredArgsConstructor
@Slf4j
public class RagService {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    public String ask(String query) {

        log.info("Processing RAG query: {}", query);

        List<Document> documents = vectorStore.similaritySearch(
                SearchRequest.builder()
                        .query(query)
                        .topK(5)
                        .build()
        );

        if (documents == null || documents.isEmpty()) {
            return "No relevant information found.";
        }

        String context = documents.stream()
                .map(doc -> """
                        Source: %s
                        Content:
                        %s
                        """.formatted(
                        doc.getMetadata().get("source"),
                        doc.getText()
                ))
                .collect(Collectors.joining("\n\n"));

        String userPrompt = buildUserPrompt(context, query);

        return chatClient.prompt()
                .user(userPrompt)
                .call()
                .content();
    }
}

Prompt Template

We externalize the user prompt using Spring AI PromptTemplate.


private String buildUserPrompt(String context, String query) {

    PromptTemplate template = new PromptTemplate(
            new ClassPathResource("prompts/rag-user-prompt.st")
    );

    return template.render(Map.of(
            "context", context,
            "query", query
    ));
}

This keeps prompts maintainable and reusable.

RAG Chat API

We expose a query endpoint for performing semantic retrieval.


@Slf4j
@RequiredArgsConstructor
@RestController
@RequestMapping("/api/rag")
public class ChatController {

    private final RagService ragService;

    @GetMapping("/ask")
    public String ask(@RequestParam String query) {
        return ragService.ask(query);
    }

}

Testing the RAG Application

Step 1: Ingest PDF


POST /api/documents/ingest

Upload a PDF file using Postman.

Step 2: Query the System


GET /api/rag/ask?query=What is Spring AI?

Spring AI will:

Generate query embeddings
Search PGVector using cosine similarity
Retrieve relevant chunks
Inject context into prompt
Generate grounded answers using Ollama

Why Use PGVector for RAG?

PGVector is becoming increasingly popular because it allows teams to build vector search systems directly on top of PostgreSQL.

This removes the need for maintaining a separate vector database while still providing:

Semantic similarity search
Vector indexing
Metadata filtering
Transactional consistency
Operational simplicity

For SEO-focused AI systems, semantic retrieval enables:

Improved contextual indexing
Knowledge graph style retrieval
Better long-tail query matching
Context-aware AI answer generation
Content enrichment pipelines

Conclusion

In this tutorial, we built a complete Retrieval-Augmented Generation (RAG) application using Spring AI, PostgreSQL, PGVector and Ollama.

We implemented:

PDF ingestion
Document chunking
Embedding generation
PGVector similarity search
Prompt augmentation
Spring AI ChatClient integration

This architecture forms the foundation for building:

AI chatbots
Enterprise search engines
Knowledge assistants
Semantic indexing platforms
Context-aware AI applications

The complete source code can be found at Github here.

In upcoming tutorials, you can extend this implementation further by adding:

Chat memory
Streaming responses
Redis caching
Metadata filtering
Hybrid search
Conversational RAG

Support Us!

Buying me a coffee helps keep the project running and supports new features.

Thank you for helping this blog thrive!

I write about cryptography, web security, and secure software development. Creator of practical crypto validation tools at Devglan.