Semantic Caching with Redis, LSH & Elasticsearch in Spring Boot

Semantic Caching with Redis, LSH & Elasticsearch in Spring Boot thumbnail

Modern AI search systems rely heavily on semantic search powered by embeddings. Instead of matching keywords, semantic search understands context and meaning.

However, semantic search systems have one big drawback - they are computationally expensive. Each query usually requires:

  • Embedding generation
  • Vector similarity search
  • Ranking and filtering

If your application receives thousands of similar queries, running the full pipeline repeatedly becomes inefficient.

This is where semantic caching becomes extremely powerful.

Instead of recomputing results, we cache previous results based on semantic similarity.

In this article we will build a production-grade semantic caching architecture using:

  • Local cache (Caffeine)
  • Redis LSH cache
  • Redis Vector cache
  • Elasticsearch semantic search

If you are new to Elasticsearch vector search, you may want to read these first:


Why Semantic Caching?

Imagine users searching for:

  • "how to fix java memory leak"
  • "java memory leak troubleshooting"
  • "debug memory leak in java"

All three queries are semantically similar. Running the full semantic search pipeline every time is wasteful.

Semantic caching allows us to reuse results from similar queries.


Production Architecture

Below is the optimized architecture used in this implementation.

semantic-caching-architecture-diagram

Why This Architecture Works Well

  • Local Cache handles repeated queries instantly
  • LSH cache detects semantically similar queries quickly
  • Vector cache performs approximate vector similarity search
  • Elasticsearch acts as the final semantic search engine

Request Flow

semantic-caching-request-flow

This layered architecture drastically reduces the load on Elasticsearch.


Project Structure

spring-boot-caching-project-structure

Maven Dependencies

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>5.1.0</version>
</dependency>

<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
    <version>3.1.8</version>
</dependency>

<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>9.2.5</version>
</dependency>

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
    <version>9.2.5</version>
</dependency>

Configuration

RedisConfig.java

Creates the Redis client connection used across the application.

    @Configuration
    public class RedisConfig {

    @Autowired
    private RedisProperties redisProperties;

    @Bean
    public JedisPool jedisPool() {
    JedisPoolConfig config = new JedisPoolConfig();
        config.setMaxTotal(200);
        config.setMaxIdle(50);
        config.setMinIdle(10);
        config.setTestOnBorrow(true);
        config.setTestWhileIdle(true);
    return new JedisPool(config,redisProperties.getHost(),redisProperties.getPort());
    }

    @Bean
    public UnifiedJedis jedis() {
    return new UnifiedJedis("redis://localhost:6379");
    }

}

CacheConfig.java

Configures the local Caffeine cache.

    @Configuration
    public class CacheConfig {

    @Bean
    public Cache<String,String> localCache(){

    return Caffeine.newBuilder()
                .maximumSize(10000)
                .expireAfterWrite(Duration.ofMinutes(30))
                .build();
    }
}


Embedding Service

The embedding service converts text queries into vector embeddings using Elasticsearch inference API.


    public float[] generate(String text) {

        List<Map<String, JsonData>> docs = new ArrayList<>();
    docs.add(Map.of("text_field", JsonData.of(text)));
    ...
    ...

    float[] arr = new float[vector.size()];

    for (int i = 0; i < vector.size(); i++) {
                arr[i] = vector.get(i).floatValue();
            }

    return arr;
}

LSH Hash Service

Locality Sensitive Hashing generates a hash that groups similar vectors together.

    @Service
    public class LshHashService {

    public long hash(String text) {
        byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
        long hash = 1125899906842597L;
        for (byte b : bytes)
            hash = 31 * hash + b;

        return hash;
    }
}

This allows fast approximate matching of semantically similar queries.


Local Cache Layer

The fastest cache layer using Caffeine.

If an identical query was executed recently, the response is returned instantly.

@Service
public class LocalCacheService {

    @Autowired
    private Cache<String,String> cache;

    public String get(String key){
        return cache.getIfPresent(key);
    }

    public void put(String key,String value){
        cache.put(key,value);
    }

}

Redis LSH Cache

This cache stores mappings from LSH hash -> cached results.

If a new query maps to an existing hash bucket, we can reuse results.


Redis Vector Cache

If LSH cache misses, we run a vector similarity search in Redis.

@Service
public class VectorCacheService {

    @Autowired
    private JedisPool jedisPool;

    @Autowired
    private UnifiedJedis jedis;

    public String search(float[] vector) {
        byte[] vec = VectorUtil.toBytes(vector);
        Query q = new Query("*=>[KNN 1 @vector $vec AS score]")
                .addParam("vec", vec)
                .setSortBy("score", true)
                .returnFields("response", "score")
                .dialect(2);
        SearchResult result = jedis.ftSearch("semantic_cache_idx", q);
        if (result.getTotalResults() == 0)
            return null;

        Document doc = result.getDocuments().get(0);
    return (String) doc.get("response");
    }

    public void store(String query, float[] vector, String response) {
        String key = "cache:" + UUID.randomUUID();
        try (Jedis jedis = jedisPool.getResource()) {
            Map<String, String> map = new HashMap<>();
            map.put("query", query);
            map.put("response", response);
            jedis.hset(key, map);
            jedis.hset(
                    key.getBytes(),
                "vector".getBytes(),
                    VectorUtil.toBytes(vector)
            );
        }
    }
}

Redis uses HNSW index for fast nearest neighbor search.


Elasticsearch Semantic Search

If no cache layers return results, we execute the full semantic search in Elasticsearch. This is the same query that we built in our last article.

SearchResponse<Map> response =
client.search(s -> s
.index("documents")
.knn(k -> k
    .field("content_vector")
.queryVector(vector)
.k(5)
.numCandidates(50)
)
);

Hybrid Semantic Search Service

This service orchestrates the entire pipeline.

@Service
public class HybridSemanticSearchService {

    private LocalCacheService localCache;

    private LshCacheService lshCache;

    private EmbeddingCacheService embeddingCache;

    private VectorCacheService vectorCache;

    private EmbeddingService embeddingService;

    private ElasticsearchService esSearch;

    private QueryNormalizer normalizer;

    private LshHashService hashService;

    public String search(String query) throws Exception {

        String normalized = normalizer.normalize(query);

        String result = localCache.get(normalized);

        if (Objects.nonNull(result)) {
            return result;
        }

        long hash = hashService.hash(normalized);

        result = lshCache.get(hash);

        if (Objects.nonNull(result)) {
            localCache.put(normalized, result);
            return result;
        }

        float[] vector = embeddingCache.get(normalized);

        if (Objects.isNull(vector)) {
            vector = embeddingService.generate(query);
            embeddingCache.store(normalized, vector);
        }

        result = vectorCache.search(vector);

        if (Objects.nonNull(result)) {
            localCache.put(normalized, result);
            lshCache.store(hash, result);
            return result;
        }

        result = esSearch.search(query, vector);

        if (Objects.nonNull(result)) {
            localCache.put(normalized, result);
            lshCache.store(hash, result);
            vectorCache.store(normalized, vector, result);
        }

        return result;
    }

}

Performance Improvements

Search Type Latency
Direct Elasticsearch 120-300 ms
Redis Vector Cache 10-20 ms
LSH Cache 2-5 ms
Local Cache <1 ms

With layered caching, most queries are served within a few milliseconds.


Conclusion

Semantic caching dramatically improves the scalability of AI search systems.

By combining:

  • Local cache
  • LSH semantic grouping
  • Redis vector similarity search
  • Elasticsearch semantic search

We get a highly optimized architecture capable of handling large query volumes with extremely low latency.

This architecture is similar to what many large-scale AI search platforms use internally.

Support Us!

Buying me a coffee helps keep the project running and supports new features.

cards
Powered by paypal

Thank you for helping this blog thrive!

About The Author

author-image
I write about cryptography, web security, and secure software development. Creator of practical crypto validation tools at Devglan.

Further Reading on Spring Boot