Database

Redis Vector Database: A Comprehensive Guide

A vector database efficiently stores, manages, and queries vector embeddings, capturing the semantic information of unstructured data like text and images. Redis, through Redis Stack, serves as a powerful vector database, enabling advanced similarity searches and machine learning applications, making it ideal for semantic search engines, recommendation systems, and more.

Introduction

Redis, traditionally known as a key-value store, has evolved to accommodate more complex data structures and operations. This guide covers how Redis can be used as a vector database, enabling efficient handling of unstructured data and performing vector similarity searches.

Redis Vector Database

Quick Start Guide

Understanding Vector Databases

Vector databases handle unstructured data, which lacks a predefined schema. This data can be text, images, videos, or music. Vectorizing this data involves mapping it to a sequence of numbers, representing the data in an N-dimensional space. Machine learning models create embeddings that capture complex patterns and semantic meanings, making this method widely accepted.

Redis Vector Database

Setting Up a Vector Database

To use Redis as a vector database, follow these steps:

  1. Set Up Redis Stack
  • Create a free account on Redis Cloud.
  • Follow instructions to create a database with all Redis Stack features.
  • Alternatively, install Redis Stack locally using installation guides.
  • Ensure your Redis server has JSON and search and query features enabled.
  1. Install Required Python Packages
  • Set up a virtual environment.
  • Install the following packages:
    python pip install redis pandas sentence-transformers tabulate
  1. Connect to Redis
  • Instantiate the Redis client in Python:
    python import redis client = redis.Redis(host="localhost", port=6379, decode_responses=True)

Creating and Storing Vector Embeddings

  1. Fetch and Store Demo Data
  • Fetch the dataset:
    python import requests url = "https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started/main/data/bikes.json" response = requests.get(url) bikes = response.json()
  • Store data in Redis using JSON.SET:
    python pipeline = client.pipeline() for i, bike in enumerate(bikes, start=1): redis_key = f"bikes:{i:03}" pipeline.json().set(redis_key, "$", bike) pipeline.execute()
  1. Create Vector Embeddings
  • Use a pre-trained model to generate embeddings:
    python from sentence_transformers import SentenceTransformer embedder = SentenceTransformer('msmarco-distilbert-base-v4') descriptions = [bike['description'] for bike in bikes] embeddings = embedder.encode(descriptions).astype(np.float32).tolist() VECTOR_DIMENSION = len(embeddings[0])
  • Store embeddings in Redis:
    python for key, embedding in zip(bike_keys, embeddings): pipeline.json().set(key, "$.description_embeddings", embedding) pipeline.execute()

Creating an Index and Querying

  1. Create an Index
  • Define the schema and create the index:
    python client.ft("idx:bikes_vss").create_index( fields=[ TextField("$.model"), TextField("$.brand"), NumericField("$.price"), TagField("$.type"), TextField("$.description"), VectorField("$.description_embeddings", "FLAT", {"TYPE": "FLOAT32", "DIM": VECTOR_DIMENSION, "DISTANCE_METRIC": "COSINE"}) ], definition=IndexDefinition(prefix=["bikes:"]) )
  1. Perform Vector Search
  • Encode queries and perform K-nearest neighbors (KNN) search:
    python queries = ["Bike for small kids", "Best Mountain bikes for kids"] encoded_queries = embedder.encode(queries) for encoded_query in encoded_queries: query = Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]").sort_by("vector_score").return_fields("vector_score", "id", "brand", "model", "description") result_docs = client.ft("idx:bikes_vss").search(query, {"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}).docs

RedisCloud as a Vector Database

RedisCloud offers a managed Redis service, simplifying the deployment and maintenance of Redis instances. This section explores the benefits and features of using RedisCloud as a vector database, based on an experience shared during the Vector Search Hackathon organized by the MLOps Community, Redis, and Saturn Cloud.

Benefits of Managed Redis

  • High Availability: RedisCloud manages sharding, availability, uptime, and throughput, allowing teams to focus on development rather than maintenance.
  • Ease of Use: Redis modules like Redisearch facilitate complex operations directly within the Redis database, eliminating the need for external implementations.

Using Redis as a Vector Database

  • Data Loading and Storage
  • Redis efficiently loads large volumes of vectors, making near real-time data processing feasible.
  • Redis Object Mapping with Python
  • Using Pydantic models, data can be directly stored and queried in Redis: from pydantic import BaseModel from redis_om import HashModel class Paper(HashModel): paper_id: str title: str year: int authors: str categories: str abstract: str paper_instance = Paper( paper_id="0704.3780", title="Stochastic Optimization Algorithms", year=2006, categories="cs.NE", abstract="---" ) paper_instance.save()
  • Querying and Vector Search
  • Queries can be performed using the same model, providing a seamless experience:
    python results = Paper.find(Paper.year > 2004).all() vector_query = search_index.vector_query(number_of_results=50) redis_conn.ft("papers").search( vector_query, query_params={"vec_param": np.array(vec, dtype=np.float32).tobytes()} )

Also read:

Clone Hard Drive with Paid/Free cloning software windows 10/11

Free Accounting Software for NonProfits

Conclusion

Redis, enhanced by Redisearch and managed through RedisCloud, offers robust capabilities for handling and querying vector data. This transforms Redis from a simple key-value store into a powerful vector database, opening up new possibilities for applications in AI, search engines, and beyond.

FAQs

What is a vector database?

A vector database is designed to store, manage, and query vector embeddings, which are numerical representations of data in an N-dimensional space. These embeddings capture the semantic information of unstructured data like text, images, and videos, enabling efficient similarity searches and other machine learning tasks.

How does Redis function as a vector database?

Redis can function as a vector database through Redis Stack, which includes modules like Redisearch for managing and querying vector embeddings. Redis Stack allows you to store vectors, retrieve them, and perform vector similarity searches efficiently.

Join Our Whatsapp Group

Join Telegram group

What are vector embeddings?

Vector embeddings are dense, low-dimensional numerical representations of data generated by machine learning models. These embeddings capture the semantic meaning and patterns in the data, making it possible to perform tasks like similarity searches and clustering.

How do you set up Redis Stack for vector databases?

To set up Redis Stack for vector databases, you can either use Redis Cloud or install Redis Stack on your local machine. Ensure that the JSON and search modules are configured. For Python, you will need libraries like redis, pandas, sentence-transformers, and optionally tabulate.

What Python packages are required to work with Redis as a vector database?

You will need the following Python packages:

  • redis for Redis client interactions
  • pandas for data manipulation
  • sentence-transformers for generating embeddings
  • tabulate (optional) for rendering Markdown tables

How do you connect to a Redis server in Python?

You can connect to a Redis server in Python by instantiating the Redis client with the appropriate host, port, and optionally, the password. For example:

client = redis.Redis(host="localhost", port=6379, decode_responses=True)

For Redis Cloud, use the provided connection string and credentials.

How do you create vector embeddings from data in Redis?

To create vector embeddings, first fetch the data, then use a pre-trained model like Sentence-BERT to generate embeddings. Store these embeddings in Redis as JSON documents. Here’s an example:

embedder = SentenceTransformer('msmarco-distilbert-base-v4')
descriptions = client.json().mget(keys, "$.description")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
    pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()

How do you create an index for vector searches in Redis?

Create an index using the FT.CREATE command with a vector field. Specify the JSON path, index type, and distance metric (e.g., cosine similarity). Here’s an example:

FT.CREATE idx:bikes_vss ON JSON 
  PREFIX 1 bikes: 
  SCHEMA 
    $.model TEXT 
    $.brand TEXT 
    $.price NUMERIC 
    $.type TAG 
    $.description TEXT 
    $.description_embeddings VECTOR FLAT 6 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE

How do you perform a vector similarity search in Redis?

To perform a vector similarity search, encode the query using the same model used for the embeddings and execute a K-nearest neighbors (KNN) query. Here’s an example:

query = Query('(*)=>[KNN 3 @vector $query_vector AS vector_score]').sort_by('vector_score').return_fields('vector_score', 'id', 'brand', 'model', 'description').dialect(2)
encoded_query = embedder.encode(["Bike for small kids"]).astype(np.float32).tobytes()
results = client.ft("idx:bikes_vss").search(query, {'query_vector': encoded_query}).docs

What are some practical applications of Redis as a vector database?

Redis as a vector database is useful for applications such as:

  • Semantic search engines
  • Recommendation systems
  • Image and video similarity searches
  • Chatbots and conversational AI
  • Personalized content delivery

How do you handle data types when storing vectors in Redis?

When storing vectors in Redis, ensure that you consistently use the same data type (e.g., np.float32). Convert the vectors to bytes before storing them, and use the same data type when retrieving them to avoid discrepancies.

What are the advantages of using Redis Cloud for vector databases?

Using Redis Cloud for vector databases offers several advantages:

  • High availability and automatic failover
  • Simplified setup and maintenance
  • Scalability without managing infrastructure
  • Access to advanced Redis modules like Redisearch for vector similarity searches

Nilesh Payghan

Share
Published by
Nilesh Payghan

Recent Posts

Celebrating Family Connections: Flutterwave’s Insights and Innovations on International Day of Family Remittances (IDFR) 2024

In honor of the International Day of Family Remittances (IDFR) 2024, Flutterwave, Africa's leading payment…

2 weeks ago

PadhAI App Smashes UPSC Exam with 170 out of 200 in Under 7 Minutes!

PadhAI, a groundbreaking AI app, has stunned the education world by scoring 170 out of…

2 weeks ago

Free Vector Database

Vector databases are essential for managing high-dimensional data efficiently, making them crucial in fields like…

2 weeks ago

Flutter App Development Services: A Hilarious Journey Through the World of Flutter

Welcome to the whimsical world of Flutter app development services! From crafting sleek, cross-platform applications…

2 weeks ago

Flutter App Development

Flutter, Google's UI toolkit, has revolutionized app development by enabling developers to build natively compiled…

2 weeks ago

SQL Convert DateTime to Date

SQL (Structured Query Language) is a powerful tool for managing and manipulating databases. From converting…

3 weeks ago