Redis Vector Database
A vector database efficiently stores, manages, and queries vector embeddings, capturing the semantic information of unstructured data like text and images. Redis, through Redis Stack, serves as a powerful vector database, enabling advanced similarity searches and machine learning applications, making it ideal for semantic search engines, recommendation systems, and more.
Redis, traditionally known as a key-value store, has evolved to accommodate more complex data structures and operations. This guide covers how Redis can be used as a vector database, enabling efficient handling of unstructured data and performing vector similarity searches.
Vector databases handle unstructured data, which lacks a predefined schema. This data can be text, images, videos, or music. Vectorizing this data involves mapping it to a sequence of numbers, representing the data in an N-dimensional space. Machine learning models create embeddings that capture complex patterns and semantic meanings, making this method widely accepted.
To use Redis as a vector database, follow these steps:
python pip install redis pandas sentence-transformers tabulate
python import redis client = redis.Redis(host="localhost", port=6379, decode_responses=True)
python import requests url = "https://raw.githubusercontent.com/bsbodden/redis_vss_getting_started/main/data/bikes.json" response = requests.get(url) bikes = response.json()
python pipeline = client.pipeline() for i, bike in enumerate(bikes, start=1): redis_key = f"bikes:{i:03}" pipeline.json().set(redis_key, "$", bike) pipeline.execute()
python from sentence_transformers import SentenceTransformer embedder = SentenceTransformer('msmarco-distilbert-base-v4') descriptions = [bike['description'] for bike in bikes] embeddings = embedder.encode(descriptions).astype(np.float32).tolist() VECTOR_DIMENSION = len(embeddings[0])
python for key, embedding in zip(bike_keys, embeddings): pipeline.json().set(key, "$.description_embeddings", embedding) pipeline.execute()
python client.ft("idx:bikes_vss").create_index( fields=[ TextField("$.model"), TextField("$.brand"), NumericField("$.price"), TagField("$.type"), TextField("$.description"), VectorField("$.description_embeddings", "FLAT", {"TYPE": "FLOAT32", "DIM": VECTOR_DIMENSION, "DISTANCE_METRIC": "COSINE"}) ], definition=IndexDefinition(prefix=["bikes:"]) )
python queries = ["Bike for small kids", "Best Mountain bikes for kids"] encoded_queries = embedder.encode(queries) for encoded_query in encoded_queries: query = Query("(*)=>[KNN 3 @vector $query_vector AS vector_score]").sort_by("vector_score").return_fields("vector_score", "id", "brand", "model", "description") result_docs = client.ft("idx:bikes_vss").search(query, {"query_vector": np.array(encoded_query, dtype=np.float32).tobytes()}).docs
RedisCloud offers a managed Redis service, simplifying the deployment and maintenance of Redis instances. This section explores the benefits and features of using RedisCloud as a vector database, based on an experience shared during the Vector Search Hackathon organized by the MLOps Community, Redis, and Saturn Cloud.
from pydantic import BaseModel from redis_om import HashModel class Paper(HashModel): paper_id: str title: str year: int authors: str categories: str abstract: str paper_instance = Paper( paper_id="0704.3780", title="Stochastic Optimization Algorithms", year=2006, categories="cs.NE", abstract="---" ) paper_instance.save()
python results = Paper.find(Paper.year > 2004).all() vector_query = search_index.vector_query(number_of_results=50) redis_conn.ft("papers").search( vector_query, query_params={"vec_param": np.array(vec, dtype=np.float32).tobytes()} )
Also read:
Clone Hard Drive with Paid/Free cloning software windows 10/11
Free Accounting Software for NonProfits
Redis, enhanced by Redisearch and managed through RedisCloud, offers robust capabilities for handling and querying vector data. This transforms Redis from a simple key-value store into a powerful vector database, opening up new possibilities for applications in AI, search engines, and beyond.
A vector database is designed to store, manage, and query vector embeddings, which are numerical representations of data in an N-dimensional space. These embeddings capture the semantic information of unstructured data like text, images, and videos, enabling efficient similarity searches and other machine learning tasks.
Redis can function as a vector database through Redis Stack, which includes modules like Redisearch for managing and querying vector embeddings. Redis Stack allows you to store vectors, retrieve them, and perform vector similarity searches efficiently.
Join Our Whatsapp Group
Join Telegram group
Vector embeddings are dense, low-dimensional numerical representations of data generated by machine learning models. These embeddings capture the semantic meaning and patterns in the data, making it possible to perform tasks like similarity searches and clustering.
To set up Redis Stack for vector databases, you can either use Redis Cloud or install Redis Stack on your local machine. Ensure that the JSON and search modules are configured. For Python, you will need libraries like redis
, pandas
, sentence-transformers
, and optionally tabulate
.
You will need the following Python packages:
redis
for Redis client interactionspandas
for data manipulationsentence-transformers
for generating embeddingstabulate
(optional) for rendering Markdown tablesYou can connect to a Redis server in Python by instantiating the Redis client with the appropriate host, port, and optionally, the password. For example:
client = redis.Redis(host="localhost", port=6379, decode_responses=True)
For Redis Cloud, use the provided connection string and credentials.
To create vector embeddings, first fetch the data, then use a pre-trained model like Sentence-BERT to generate embeddings. Store these embeddings in Redis as JSON documents. Here’s an example:
embedder = SentenceTransformer('msmarco-distilbert-base-v4')
descriptions = client.json().mget(keys, "$.description")
embeddings = embedder.encode(descriptions).astype(np.float32).tolist()
pipeline = client.pipeline()
for key, embedding in zip(keys, embeddings):
pipeline.json().set(key, "$.description_embeddings", embedding)
pipeline.execute()
Create an index using the FT.CREATE
command with a vector field. Specify the JSON path, index type, and distance metric (e.g., cosine similarity). Here’s an example:
FT.CREATE idx:bikes_vss ON JSON
PREFIX 1 bikes:
SCHEMA
$.model TEXT
$.brand TEXT
$.price NUMERIC
$.type TAG
$.description TEXT
$.description_embeddings VECTOR FLAT 6 TYPE FLOAT32 DIM 768 DISTANCE_METRIC COSINE
To perform a vector similarity search, encode the query using the same model used for the embeddings and execute a K-nearest neighbors (KNN) query. Here’s an example:
query = Query('(*)=>[KNN 3 @vector $query_vector AS vector_score]').sort_by('vector_score').return_fields('vector_score', 'id', 'brand', 'model', 'description').dialect(2)
encoded_query = embedder.encode(["Bike for small kids"]).astype(np.float32).tobytes()
results = client.ft("idx:bikes_vss").search(query, {'query_vector': encoded_query}).docs
Redis as a vector database is useful for applications such as:
When storing vectors in Redis, ensure that you consistently use the same data type (e.g., np.float32
). Convert the vectors to bytes before storing them, and use the same data type when retrieving them to avoid discrepancies.
Using Redis Cloud for vector databases offers several advantages:
In honor of the International Day of Family Remittances (IDFR) 2024, Flutterwave, Africa's leading payment…
PadhAI, a groundbreaking AI app, has stunned the education world by scoring 170 out of…
Vector databases are essential for managing high-dimensional data efficiently, making them crucial in fields like…
Welcome to the whimsical world of Flutter app development services! From crafting sleek, cross-platform applications…
Flutter, Google's UI toolkit, has revolutionized app development by enabling developers to build natively compiled…
SQL (Structured Query Language) is a powerful tool for managing and manipulating databases. From converting…