What Are Vector Databases?

Building AI-Powered Search and Recommendations

Louis Bouchard

Dec 12, 2024 • 7 min read

Watch the video:

How can Netflix or Spotify recommend us the best series to watch or a song to listen to when it’s hard to find what we want in a basic Excel sheet? While spreadsheets are pretty straightforward, platforms like Netflix know your interests and store them efficiently in their databases. And it’s not just a bunch of Excel sheets. The spreadsheet is like a traditional database, whereas vector databases power the recommendation systems we see today. In this article, we dive into what precisely a vector database is, how it stores and searches data, the difference between indexing and a database, and the newest trends in vector databases. These are all really useful concepts for an AI engineer today playing with LLMs. Let’s dive right in.

A traditional database, as I mentioned, is similar to spreadsheets. You have rows and columns, and when you try to search for something, you get exact matches. A vector database, on the other hand, tries to understand the context and gives you items or data based on similarity. Initially, all the items are stored as a vector index.

Here, an index, or indexing, is when vector embeddings, which are numerical representations of data such as our images or text, are stored efficiently based on their similarity to each other. The data is then retrieved based on how similar it is to the question or input vector.

Let’s say you have a database of animal images. All the animals are grouped based on how similar they are. For example, two photos of a cat would be closer together than an image of a cat and a zebra. Suppose you want to find something similar to a tiger without tiger images in the database. In that case, you might get pictures of a cat because they are the next closest in similarity. This process of grouping, searching, and retrieving based on similarity is what we call indexing. Our vector index is simply our vectors after this indexing process is run, which just means our vectors are now properly indexed.

When a user sends a question to your LLM, or, in other words, when you give a new user question, the vector index searches the content efficiently and then returns the most similar items. But, as we’ve said, there may not be any images of tigers. Or, even worse, we may find very similar images of cats that we assume to be tigers. Thus, people have introduced various ways to improve this database search by combining the vector index approach with old-school approaches, such as keyword filtering. Here, we basically filter down our database based on their metadata so that we already have fewer vectors to look into and compare. The filtering process can be performed either before or after the vector search itself.

If you choose to do this before, we call it pre-filtering. While this can help reduce the search space, it may also cause the system to overlook relevant results that don’t match the metadata filter criteria. Extensive metadata filtering may also slow the processing due to the added computational overhead.
If you choose to do the filtering after your vector search, we call it post-filtering. This ensures that all relevant results are considered first based on the context of the data itself. Still, it may also introduce additional overhead and slow the processing, as irrelevant results must be filtered out after the search is complete.

In simple terms, a vector index is a search mechanism used to filter a database. But vector databases do more than just search — they handle various database management tasks differently.

First, it has better access control, which is key in any database. In a company, different users need access to specific information. Typically, we manage this with metadata filtering. In a regular database, this is straightforward. However, in a vector database, metadata filtering happens first, followed by the vector search. This sequence ensures users access only the information they’re authorized to see.

Another key point is making sure the system keeps running even if something goes wrong, which is known as fault tolerance. In a vector database, data is split across several nodes — think of them as storage units that each handle a chunk of the data. To keep things running smoothly, these nodes usually have backups. So, if one part of the database fails, the system can easily switch to a backup and keep going without missing a beat.

When you’re working with a large dataset and need to keep things running smoothly, vector databases use a technique called sharding. Basically, the data is split into smaller pieces called shards. When you run a query, each shard gets it, and the results are pulled together, making data retrieval fast and efficient.

Vector databases are also built with reliability and ease of use in mind. They support regular backups to protect against data loss and offer APIs and SDKs that make it easier for developers to build and integrate their solutions. So, it’s all about making sure everything runs smoothly and is easy to work with.

Now that we understand how vector databases work, how to choose the right one for us?

Firstly, look at its performance. Look at how many questions the database can process per second, how quickly it returns results (also known as its latency) and how fast new data can be added. Next, consider the type of indexing the vector database uses. Different vector databases use different methods under the hood. Some might be faster for certain types of queries, while others might be more efficient with memory usage. It’s like choosing the right tool for the job — each has its strengths depending on what you need. Another important factor to consider is how user-friendly the API support is. It can make a big difference for developers. As your data grows, you’ll need a database that can scale with it. There are two ways to scale: vertically, which means adding more powerful hardware, and horizontally, which means adding more nodes to your system. Look at how well it can scale and the options it provides. And, of course, we cannot forget about pricing. By considering these aspects, you’ll be better equipped to choose a database that works best for your task. You can easily find a ton of vector database options with a quick Google search, and we show our favourite ones in the course.

Now that you know how to pick the right vector database, it’s time to implement it! Let’s go through some best practices for implementing vector databases.

First, check if your vector database offers built-in functions to preprocess the data. Make sure your data is clean, with no missing values, and that you’re normalizing your data. Normalization scales your data to a common range, ensuring that each feature contributes equally to the distance calculations. This can significantly improve the accuracy of similarity searches.

Next, optimize the questions of your users. Instead of processing them one at a time, batch multiple questions together. Apply metadata filtering before performing a vector search. This reduces the number of vectors to be searched, speeding up the retrieval. Leverage the parallel processing capabilities of your vector database by distributing user queries across multiple processors or nodes to improve response times. If you have frequent similar questions, then they can be cached for redundant computations. All these practices can help improve the performance of your vector database over time, and we will go further into them in the course.

Lastly, monitor your vector database regularly using different metrics to ensure there are no issues. Key metrics like query latency, which measures the time it takes to retrieve results, and throughput, ****which tells us how many queries the system can handle per second, help you understand if the database can efficiently manage a high volume of requests. Analyzing these metrics allows you to fine-tune the database configurations for better usability and user experience.

The next big trend in vector databases is serverless vector databases. Early vector databases have some big hurdles. One major issue is that storage and computing are tightly linked, making them expensive to run. Serverless databases fix this by separating them using something called geometric partitioning. This breaks the index into smaller pieces, so searches are more focused and compute resources are used only when needed, cutting down costs. However, this can sometimes increase the time it takes to get a response.

Another challenge is handling multiple users on the same hardware without wasting resources. Let’s imagine this: you and your friend use Netflix. You stream HD movies every night, while your friend only watches a movie once a month. You need constant, high-speed data, which requires a lot of hardware, while your friend doesn’t need as much. If both of you were given the same amount of hardware, resources might be wasted on your friend’s end to meet your high-demand needs. To solve this, the system groups users with similar usage patterns together, ensuring that everyone gets the resources they need without impacting others.

Also, keeping data fresh and ready for quick searches can be tough for early databases, especially with large data volumes. When you add new data to a vector database, you want it to be searchable immediately. However, adding it to the main index can take some time, which can slow things down. To solve this, some systems use a freshness layer, like a temporary cache for new data. This allows for fast queries while the data is gradually integrated into the main index.

These features show that serverless vector databases offer flexible, cost-efficient, and scalable solutions, resulting in more accurate and thorough searches. They are quite an interesting option for most startups!

Whether you’re building recommendation systems like Netflix, Spotify, or any AI-driven application, vector databases provide the performance, scalability, and flexibility needed to handle large, complex datasets.

I know this article was packed with lots of tips and insights, but maybe a bit too much. Fortunately, we will detail all of these approaches in the course with clear and concrete code examples. I invite you to have a look at our practical RAG course for more insights and learnings.

Thank you for reading throughout the end and I hope you found this article useful!

Watch the video:

Sign up for more like this.