Agents need more vector search than RAG



What is the role of vector databases in the world of agency artificial intelligence? That’s a question organizations have been grappling with in recent months. The story had real momentum. As large language models scaled to millions of tagged context windows, a valid argument spread among enterprise architects: purpose-built vector lookup was a stopgap, not an infrastructure. The agent will master the memory retrieval problem. Vector databases were an artifact of the RAG era.

Production evidence goes in the other direction.

QdrantThe Berlin-based open source vector search company announced a $50 million Series B on Thursday, two years after its $28 million Series A. The timing is not random. The company is also shipping version 1.17 of its platform. Together, they represent a specific argument: the search problem did not decrease when agents arrived. Expanded and complicated.

"People make several requests every few minutes," Qdrant CEO and co-founder Andre Zayarni told VentureBeat. "Agents make hundreds or even thousands of requests per second, just gathering information to make decisions."

This change changes infrastructure requirements in ways that RAG-era deployments were never designed to handle.

Why agents need a search layer that memory can’t replace

Agents work on data they’ve never been trained on: proprietary enterprise data, current data, millions of documents that are constantly changing. Context windows manage session state. They do not provide high recall search on this data, maintain search quality as it changes, or maintain query volumes generated by autonomous decision making.

"Most AI memory frameworks out there use some form of vector memory," Zayarni said.

The implication is straightforward: even tools positioned as storage alternatives rely on the underlying search infrastructure.

Three failure modes arise when this search layer is not designed for load. At the document scale, a missed result is not a latency problem—it is a decision quality problem compounded at each search pass by an agent queue. Actuality decreases under write load because newly inserted data resides in unoptimized segments before indexing matures, making searches on the freshest data slower and less accurate when current data is most important. In a distributed infrastructure, a single slow replica pushes latency into each concurrent tool call in the agent queue—latency that a human user perceives as an inconvenience, but an autonomous agent cannot.

Release 1.17 of Qdrant directly addresses each. A corresponding feedback query improves the recall by adjusting the similarity score in the next search pass using the signals generated by the lightweight model without retraining the deployment model. A delayed fan-out feature polls a second replica when the first exceeds a configurable delay threshold. A new cluster-wide telemetry API replaces node-by-node troubleshooting with a single view across the entire cluster.

Why Qdrant doesn’t want to be called a vector database anymore

Almost every major database now supports vectors as a data type – from hyperscalars to traditional relational systems. This change changed the competition question. The data type is now table shares. What remains specialized is the search quality at production scale.

The difference is that Zayarni no longer wants Qdrant to be called a vector database.

"We are building a data retrieval layer for the era of artificial intelligence," he said. "Databases are for storing user data. If the quality of search results is important, you need a search engine."

His advice for starting teams: use the vector support you already have in your stack. Teams that migrate to targeted search do so when the scale problem is exacerbated.

"We see companies come to us every day saying they started with Postgres and thought it was good enough – and it wasn’t."

Qdrant’s architecture, written in Rust, gives it memory efficiency and low-level performance control that higher-level languages ​​can’t match at the same price. An open-source foundation prevails—community feedback and developer acceptance allow a Qdrant-scale company to compete with vendors with greater engineering resources.

"Without him, we wouldn’t be where we are today." Zayarni said.

How two production teams found the limits of general-purpose databases

Companies building AI systems on Qdrant make the same argument from different angles: agents need a search layer, and conversational or context memory is no substitute for it.

GlassDollar helps businesses including Siemens and Mahle value startups. Search is the core product: it describes a user need in natural language and gets a ranked shortlist from a corpus of millions of companies. The architecture performs query dilation on each query – a single query turns into multiple parallel queries, each taking candidates from a different angle, before the results are combined and reordered. This is an agent search pattern, not a RAG pattern, and requires a purpose-built search infrastructure to keep it at volume.

The company migrated from Elasticsearch and reached 10 million indexed documents. After switching to Qdrant, it reduced infrastructure costs by about 40%, reduced the keyword-based compensation layer that Elasticsearch kept to cover compliance gaps, and saw a 3x increase in user engagement.

"We measure success by remembering," GlassDollar product manager Kamen Kanev informed VentureBeat about this. "If the top companies aren’t in the results, nothing else matters. The user loses trust."

Agent memory and extended context windows are also insufficient to absorb the workload that GlassDollar needs.

"This is an infrastructure problem, not a public administration task." Kanev said. "This is not something you will solve by expanding the context window."

Another is a Qdrant user & AIbuilds infrastructure for patent litigation. Its AI agent, Andy, performs semantic searches on hundreds of millions of documents spanning decades and multiple jurisdictions. Patent attorneys will not act on legal text generated by artificial intelligence, meaning that every conclusion the agent makes must be substantiated in a real document.

"Our entire architecture is designed to minimize the risk of hallucinations by making search the base primitive, not the generation." AI founder and CTO Herbie Turner told VentureBeat about it.

For &AI, the agent layer and search layer differ by design.

"Based on our patent agent Andy Qdrant," Turner said. "An agent is an interface. A vector database is ground truth."

Three signals that it’s time to move beyond your current setup

A practical starting point: use the vector ability you have on your stack. The evaluation question is not whether to add vector search – when your current setup ceases to be adequate. Three signals mark this point: search quality is directly related to business results; query patterns include expansion, multistage reordering, or parallel tool calls; or the amount of data goes into tens of millions of documents.

At this point, the evaluation turns to operational questions: how much visibility does your current setup give you into what’s happening on the distributed cluster, and how much performance slack does it have when the volume of agent requests increases.

"There is a lot of buzz right now about what will replace the search layer," Kanev said. "But for anyone building a product where search quality is a product, where missed results have real business consequences, you need a dedicated search infrastructure."



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *