Context architecture replaces RAG as agent AI pushes enterprise search to its limits

Redis made its name as a caching layer that prevents web applications from crashing under load. The problem it targets now has the same structure, but is more difficult to solve: production AI agents fail not because the models are wrong, but because the data underlying them is scattered, outdated, and structured for people, not machines. Search pipelines built for single queries cannot handle the volume generated by agents.

The gap that Redis targets is structural: agents query more data than human users, but most search layers are built for the human-scale problem. Launched Monday, Redis Iris is the company’s answer: a context and storage platform that sits between an agent and the data it needs to operate on. The platform combines real-time data ingestion, a semantic interface that automatically generates MCP tools from business data models, and an agent storage server built on Redis Flex, a rewritten storage engine that runs 99% of data in flash at only one-tenth the cost of in-memory storage.

The announced enterprise RAG infrastructure is in an active transition phase. VentureBeat’s Q1 2026 VB Pulse The RAG Infrastructure Market Tracker found buyer intent to triple hybrid search from 10.3% to 33.3% between January and March. For the first time, search engine optimization has topped the ranking as the highest enterprise investment priority. Individual internal search stacks rose from 24.1% to 35.6% as enterprises outpaced off-the-shelf options. Redis isn’t the only infrastructure vendor reading these signals—several data platform providers have moved around agent context layers in recent weeks.

Scale inconsistency is the structural argument behind initialization.

"Companies will have more agents than people," This was reported by Redis CEO Rowan Trollope VentureBeat. "More agents than people means orders of magnitude more load on back-end systems."

From cache to context

Trollope traces the parallel to the mobile era: When legacy backends built for branch checkouts suddenly had to serve a million smartphone users, Redis became a load-absorbing caching layer without a complete overhaul.

What’s different this time is that agents can’t write their own middleware. In the mobile era, a developer would sit down with a database administrator, define the queries the application needed, and code the caching logic into the middleware layer. Agents cannot do this. They must find the right information at runtime through pre-built interfaces or they stall.

"It’s like the analogy of a grocery store in a refrigerator," he said. "If you have to go to the grocery store to buy food every time you have to go make your sandwich, that’s not very efficient. You put a refrigerator in every house and keep some food. And that’s where we still tend to exist in the infrastructure stack."

What is included in Redis Iris?

Together, Iris ships with five components that include data ingestion, semantic access, storage, and caching.

Redis Data Integration. Now general availability. RDI uses change capture pipelines to continuously sync data from databases, repositories, and document stores to Redis with connectors for Oracle, Snowflake, Databricks, and Postgres.

Context Retriever. Now in preview. Developers define the semantic model of business data using pedantic models, and Redis automatically generates MCP tools that agents use to query directly with row-level access control implemented on the server side. Trollope describes the transition from classical RAG as a directional inversion. "This is just a move to allow the agent to pull the data instead of pre-fetching it and stuffing it into the pipeline." he said.

Agent Memory. Now in preview. Saves short- and long-term state between sessions so that agents carry the context without retrieving it each turn.

Redis Flex. A rewritten storage engine that runs 99% of data on SSDs and 1% in RAM, delivering petabyte-scale retrieval at millisecond latencies.

Redis Search and LangCache. Search and semantic caching framework under the platform. LangCache reduces unnecessary model calls by caching operational responses.

What analysts say

The data industry is generally moving in the same direction. Each major database vendor creates a context layer argument.

Traditional database vendors including Oracle It combines context and memory layers to bring relational databases into the age of agent AI. Including target vector database vendors Pine cone they do the same thing, creating a new layer of knowledge for the agent AI context. such as independent context layers View it is also part of the emerging landscape.

Trollope frames Redis’ position as structurally distinct from this competition.

"No one else needs to lose for us to win." he said. Many Redis deployments already run MongoDB or Oracle as a system of record. The iris reflects and stores those systems instead of displacing them. Redis launches Iris with native connectors on the Snowflake marketplace.

Stephanie Walter, AI Stack Practice Lead at HyperFRAME Research, lays out the market context. "The market is converging to the same conclusion: agents don’t need more tokens or better models. They need a controlled, current, low-latency context," Walter said.

His reading of Redis’ differentiation focuses on runtime, latency-sensitive operational state, and where Redis already sits on the stack with near real-time data.

"The pitch is not “better RAG” because “agents need live context, memory and fast search when they’re actually working." he said.

Whether it’s Redis or another vendor, every context layer technology will face a management challenge to be successful.

"Agent AI will not scale in the enterprise if each agent becomes a new cost center, a new risk to access, and a new management exception." he said. "The winning context layers will be those that run agents faster, cheaper, and more securely."

Getting it wrong in context is not an option for real-time clinical AI

Mangoes.ai is one of the companies that must already be answering these questions in production, in a context where the cost of getting it wrong is weighed against patient outcomes.

Amit Lamba, founder and CEO of Mangoes.ai, runs a real-time voice AI platform deployed in large healthcare facilities where patients and clinicians ask live questions about treatment, scheduling and case histories. Mangoes.ai built its stack on top of Redis from the ground up.

"Lookup, storage, and session state are all done through Redis, so we don’t bundle separate tools and hope they talk to each other." Lamba said.

The problem that Iris’s dynamic memory capability solves is what happens in a complex session.

"Think about an hour-long group therapy session," Lamba said. "You need to know who said what, when, and be able to convey the correct information to the therapist in the moment. This is not a simple search problem."

The platform runs multiple specialized agents in parallel, one for entity identification, one for relationship grounding, and one for job history integration.

"The dynamic memory capability is almost perfectly suited to the problem we are solving," Lamba said.

What this means for businesses

For enterprises building their AI stack around RAG, the search layer that drove them to production is no longer enough to keep them there.

The RAG era gives way to context architecture. The classic RAG model pushed data to the agent before the model was called. Production deployments change that: agents treat the data layer as a live resource rather than a preloaded payload, pulling what is needed at runtime through tool calls. Teams still optimizing RAG pipelines are solving last year’s problem.

The semantic layer is now the production infrastructure. The model that defines the business entities, their relationships, and the access rules between them must be built, versioned, and maintained with the same discipline as the data pipeline. Most organizations do not have the staff or structure to do this. Enterprises that define context architecture now are the ones that won’t have to rebuild it when agent workloads scale.

The budget is already in motion. VB Pulse Q1 2026 data shows search optimization investment rising from 19% to 28.9% during the quarter, outpacing valuation spend for the first time. Organizations that spent the previous year measuring search quality are now spending money to fix it. The context layer is an active purchase decision, not a roadmap element.

"The first buyer question is “Do I need a vector database, long context, memory, or a context engine?” should not. “What does this agent need to know, how recent does that knowledge need to be, who is allowed to access it, and what does each search cost?” should be" Walter said.

Source link

Context architecture replaces RAG as agent AI pushes enterprise search to its limits

From cache to context

What is included in Redis Iris?

What analysts say

Getting it wrong in context is not an option for real-time clinical AI

What this means for businesses

Leave a ReplyCancel Reply

Microsoft has finally admitted that the Copilot key is not a good idea, and it will soon allow you to change it again.

Forget German luxury SUVs—the Lexus GX makes more sense

T-Mobile CEO: “Almost nobody buys standalone satellite”

From cache to context

What is included in Redis Iris?

What analysts say

Getting it wrong in context is not an option for real-time clinical AI

What this means for businesses

Leave a ReplyCancel Reply

Trending now

Microsoft has finally admitted that the Copilot key is not a good idea, and it will soon allow you to change it again.

Forget German luxury SUVs—the Lexus GX makes more sense

T-Mobile CEO: “Almost nobody buys standalone satellite”