Enterprise AI agents fail because they forget what they learned

RAG architectures are good at one thing: uncovering semantically relevant documents. They stop there.

A framework called the decision context graph bridges this gap by giving agents structured memory, time-aware reasoning, and open decision logic. RippletideA startup in the Neo4j ecosystem, built one. Key capability: non-regressive, agents that can freeze confirmed action sequences and combine them over time.

“The key point you want is non-regressivity: how do you make sure you can incorporate previous discoveries when an agent creates something new?” said Yann Bilien, Co-Founder and Chief Scientist of Rippletid.

Why RAG doesn’t go far enough

Enterprise context is spread across ERP tools, logs, databases, vector stores, and policy documents. Generative AI tools can derive from all of these—via keyword search, SQL queries, or full RAG pipelines—but there is a ceiling to search.

It should be noted that the information obtained may not correspond to the decision made (thus causing hallucinations); and even when agents take in the right information, there is often no guidance for making decisions supported by strong grounds.

That is, RAG receives the documents, not the decision context. “Everybody starts with RAG: Pull the relevant documents, put them in the query, let the model figure it out,” says Wyatt Mayham. Northwest AI Consulting.

While this works well for chatbots, it “immediately breaks down” for agents who need to make decisions and take action, he noted. “The biggest thing developers struggle with is the gap between search and implementation.”

The received document does not tell the agent whether it still applies, has been superseded or if there is a conflicting rule that takes precedence, Mayham said. “Agents need decision context, not just information.”

In construction (the human world), this might mean knowing that a price exception has expired, that a safety policy only applies in certain jurisdictions, or that a standard operating procedure was updated a month ago. “Miss any of that and the agent has made a mistake of confidence,” Mayham said.

Without a structured decision context, agents combine incompatible rules, invent constraints to fill gaps, and rely on what Bilien calls. "probabilistic estimates on unlimited data." Errors are hard to replicate because builders can’t track why the agent made a particular choice.

The problem of compositional error is also real, Mayham said: A small miss rate at each step is “disastrous” in a multistep workflow. “That’s the main reason most enterprise agents never leave the pilot stage.”

How decision context graphs arrive at the appropriate response

A decision context graph addresses this by encoding a structured map of what is applicable, what rules are, and when.

The framework is optimized for one question: "Given this situation, what context currently applies?" Time is treated as a primary dimension; Each rule, ruling and exception is covered when in effect.

“The goal is to explicitly resolve missing, inconsistent, or conflicting data when plotting to avoid potential (errors) after the agent runs,” Bilien said.

The system is based on three principles:

Compatibility: The logic is explicitly encoded so that the agent knows what rules to remember and apply in a given situation. The context is returned only if the condition is appropriate.
Time-aware memory: Every rule, decision and exception is time bound. It allows agents to think "What was true then and what is true now," then repeat or explain their decision.
Decision ways: The system can explain how it got from A to B "why" logic behind it (for example, why one piece of context is included and another is not). Agents are provided "decision path" examples of how similar cases have been handled before.

During installation, unstructured data is received and structured into an ontology: what entities exist, what rules apply, what are exceptions. Neuro-symbolic AI handles pattern recognition and encodes formal, machine-readable logic. Over time, the system improves its knowledge base as new decisions are made.

“Neuro-symbolic consists of two parts: a neural part that gives agents great autonomy, and a symbolic part to reduce the amount of information needed and provide control,” Bilien said.

Tested during build (pre-production) to validate agent behaviors or refine enhancements. This reduces risks and computational needs when inferring, he said.

Agents learn more from lag

When it comes to non-regression, the key is convergence on both intelligence (models) and knowledge (shared between agents), Bilien said. It is important for agents to be able to investigate; when they do not know how to perform a task, they can try different possibilities, usually in a controlled environment or simulation (such as a support bot that tries multiple response patterns).

Then, “once a solution is evaluated as satisfactory, the graph freezes that sequence of actions,” Bilien said. Future exploration then starts from this “stable base of proven behaviors” to prevent newly acquired skills from overwriting previously learned good behavior.

The agent checks the chart before acting on or influencing the client: Does it violate the rule? Hallucinating? Staying within limits? Can you generalize the solution for similar cases?

At the macro level, the system evaluates outcomes: Did the behavior improve long-term performance? Has it been generalized in similar contexts? Did it retain the previous capabilities?

“This determinism is key for agents to manage reliability at scale,” Bilien said. This leads to more consistent, predictable, explainable and stronger control and auditable behavior.

“You want your agents to be able to learn for themselves when they encounter something they don’t know,” he said. “You want them to be able to explore and find new solutions.”

Pass by "episodic" memory

Although the team initially assumed it would deploy RL everywhere, "it was actually very difficult in an enterprise setting," Bilien said. "Information is scarce for some specific use cases and mixed for others."

Typically, using raw data for reliable predictions was a manual and time-consuming challenge, but “now we’ve entered a new era where building ontologies with agents is automatically possible,” Bilien said.

Classical controlled fine-tuning techniques can cause oscillations when models forget the last skill they learned while learning the next tone. In general, learning is not complex, compression is “dramatic”, and models improve “episodic” rather than continuously, leading them to constantly fail new or unprecedented tasks.

As Bilien points out: “If you have a regression every time, you’ll never have a complete self-learning model.”

In enterprise use cases — such as banking, where millions of transactions are processed daily — a high level of reliability is essential, he noted. “One question I ask all customers: Is 95% enough? In many use cases, it’s not. You need 99.999%. A 1% discount is huge.”

Decision context graphs can close this gap, he argues: When the same customer support question is asked over and over again, the agent will give a “satisfactory” answer, predictably and without lag, while retaining its autonomy.

Encoding applicability and temporal validity into a structured graph—rather than relying on LLM to infer— "sound approach" Mayham said there is a real limitation in existing search frameworks. An open question is whether automatic ontology generation will hold up against the mixed, diverse data that enterprises actually possess. "This is always the hard part," he said.

Source link

Enterprise AI agents fail because they forget what they learned

Why RAG doesn’t go far enough

How decision context graphs arrive at the appropriate response

Agents learn more from lag

Pass by "episodic" memory

Leave a ReplyCancel Reply

Laife’s First Daily Drop is the Personal Grooming Event of the Season, Now Up to 40% Off

Waymo says it has created a better benchmark to compare robotaxis to humans

Rotomate raises €2.1M pre-seed for industrial AI

Why RAG doesn’t go far enough

How decision context graphs arrive at the appropriate response

Agents learn more from lag

Pass by "episodic" memory

Leave a ReplyCancel Reply

Trending now

Laife’s First Daily Drop is the Personal Grooming Event of the Season, Now Up to 40% Off

Waymo says it has created a better benchmark to compare robotaxis to humans

Rotomate raises €2.1M pre-seed for industrial AI