As enterprises face reliability challenges, AI agents are entering a period of reinvention



As enterprise AI agents move into production, organizations face a growing reliability challenge. Many teams are discovering that LLM performance alone does not determine whether agents are successful in production. End-to-end AI workflows must survive crashes, maintain state, recover from failures, manage inference costs, and coordinate across APIs, tools, and enterprise systems.

After the first wave focused on rapid deployment, organizations must now revisit first-generation applications and redesign their initial agent architectures around workflow orchestration, observability, management and recovery, said Preeti Somal, senior vice president of Temporal Technologies, during the recent AI Impact Series event in New York.

“We have a lot of customers who come to us where they’re building version 2.0 of the same agent,” Somal said. “They had to move really fast, but they didn’t care about the plumbing. Things fall apart and burn and then they go back to rebuilding with a solid foundation.”

For Temporal, a workflow orchestration company whose infrastructure predates the current wave of agent AI, the shift reflects a broader enterprise realization: production AI systems require continuous execution, state management, visibility into workflows, and mechanisms to recover when models or downstream systems fail.

Agent AI has amplified familiar engineering challenges

“These examples are not necessarily new," Somal said. " The AI ​​just powers them up."

Agent systems introduce additional complexity because they often involve long-running, multi-step processes involving multiple services, models, APIs, and tools. A single workflow can call several large language models, access search engines, launch external applications, and manage state within hours or days. Engineering questions, Somal said, often come up only after deployment.

“People will write agents, but they haven’t thought about what happens if the agent crashes,” he said. “Should I run the entire agent flow again?”

For businesses operating under cost constraints, the answer is important. Restarting workflows after failures can increase turnaround costs, increase latency, and create a poor customer experience.

Somal compared the present moment to an earlier era in enterprise cloud adoption, when organizations went straight to migrating workloads before considering that they would have to redesign underlying architectures if they wanted those workloads to weather long-term.

“The rush to do AI in a world where you haven’t even modernized your application reminds me a little bit of the lift and shift that’s happening in the cloud,” he said. “Everybody realized that you spend more money in the cloud, and we haven’t gotten any value there.”

Why are long-standing agents forcing a new architecture?

Enterprise workflows increasingly involve agents executing over long windows, sometimes spanning many hours interacting with tools and systems. Reliability becomes difficult when workflows persist over time, affecting both state and memory, two ideas that are often used interchangeably in AI conversations.

State refers to the execution of the workflow. This includes what process the agent is in, what actions have already completed, and where to resume recovery after a failure. Memory or context captures the information an agent carries between interactions or tasks.

“You want to recover the state of the agent, what step and what actions were taken, and if something crashes, where relative to the context and memory fragment,” Somal said.

This distinction becomes increasingly important as enterprises move beyond simple chatbot interactions to longer-term business processes. Somal pointed to a healthcare example from client Abridge, where a workflow processes physician visits at multiple stages, including audio processing, summarization, model calls and post-visit generation.

“There’s not just one piece of this flow,” Somal said. “Taking and slicing videos, getting summaries, calling LLMs, creating post-visit summaries, it’s all organized.”

The implication for enterprises is that successful agents increasingly depend on systems that can withstand disruptions, coordinate across services, and maintain continuity over time.

Deterministic spine elevation

A useful framework for enterprise AI design is a deterministic backbone, Somal explained how they think about Temporal’s role.

“It means the way you want to go," he said. "It summons the brain, but if the brain doesn’t respond, it will summon it again. If the brain responds, but the next step fails, it will continue from where it failed.”

In this framework, the language model acts as a probabilistic system that produces variable results, while the orchestration software maintains execution reliability around it. The concept is important because enterprise systems require more and more consistency even as models remain non-deterministic. A procurement workflow, health care summary, customer support escalation, or compliance process cannot simply fail silently because a model call times out or an external dependency crashes.

“The most important thing you care about is making sure you recover and that you don’t pay taxes if something goes wrong,” Somal said.

Reliability, visibility and economics of token spending

As enterprise leaders evaluate AI ROI, cost visibility is a growing concern. Long-running agents often make multiple model calls across complex workflows, which can create opaque spending patterns. Somal described one operational advantage of orchestration as visibility into where costs are accumulating. Because workflows can be observed step-by-step, teams can see where tokens are consumed in the agent process.

“You have a view of the entire stream in a single pane of glass,” he said. “You can now see where you’re spending tokens in an agent that takes just a few steps and makes many different system calls.”

Workflow recovery also builds cost efficiency. Without continuous orchestration, a late-stage failure can force organizations to redo the entire process, including all previous model invocations. Systems designed around recovery can continue to execute from the point of disruption, Somal said.

“You take it from the scene of the accident,” he said. “We save you the cost of running the agent again from the first step.”

Enterprises must build paved roads and engage the expertise of partners

Concerns about governance are another example that emerges with the takeover of agentic AI. Instead of adopting fully managed agent systems wholesale, enterprises increasingly want standardized internal frameworks that implement and provide safeguards while maintaining flexibility and the necessary features such as management controls, model selection policies, identity systems, cost management and observability, Somal said.

“Businesses are looking at building these paved roads,” he said. “Taking something off the shelf might not work because there are all these other requirements.”

As organizations revisit first-generation deployments, challenges like these look less like modeling problems and more like systems engineering challenges, and Temporal is positioned to help enterprises take that next step in part because, for many organizations, it existed as part of broader modernization programs before AI became a strategic priority.

“The temporary is already in the facility,” Somal said. “It feels very natural to take that and extend it to AI and agent platforms.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *