
Getting AI agents to work reliably in production, not just in demos, is proving more difficult for enterprises than expected. Fragmented data, uncertain workflows, and runaway escalation rates slow deployment across industries.
“The technology itself often works well in demonstrations,” said Sanchit Vir Gogia, senior analyst at Greyhound Research. “The challenge begins when it is required to operate within the complexity of a real organization.”
Burley Kawasaki, who oversees agent deployment at Creatio, and the team have developed a methodology built around three disciplines: data virtualization to work around data lake latencies; KPIs such as agent dashboards and management level; and tightly coupled loops of use to move toward higher autonomy.
In simpler use cases, Kawasaki says these practices have allowed agents to handle 80-90% of tasks on their own. With further tuning, he estimates it could support an autonomous solution in at least half of use cases, even in more complex deployments.
“People did a lot of testing with proof of concepts, a lot of testing out there,” Kawasaki told VentureBeat. “But now in 2026, we’re starting to focus on critical workflows that either drive operational efficiency or generate additional revenue.”
Why agents fail in production
Enterprises are eager to adopt agent AI in one form or another—often because they fear being left out before they can identify tangible real-world use cases—but face significant bottlenecks around data architecture, integration, monitoring, security, and workflow design.
Gogia said the first hurdle is almost always related to data. Enterprise information rarely exists in a neat or uniform form; SaaS platforms are common in applications, internal databases, and other data stores. Some are structured and some are not.
But while enterprises have overcome the problem of data retrieval, integration is a big challenge. Agents rely on APIs and automation hooks to interact with applications, but many enterprise systems were developed long before such autonomous interactions became a reality, Gogia noted.
This can result in incomplete or inconsistent APIs, and systems can react unpredictably when accessed programmatically. Organizations also face challenges when attempting to automate processes that have never been formally defined, Gogia said.
“Many workflows depend on tacit knowledge,” he said. That is, employees know how to handle exceptions they’ve seen before without explicit instructions—but these missing rules and instructions become surprisingly clear when workflows become automation logic.
Tuning loop
Kawasaki explained that Creatio places agents in a “limited framework with clear safeguards,” followed by an “open” regulatory and validation phase. Teams review initial results, adjust as necessary, then retest until they reach an acceptable level of accuracy.
This loop usually follows this pattern:
-
Design time adjustment (before going live): Performance is improved through operational engineering, context binding, role definitions, workflow design, and reasoning in data and documentation.
-
Human correction in the loop (at runtime): Devs approve, edit, or resolve exceptions. In situations where people need to intervene the most (escalation or approval), users define stronger rules, provide more context, and update workflow steps; or, they will narrow the tool access.
-
Ongoing optimization (after going live): Developers continue to monitor exception rates and results, then adjust iteratively as needed, helping to improve accuracy and autonomy over time.
The Kawasaki team applies search-enhanced generation to location agents in enterprise knowledge bases, CRM data, and other proprietary sources.
Once agents are deployed in the wild, they are monitored with a dashboard that provides performance analytics, conversion insights and auditability. In essence, agents are treated as digital workers. They have their own management layer with dashboards and KPIs.
For example, an onboarding agent will be included as an interface to a standard dashboard that provides agent monitoring and telemetry. This is where the platform layer — orchestration, management, security, workflow execution, monitoring, and UI deployment — sits "above LLM," Kawasaki said.
Users see a dashboard of agents in use and each of their processes, workflows, and executed results. They can “sense” into an individual record (such as a production or update) showing a step-by-step execution log and associated communications to support tracing, debugging, and agent tuning. The most common fixes include logic and incentives, business rules, operational context, and access to tools, Kawasaki said.
The biggest problems after deployment:
-
By exception, the workload can be high: In extreme cases, early spikes often occur until guardrails and workflows are adjusted.
-
Data quality and completeness: Missing or inconsistent fields and documents may lead to escalation; teams can determine which data to prioritize for reasoning and which checks to automate.
-
Auditability and credibility: Regulated customers, in particular, require clear logging, approvals, role-based access control (RBAC), and audit trails.
“We always explain that you have to take the time to train agents,” Katherine Kostereva, CEO of Creatio, told VentureBeat. “When you start the agent, it doesn’t happen immediately, it takes time to fully understand, then the number of errors will decrease.”
"Data preparation" does not always require major repairs
“Is my data ready?” is a common early question when trying to place agents. Businesses know data access is important, but it can be turned off by a massive data consolidation project.
But virtual connections can allow agents to access core systems and deal with typical data lake/lake storage/storage latencies. The Kawasaki team has built a platform that integrates with data and is now working on an approach that will capture data into a virtual object, process it, and use it as a standard object for UI and workflows. This way, they don’t have to “continue or duplicate” large amounts of data in the database.
The technique could be useful in areas such as banking, where the transaction volume is simply too large to move into CRM, but is “still valuable for AI analysis and triggers,” Kawasaki said.
Once integration and virtual objects are in place, teams can assess data completeness, consistency, and availability and identify low-friction starting points (such as document-heavy or unstructured workflows).
Kawasaki stressed the importance of “really using the data in the underlying systems, which tend to be the purest or the source of truth anyway.”
Matching agents to work
Kawasaki says autonomous (or near-autonomous) agents are best suited for high-volume workflows with “clear structure and controlled risk.” For example, onboarding or document acceptance and approval in loan preparation, or standardized assistance such as updates and applications.
“Especially when you can link them to very specific processes within the industry – you can really measure and deliver hard ROI,” he said.
For example, financial institutions are often siled by nature. Commercial credit teams operate in their own environment, wealth management in another. But an autonomous agent can look at departments and individual data stores to identify commercial clients who might be good candidates for, say, wealth management or advisory services.
“You’d think it would be an open opportunity, but nobody’s looking at all the silos,” Kawasaki said. In this scenario, some banks that agents applied to, claimed to have seen “millions of dollars in additional revenue benefits,” without naming specific institutions.
However, in other cases—particularly in regulated industries—agents with longer contexts are not only preferred, but necessary. For example, in multi-step tasks such as collecting, summarizing, comparing evidence across systems, designing communications, and developing verifiable justifications.
“The agent doesn’t respond to you right away,” Kawasaki said. “Complete tasks can take hours, days to complete.”
This requires orchestrated agent execution rather than “a single giant command,” he said. This approach divides the work into deterministic steps to be performed by sub-agents. Memory and context management can be maintained at different steps and time intervals. Grounding with RAG can help keep outputs tied to approved sources, and users have the ability to dictate expansion to file shares and other document repositories.
This model usually does not require special retraining or a new foundation model. Regardless of which model businesses use (GPT, Claude, Gemini), performance improves through guidelines, role definitions, managed tools, workflows and data justification, Kawasaki said.
The feedback loop puts “extra attention” on checkpoints, he said. People review interim artifacts (such as summaries, extracted facts, or draft recommendations) and correct errors. These can then be translated into better rules and search sources, narrower tool scopes and improved templates.
“What’s important for this style of autonomous agent is that you mix the best of both worlds: the dynamic reasoning of AI with the control and power of true orchestration,” Kawasaki said.
Ultimately, agents require coordinated changes across enterprise architecture, new orchestration frameworks and open access controls, Gogia said. Agents should be assigned identities to limit their privileges and keep them within bounds. Observational ability is essential; monitoring tools can record task completion rates, escalation events, system interactions, and error patterns. This kind of evaluation should be an ongoing experiment, and agents should be tested to see how they react when faced with new scenarios and unusual inputs.
“As soon as an AI system can be deployed, enterprises must answer several questions that rarely arise when deploying copilots,” Gogia said. For example: What systems is the agent allowed to access? What types of activities can be carried out without approval? Which activities should always require a human decision? How will each action be recorded and audited?
“Those who underestimate the challenge (enterprises) often get stuck with demonstrations that look impressive but don’t survive real operational complexity,” Gogia said.





