
Enterprises that build and deploy agents have a problem: it takes a lot of time for their engineers to learn that an agent is wrong, and the cycle continues, especially without a human at each step.
LangSmith, LangChain’s monitoring and evaluation platform, has launched a new capability in public beta that makes this problem more manageable. LangSmith Engine automates the entire chain by detecting production failures, diagnosing root causes against a live code base, developing fixes, and preventing regressions. It does this in one automated pass.
The LangSmith Engine gives AI engineers a faster way to experiment, but it’s entering a crowded field: Anthropic, OpenAI, and Google are all involved in observation and evaluation. on their platforms.
LangSmith Engine looks at failures
LangChain said in a blog post that a typical agent development cycle begins with monitoring to understand what the agent is doing, followed by identifying gaps, making changes to instructions and tools, and creating ground-truth data sets. Developers then run experiments and check for regressions before shipping the agent.
The problem is that when customers often don’t detect bug patterns during tracking, it becomes difficult to see the bugs repeating and there is no targeted evaluator to solve the same problem when it happens again in production.
The LangSmith Engine works to detect production traces for several types of signals, “obvious errors, online rater failures, trace anomalies, negative user feedback, and unusual behavior such as the user asking questions the agent is not designed to answer.”
The engine will then read the live codebase, find the culprit, and generate a pull request before proposing a custom evaluator for a specific failure pattern. Man comes at the stage of confirmation.
It builds on LangSmith’s existing tracking and assessment infrastructure and also works with enterprise assessment results.
Unlike monitoring tools like Weights & Biases, Arize Phoenix, and Honeyhive, the LangSmith Engine takes the entire chain automatically—detects the failure, diagnoses the root cause, drafts a fix—and brings the human in only at the approval stage.
Model providers that bring appraisers to the platform
While LangSmith identifies this evaluation cycle as a need for many enterprises, the Engine comes at a time when larger providers are beginning to offer monitoring tools on their platforms. This means that businesses can choose to use the end-to-end platform rather than adding the LangSmith Engine to their existing workflows.
Anthropic’s Claude managed agents agent combines deployment, evaluation, and orchestration in a single package. Frontier of OpenAI offers a similar end-to-end platform for building, managing and evaluating enterprise agents — though both have faced questions from enterprises wary of committing to a single vendor.
However, practitioners note that not everyone wants to bring assessments and observability to a complete platform.
Third-party observability is standard for many businesses, Leigh Coney, founder and general counsel at Workwise Solutions, told VentureBeat.
“One fund I work with runs Claude for analytics and GPT for separate workflows. If the observability lives inside each provider’s tools, now you have two systems that can’t talk to each other. Your compliance team can’t create a single audit trail,” he said. “So third-party observability survives because many models are already the default in the enterprise and someone has to sit between the providers.”
Jessica Arredondo Murphy, CEO and co-founder of True Fit, said independent platforms like LangSmith need to prove to businesses what they can do. "answer the long-standing question of whether they are a cross-model transaction layer for quality and reliability.
“Enterprises aren’t converging on first-party model provider tools like model providers prefer. What I’m seeing is a pragmatic divide: teams will use first-party tools for quick onboarding and early-stage debugging, but they tend to be more flexible when it comes to production reliability, manageability, and the long-term.” he said.
The LangSmith Engine is now available in public beta. Teams can merge a tracking project, optionally merge their repos, and Engine will automatically start troubleshooting from production traces.




