
Most orchestration frameworks are built for agents that run in seconds or minutes. Now the agents the clock is tickings – and in some cases days – these frames begin to crack.
Several model providers, such as Anthropic with Claude Code and OpenAI with Codex, introduced early support for long-horizon agents through multi-session tasks, subagents, and background execution. However, these systems sometimes assume that agents are still working within time-bound workflows, even if they work for long periods of time.
Open-source model provider Moonshot AI wants to go beyond that with its new model, the Kimi K2.6.
Moonshot says the model is designed for continuous execution, autonomously handling internal use cases including agents working for hours and, in one case, five days of monitoring and incident response.
But the increasing use of this type of agent exposes a critical gap in orchestration: most orchestration frameworks are not designed for this type of persistent, state-aware execution. Open-source models such as Kimi K2.6, which rely on swarms of agents, claim that their orchestration approach is close to managing stateful agents.
Challenges of managing long-running agents
While it is true that some enterprises prefer to bring their own orchestration frameworks to agent ecosystems, model providers and agent platforms recognize that offering agent management remains a competitive advantage.
Other model providers have begun to explore long-running agents through multi-session tasks and background execution. For example, Anthropic’s Claude Code handles agents leading agent directing other agents based on a user-instructed set of definitions. The OpenAI Codex acts similarly.
Kimi K2.6 approaches orchestration with an enhanced version of Agent Swarms that can manage up to 300 sub-agents “performing 4,000 coordinated steps simultaneously.” Moonshot AI wrote in a blog post. Compared to both Codex Claude and Codex, K2.6 relies on a model rather than predefined roles to define orchestration.
Kimi K2.6 is now available on Hugging Face via the API, Kimi Code and the Kimi app.
Practitioners who experiment with long-horizon agents say that fragility runs deeper than it can be fixed by push.
As a practitioner, Maxim Saplin included it blog post“That doesn’t mean subagents are useless. It just shows that orchestration is still fragile. Right now, it feels more like a product and training problem than something you can solve by writing a fairly rigid directive.”
The problem posed by long-running agents is that it is difficult to maintain their state, especially as the environment keeps changing as they perform their jobs. The agent constantly calls various tools and APIs or accesses various databases during runtime. Most existing agents, which can work for one or two executions, call various tools, but at most one minute.
Mark Lambert, chief product officer at ArmorCode, which builds an autonomous security platform for enterprises, told VentureBeat in an email that the governance gap is already outpacing deployment.
"These agent systems can now generate code and system changes faster than most organizations can review, patch or manage them. This will require more than an additional scan. Organizations will need stronger AI governance that provides context, prioritization and accountability, teams must manage Kim and other AI-generated risks before they become cumulative exposure." Lambert said.
Agents that have been running for a long time can also fail without a clear pullback. Most importantly, these types of agents often lack a well-defined set of tasks and dynamically adjust their plans as they work.
Kunal Anand, chief product officer at F5, told VentureBeat in an email that long-horizon agents represent a bigger architectural change than most companies are preparing for.
“We’ve moved from scripts to services, to containers, to functions, and now to agents as persistent infrastructure. This creates categories that we don’t yet have good names for: agent runtime, agent gateway, agent identity provider, agent network. The API gateway pattern becomes something that needs to understand goals and workflows, not just endpoints and verbs.”
13 hours and even five days of running
Even as enterprises begin to look at long-horizon agents, understanding how to orchestrate agents is critical as model capabilities begin to outpace orchestration innovation.
Moonshot AI said the model was built for reflective tasks "real-world problems that typically require weeks or months of collective human effort." In a separate whitepaper provided to VentureBeat, Moonshot claims that K2.6 built a complete SysY compiler from scratch in 10 hours—the equivalent of a team of four engineers over two months—and passed all 140 functional tests without human intervention.
The team applied K2.6 to complex engineering tasks, including an eight-year-old open-source financial compliance engine overhaul. Moonshot’s engineers described a 13-hour run that “iterated through 12 optimization strategies, launching more than 1,000 tool calls to precisely modify more than 4,000 lines of code.”
Moonshot said one of his teams used K2.6 to create an autonomously operating agent within five days. That agent handled monitoring, incident response, and system operations.





