Forget about fine tuning. RAG leaks context. Hypergrids build the model your agent requires.

Enterprise teams continue to watch the same thing happen. An AI agent performs beautifully, goes into production and stops: it works for a short time, then needs a human to increase its context and inspect the product, and the promised efficiency flows into control. The agent did the job; you watched This is one reason why many agent pilots never make it to production systems.

The pitch on the other side of that wall is the one every team wants to believe in: an agent who handles a long case single-handedly, overnight if necessary, leaving only the human to approve the last 10%. Whether or not this is possible causes a problem, the orchestration conversation is mostly skipped. When AI firm Chroma tested 18 leading models, each lost accuracy as input increasedA feature of how attention works is not a gap that a stronger model closes. An agent that feeds your work more and more does not stabilize. It shakes even more.

This is the layer below the orchestra competition. Routing, continuous execution, and observability assume that each agent is competent enough to coordinate in the first place. The deeper question is how long an agent can work before a person starts, and that depends on where your company’s knowledge resides relative to the model. Both standard fixes leave people in the loop.

Why teaching your business a model keeps you in the loop

Frontier models continue to become more capable and the gap is not closing because it is not a capability issue. It’s about your knowledge of the model and where the businesses have it two ways to place there.

The first is fine-tuning, which translates knowledge into weights. It is subject to catastrophic neglect, a problem identified in the 1980s and Still not resolved in 2026: teaching a model something new erodes what it already knows. Teams work around each task by isolating it in its own fine-tuned model or adapter, which produces a large set of models. increases costs and management costs. And the fine-tuned model is an outdated picture by the day policy changes, when the expensive, slow retraining period begins.

The second is in-context learning, which skips retraining by embedding relevant policies into the query at runtime. This is where the context rot bites. The search narrows down what is included in the query, but the search miss appears to be the same as the confident answer, and both cost and latency increase with each token added.

Two failures rhyme. With fine-tuning, the model can work with confidence from the politics of the last quarter. With learning in context, he can confidently work through a detail he lost in the middle of a lengthy survey. Either way the output looks equally reliable, so you can’t tell which parts are wrong without checking them all. That is why one can never leave. Some teams often work on both at the same time, refining the fixed knowledge and buying the rest. This mitigates each failure, but doesn’t eliminate any: at any given output, you still can’t be sure that the model is running in both the current and correct contexts, so you check it.

The third way: create an on-demand expert model

The third approach is the transition from research to initial product. Instead of retraining a model or populating its command, the generator builds a small, task-specific model on demand from your policies. A generator is a hypernetwork: a network whose output is the weights of another network.

It was an idea Named in 2016; its application to create expert language models from text or documents is new and active. Sakana AI Text to LoRAIntroduced in ICML 2025, it creates a model adapter from a plain language description in one pass and calls a 2026 system hypergrid adapter called SHINE. a promising new frontierprecisely because it removes both the retraining costs of fine-tuning and the constraints of the context of the proposition.

Rather than training and maintaining adapters, the point of creating them is to assemble a large library of LoRAs for each task into a network that can produce them on demand, including tasks it has not seen.

The neat part is how it closes the loop on the above problem: the adapter groups for each task are the same object that the hypergrid automatically creates to avoid catastrophic forgetting. The model zoo ceases to be a management headache and becomes a created product.

Underneath it all, the claim to be small was most directly placed on the 2025 paper Nvidia researchers: for the narrow, repetitive tasks that populate agent workflows, small models are quite capable and 10-30 times cheaper than frontier generals. Nace.AI, a Palo Alto company $21.5 million seed round in Mayis the clearest commercial example. Its core technology, a generator it calls MetaModel, produces parameter fits for a model during inference from company policies that indicate regulated work: audit, compliance, risk assessment. The company says its agents handle most of the workflow, while human experts validate the output, 90/10 to market.

A comparison of three approaches

	Fine tuning	In context / RAG	A model generated by a hypergrid
Where business knowledge lives	In the weights of the model	Quickly, re-equip every run	In weights made on demand
Cost to renew on policy change	High: retrain	Bottom: edit source	Down: restore
Staleness	Top: picture	Down	Bottom: Restored from current policy
Cost per call and delay	Down	High, it grows with context	Down at work
Dominant failure mode	to forget model zoo	Context decay; silent search misses	Generator quality; calibration
Who owns the improving asset?	Who teaches the model	Whoever holds the data warehouse	It depends on where the generator and feedback live

Why does the hypergrid-based model raise the bar for autonomy?

There is a smaller surface where a narrow, current and small model will go wrong. Fewer errors confined to a known domain means fewer consequences for the agent to convey to the individual, which is the real basis for any high autonomy claim. This is also where a number like 90/10 comes in: it’s not a pre-set sum, but a result of how little the system is handing back. The reported autonomy shares are best read as dimensions of the architecture rather than as parameters.

Two design choices decide whether this autonomy is reliable or just fast. The first is justification: linking each output to the source so that the reviewer can check instead of repeating. This is exactly what research models are built for, e.g HalluGuardlabel each claim as supported or unsupported and cite the passage they refer to. NACE sends its agents with reasoning models and reasoning traces for the same reason. 10% feedback only means something if a person can confirm its origin in seconds.

The second is the feedback loop, and it raises the question every buyer should be asking: when your experts approve the product, whose model improves and where does it live? This decides whether the compound asset belongs to the seller or to you. The settings are different. For example, Nace uses an external network of certified experts for some tasks and the client’s own staff for direct enterprise deployments, resulting in the model being stored in the client’s cloud. Each choice takes learning and ownership to a different place.

Where the third road intersects

The approach is still early, and several questions will decide how far it goes. Calibration is the linchpin: the value is based on a model that knows it is uncertain. And this is indeed uncertain, as recent work creating these adapters has found that they do not automatically improve calibration over conventional fine-tuning, with gains appearing only under specific constraints.

The quality of the generated model is also highly dependent on the policy data on which it is built, favoring data curation. And scale is an open research frontier, so far the hypergrids shown in published work have been small. Here’s where Nace’s work gets interesting: in our interview, the company says it pushed its generator well beyond published metrics and how performance grew, began sharing results with the public, and is now rolling out a scaling law for peer review. If it holds up, it would help answer one of the open questions in the field, and it’s a paper worth looking at.

Whichever approach wins, the work still ends up in the human, and this handover is a problem of its own design. When Deloitte Australia presented a government report worth about 440,000 Australian dollars, it sent with fictitious citations and fictitious court citation after high-level review, because reviewers were examining healthy outcomes, not origins. Controlled research suggests that the pattern is general: experts made the same flawed recommendation even less when tagged by artificial intelligence.

EU AI Act Article 14 now calls this automation trend. The lesson is not about any salesman: a high degree of autonomy focuses one’s attention on a thin, late part of the work, so the value of this view depends entirely on one’s ability to quickly check the origin, which turns back to justification.

What to build and what to ask before buying

The honest way: it’s usually not orchestration or model size that’s holding your agents back, it’s whether the model knows your job well enough to be left alone, and the right tuning depends on the job. To automate a long, repetitive, high-volume process end-to-end, to run most of your internal audits overnight, and to have your own experts do the final slice checking, a hyper-networked model is an approach that is likely to be done cheaply and work long enough. For a short task that is completed in a few steps and never needs to be run unattended, the gap between this and a well-designed boundary model narrows almost to nothing and is not worth the integration costs.

When a seller introduces autonomous or specialist agents, four questions cut it.

Where does business knowledge live: in weights, operationally, or generated on demand?
What does each output come with so the reviewer can check it instead of repeating it?
What decides which job is promoted to a person?
And whose model improves from this feedback and where does it work?

The answers tell you what you’re getting, not the headline ratio.

The hypergrid approach is the most reliable attempt to know a particular case without forgetting a small model and explaining it again every time. It is also the least proven, and the parts that matter most, calibration and scale, are still under review. Test it now for proper working. For error, the integration cost costs you less than a well-proposed frontier model would.

Source link

Forget about fine tuning. RAG leaks context. Hypergrids build the model your agent requires.

Why teaching your business a model keeps you in the loop

The third way: create an on-demand expert model

A comparison of three approaches

Why does the hypergrid-based model raise the bar for autonomy?

Where the third road intersects

What to build and what to ask before buying

Leave a ReplyCancel Reply

These important Google services on Samsung phones are waiting for you to update

Oppo | Android Center

Someone made an automatic lightning photographer with a Raspberry Pi and it actually works

Why teaching your business a model keeps you in the loop

The third way: create an on-demand expert model

A comparison of three approaches

Why does the hypergrid-based model raise the bar for autonomy?

Where the third road intersects

What to build and what to ask before buying

Leave a ReplyCancel Reply

Trending now

These important Google services on Samsung phones are waiting for you to update

Oppo | Android Center

Someone made an automatic lightning photographer with a Raspberry Pi and it actually works