Andrej Karpathy's new open source autoresearch lets you run hundreds of AI experiments overnight with revolutionary implications.

Over the weekend, Andrej Karpathy – the influential ex-Tesla AI leader and co-founder and former member of OpenAI who coined the term "vibe coding"— Posted in X about a new open source project, auto search.

It wasn’t a finished model or a massive corporate product: it was, by his own admission, a simple, 630-line script. Available on Github under the permissive, enterprise-friendly MIT License. But the ambition was big: to automate the scientific method with artificial intelligence agents while we humans sleep.

"The goal is to engineer your agents to provide the fastest research progress indefinitely and without any input from you." He said in X.

The system operates as an autonomous optimization loop. The AI agent is given a training script and a fixed computing budget (typically 5 minutes on a GPU).

It reads its source code, formulates a hypothesis for improvement (for example, changing the learning rate or architectural depth), modifies the code, runs an experiment, and evaluates the results.

Verification loss – if measured in bits per byte (val_bpb)—improves, maintains change; if not, it goes back and tries again. In one night run, Karpati’s agent completed 126 experimentsdrive loss reduced from 0.9979 to 0.9697.

Today, Karpathy stated that after leaving the agent to arrange a "depth=12" The model is two days, successfully processed about 700 autonomous changes.

The agent found about 20 additional improvements that transfer seamlessly to larger models. The accumulation of these changes is reduced "GPT-2 time" the leaderboard metric from 2.02 hours to 1.80 hours—an 11% efficiency gain in a project that Karpathy believes is already fine-tuned.

"To see an agent doing this whole process from scratch and by himself is… wild," Karpathy noted that the agent has caught up with controls on the scope and alignment of attention that it had missed by hand over two decades.

It’s more than just a productivity hack; is a fundamental change in how intelligence is refined. with automation "scientific method" for code, Karpathy turned machine learning into an evolutionary process that runs at the speed of silicon rather than the speed of human thought.

In addition, X showed the broader AI and machine learning community that this kind of process can be applied far beyond computer science to marketing, health, and basically any field that requires research.

Auto-research is very widespread

The reaction was swift and viral, with Karpati’s post garnering more than 8.6 million views in two days as builders and researchers scrambled to scale up. "Carpathian loop".

Varun Mathur, CEO of AI tool aggregator platform Hyperspace AI, took the single agent circuit and distributed it over a peer-to-peer network. Each node running the Hyperspace agent became an autonomous explorer.

On the night of March 8-9, 35 autonomous agents in the Hyperspace network conducted 333 completely unsupervised experiments. The results were a masterclass in emergency strategy:

Hardware diversity as a feature: Mathur noted that when using the H100 GPU "brute force" only CPU agents on laptops were forced to be intelligent to find aggressive learning rates. These are "weak" agents focused on initialization strategies (such as Kaiming and Xavier init) and normalization options because they could not rely on raw throughput.
Discovery based on gossip: Using the GossipSub protocol, agents shared their wins in real time. When one agent discovered that Kaiming activation reduced loss by 21%, the idea spread like a digital virus. Within hours, 23 other agents had incorporated the discovery into their assumptions.
Date compression: In just 17 hours, these agents independently reinvented ML stages like RMSNorm and connected inputs that took human researchers nearly eight years to formalize at labs like Google Brain and OpenAI.

Drive 36,500 marketing experiences every year, not 30

While ML purists focus on loss curves, the business world has seen a different revolution. Eric Siu, founder of advertising agency Single Grainapplied auto search for "Experience Cycle" marketing.

"Most marketing teams run ~30 practices per year," Siu wrote in X. "The next generation will run 36,500+. Easily." He continued:

"They will conduct experiments while sleeping. Current marketing teams conduct 20-30 experiments per year. If they are “good”, maybe 52. New landing page. New ad creative. Maybe a subject line test. This is considered "data driven marketing."
But next generation marketing systems will handle 36,500+ experiences per year."

The Siu framework replaces the training script with a marketing asset—a landing page, ad creative, or cold email. The agent changes the variable (subject line or CTA), places it, measures it "positive response rate," keeps or rejects.

Siu claims that it creates a "property map" things that resonate with specific audiences—a moat built not from code, but from the history of experience. "Winning companies won’t have better marketers," wrote, "they will have faster testing cycles".

Community discussion and ‘spoilage’ of the validation set

Despite showing enthusiasm, GitHub discussions revealed a society struggling with the consequences of such rapid, automated progress.

The over-optimization trap: Researcher alexithual caused serious concern: "Aren’t you worried that starting too many experiments will eventually ‘corrupt’ the validation set?". The fear is that with enough agents, the parameters will be optimized for specific characteristics of the test data rather than general intelligence.

The meaning of earnings: User Samionb Asked if the fall from 0.9979 to 0.9697 was really noticeable. Karpati’s response was characteristically direct: "What we’re doing is optimizing performance on every compute… these are real and significant gains"

The Human Element: In X, the user a witchHead of Growth in Crypto Platform Half FinanceWhile 26 of the 35 experiments failed or crashed, the seven successful experiments documented running overnight on the Mac Mini M4. "the model became simpler and better".

This insight—more in less time—was achieved without any human intervention.

The future: curiosity as a bottleneck

The release of auto-research suggests the future of research in areas where the human role has changed thanks to simple AI guidance mechanisms. "experimenter" for "experimental designer."

As tools like DarkMatter, Optimization Arena, and NanoClaw emerge to support this bandwagon, the bottleneck of AI progress is no longer an issue. "meat computer" (Karpati’s description of the human brain) coding ability – this is our ability to determine the limits of search.

Andrej Karpathy changed the vibe again. We are no longer just coding models; we are planting ecosystems that learn while we sleep.

Source link

Andrej Karpathy’s new open source autoresearch lets you run hundreds of AI experiments overnight with revolutionary implications.

Auto-research is very widespread

Drive 36,500 marketing experiences every year, not 30

Community discussion and ‘spoilage’ of the validation set

The future: curiosity as a bottleneck

Leave a ReplyCancel Reply

Your MQTT broker can be public (how to explain here)

Aviation Group to Starlink: Small Aircraft Owners Can’t Afford New Prices

Destiny 2’s content will not be verified in Bungie’s Marathon

Auto-research is very widespread

Drive 36,500 marketing experiences every year, not 30

Community discussion and ‘spoilage’ of the validation set

The future: curiosity as a bottleneck

Leave a ReplyCancel Reply

Trending now

Your MQTT broker can be public (how to explain here)

Aviation Group to Starlink: Small Aircraft Owners Can’t Afford New Prices

Destiny 2’s content will not be verified in Bungie’s Marathon