Nvidia’s new open weights Nemotron 3 combines three different architectures to transmit super, gpt-oss and Qwen.



Multi-agent systems designed to handle long-term tasks such as software engineering or cybersecurity testing can generate up to 15 times the signal volume of standard conversations, compromising their efficiency in managing enterprise tasks.

But today, Nvidia tried to help solve this problem Release of Nemotron 3 Super120 billion parameter hybrid model with weights placed on it Hugging Face.

Combining different architectural philosophies – state-space models, transformers and a novel "Hidden" Expert blend design – Nvidia strives to provide the specialized depth required for agent workflows without the bloat inherent in densely thought models, all available for commercial use under mostly open weights.

Triple hybrid architecture

At the heart of Nemotron 3 Super is a complex architectural triad that balances memory efficiency with precision thinking. The model uses a Hybrid Mamba-Transformer backboneCombines Mamba-2 layers with strategic Transformer focus layers.

Review to understand implications for enterprise manufacturing "a needle in a haystack" problem. Mamba-2 layers function as a "fast travel" backbone system that handles the vast majority of sequential processing with linear time complexity. This allows the model to store a massive 1 million token context window without exploding the KV cache. However, pure state-space models often struggle with associative recall.

To fix this, Nvidia strategically includes Transformer focus layers "global anchors," ensuring that the model can accurately retrieve specific facts hidden deep within a codebase or stack of financial statements.

On the edge of the spine, the model presents Latent MoE (LatentMoE). A traditional Expert Mix (TN) maps routes to experts in a fully implicit dimension, which creates a computational bottleneck such as model scale. LatentMoE solves this by projecting tokens into a compressed space before forwarding them to specialists.

This "expert compression" allowing the model to consult four times as many experts for the same computational cost. This granularity is critical for agents who must switch between Python syntax, SQL logic, and conversational reasoning during a turn.

Multi-Token Prediction (MTP), which further accelerates the model. While standard models predict one next token, MTP predicts several future tokens simultaneously. It serves as "internal draft model," enables native speculative decoding that can accelerate up to 3x wall clock for structured generation tasks such as code or tool calls.

Blackwell advantage

The most significant technical breakthrough in Nemotron 3 Super for enterprises is its optimization for the Nvidia Blackwell GPU platform. By pretraining natively in NVFP4 (4-bit floating point), Nvidia achieved a breakthrough in production efficiency.

The model in Blackwell produces results 4 times faster than 8-bit models running on the previous Hopper architecture, and without loss of accuracy.

In practical performance, Nemotron 3 Super agent is a special tool for grounding.

It currently ranks #1 on the DeepResearch Bench, a benchmark that measures the ability of artificial intelligence to perform comprehensive, multi-stage research on large document sets.

Benchmark

Nemotron 3 Super

Qwen3.5-122B-A10B

GPT-OSS-120B

General Knowledge

MMLU-Pro

83.73

86.70

81.00

Justification

AIME25 (no tools)

90.21

90.36

92.50

HMMT Feb 25 (no tool)

93.67

91.40

90.00

HMMT February 25 (with tools)

94.73

89.55

GPQA (no tools)

79.23

86.60

80.10

GPQA (with tools)

82.70

80.09

LiveCodeBench (v5 2024-07↔2024-12)

81.19

78.93

88.00

SciCode (subtask)

42.05

42.00

39.00

HLE (no tool)

18.26

25.30

14.90

HLE (with tools)

22.82

19.0

Agent

Terminal Bench (Hard Subset)

25.78

26.80

24.00

Terminal Bench Core 2.0

31.00

37.50

18.70

SWE-Bench (OpenHands)

60.47

66.40

41.9

SWE-Bench (OpenCode)

59.20

67.40

SWE-Bench (Codex)

53.73

61.20

SWE-Bench Multilingual (OpenHands)

45.78

30.80

TauBench V2

Airline

56.25

66.0

49.2

Retail sales

62.83

62.6

67.80

Telecom

64.36

95.00

66.00

Medium

61.15

74.53

61.0

BrowseComp with Search

31.28

33.89

BIRD BENCH

41.80

38.25

Conversation and Instruction Follows

IFBench (request)

72.56

73.77

68.32

Scale AI Multi-Challenge

55.23

61.50

58.29

Arena-Hard-V2

73.88

75.15

90.26

Long Context

AA-LCR

58.31

66.90

51.00

OUR RIGHT @ 256k

96.30

96.74

52.30

OUR RIGHT @ 512k

95.67

95.95

46.70

AUTHORITY @ 1M

91.75

91.33

22.30

Multilingual

MMLU-ProX (longitudinal mean)

79.36

85.06

76.59

WMT24++ (en→xx)

86.67

87.84

88.89

It also exhibits significant throughput advantages, achieving 2.2 times higher throughput than gpt-oss-120B and 7.5 times higher throughput than Qwen3.5-122B at high volume settings.

Special “open” license – commercial use, but with important caveats

under Nemotron 3 Super release Nvidia Open Model License Agreement (updated October 2025) provides an enabling framework for enterprise adoption, albeit with distinct characteristics. "protect" Differentiating clauses from pure open source licenses such as MIT or Apache 2.0.

Key provisions for enterprise users:

  • Commercial use: The license expressly states that the models "can be used commercially" and grants a perpetual, worldwide, royalty-free license to sell and distribute products based on the model.

  • Property of Result: Nvidia makes no claims to the outputs generated by the model; responsibility and ownership of those results rest entirely with the user.

  • Derivative Works: Free to create and own businesses "Derived Models" (fine-tuned versions), provided you include the required attribute notice: "Licensed by Nvidia Corporation under the Nvidia Open Model License."

The "Red lines":

The license contains two critical termination triggers that production teams must monitor:

  1. Safety guards: If the user bypasses or circumvents the model, the license is automatically suspended "Railings" (technical limitations or security hyperparameters) without implementing a "substantially similar" replacement according to the condition of use.

  2. Case Trigger: If a user files a copyright or patent claim against Nvidia alleging that the model infringes their IP, their license to use the model is immediately terminated.

This structure allows Nvidia to develop its commercial ecosystem while protecting itself "IP trolling" and to ensure that the model is not deprived of safety features for malicious use.

“The team really cooked”

The release generated significant buzz in the developer community. Chris Alexiuk, Principal Product Research Engineer at Nvidia, heralded its launch on X. @llm_wizard as "SUPER DAY," emphasizes the speed and transparency of the model. "Model: FAST. Model: SMART. Model: THE MOST OPEN MODEL WE’VE EVER MADE," Chris posted highlighting the release of 10 trillion training data and recipes, not just weights.

Industry adoption reflects this desire:

  • Cloud and Hardware: The model is applied as one Nvidia NIM microserviceallows you to work locally through Dell AI Factory or HPEalso on Google Cloud, Oracle, and shortly AWS and Azure.

  • Production agents: companies like CodeRabbit (software development) and Greptil While industry leaders like it, they integrate the model to manage large-scale code base analysis Siemens and Palantir deploys it to automate complex workflows in manufacturing and cybersecurity.

As Nvidia Vice President of AI Software Kari Brisky points out: "As companies move beyond chatbots to multi-agent applications, they’re facing an explosion of context."

The Nemotron 3 Super is Nvidia’s answer to this explosion – a model that delivers "brain power" 120B parameter system with the operational efficiency of a smaller specialist. The message for the enterprise is clear: the "thought tax" finally comes down.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *