
Liquid AI, founded by former MIT computer scientists, today released its smallest AI language model, LFM2.5-230Mand enterprises would do well to consider it for use in data mining and local deployment on smartphones, laptops and robotics.
It’s a 230 million-parameter foundation model expressly designed for on-device agent workflows, and as noted in a Liquid publication blog post , this small size makes it almost workable. "everywhere." According to Liquid, it also outperforms models more than 4X its size on selected benchmarks, notably outperforming Alibaba Qwen3.5-0.8B (Guide) with 800 million parameters in data extraction and Google Gemma 3 1B with 1 billion parameters.
The model targets developers and engineers building lightweight data mining pipelines and autonomous edge systems.
Operating under a commercial license for dual use, the model remains free for individuals and companies with less than $10 million in annual revenue, while larger corporations require a paid enterprise agreement.
This release differentiates itself from other small AI models by using the LFM2 architecture to achieve high inference speed without the massive memory overhead of parameter-heavy transformers.
While major AI companies Anthropic, OpenAI, Google, Microsoft, Meta, and others have scaled the number of parameters to hundreds of billions or trillions to achieve frontier performance, the parallel race focuses entirely on off-premise and on-premise applications.
Liquid AI’s launch of the LFM2.5-230M signals a major shift towards architectural efficiency over brute force scaling. By compressing 19 trillion pre-training tokens into 230 million parameter fields, the company demonstrates that edge devices do not need massive computing power or persistent cloud connections to execute complex, multi-step agent workflows.
How the LFM2.5-230M works
The LFM2.5-230M model is based on the LFM2 framework, unlike standard transformer architectures. This architecture acts as a hybrid system, combining closed short-range coils with clustered query focus to efficiently process information.
For those following the evolution of efficient architectures, Liquid’s approach shares a similar conceptual goal: to efficiently manage long contexts and sequential data on peripheral hardware without the quadratic memory overhead of pure attention mechanisms. The model supports a large 32K context window, allowing it to accept large files or continuous streams of robot telemetry.
When analyzing the performance graphs presented in the release, the architectural efficiency becomes visually clear. The model maintains a memory footprint of under 400MB, achieving preload and decoding speeds that surpass comparable models such as the Gemma 3 1B IT and Granite 4.0-H-350M.
In the Samsung Galaxy S25 Ultra equipped with a Qualcomm Snapdragon Gen4 processor, the model reaches a decoding speed of 213 tokens per second. Even on the highly limited Raspberry Pi 5, the model maintains an encoding speed of 42 tokens per second. Additionally, internal benchmarking shows that the GPU extract stack delivers lower latency than competing smaller models at all levels of parallelism.
Why it matters to businesses
To understand why a model with 230 million parameters is needed, you need to look at how businesses currently manage data.
Organizations have traditionally relied on rigid, rules-based Extract, Transform, Load (ETL) scripts to move and process data. However, these legacy systems are very fragile; a simple change in document layout or schema update can disrupt the entire pipeline.
To address this, the industry is changing direction "AI ETL," where machine learning maps, detects schema drift, and automatically adapts to changes. In a modern lightweight data mining pipeline, an AI model connects to unstructured sources like PDF, email, or web forms and structures the data into formats like JSON without requiring hard-coded rules.
It is not economically viable for businesses to use a large flagship model such as Claude Opus 4.6 (which costs $5.00 per million input tokens) to analyze daily invoices, format addresses, or route telemetry data.
This is where models like the LFM2.5-230M are critical. Designed explicitly as a lightweight extraction engine, it allows companies to automate repetitive formatting and data analysis at a fraction of the computational cost and latency, running directly on on-premises hardware instead of relying on expensive, persistent cloud API calls.
Small Model Tests: LFM and Class 3B
By mid-2026, the AI industry is seeing a renaissance "small" models, but definition "small" it varies wildly.
Recently, the bantamweight community has been stunned Weibo’s VibeThinker-3B, a model with 3 billion parameters Built on a Qwen2-style backbone that scored a whopping 94.3 on the AIME 2026 math benchmark, it competes with 600 billion-parameter behemoths through aggressive data curation and reinforcement learning.
Similarly, Google’s Gemma 4 family—recently surpassing 200 million downloads—is pushing the boundaries of AI, including E2B (2 billion parameters) designed specifically for mobile and IoT deployments.
In contrast, Liquid AI’s LFM2.5-230M operates in a completely different weight range. With just 230 million parameters, it’s about a tenth the size of Google’s smallest Gemma 4 and the VibeThinker-3B.
Because of its microscopic footprint, the LFM2.5-230M isn’t designed to compete in judgment-heavy workloads like advanced math, coding, or creative writing—a limitation that Liquid AI clearly acknowledges.
However, in its areas of data extraction and tool calling, the model punches well above its weight class.
Benchmarks released by Liquid AI show that the LFM2.5-230M scored 43.26 on the BFCLv3 tool usage benchmark, dominating IBM’s Granite 4.0-350M (39.58) and outpacing larger 1 billion-parameter models (such as Google’s Gemma.611).
In CaseReportBench for data mining, it scores 22.51, undercutting Qwen3.5-0.8B (Guide).
The LFM2.5-230M proves that while 3-billion-parameter models like VibeThinker handle advanced computations, the 230-million-parameter model is the superior, highly optimized choice for handling structured tool calls and keeping agent pipelines running efficiently on limited hardware.
Using advanced research
Because it excels at tool calling, the LFM2.5-230M functions primarily as a skill selection layer. Maie demonstrated this capability by embedding the AI model in the Unitree G1 humanoid robot.
The fully on-device model successfully processes complex environmental commands through the robot’s NVIDIA Jetson Orin computing module.
As noted on the company’s technical blog, the model receives free-form instructions: *"Hold still for 2 seconds, then walk forward at a speed of 1 meter per second for 3 meters, hold one knee forward for 5 seconds and walk backward 3 meters at a speed of 0.5 meters per second,"* and automatically turns it into a structured multi-step plan that requires pre-engineered low-level skills provided by NVIDIA’s SONIC framework.
Base and trained models are immediately available on Hugging Face with day-one support in the extraction ecosystem for llama.cpp (GGUF), MLX, vLLM, SGLang and ONNX.
Dual-use, dedicated LFM Open License
Liquid AI ships the LFM2.5-230M under the LFM Open License v1.0. Despite the word "open" in the title, this is not an Open Source Initiative (OSI) compliant license; it operates as a limited, dual-use commercial framework.
For independent developers, researchers, and early-stage startups, the license works the same way as open source software.
Users receive a perpetual, worldwide, royalty-free license to reproduce, modify, and distribute the model, provided they retain the original copyright notices and prominently display any changes.
However, the license includes something serious "Commercial Restriction". Any entity with annual revenue of $10 million or more loses the right to use the model commercially under this agreement.
Large enterprises that exceed this funding threshold must enter into a separate, paid commercial contract with Liquid AI to deploy the model in production.
This strategy protects the company from free appropriation of its intellectual property by major tech conglomerates, while still applying the model at the core developer level.





