Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now
Liquid AI has launched LFM2-VL, a brand new era of vision-language basis fashions designed for environment friendly deployment throughout a variety of {hardware} — from smartphones and laptops to wearables and embedded methods.
The fashions promise low-latency efficiency, robust accuracy and adaptability for real-world functions.
LFM2-VL builds on the corporate’s current LFM2 architecture launched simply over a month in the past. The corporate says it gives the “quickest on-device basis fashions in the marketplace” because of its method of producing “weights” or mannequin settings on the fly for every enter (generally known as a linear input-varying (LIV) system), extending it into multimodal processing that helps each textual content and picture inputs at variable resolutions.
Based on Liquid AI, the fashions ship as much as twice the GPU inference pace of comparable vision-language fashions, whereas sustaining aggressive efficiency on frequent benchmarks.
AI Scaling Hits Its Limits
Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be a part of our unique salon to find how high groups are:
- Turning vitality right into a strategic benefit
- Architecting environment friendly inference for actual throughput positive factors
- Unlocking aggressive ROI with sustainable AI methods
Safe your spot to remain forward: https://bit.ly/4mwGngO
“Effectivity is our product,” Liquid AI co-founder and CEO Ramin Hasani in a put up on X asserting the brand new mannequin household:
Two variants for various wants
The discharge consists of two mannequin sizes:
- LFM2-VL-450M — A hyper-efficient mannequin with lower than half a billion parameters (inner settings) geared toward extremely resource-constrained environments.
- LFM2-VL-1.6B — A extra succesful mannequin that continues to be light-weight sufficient for single-GPU and device-based deployment.
Each variants course of photos at native resolutions as much as 512X512 pixels, avoiding distortion or pointless upscaling.
For bigger photos, the system applies non-overlapping patching and provides a thumbnail for world context, enabling the mannequin to seize each nice element and the broader scene.
Background on Liquid AI
Liquid AI was based by former researchers from MIT’s Pc Science and Synthetic Intelligence Laboratory (CSAIL) with the objective of constructing AI architectures that transfer past the broadly used transformer mannequin.
The corporate’s flagship innovation, the Liquid Basis Fashions (LFMs), are primarily based on rules from dynamical methods, sign processing and numerical linear algebra, producing general-purpose AI fashions able to dealing with textual content, video, audio, time collection and different sequential information.
In contrast to conventional architectures, Liquid’s method goals to ship aggressive or superior efficiency utilizing considerably fewer computational sources, permitting for real-time adaptability throughout inference whereas sustaining low reminiscence necessities. This makes LFMs well-suited for each large-scale enterprise use instances and resource-limited edge deployments.
In July, the corporate expanded its platform technique with the launch of the Liquid Edge AI Platform (LEAP), a cross-platform SDK designed to make it simpler for builders to run small language fashions straight on cell and embedded units.
LEAP gives OS-agnostic assist for iOS and Android, integration with each Liquid’s fashions and different open-source SLMs, and a built-in library with fashions as small as 300MB — sufficiently small for contemporary telephones with minimal RAM.
Its companion app, Apollo, allows builders to check fashions fully offline, aligning with Liquid AI’s emphasis on privacy-preserving, low-latency AI. Collectively, LEAP and Apollo mirror the corporate’s dedication to decentralizing AI execution, lowering reliance on cloud infrastructure and empowering builders to construct optimized, task-specific fashions for real-world environments.
Velocity/high quality trade-offs and technical design
LFM2-VL makes use of a modular structure combining a language mannequin spine, a SigLIP2 NaFlex imaginative and prescient encoder and a multimodal projector.
The projector features a two-layer MLP connector with pixel unshuffle, lowering the variety of picture tokens and enhancing throughput.
Customers can alter parameters comparable to the utmost variety of picture tokens or patches, permitting them to steadiness pace and high quality relying on the deployment situation. The coaching course of concerned roughly 100 billion multimodal tokens, sourced from open datasets and in-house artificial information.
Efficiency and benchmarks
The fashions obtain aggressive benchmark outcomes throughout a spread of vision-language evaluations. LFM2-VL-1.6B scores properly in RealWorldQA (65.23), InfoVQA (58.68), and OCRBench (742), and maintains strong leads to multimodal reasoning duties.

In inference testing, LFM2-VL achieved the quickest GPU processing instances in its class when examined on a regular workload of a 1024X1024 picture and brief immediate.

Licensing and availability
LFM2-VL fashions can be found now on Hugging Face, together with instance fine-tuning code in Colab. They’re suitable with Hugging Face transformers and TRL.
The fashions are launched below a customized “LFM1.0 license.” Liquid AI has described this license as primarily based on Apache 2.0 rules, however the full textual content has not but been revealed.
The corporate has indicated that business use can be permitted below sure situations, with completely different phrases for firms above and beneath $10 million in annual income.
With LFM2-VL, Liquid AI goals to make high-performance multimodal AI extra accessible for on-device and resource-limited deployments, with out sacrificing functionality.
Source link
