Saturday, 28 Feb 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs
AI

Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs

Last updated: October 19, 2024 10:42 am
Published October 19, 2024
Share
Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs
SHARE

Be part of our day by day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Study Extra


Simply in time for Halloween 2024, Meta has unveiled Meta Spirit LM, the corporate’s first open-source multimodal language mannequin able to seamlessly integrating textual content and speech inputs and outputs.

As such, it competes straight with OpenAI’s GPT-4o (additionally natively multimodal) and different multimodal fashions comparable to Hume’s EVI 2, in addition to devoted text-to-speech and speech-to-text choices comparable to ElevenLabs.

Designed by Meta’s Basic AI Analysis (FAIR) group, Spirit LM goals to deal with the restrictions of current AI voice experiences by providing a extra expressive and natural-sounding speech technology, whereas studying duties throughout modalities like automated speech recognition (ASR), text-to-speech (TTS), and speech classification.

Sadly for entrepreneurs and enterprise leaders, the mannequin is barely at the moment out there for non-commercial utilization underneath Meta’s FAIR Noncommercial Research License, which grants customers the best to make use of, reproduce, modify, and create spinoff works of the Meta Spirit LM fashions, however just for noncommercial functions. Any distribution of those fashions or derivatives should additionally adjust to the noncommercial restriction.

A brand new method to textual content and speech

Conventional AI fashions for voice depend on automated speech recognition to course of spoken enter earlier than synthesizing it with a language mannequin, which is then transformed into speech utilizing text-to-speech methods.

Whereas efficient, this course of usually sacrifices the expressive qualities inherent to human speech, comparable to tone and emotion. Meta Spirit LM introduces a extra superior resolution by incorporating phonetic, pitch, and tone tokens to beat these limitations.

See also  Eaton & CTS Nordics open NordicEPOD factory

Meta has launched two variations of Spirit LM:

• Spirit LM Base: Makes use of phonetic tokens to course of and generate speech.

• Spirit LM Expressive: Contains further tokens for pitch and tone, permitting the mannequin to seize extra nuanced emotional states, comparable to pleasure or unhappiness, and replicate these within the generated speech.

Each fashions are skilled on a mixture of textual content and speech datasets, permitting Spirit LM to carry out cross-modal duties like speech-to-text and text-to-speech, whereas sustaining the pure expressiveness of speech in its outputs.

Open-source noncommercial — solely out there for analysis

Consistent with Meta’s dedication to open science, the corporate has made Spirit LM totally open-source, offering researchers and builders with the mannequin weights, code, and supporting documentation to construct upon.

Meta hopes that the open nature of Spirit LM will encourage the AI analysis group to discover new strategies for integrating speech and textual content in AI programs.

The discharge additionally features a research paper detailing the mannequin’s structure and capabilities.

Mark Zuckerberg, Meta’s CEO, has been a powerful advocate for open-source AI, stating in a latest open letter that AI has the potential to “improve human productiveness, creativity, and high quality of life” whereas accelerating developments in areas like medical analysis and scientific discovery.

Purposes and future potential

Meta Spirit LM is designed to be taught new duties throughout numerous modalities, comparable to:

• Automated Speech Recognition (ASR): Changing spoken language into written textual content.

• Textual content-to-Speech (TTS): Producing spoken language from written textual content.

See also  Inference tool promises higher performance

• Speech Classification: Figuring out and categorizing speech based mostly on its content material or emotional tone.

The Spirit LM Expressive mannequin goes a step additional by incorporating emotional cues into its speech technology.

For example, it could actually detect and replicate emotional states like anger, shock, or pleasure in its output, making the interplay with AI extra human-like and interesting.

This has important implications for purposes like digital assistants, customer support bots, and different interactive AI programs the place extra nuanced and expressive communication is crucial.

A broader effort

Meta Spirit LM is a part of a broader set of analysis instruments and fashions that Meta FAIR is releasing to the general public. This consists of an replace to Meta’s Phase Something Mannequin 2.1 (SAM 2.1) for picture and video segmentation, which has been used throughout disciplines like medical imaging and meteorology, and analysis on enhancing the effectivity of enormous language fashions.

Meta’s overarching purpose is to realize superior machine intelligence (AMI), with an emphasis on creating AI programs which are each highly effective and accessible.

The FAIR group has been sharing its analysis for greater than a decade, aiming to advance AI in a means that advantages not simply the tech group, however society as a complete. Spirit LM is a key element of this effort, supporting open science and reproducibility whereas pushing the boundaries of what AI can obtain in pure language processing.

What’s subsequent for Spirit LM?

With the discharge of Meta Spirit LM, Meta is taking a big step ahead within the integration of speech and textual content in AI programs.

See also  Autonomy in the real world? Druid AI unveils AI agent 'factory'

By providing a extra pure and expressive method to AI-generated speech, and making the mannequin open-source, Meta is enabling the broader analysis group to discover new prospects for multimodal AI purposes.

Whether or not in ASR, TTS, or past, Spirit LM represents a promising advance within the subject of machine studying, with the potential to energy a brand new technology of extra human-like AI interactions.


Source link
TAGGED: combines, inputsoutputs, introduces, Meta, Model, Open, source, Speech, Spirit, text
Share This Article
Twitter Email Copy Link Print
Previous Article 6 key mobile and IoT/OT attack trend findings 6 key mobile and IoT/OT attack trend findings
Next Article Flex Acquires Crown for $325M to Boost Data Center Power Solutions Flex Acquires Crown for $325M to Boost Data Center Power Solutions
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Mangata Networks and Microsoft develop AI-enabled edge cloud connectivity

Mangata Networks, a global company offering satellite-enabled connectivity & intelligent edge computing solutions, has signed…

January 22, 2024

Wi-Fi HaLow: Hands on with AsiaRF’s IoT network gateway

AsiaRF Though this mannequin doesn’t have a weather-proof enclosure, AsiaRF does have an outdoor gateway…

October 8, 2024

In the Spotlight… Future-proofing data centres with EnerSys’ DataSafe XE

Within the newest version of Within the Highlight, we sit down with Michael Sagar, Director…

February 1, 2025

IntusCare Raises $11.5M in Follow-on Financing

IntusCare, a Windfall, RI-based supplier of predictive analytics options for geriatric care, raised $11.5M in…

January 20, 2025

Microsoft opens its Copilot GPT Builder to all Pro subscribers

Be part of leaders in Boston on March 27 for an unique evening of networking,…

March 12, 2024

You Might Also Like

ASML's high-NA EUV tools clear the runway for next-gen AI chips
AI

ASML’s high-NA EUV tools clear the runway for next-gen AI chips

By saad
Poor implementation of AI may be behind workforce reduction
AI

Poor implementation of AI may be behind workforce reduction

By saad
Upgrading agentic AI for finance workflows
AI

Upgrading agentic AI for finance workflows

By saad
Goldman Sachs and Deutsche Bank test agentic AI for trade surveillance
AI

Goldman Sachs and Deutsche Bank test agentic AI in trading

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.