Thursday, 19 Feb 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > Innovations > Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant
Innovations

Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant

Last updated: July 5, 2025 6:21 am
Published July 5, 2025
Share
Researcher develops 'SpeechSSM,' opening up possibilities for a 24-hour AI voice assistant
SHARE
Windowing technique for (a) tokenizing and (b) decoding long-form speech to allow extrapolation in decoding lengths. Credit score: arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Not too long ago, spoken language fashions (SLMs) have been highlighted as next-generation know-how that surpasses the constraints of text-based language fashions by studying human speech with out textual content to know and generate linguistic and non-linguistic data.

Nevertheless, current fashions present important limitations in producing long-duration content material required for podcasts, audiobooks, and voice assistants.

Ph.D. candidate, Sejin Park, from Professor Yong Man Ro’s analysis workforce on the Korea Superior Institute of Science and Expertise’s (KAIST) College of Electrical Engineering, has succeeded in overcoming these limitations by growing “SpeechSSM,” which allows constant and pure speech technology with out time constraints.

The work has been published on the arXiv preprint server and is about to be offered as at ICML (Worldwide Convention on Machine Studying) 2025.

A serious benefit of SLMs is their capacity to straight course of speech with out intermediate textual content conversion, leveraging the distinctive acoustic traits of human audio system, permitting for the speedy technology of high-quality speech even in large-scale fashions.

Nevertheless, current fashions confronted difficulties in sustaining semantic and speaker consistency for long-duration speech resulting from elevated “speech token decision” and reminiscence consumption when capturing very detailed data by breaking down speech into fantastic fragments.

SpeechSSM employs a “hybrid construction” that alternately locations “consideration layers” specializing in current data and “recurrent layers” that keep in mind the general narrative movement (long-term context). This enables the story to movement easily with out dropping coherence even when producing speech for a very long time.

See also  Mistral AI gives Le Chat voice recognition and deep research tools

Moreover, reminiscence utilization and computational load don’t enhance sharply with enter size, enabling secure and environment friendly studying and the technology of long-duration speech.

SpeechSSM successfully processes unbounded speech sequences by dividing speech information into brief, fastened items (home windows), processing every unit independently, after which combining them to create lengthy speech.

Moreover, within the speech technology section, it makes use of a “Non-Autoregressive” audio synthesis mannequin (SoundStorm), which quickly generates a number of elements directly as a substitute of slowly creating one character or one phrase at a time, enabling the quick technology of high-quality speech.

Whereas current fashions usually evaluated brief speech fashions of about 10 seconds, Se Jin Park created new analysis duties for speech technology primarily based on their self-built benchmark dataset, “LibriSpeech-Lengthy,” able to producing as much as 16 minutes of speech.

In comparison with PPL (Perplexity), an current speech mannequin analysis metric that solely signifies grammatical correctness, she proposed new analysis metrics akin to “SC-L (semantic coherence over time)” to evaluate content material coherence over time, and “N-MOS-T (naturalness imply opinion rating over time)” to guage naturalness over time, enabling simpler and exact analysis.

Via these new evaluations, it was confirmed that speech generated by the SpeechSSM spoken language mannequin persistently featured particular people talked about within the preliminary immediate, and new characters and occasions unfolded naturally and contextually persistently, regardless of long-duration technology.

This contrasts sharply with current fashions, which tended to simply lose their subject and exhibit repetition throughout long-duration technology.

Sejin Park defined, “Present spoken language fashions had limitations in long-duration technology, so our purpose was to develop a spoken language mannequin able to producing long-duration speech for precise human use.”

See also  Nirmata Unveils AI Assistant to Automate Kubernetes Security

She added, “This analysis achievement is anticipated to tremendously contribute to varied varieties of voice content material creation and voice AI fields like voice assistants, by sustaining constant content material in lengthy contexts and responding extra effectively and rapidly in actual time than current strategies.”

This analysis, with Se Jin Park as the primary creator, was performed in collaboration with Google DeepMind.

Extra data:
Se Jin Park et al, Lengthy-Kind Speech Era with Spoken Language Fashions, arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Accompanying demo: SpeechSSM Publications.

Journal data:
arXiv


Supplied by
The Korea Superior Institute of Science and Expertise (KAIST)


Quotation:
Researcher develops ‘SpeechSSM,’ opening up prospects for a 24-hour AI voice assistant (2025, July 4)
retrieved 5 July 2025
from https://techxplore.com/information/2025-07-speechssm-possibilities-hour-ai-voice.html

This doc is topic to copyright. Aside from any truthful dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is offered for data functions solely.



Source link

TAGGED: 24hour, assistant, develops, Opening, possibilities, Researcher, SpeechSSM, voice
Share This Article
Twitter Email Copy Link Print
Previous Article Qedma Qedma Raises $26M in Series A Funding
Next Article Unclassified data could be a silent saboteur to AI ambitions Unclassified data could be a silent saboteur to AI ambitions
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Marble enters the race to bring AI to tax work, armed with $9 million and a free research tool

Marble, a startup constructing synthetic intelligence brokers for tax professionals, has raised $9 million in…

December 11, 2025

Potential Effects of Trump Policy Changes on Network Managers

Within the weeks since President Donald Trump was sworn in for his second time period,…

March 8, 2025

Shield AI Closes $240M in Series F-1 Funding, at $5.3B valuation

Shield AI, a Washington, DC-based firm constructing autonomy software program merchandise and protection plane, raised…

March 9, 2025

Google to invest €3B in data centers

Google will make an extra funding of €3 billion (US$3.3 billion) over the subsequent two…

April 8, 2024

Pioneering ethical AI implementation across the EU

The European Fee’s AI Workplace performs an integral function in implementing the AI Act, selling…

December 3, 2024

You Might Also Like

Biometric passwordless login and EU digital wallet security platform
Innovations

Biometric passwordless login and EU digital wallet security platform

By saad
Why Europe’s digital future depends on intelligent networks
Innovations

Why Europe’s digital future depends on intelligent networks

By saad
EPI's path to innovative high-performance computing
Innovations

EPI’s path to innovative high-performance computing

By saad
VIRTUS develops Marienpark data centre campus in Berlin
Power & Cooling

VIRTUS develops Marienpark data centre campus in Berlin

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.