Thursday, 26 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > Innovations > Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant
Innovations

Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant

Last updated: July 5, 2025 6:21 am
Published July 5, 2025
Share
Researcher develops 'SpeechSSM,' opening up possibilities for a 24-hour AI voice assistant
SHARE
Windowing technique for (a) tokenizing and (b) decoding long-form speech to allow extrapolation in decoding lengths. Credit score: arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Not too long ago, spoken language fashions (SLMs) have been highlighted as next-generation know-how that surpasses the constraints of text-based language fashions by studying human speech with out textual content to know and generate linguistic and non-linguistic data.

Nevertheless, current fashions present important limitations in producing long-duration content material required for podcasts, audiobooks, and voice assistants.

Ph.D. candidate, Sejin Park, from Professor Yong Man Ro’s analysis workforce on the Korea Superior Institute of Science and Expertise’s (KAIST) College of Electrical Engineering, has succeeded in overcoming these limitations by growing “SpeechSSM,” which allows constant and pure speech technology with out time constraints.

The work has been published on the arXiv preprint server and is about to be offered as at ICML (Worldwide Convention on Machine Studying) 2025.

A serious benefit of SLMs is their capacity to straight course of speech with out intermediate textual content conversion, leveraging the distinctive acoustic traits of human audio system, permitting for the speedy technology of high-quality speech even in large-scale fashions.

Nevertheless, current fashions confronted difficulties in sustaining semantic and speaker consistency for long-duration speech resulting from elevated “speech token decision” and reminiscence consumption when capturing very detailed data by breaking down speech into fantastic fragments.

SpeechSSM employs a “hybrid construction” that alternately locations “consideration layers” specializing in current data and “recurrent layers” that keep in mind the general narrative movement (long-term context). This enables the story to movement easily with out dropping coherence even when producing speech for a very long time.

See also  ‘Boundless Possibilities’ as AI Transforms Data Center Infrastructure

Moreover, reminiscence utilization and computational load don’t enhance sharply with enter size, enabling secure and environment friendly studying and the technology of long-duration speech.

SpeechSSM successfully processes unbounded speech sequences by dividing speech information into brief, fastened items (home windows), processing every unit independently, after which combining them to create lengthy speech.

Moreover, within the speech technology section, it makes use of a “Non-Autoregressive” audio synthesis mannequin (SoundStorm), which quickly generates a number of elements directly as a substitute of slowly creating one character or one phrase at a time, enabling the quick technology of high-quality speech.

Whereas current fashions usually evaluated brief speech fashions of about 10 seconds, Se Jin Park created new analysis duties for speech technology primarily based on their self-built benchmark dataset, “LibriSpeech-Lengthy,” able to producing as much as 16 minutes of speech.

In comparison with PPL (Perplexity), an current speech mannequin analysis metric that solely signifies grammatical correctness, she proposed new analysis metrics akin to “SC-L (semantic coherence over time)” to evaluate content material coherence over time, and “N-MOS-T (naturalness imply opinion rating over time)” to guage naturalness over time, enabling simpler and exact analysis.

Via these new evaluations, it was confirmed that speech generated by the SpeechSSM spoken language mannequin persistently featured particular people talked about within the preliminary immediate, and new characters and occasions unfolded naturally and contextually persistently, regardless of long-duration technology.

This contrasts sharply with current fashions, which tended to simply lose their subject and exhibit repetition throughout long-duration technology.

Sejin Park defined, “Present spoken language fashions had limitations in long-duration technology, so our purpose was to develop a spoken language mannequin able to producing long-duration speech for precise human use.”

See also  People have no difficulty getting to grips with an extra thumb, study finds

She added, “This analysis achievement is anticipated to tremendously contribute to varied varieties of voice content material creation and voice AI fields like voice assistants, by sustaining constant content material in lengthy contexts and responding extra effectively and rapidly in actual time than current strategies.”

This analysis, with Se Jin Park as the primary creator, was performed in collaboration with Google DeepMind.

Extra data:
Se Jin Park et al, Lengthy-Kind Speech Era with Spoken Language Fashions, arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Accompanying demo: SpeechSSM Publications.

Journal data:
arXiv


Supplied by
The Korea Superior Institute of Science and Expertise (KAIST)


Quotation:
Researcher develops ‘SpeechSSM,’ opening up prospects for a 24-hour AI voice assistant (2025, July 4)
retrieved 5 July 2025
from https://techxplore.com/information/2025-07-speechssm-possibilities-hour-ai-voice.html

This doc is topic to copyright. Aside from any truthful dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is offered for data functions solely.



Source link

TAGGED: 24hour, assistant, develops, Opening, possibilities, Researcher, SpeechSSM, voice
Share This Article
Twitter Email Copy Link Print
Previous Article Qedma Qedma Raises $26M in Series A Funding
Next Article Unclassified data could be a silent saboteur to AI ambitions Unclassified data could be a silent saboteur to AI ambitions
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Pacific Avenue Capital Partners Closes More Than $1.65 Billion in Committed Capital

Pacific Avenue Capital Partners, a Los Angeles, CA-based international non-public fairness agency centered on company…

August 18, 2025

Synopsys and SiMa.ai pair to advance power-efficient automotive edge AI

Synopsys and SiMa.ai have introduced a strategic collaboration to reinforce automotive edge AI options. The…

December 20, 2024

Redpanda Raises $100M in Series D; Valued at $1 Billion

Redpanda, a San Francisco, CA-based information platform to combine and course of real-time information for…

April 3, 2025

Stratus boosts edge reliability with Windows server on ztC Endurance

Edge computing firm Stratus Applied sciences has introduced help for Microsoft Home windows Server 2022…

November 18, 2024

OVHcloud inaugurates quantum computer and offers educational support for the European quantum ecosytem

Europen cloud supplier OVHcloud recentlybrought collectively the European Quantum ecosystem in Croix France to inaugurate…

March 26, 2024

You Might Also Like

Fuelling defence goals with compound semiconductors from South Wales
Innovations

Fuelling defence goals with compound semiconductors

By saad
X-ray breakthrough enables real-time monitoring of electronic chips
Innovations

X-ray breakthrough enables real-time monitoring of electronic chips

By saad
AI could accurately deliver flood warnings in data-scarce regions
Innovations

AI could accurately deliver flood warnings in data-scarce regions

By saad
ARCHER2 supercomputer
Innovations

ARCHER2 supercomputer generates £4.2bn for UK economy

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.