Wednesday, 15 Apr 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > Innovations > Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant
Innovations

Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant

Last updated: July 5, 2025 6:21 am
Published July 5, 2025
Share
Researcher develops 'SpeechSSM,' opening up possibilities for a 24-hour AI voice assistant
SHARE
Windowing technique for (a) tokenizing and (b) decoding long-form speech to allow extrapolation in decoding lengths. Credit score: arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Not too long ago, spoken language fashions (SLMs) have been highlighted as next-generation know-how that surpasses the constraints of text-based language fashions by studying human speech with out textual content to know and generate linguistic and non-linguistic data.

Nevertheless, current fashions present important limitations in producing long-duration content material required for podcasts, audiobooks, and voice assistants.

Ph.D. candidate, Sejin Park, from Professor Yong Man Ro’s analysis workforce on the Korea Superior Institute of Science and Expertise’s (KAIST) College of Electrical Engineering, has succeeded in overcoming these limitations by growing “SpeechSSM,” which allows constant and pure speech technology with out time constraints.

The work has been published on the arXiv preprint server and is about to be offered as at ICML (Worldwide Convention on Machine Studying) 2025.

A serious benefit of SLMs is their capacity to straight course of speech with out intermediate textual content conversion, leveraging the distinctive acoustic traits of human audio system, permitting for the speedy technology of high-quality speech even in large-scale fashions.

Nevertheless, current fashions confronted difficulties in sustaining semantic and speaker consistency for long-duration speech resulting from elevated “speech token decision” and reminiscence consumption when capturing very detailed data by breaking down speech into fantastic fragments.

SpeechSSM employs a “hybrid construction” that alternately locations “consideration layers” specializing in current data and “recurrent layers” that keep in mind the general narrative movement (long-term context). This enables the story to movement easily with out dropping coherence even when producing speech for a very long time.

See also  Researchers develop biomimetic olfactory chips to enable advanced gas sensing and odor detection

Moreover, reminiscence utilization and computational load don’t enhance sharply with enter size, enabling secure and environment friendly studying and the technology of long-duration speech.

SpeechSSM successfully processes unbounded speech sequences by dividing speech information into brief, fastened items (home windows), processing every unit independently, after which combining them to create lengthy speech.

Moreover, within the speech technology section, it makes use of a “Non-Autoregressive” audio synthesis mannequin (SoundStorm), which quickly generates a number of elements directly as a substitute of slowly creating one character or one phrase at a time, enabling the quick technology of high-quality speech.

Whereas current fashions usually evaluated brief speech fashions of about 10 seconds, Se Jin Park created new analysis duties for speech technology primarily based on their self-built benchmark dataset, “LibriSpeech-Lengthy,” able to producing as much as 16 minutes of speech.

In comparison with PPL (Perplexity), an current speech mannequin analysis metric that solely signifies grammatical correctness, she proposed new analysis metrics akin to “SC-L (semantic coherence over time)” to evaluate content material coherence over time, and “N-MOS-T (naturalness imply opinion rating over time)” to guage naturalness over time, enabling simpler and exact analysis.

Via these new evaluations, it was confirmed that speech generated by the SpeechSSM spoken language mannequin persistently featured particular people talked about within the preliminary immediate, and new characters and occasions unfolded naturally and contextually persistently, regardless of long-duration technology.

This contrasts sharply with current fashions, which tended to simply lose their subject and exhibit repetition throughout long-duration technology.

Sejin Park defined, “Present spoken language fashions had limitations in long-duration technology, so our purpose was to develop a spoken language mannequin able to producing long-duration speech for precise human use.”

See also  AI-powered headphones offer group translation with voice cloning and 3D spatial audio

She added, “This analysis achievement is anticipated to tremendously contribute to varied varieties of voice content material creation and voice AI fields like voice assistants, by sustaining constant content material in lengthy contexts and responding extra effectively and rapidly in actual time than current strategies.”

This analysis, with Se Jin Park as the primary creator, was performed in collaboration with Google DeepMind.

Extra data:
Se Jin Park et al, Lengthy-Kind Speech Era with Spoken Language Fashions, arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Accompanying demo: SpeechSSM Publications.

Journal data:
arXiv


Supplied by
The Korea Superior Institute of Science and Expertise (KAIST)


Quotation:
Researcher develops ‘SpeechSSM,’ opening up prospects for a 24-hour AI voice assistant (2025, July 4)
retrieved 5 July 2025
from https://techxplore.com/information/2025-07-speechssm-possibilities-hour-ai-voice.html

This doc is topic to copyright. Aside from any truthful dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is offered for data functions solely.



Source link

TAGGED: 24hour, assistant, develops, Opening, possibilities, Researcher, SpeechSSM, voice
Share This Article
Twitter Email Copy Link Print
Previous Article Qedma Qedma Raises $26M in Series A Funding
Next Article Unclassified data could be a silent saboteur to AI ambitions Unclassified data could be a silent saboteur to AI ambitions
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

NTT DATA and Ericsson move to industrialize private 5G and edge AI deployments

NTT DATA and Ericsson are accelerating enterprise personal 5G and edge AI use instances with…

March 13, 2026

Oracle’s cloud strategy an increasingly risky bet

Nevertheless, he identified, “theatre is just not supply. What Oracle served was much less a…

October 30, 2025

AI Driving Significant Global Data Center Growth in 2024

Electronic mail Signal Up For Our Free Weekly Publication A brand new report from CBRE…

July 7, 2024

JUPITER supercomputer propels Europe into the exascale era

Europe has formally entered the worldwide league of high-performance computing with the inauguration of the…

September 6, 2025

Latent AI and Carahsoft team up to boost tactical edge AI for U.S. agencies

Latent AI, a frontrunner in edge AI options, and Carahsoft have partnered to speed up…

March 24, 2025

You Might Also Like

As cyber-attacks become more and more sophisticated, cybersecurity must adopt a zero trust policy- verify, validify and authenticate at every step. Biometrics can help reduce friction in this process.
Innovations

Biometrics in zero trust architecture: Rebuilding security around identity

By saad
Improved connectivity is transforming daily life in rural Europe with cleaner energy whilst supporting local economies and cutting emissions
Innovations

Smart tech is recharging rural Europe

By saad
A Czech startup is making factory automation easier by letting workers teach robots new tasks through simple demonstrations instead of complex coding, as Anthony King explores
Innovations

Czech startup lets factory workers teach robots by demonstration

By saad
How IH-MIE is accelerating hydrogen mobility across Europe
Innovations

How IH-MIE is accelerating hydrogen mobility across Europe

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.