Sunday, 8 Feb 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > Innovations > Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant
Innovations

Researcher develops ‘SpeechSSM,’ opening up possibilities for a 24-hour AI voice assistant

Last updated: July 5, 2025 6:21 am
Published July 5, 2025
Share
Researcher develops 'SpeechSSM,' opening up possibilities for a 24-hour AI voice assistant
SHARE
Windowing technique for (a) tokenizing and (b) decoding long-form speech to allow extrapolation in decoding lengths. Credit score: arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Not too long ago, spoken language fashions (SLMs) have been highlighted as next-generation know-how that surpasses the constraints of text-based language fashions by studying human speech with out textual content to know and generate linguistic and non-linguistic data.

Nevertheless, current fashions present important limitations in producing long-duration content material required for podcasts, audiobooks, and voice assistants.

Ph.D. candidate, Sejin Park, from Professor Yong Man Ro’s analysis workforce on the Korea Superior Institute of Science and Expertise’s (KAIST) College of Electrical Engineering, has succeeded in overcoming these limitations by growing “SpeechSSM,” which allows constant and pure speech technology with out time constraints.

The work has been published on the arXiv preprint server and is about to be offered as at ICML (Worldwide Convention on Machine Studying) 2025.

A serious benefit of SLMs is their capacity to straight course of speech with out intermediate textual content conversion, leveraging the distinctive acoustic traits of human audio system, permitting for the speedy technology of high-quality speech even in large-scale fashions.

Nevertheless, current fashions confronted difficulties in sustaining semantic and speaker consistency for long-duration speech resulting from elevated “speech token decision” and reminiscence consumption when capturing very detailed data by breaking down speech into fantastic fragments.

SpeechSSM employs a “hybrid construction” that alternately locations “consideration layers” specializing in current data and “recurrent layers” that keep in mind the general narrative movement (long-term context). This enables the story to movement easily with out dropping coherence even when producing speech for a very long time.

See also  Synthesia launches LLM-powered assistant to turn any text file or link into AI video

Moreover, reminiscence utilization and computational load don’t enhance sharply with enter size, enabling secure and environment friendly studying and the technology of long-duration speech.

SpeechSSM successfully processes unbounded speech sequences by dividing speech information into brief, fastened items (home windows), processing every unit independently, after which combining them to create lengthy speech.

Moreover, within the speech technology section, it makes use of a “Non-Autoregressive” audio synthesis mannequin (SoundStorm), which quickly generates a number of elements directly as a substitute of slowly creating one character or one phrase at a time, enabling the quick technology of high-quality speech.

Whereas current fashions usually evaluated brief speech fashions of about 10 seconds, Se Jin Park created new analysis duties for speech technology primarily based on their self-built benchmark dataset, “LibriSpeech-Lengthy,” able to producing as much as 16 minutes of speech.

In comparison with PPL (Perplexity), an current speech mannequin analysis metric that solely signifies grammatical correctness, she proposed new analysis metrics akin to “SC-L (semantic coherence over time)” to evaluate content material coherence over time, and “N-MOS-T (naturalness imply opinion rating over time)” to guage naturalness over time, enabling simpler and exact analysis.

Via these new evaluations, it was confirmed that speech generated by the SpeechSSM spoken language mannequin persistently featured particular people talked about within the preliminary immediate, and new characters and occasions unfolded naturally and contextually persistently, regardless of long-duration technology.

This contrasts sharply with current fashions, which tended to simply lose their subject and exhibit repetition throughout long-duration technology.

Sejin Park defined, “Present spoken language fashions had limitations in long-duration technology, so our purpose was to develop a spoken language mannequin able to producing long-duration speech for precise human use.”

See also  HTC boss welcomes Apple VR competition

She added, “This analysis achievement is anticipated to tremendously contribute to varied varieties of voice content material creation and voice AI fields like voice assistants, by sustaining constant content material in lengthy contexts and responding extra effectively and rapidly in actual time than current strategies.”

This analysis, with Se Jin Park as the primary creator, was performed in collaboration with Google DeepMind.

Extra data:
Se Jin Park et al, Lengthy-Kind Speech Era with Spoken Language Fashions, arXiv (2024). DOI: 10.48550/arxiv.2412.18603

Accompanying demo: SpeechSSM Publications.

Journal data:
arXiv


Supplied by
The Korea Superior Institute of Science and Expertise (KAIST)


Quotation:
Researcher develops ‘SpeechSSM,’ opening up prospects for a 24-hour AI voice assistant (2025, July 4)
retrieved 5 July 2025
from https://techxplore.com/information/2025-07-speechssm-possibilities-hour-ai-voice.html

This doc is topic to copyright. Aside from any truthful dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is offered for data functions solely.



Source link

TAGGED: 24hour, assistant, develops, Opening, possibilities, Researcher, SpeechSSM, voice
Share This Article
Twitter Email Copy Link Print
Previous Article Qedma Qedma Raises $26M in Series A Funding
Next Article Unclassified data could be a silent saboteur to AI ambitions Unclassified data could be a silent saboteur to AI ambitions
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

How The Ottawa Hospital uses AI ambient voice capture to reduce physician burnout by 70%, achieve 97% patient satisfaction

Be part of our every day and weekly newsletters for the newest updates and unique…

May 11, 2025

IT Certification Pay Surges Amid Cloud, AI, and Security Demands

IT certification pay developments stayed robust in late 2024, with database/information administration, networking and communications,…

March 17, 2025

Digital Asset Raises $135M in Funding

Digital Asset, the NYC-based blockchain know-how firm behind the Canton Community, raised $135M in funding.…

June 24, 2025

Tough Commerce Acquires AgileTQ

Powerful Commerce Acquires AgileTQ (CNW Group/Powerful Commerce) Tough Commerce, a Toronto, Canada-based firm which makes…

August 8, 2025

Insilico Medicine Raises $110M in Series E Financing

Insilico Medicine, a Cambridge, MA-based clinical-stage generative synthetic intelligence (AI)-driven drug discovery firm, raised $110M…

March 17, 2025

You Might Also Like

How JHC is integrating HPC, AI, and quantum
Innovations

How JSC is integrating HPC, AI, and quantum

By saad
printed electronics
Innovations

How Tampere Uni’s printed electronics forge a sustainable future

By saad
DiDAX: Innovating DNA-based data applications
Innovations

DiDAX: Innovating DNA-based data applications

By saad
Where energy challenges meet AI solutions
Innovations

Where energy challenges meet AI solutions

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.