Friday, 1 May 2026
Subscribe
logo
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Font ResizerAa
Data Center NewsData Center News
Search
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI & Compute > Voice AI that actually converts: New TTS model boosts sales 15% for major brands
AI & Compute

Voice AI that actually converts: New TTS model boosts sales 15% for major brands

Last updated: June 7, 2025 6:05 am
Published June 7, 2025
Share
Voice AI that actually converts: New TTS model boosts sales 15% for major brands
SHARE

Be part of the occasion trusted by enterprise leaders for practically 20 years. VB Remodel brings collectively the folks constructing actual enterprise AI technique. Learn more


Producing voices that aren’t solely humanlike and nuanced however various continues to be a battle in conversational AI. 

On the finish of the day, folks wish to hear voices that sound like them or are a minimum of pure, not simply the Twentieth-century American broadcast normal. 

Startup Rime is tackling this problem with Arcana text-to-speech (TTS), a brand new spoken language mannequin that may rapidly generate “infinite” new voices of various genders, ages, demographics and languages simply primarily based on a easy textual content description of supposed traits. 

The mannequin has helped enhance buyer gross sales — for the likes of Domino’s and Wingstop — by 15%. 

“It’s one factor to have a extremely high-quality, life-like, actual person-sounding mannequin,” Lily Clifford, Rime CEO and co-founder, informed VentureBeat. “It’s one other to have a mannequin that may not simply create one voice, however infinite variability of voices alongside demographic strains.”

A voice mannequin that ‘acts human’ 

Rime’s multimodal and autoregressive TTS mannequin was educated on pure conversations with actual folks (versus voice actors). Customers merely kind in a textual content immediate description of a voice with desired demographic traits and language. 

As an example: ‘I desire a 30 yr previous feminine who lives in California and is into software program,’ or ‘Give me an Australian man’s voice.’ 

“Each time you do this, you’re going to get a distinct voice,” mentioned Clifford. 

Rime’s Mist v2 TTS mannequin was constructed for high-volume, business-critical purposes, permitting enterprises to craft distinctive voices for his or her enterprise wants. “The client hears a voice that enables for a pure, dynamic dialog with no need a human agent,” mentioned Clifford. 

For these in search of out-of-the-box choices, in the meantime, Rime gives eight flagship audio system with distinctive traits: 

  • Luna (feminine, chill however excitable, Gen-Z optimist)
  • Celeste (feminine, heat, laid-back, fun-loving)
  • Orion (male, older, African-American, comfortable)
  • Ursa (male, 20 years previous, encyclopedic data of 2000s emo music)
  • Astra (feminine, younger, wide-eyed)
  • Esther (feminine, older, Chinese language American, loving)
  • Estelle (feminine, middle-aged, African-American, sounds so candy)
  • Andromeda (feminine, younger, breathy, yoga vibes)
See also  Alibaba Qwen is challenging proprietary AI model economics

The mannequin has the power to modify between languages, and might whisper, be sarcastic and even mocking. Arcana may also insert laughter into speech when given the token <snort>. This could return assorted, sensible outputs, from “a small chuckle to a giant guffaw,” Rime says. The mannequin may also interpret <chuckle>, <sigh> and even <hum> accurately, despite the fact that it wasn’t explicitly educated to take action. 

“It infers emotion from context,” Rime writes in a technical paper. “It laughs, sighs, hums, audibly breathes and makes refined mouth noises. It says ‘um’ and different disfluencies naturally. It has emergent behaviors we’re nonetheless discovering. Briefly, it acts human.” 

Capturing pure conversations

Rime’s mannequin generates audio tokens which might be decoded into speech utilizing a codec-based strategy, which Rime says gives for “faster-than-real-time synthesis.” At launch, time to first audio was 250 milliseconds and public cloud latency was roughly 400 milliseconds. 

Arcana was educated in three phases:

  • Pre-training: Rime used open-source massive language fashions (LLMs) as a spine and pre-trained on a big group of text-audio pairs to assist Arcana study normal linguistic and acoustic patterns.
  • Supervised fine-tuning with a “huge” proprietary dataset. 
  • Speaker-specific fine-tuning: Rime recognized the audio system it discovered “most exemplary” amongst its dataset, conversations and reliability. 

Rime’s knowledge incorporates sociolinguistic dialog strategies (factoring in social context like class, gender, location), idiolect (particular person speech habits) and paralinguistic nuances (non-verbal points of communication that associate with speech). 

 The mannequin was additionally educated on accent subtleties, filler phrases (these unconscious ‘uhs’ and ‘ums’) in addition to pauses, prosodic stress patterns (intonation, timing, stressing of sure syllables) and multilingual code-switching (when multilingual audio system swap forwards and backwards between languages). 

The corporate has taken a novel strategy to amassing all this knowledge. Clifford defined that, sometimes, mannequin builders will collect snippets from voice actors, then create a mannequin to breed the traits of that particular person’s voice primarily based on textual content enter. Or, they’ll scrape audiobook knowledge. 

See also  How Levi Strauss is using AI for its DTC-first business model

“Our strategy was very completely different,” she defined. “It was, ‘How will we create the world’s largest proprietary knowledge set of conversational speech?’” 

To take action, Rime constructed its personal recording studio in a basement in San Francisco and spent a number of months recruiting folks off Craigslist, by means of word-of-mouth, or simply causally gathered themselves and family and friends. Quite than scripted conversations, they recorded pure conversations and chitchat. 

They then annotated voices with detailed metadata, encoding gender, age, dialect, speech have an effect on and language. This has allowed Rime to attain 98 to 100% accuracy. 

Clifford famous that they’re always augmenting this dataset. 

“How will we get it to sound private? You’re by no means going to get there in the event you’re simply utilizing voice actors,” she mentioned. “We did the insanely laborious factor of amassing actually naturalistic knowledge. The large secret sauce of Rime is that these aren’t actors. These are actual folks.”

A ‘personalization harness’ that creates bespoke voices

Rime intends to provide clients the power to seek out voices that can work greatest for his or her software. They constructed a “personalization harness” instrument to permit customers to do A/B testing with numerous voices. After a given interplay, the API studies again to Rime, which gives an analytics dashboard figuring out the best-performing voices primarily based on success metrics. 

In fact, clients have completely different definitions of what constitutes a profitable name. In meals service, that is likely to be upselling an order of fries or further wings. 

“The objective for us is how will we create an software that makes it straightforward for our clients to run these experiments themselves?,” mentioned Clifford. “As a result of our clients aren’t voice casting administrators, neither are we. The problem turns into tips on how to make that personalization analytics layer actually intuitive.”

One other KPI clients are maximizing for is the caller’s willingness to speak to the AI. They’ve discovered that, when switching to Rime, callers are 4X extra more likely to speak to the bot. 

See also  Hugging Face partners with Groq for ultra-fast AI model inference

“For the primary time ever, persons are like, ‘No, you don’t must switch me. I’m completely prepared to speak to you,’” mentioned Clifford. “Or, after they’re transferred, they are saying ‘Thanks.’” (20%, actually, are cordial when ending conversations with a bot). 

Powering 100 million calls a month

Rime counts amongst its clients Domino’s, Wingstop, Converse Now and Ylopo. They do a number of work with massive contact facilities, enterprise builders constructing interactive voice response (IVR) techniques and telecoms, Clifford famous.  

“Once we switched to Rime we noticed a right away double-digit enchancment within the probability of our calls succeeding,” mentioned Akshay Kayastha, director of engineering at ConverseNow. “Working with Rime means we clear up a ton of the last-mile issues that come up in delivery a high-impact software.” 

Ylopo CPO Ge Juefeng famous that, for his firm’s high-volume outbound software, they should construct instant belief with the patron. “We examined each mannequin available on the market and located that Rime’s voices transformed clients on the highest charge,” he reported. 

Rime is already serving to energy near 100 million telephone calls a month, mentioned Clifford. “Should you name Domino’s or Wingstop, there’s an 80 to 90% probability that you simply hear a Rime voice,” she mentioned. 

Trying forward, Rime will push extra into on-premises choices to help low latency. The truth is, they anticipate that, by the tip of 2025, 90% of their quantity will likely be on-prem. “The rationale for that’s you’re by no means going to be as quick in the event you’re working these fashions within the cloud,” mentioned Clifford. 

Additionally, Rime continues to fine-tune its fashions to deal with different linguistic challenges. As an example, phrases the mannequin has by no means encountered, like Domino’s tongue-tying “Meatza ExtravaganZZa.” As Clifford famous, even when a voice is customized, pure and responds in actual time, it’s going to fail if it might probably’t deal with an organization’s distinctive wants. 

“There are nonetheless a number of issues that our opponents see as last-mile issues, however that our clients see as first-mile issues,” mentioned Clifford. 


Source link
TAGGED: Boosts, Brands, Converts, Major, Model, Sales, TTS, voice
Share This Article
Twitter Email Copy Link Print
Previous Article Sam Altman calls for 'AI privilege' as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions Sam Altman calls for ‘AI privilege’ as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions
Next Article New Kao Data campaign highlights women driving innovation in Digital Infrastructure New Kao Data campaign highlights women driving innovation in Digital Infrastructure
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

New white paper reveals how smarter water treatment cut corrosion and saved 34 million gallons

System Efficiency and Environmental Influence. The great doc gives an in-depth look into ChemTreat’s state-of-the-art…

April 23, 2025

Telegram and xAI forge Grok AI deal

Telegram has cast a take care of Elon Musk’s xAI to weave Grok AI into…

May 29, 2025

The Pros and Cons of Dry Coolers for Data Centers

Historically, the information heart cooling course of has required two key elements: a lot of…

July 12, 2025

Claude faces ‘industrial-scale’ AI model distillation

Anthropic has detailed three “industrial-scale” AI mannequin distillation campaigns by abroad labs designed to extract…

February 24, 2026

Google layoffs hit over 100 design roles amid AI spending shift

Google layoffs have hit greater than 100 employees in its design groups, marking the most…

October 2, 2025

You Might Also Like

STL launches Neuralis data centre connectivity suite in the U.S.
AI & Compute

STL launches Neuralis data centre connectivity suite in the U.S.

By saad
What is optical interconnect and why Lightelligence's $10B debut says it matters for AI
AI & Compute

What is optical interconnect and why Lightelligence’s $10B debut says it matters for AI

By saad
IBM launches AI platform Bob to regulate SDLC costs
AI & Compute

IBM launches AI platform Bob to regulate SDLC costs

By saad
The evolution of encoders: From simple models to multimodal AI
AI & Compute

The evolution of encoders: From simple models to multimodal AI

By saad

About Us

Data Center News is your dedicated source for data center infrastructure, AI compute, cloud, and industry news.

Top Categories

  • AI & Compute
  • Cloud Computing
  • Power & Cooling
  • Colocation
  • Security
  • Infrastructure
  • Sustainability
  • Industry News

Useful Links

  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

Find Us on Socials

© 2026 Data Center News. All Rights Reserved.

© 2026 Data Center News. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.