Within just two years of its inception, ElevenLabs, the AI voice startup founded by former Google and Palantir employees, has hit the unicorn status. The company today announced it has raised $80 million in a series B round of funding, growing its valuation ten-fold to $1.1 billion.
The investment has been co-led by existing investors Andreessen Horowitz (a16z), former GitHub CEO Nat Friedman and former Apple AI leader Daniel Gross, with participation from Sequoia Capital and SV Angel. It comes six months after the $19 million series A round that valued the company at about $100 million.
ElevenLabs, which has mastered the art of using machine learning for voice cloning and synthesis in different languages, said it plans to use the capital to advance its research and build on the products on offers. It also announced a bunch of new features, including a tool for dubbing full-length movies and a new marketplace where users will be able to sell their cloned voice for money.
They are expected to roll out over the coming weeks.
Making content universally accessible
In a world where dialects and languages change with every region, it is impossible to localize content for everyone. Traditionally, the approach has been to focus on English or mainstream language while hiring dubbing artists for select markets with growth potential. The artists then record the content in the targeted language, enabling distribution. Now, the thing is, these manual dubbings are far from the original content. Plus, even with this, it is impossible to scale the content for widespread distribution – especially when the production team is not that big.
Former Google machine learning engineer Piotr Dabkowski and ex-Palantir deployment strategist Mati Staniszewski, who both hail from Poland, witnessed this problem firsthand when they saw poorly dubbed movies. This challenge inspired them to launch ElevenLabs, a company on a mission to make all content universally accessible in any language and voice with the power of AI.
ElevenLabs debuted in 2022 and has since been growing bit by bit. In the initial phase, it made waves with a text-to-speech model that synthesized natural-sounding AI voices in English. Then, the model expanded to Eleven Multilingual v1 and v2, which introduced support for synthesis in more languages, including Polish, German, Spanish, French, Italian, Portuguese and Hindi. Simultaneously, the company also developed a Voice Lab, where users could clone their own voices or generate entirely new synthetic voices (by randomly sampling vocal parameters) to use with the synthesis tool. This allowed them to convert the text of their choice, like the script of a podcast, into audio content in their preferred voice and language.
“ElevenLabs’ technology combines context awareness and high compression to deliver ultra-realistic speech. Rather than generate sentences one by one, the company’s proprietary model is built to understand word relationships and adjusts delivery based on the wider context. It also has no hardcoded features, meaning it can dynamically predict thousands of voice characteristics while generating speech,” Staniszewski told VentureBeat.
A million users and counting
Within a few months of launching the tools in beta, ElevenLabs gained significant traction, with over a million users coming aboard. The company also built on its AI voice research by launching AI Dubbing, a speech-to-speech conversion tool that allowed users to translate audio and video into 29 different languages whilst preserving the original speaker’s voice and emotions. As of now, it counts 41% of the Fortune 500 among its customers. This also includes notable content publishers such as Storytel, The Washington Post and TheSoul Publishing
“We are constantly entering into new B2B partnerships, with over 100 established to date. AI voices have wide applicability – from enabling creators to enhance audience experiences, to broadening access to education and providing innovative solutions in publishing, entertainment, and accessibility,” Staniszewski noted.
Now, as the user base continues to grow, ElevenLabs is also looking to innovate on the product side to give users the best set of features to work with. This is where the new Dubbing Studio workflow comes in.
The workflow builds on the AI Dubbing product and gives professional users a dedicated set of tools to not only dub entire movies in the language of their choice but also generate and edit their transcripts, translations and timecodes, allowing for additional hands-on control over production. It supports 29 languages, like AI Dubbing, but misses out on one key element critical to content localization: lip-syncing.
This means that if a movie is localized with the tool, it will only dub the audio in the targeted language – the lip movement in the video will remain as it was in the original. Staniszewski confirmed that the company is currently laser-focused on delivering the best audio experience but hopes to add this capability in the future.
Marketplace to sell AI voices and more to come
In addition to the Dubbing Studio, ElevenLabs is also launching an accessibility app to convert text or URLs into audio as well as a Voice Library or a marketplace of sorts enabling users to sell their AI-cloned voice for money. The company is giving users the flexibility to define the availability and compensation terms for their AI-generated voice but notes that sharing it will be a multi-step process involving different layers of verification. The move will give users a broader set of voice models to work with while giving the creators of those voice models an opportunity to earn.
“Before sharing a voice, users must pass a voice captcha verification by reading a text prompt within a specific timeframe to confirm their voice matches the training samples. This, along with our team’s moderation and manual approval, ensures authentic, user-verified voices can be shared and monetized,” the founder and CEO said.
As these features hit general availability, which is expected over the coming weeks, ElevenLabs hopes to draw more customers from different segments. The company said it plans to use this capital, which takes its total fund-raise to $101 million, to advance its research on AI voice, expand infrastructure and develop new vertical-specific products – while building strong safety controls at the same time, including a classifier that could identify AI audio.
“Over the next years, we aim to build our position as the global leader in voice AI research and product deployment. We also plan to develop increasingly advanced tools tailored to professional users and use cases,” Staniszewski said.
Other players in the space of AI-powered voice and speech generation are MURF.AI, Play.ht and WellSaid Labs. According to Market US, the global market for such tools stood at $1.2 billion in 2022 and is estimated to touch nearly $5 billion in 2032, with a CAGR of slightly above 15.40%.