It is time to rejoice the unimaginable ladies main the way in which in AI! Nominate your inspiring leaders for VentureBeat’s Girls in AI Awards at this time earlier than June 18. Study Extra
Weeks after AI voice startup ElevenLabs launched its Sound Results text-to-sound AI providing, the corporate is releasing an open-source tool to showcase its potential. In “about 15 seconds,” this software permits creators to generate sound impact samples for his or her movies, analyzing the imported clip and offering a number of choices.
Whereas builders can entry the app’s code on GitHub, ElevenLabs has revealed an internet site for the general public to check out its Sound Results API.
Once you add a video, the so-called Video to Sound Results app extracts 4 frames at one-second intervals on the shopper facet. Then, it sends these frames and a immediate to OpenAI’s GPT-4o to create a customized text-to-sound results immediate. That immediate is then used to generate a sound impact by ElevenLabs’s Sound Results API. Lastly, the video and audio are mixed on the shopper facet right into a single file prepared for obtain that may be as much as 22 seconds lengthy.
“We view it as a proof of idea of what individuals will be capable of do with our SFX API,” Ammaar Reshi, ElevenLabs’ design lead, tells VentureBeat. “AI video creators are sometimes looking for the right sound impact and we felt like we may velocity up the workflow intelligently by understanding the frames of their movies after which suggesting the very best output.” He says the corporate is worked up in regards to the totally different sorts of dynamic experiences individuals would possibly construct with this API, highlighting immersive video video games as one instance the place sounds could also be generated based mostly on a participant’s interplay.
VB Rework 2024 Registration is Open
Be part of enterprise leaders in San Francisco from July 9 to 11 for our flagship AI occasion. Join with friends, discover the alternatives and challenges of Generative AI, and learn to combine AI purposes into your business. Register Now
The aforementioned API permits builders to construct totally customized AI sound results utilizing a brief description. ElevenLabs expenses 100 characters per technology with an automated period or 25 characters per second with a set period.
In a short take a look at, the video-to-sound results app appeared easy. After importing an audio-free film of a automobile navigating an all-terrain atmosphere, ElevenLabs’ AI generated 4 choices, all sounding like a automobile traversing on a gravel highway. However whereas it’s amusing to use sound results to clips, maybe the true potential is for this functionality to be built-in into a bigger system to derive the true advantages.
And because the AI video technology area heats up, ElevenLabs may be seeking to keep forward of everybody, growing new audio options it is aware of shall be in demand by builders, filmmakers and creators.
Source link