Stable Audio 2.0: The new frontier in AI-generated music
Unleash your musical genius with Stable Audio 2.0! Transform prompts into rich, full-length tracks or revamp sounds with groundbreaking audio-to-audio features. Elevate your artistry today!
Introducing the latest in AI music generation
AI Media Cafe is excited to present the latest innovation in artificial intelligence music creation: Stable Audio 2.0. This groundbreaking model is capable of producing high-fidelity, full-length music tracks up to three minutes long in 44.1 kHz stereo quality, all from a simple natural language prompt.
The capabilities of this new model extend beyond mere text-to-audio conversion. It now includes audio-to-audio features, allowing users to upload their own audio clips and transform them into a diverse range of sounds using natural language instructions. This update significantly enhances the generation of sound effects and the ability to transfer styles, offering artists and musicians unprecedented levels of flexibility and creative control.
Building on the success of its predecessor, Stable Audio 1.0, which launched in September 2023 as the first AI music generation tool of its kind to produce high-quality 44.1kHz music, Stable Audio 2.0 utilizes latent diffusion technology. It has earned recognition as one of TIME’s Best Inventions of 2023.
As of today, this new model is accessible at no cost on the Stable Audio website and is slated for future integration with the Stable Audio API.
Enhanced creative toolkit for audio production
Stable Audio 2.0 represents our most sophisticated audio model to date, broadening the creative toolkit available to artists and musicians. It supports both text-to-audio and audio-to-audio prompts, enabling the creation of melodies, backing tracks, stems, and sound effects that can take the creative process to new heights.
Comprehensive track generation
What sets Stable Audio 2.0 apart from other advanced models is its ability to generate entire songs, complete with structured compositions that feature intros, developments, outros, and even stereo sound effects, all within a three-minute duration.
Expanding the horizons of sound design
The model enhances the production of sound and audio effects, enabling the creation of everything from the subtle click of a keyboard to the thunderous applause of an audience or the ambient buzz of urban life, providing new avenues to enrich audio projects.
Seamless style adaptation
A novel feature of this model is its style transfer capability, which effortlessly alters either newly generated or uploaded audio within the generation process. This function allows users to tailor the output’s theme to match the specific style and tone of their project.
Behind the scenes: technical advancements
The architecture of the Stable Audio 2.0 latent diffusion model is intricately designed to facilitate the generation of full tracks with coherent structures. To accomplish this, every component of the system has been optimized for enhanced performance over extended time scales. A newly developed, highly compressed autoencoder condenses raw audio waveforms into much shorter representations. Meanwhile, a diffusion transformer (DiT), similar to the one used in Stable Diffusion 3, replaces the previous U-Net model due to its superior ability to handle data over long sequences. This combination allows the model to recognize and replicate the large-scale structures that are crucial for producing high-quality musical compositions.
Details on the research and technical specifics will be disclosed in an upcoming research paper.
Commitment to ethical use and copyright protection
As with the initial version, Stable Audio 2.0 has been trained on a dataset from AudioSparx, which includes over 800,000 audio files featuring music, sound effects, and single-instrument stems, along with corresponding text metadata. Artists contributing to AudioSparx had the option to exclude their work from the training data for the Stable Audio model.
To safeguard creator copyrights, for audio uploads, we collaborate with Audible Magic to employ their content recognition (ACR) technology. This enables real-time content matching to help prevent copyright infringement.
Stable Radio: A showcase of AI-generated music
Stable Radio, a 24/7 live stream that showcases tracks produced exclusively by Stable Audio, is currently broadcasting on the Stable Audio YouTube channel.
Discover the model and begin your free creative journey on the Stable Audio website today. For the latest updates on Stable Audio, be sure to regulary visit our website.