Audio News

Stable Audio 2.0: The new frontier in AI-generated music

Unleash your musical genius with Stable Audio 2.0! Transform prompts into rich, full-length tracks or revamp sounds with groundbreaking audio-to-audio features. Elevate your artistry today!

ByEmily JohnsonEmily Johnson

PublishedApril 11, 2024

Introducing the latest in AI music generation

AI Media Cafe is excited ⁣to present the latest innovation in artificial intelligence music creation: Stable Audio 2.0. This groundbreaking⁤ model is capable of producing high-fidelity, full-length music tracks up to three minutes long in 44.1 kHz stereo⁣ quality, all from a simple natural ‌language prompt.

The capabilities of this new model extend beyond mere text-to-audio conversion. It now includes audio-to-audio features, allowing users to upload their own audio⁣ clips and transform them into a diverse range⁣ of sounds using natural language instructions. This update significantly enhances ‍the generation of sound effects and the ability to transfer styles, offering artists and musicians unprecedented levels of flexibility and creative control.

Building on the success of its predecessor, Stable Audio 1.0, which launched in September 2023 as the first AI music generation tool of its kind to produce high-quality 44.1kHz music, Stable Audio 2.0⁢ utilizes latent diffusion technology. It has earned recognition as one of TIME’s Best‌ Inventions of 2023.

As of today, this new model is accessible at no cost on the Stable Audio website and is slated for future integration with the Stable Audio API.

Enhanced creative toolkit for audio production

Stable Audio 2.0 represents our most sophisticated audio model to‍ date, broadening the creative toolkit available to artists and musicians. It supports both text-to-audio and audio-to-audio prompts, enabling the creation of melodies, backing tracks, stems, and sound effects that can take the creative process to ⁢new heights.

Comprehensive track generation

What sets Stable Audio 2.0 apart from other advanced models is its ability to generate entire songs, complete with structured compositions that feature intros, developments, outros, and even stereo sound effects, ⁣all within a three-minute duration.

Expanding the horizons of sound design

The model enhances the⁣ production ‍of sound and audio effects, enabling the creation of everything from the⁤ subtle click of a keyboard to‌ the thunderous‌ applause of an audience or the ambient⁣ buzz of urban life, providing new avenues⁢ to enrich audio projects.

Seamless style adaptation

A novel feature of this model is its ⁣style transfer capability, which effortlessly⁣ alters either ‍newly generated or uploaded audio within the generation process. This function⁢ allows⁢ users to tailor⁢ the output’s theme to match the specific style and tone of their project.

Behind the scenes: technical advancements

The ⁤architecture of the Stable Audio 2.0 latent diffusion model is intricately designed to facilitate the generation of‌ full tracks with coherent ⁢structures.⁤ To ⁣accomplish this, ⁣every component of the⁤ system has been optimized for enhanced performance over extended time scales. A ‍newly developed, highly compressed autoencoder condenses raw audio waveforms into much ⁢shorter representations. Meanwhile, a diffusion ⁤transformer (DiT), similar to the one used in Stable Diffusion 3, replaces the previous U-Net ‌model ‍due to ⁣its superior ability ⁢to handle ⁤data over long⁣ sequences. This combination allows the model to recognize and⁤ replicate ‌the large-scale structures that are crucial for producing high-quality musical compositions.

Details on the research and technical specifics will be disclosed in an upcoming research paper.

Commitment to ethical use and copyright protection

As with the initial version, Stable Audio 2.0 has been trained on a dataset from AudioSparx, which includes over 800,000 audio files featuring music, sound effects, and single-instrument stems, along with corresponding text ‌metadata. Artists contributing to AudioSparx had the⁣ option to exclude their work from the training data ‍for the Stable Audio ⁣model.

To safeguard creator ‌copyrights, for audio uploads, we ⁤collaborate⁤ with Audible Magic to‍ employ their content recognition (ACR) technology. This enables real-time content matching to help prevent copyright⁤ infringement.

Stable Radio:⁤ A showcase of AI-generated⁣ music

Stable Radio, a 24/7 live stream that ‍showcases tracks produced exclusively⁢ by Stable Audio, is currently broadcasting on the Stable ⁣Audio YouTube channel.

Discover the model and⁢ begin your free creative ⁢journey on the Stable Audio website‌ today. For the latest updates on Stable Audio, be sure to regulary visit our website.