Songs generated by artificial intelligence are getting longer, but not necessarily better

Users can now contribute their own audio samples to Stable Audio 2.0, an audio production model for Stability AI, and use prompts to transform them into AI-generated tunes. But the songs aren’t going to be Grammy wins just soon.

When Stable Audio’s first version was launched in September 2023, it was limited to 90 seconds for some paying users, meaning that users could only create brief sound bits for experimentation. A complete three-minute sound clip, or the duration of most radio-friendly songs, is provided by Stable Audio 2.0. All audio uploaded needs to be free of copyright.

Stability AI made Stable Audio free and accessible to the general public via its website and, soon, its API, in contrast to OpenAI’s Voice Engine audio generating approach, which is exclusively available to a limited number of users.

The ability to produce songs that sound like songs, packed with an intro, progression, and outro, is one major distinction between Stable Audio 2.0 and its previous incarnation, according to Stability AI.

I was given the opportunity to experiment with Stable Audio by the company, and I must admit that I’m still a long way from being able to channel my inner Beyoncé. The song that Stable Audio produced in response to the request “folk pop song with American vibes” (I meant Americana, by the way) does, in certain places, seem like it belongs in my Mountain Vibes Listening Wednesday Morning Spotify collection. However, it also included what appear to be vocals. It sounds, according to another Verge reporter, like whale sounds. I’m more concerned that I may have unintentionally called a creature into my house.

Theoretically, I could edit the audio to better suit my listening preferences because Stable Audio 2.0 has new capabilities that allow users to customize their projects by modifying the amount of uploaded audio that will be modified and the prompt strength, or how much of the prompt should be followed. Additionally, users can include sound effects like keyboard taps or a roaring audience.

It’s not surprising that AI-generated songs still sound strange and soulless, even with the strange Gregorian whale noises removed. Wes Davis, my colleague, considered this after listening to a song that Suno had composed. Other businesses, such as Meta and Google, have also experimented with AI audio production, but they haven’t made their models available to the public yet in order to gather developer input and address the issue of soulless sound.

According to a press release from Stability AI, AudioSparx, which claims a library of over 800,000 audio samples, provides training data for Stable Audio. According to Stability AI, artists signed to AudioSparx have the option to withhold their work from being used to train the model. Ed Newton-Rex, the former vice president for audio at Stability AI, quit the firm soon after Stable Audio launched, claiming training on copyrighted audio as one of the reasons. In order to track and prevent unauthorized materials from accessing the site, Stability AI claims that it collaborated with Audible Magic to employ its content recognition technology for this version.

Although it’s not quite there yet, Stable Audio 2.0 does a better job of making music sound like songs than the previous version. Maybe the next iteration will have more recognizable language if the model insists on including vocals of any kind.