An impressive new AI system from Google can generate music in any genre given a text description. But the company, fearing the risks, has no immediate plans to release it.
Called MusicML, Google’s certainly isn’t the first generative AI system for song. There’s been other attempts, including Riffusion, an AI that composes music by visualizing it, as well as Dance Diffusion, Google’s own AudioML and OpenAI’s Jukebox. But owing to technical limitations and limited training data, none have been able to produce songs that are particularly complex or high-fidelity.
MusicML is perhaps the first that can.
Detailed in an academic paper this week, MusicML was trained on a data set of unlabeled music to learn to generate coherent songs for descriptions of — as the creators put it — “significant complexity” (e.g. “enchanting jazz song with a memorable saxophone solo and a solo singer” or “Berlin ’90s techno with a low bass and strong kick.” Its songs, remarkably, sound something like a human artist might compose, albeit not necessarily as inventive or musically cohesive.
Indeed, it’s hard to overstate just how good the samples sound given there isn’t a musician or an instrumentalist in the loop. Even when fed somewhat long and meandering descriptions, MusicML manages to capture nuances like instrumental riffs, melodies and moods.
The caption for the sample below, for instance, included the bit “induces the experience of being lost in space,” and it certainly delivers on that front (at least to my ears):
Here’s another sample, generated from a description starting with the sentence “The main soundtrack of an arcade game.” Plausible, no?
MusicLM’s capabilities extend beyond generating short clips of songs. The Google researchers show that the system can build on existing melodies, whether hummed, sung, whistled or played on an instrument. Moreover, MusicLM can take several descriptions written in sequence (e.g. “time to meditate,” “time to wake up,” “time to run,” “time to give 100%”) and create a sort of melodic “story” or narrative ranging up to several minutes in length, perfectly fit for a movie soundtrack.
See below, which came from the sequence “electronic song played in a videogame,” “meditation song played next to a river,” “fire,” “fireworks.”
That’s not to suggest MusicLM’s flawless — far from it, truthfully. Some of the samples have a distorted quality to them, an unavoidable side effect of the training process. And while MusicLM can technically generate vocals, including choral harmonies, many leave a lot to be desired. Most of the “lyrics” range from barely coherent to pure gibberish, sung by synthesized voices that sound like amalgamations of several artists.
Still, the Google researchers note the many ethical challenges posed by a system like MusicML, including an unfortunate tendency to incorporate copyrighted material from training data into the generated songs. During an experiment, they found that about 1% of the music the system generated was directly replicated from the songs on which it trained — a threshold apparently high enough to discourage them from releasing MusicML in its current state.
“We acknowledge the risk of potential misappropriation of creative content associated to the use case,” the coauthors of the paper wrote. “We strongly emphasize the need for more future work in tackling these risks associated to music generation.”
Assuming MusicML or a system like it is one day made available, it seems inevitable that major legal issues will come to the fore. They already have, albeit around simpler AI systems. In 2020, Jay-Z ‘s record label filed copyright strikes against a YouTube channel, Vocal Synthesis, for using AI to create Jay-Z covers of songs like Billy Joel’s “We Didn’t Start the Fire.” After initially removing the videos, YouTube reinstated them, finding the takedown requests were “incomplete.” But deepfaked music still stands on murky legal ground.
A whitepaper authored by Eric Sunray, now a legal intern at the Music Publishers Association, argues that AI music generators like MusicML violate music copyright by creating “tapestries of coherent audio from the works they ingest in training, thereby infringing the United States Copyright Act’s reproduction right.” Following the release of Jukebox, critics have also questioned whether training AI models on copyrighted musical material constitutes fair use. Similar concerns have been raised around the training data used in image-, code- and text-generating AI systems, which is often scraped from the web without creators’ knowledge.
From a user perspective, Waxy’s Andy Baio speculates that music generated by an AI system would be considered a derivative work, in which case only the original elements would be protected by copyright. Of course, it’s unclear what might be considered “original” in such music; using this music commercially is to enter uncharted waters. It’s a simpler matter if generated music is used for purposes protected under fair use, like parody and commentary, but Baio expects that courts would have to make case-by-base judgments.
Leave a Reply