AI model adds music to its repertoire

by Andrew Cochran

An AI technology that generates believable text produces tunes, too. Both come from OpenAI, a California company dedicated to developing artificial general intelligence.

Their music model is called MuseNet.

OpenAI says MuseNet can generate an original composition up to four minutes long, orchestrated in up to 10 instruments in many styles. The example to the right is a jazz trio.

The system is given a short passage of music that acts as a ‘prompt’ — a new twist on the old idea of ‘hum a few bars.‘ But instead of recalling an old favourite, the algorithm takes its cue from the prompt and generates a completely new piece. For example, a country prompt will trigger a tune with the characteristics of country music.

The machine determines the features of country music by assessing distinguishing patterns from thousands of country tunes used to ‘train’ the system. MuseNet generates in many different styles.

Transforming music and text

The technical term for the AI engine is a ‘large scale transformer model’. The computer predicts what comes next based on the sequence before. Given a set of notes, it predicts the next one, constantly updating and predicting (very quickly) as the piece evolves.

The transformer model works the same whether it’s processing notes or words. In February 2019, OpenAI showed its text-generating model, GPT-2. With it came concerns about the possibilities for malicious uses of AI-generated text. Artificial music is less contentious, but the underlying concept is the same: generate ‘new’ based on ‘known.’

The model knows nothing about language or music, only their patterns and how they vary according to different styles.

For example, news reporting and academic writing are different styles of writing, and classical and jazz are different styles of music. The system becomes familiar with the characteristic structures and nuances of each by assessing very large collections as examples.

In AI-speak, this is known as unsupervised learning because the algorithm arrives at an ‘understanding’ on its own.

The result is chameleon-like. The newly generated material takes on the style it’s given: rapid, short notes for a Bach style, sweeping flourishes in a movie-theme style, etc.

Mix-and-match

Styles can be blended, too. The model can be prompted with Bach and then asked to interpolate Lady GaGa. It isn’t mixing music but patterns. Here is an example of a Bon Jovi style prompted by six notes from Chopin.

Current limitations

The company says there can be unsuccessful combinations, such as Chopin with drums. Transformer shortcomings are more noticeable in text, such as repeating words or passages, switching topics unnaturally, or mixing up worlds, for example, describing ‘a fire underwater.’

As with many AI models, these are early days. The amount generated is determined by algorithmic memory — the extent the algorithm can ‘remember’ earlier sequences. Otherwise, consistency is lost.

Still, GPT-2 text was sufficiently coherent over multiple paragraphs to prompt concerns about spreading untruths. No similar flags have gone up about sibling MuseNet except by critics who have split opinions about its contribution to music.

Our take

Computer-generated music is not new — think of all those drum machines or melody tracks for solo performers. This breakthrough is about a machine composing by itself in many musical styles.
The bigger advance is demonstrating there can be more than one kind of generative output from the same AI model, in other words, the same model generating either music or text.
A future phase of AI, when the same machine is capable of many/most things, is known as artificial general intelligence, or AGI. This outcome is not AGI, but, significantly, it comes from a company that directs its research toward achieving AGI and is a step along that path.
Music is safer than text for experimentation. Sour notes are likely the worst outcome when a new trial falls short. With artificial text, the stakes are much higher until there are better methods of detecting fakery.
There will be several knock-ons: When an algorithm generates text or music that is original, does it constitute creativity? How is computer-generated text or music treated in copyright law? If it disaffects or infringes on the works of others, who is responsible?

SEE RELATED STORIES

OpenAI’s MuseNet generates AI music at the push of a button
THE VERGE | April 26, 2019 | by Jon Porter |‘Lady Gaga’s Poker Face in the style of Mozart? Sure, why not’

This AI-generated musak shows us the limit of artificial creativity
MIT TECHNOLOGY REVIEW | April 26, 2019 | by Will Knight | ‘A powerful AI algorithm can dream up music that echoes Bach or the Beatles, but it isn’t real creativity.’

MuseNet generates original songs in seconds, from Bollywood to Bach (or both)
TECHCRUNCH | April 25, 2019 | by Devin Coldeway | ‘What makes it impressive is that a single model does this reliably across so many types of music.’

OpenAI’s MuseNet AI generates novel 4-minute songs with 10 across a range of genres and styles
VENTUREBEAT | April 25, 2019 | by Kyle Wiggers | ‘…a small but noteworthy step forward in autonomous music generation research…’

LATEST

Meet the One Woman Anthropic Trusts to Teach AI Morals | WSJ.MAGAZINE

Albania Created an ‘A.I. Minister’ to Curb Corruption. Then Its Developers Were Accused of Graft | THE NEW YORK TIMES

Anthropic Releases A New ‘Constitution’ For Claude | FORTUNE

Publishers prepare to be “squeezed” by AI and creators in 2026 | NEIMAN LAB