Google’s MusicLM AI can generate music from just text descriptions


Jan 29, 2023, 04:02 pm
3 min read

MusicLM has been trained on 2.8 lakh hours of audio clips (Photo credit: Google)

Google has created an impressive AI model, named MusicLM, that can generate different genre-based audio samples by simply adhering to text descriptions.

MusicLM can take multiple text descriptions and follow them in a sequential order to produce a melodic story or narrative-type music. It can even build on existing melodies which you hum, whistle, sing, or play.

Let’s talk about its potential in detail.

Why does this story matter?

  • The era of AI and automation in content creation has already begun with OpenAI‘s ChatGPT, which has taken the world by storm with its distinctive approach to answering human queries.
  • Now, Google’s intriguing AI model, MusicLM, is here to answer text-based queries, but in the form of music. It can even read images along with their description to produce audio clips in no time.

MusicLM: Everything you need to know

Google’s MusicLM is a generative AI system for songs.

According to the tech giant, the model has been trained on a dataset of 2,80,000 hours of music so that it can learn how to generate coherent songs from descriptions of “significant complexity.”

It can maintain 23kHz of consistent music throughout a clip’s duration, which ranges from a few seconds to several minutes in length.

The AI model can create music based on different genres

Google’s MusicLM can produce both short and long-form music in almost any genre.

It even understands the music from hums, whistles, melodies, and chorus forms of audio.

The AI model supports all major musical genres including Jazz, Tecno, British indie rock, Hip-hop, Reggae, Folk, and more.

However, the company has not mentioned any India-centric music genres like Hindustani Classical or Carnatic.

It can produce audio from pictures and captions

MusicLM’s capabilities go beyond just creating audio clips/samples.

Even if it is given a somewhat lengthy description(s), it can pick up subtleties in sequential order to produce a soundtrack.

Additionally, MusicLM can be instructed by a combination of pictures and captions. It can even generate specific instrument-based audio.

The AI model’s experience level can be adjusted to create music inspired by places and events.

Is it available for public use?

Google’s MusicLM can deliver both beginner and professional-level music, provided the user specifies the requirements (style, tone, genre, etc.) in the text description.

At the moment, the AI model isn’t ready for public use. However, to support future research, Google will soon release MusicCaps, a dataset with over 5,500 music text pairs generated using descriptions provided by experts.

What are the current limitations of MusicLM?

In all honesty, MusicLM isn’t perfect yet. Some of the generated samples have a distorted quality.

Notably, the AI model can mimic vocals, but they sound like nonsensical language derived by consolidating the synthesized voices of artists.

Google’s researchers found that 1% of the music generated, directly replicated the songs the system was trained on. It would have led to copyright violations.

