The Atlantic Publishes Searchable Database of Music Used to Train AI Models

Atlantic reporter Alex Reisner has made four music datasets — containing as many as 12 million tracks and used to train AI models — fully searchable by the public, revealing that artists from Lady Gaga to Radiohead appear in AI training pipelines.

Sara Montes de Oca

JUN 20, 2026 · 11:09 PM ET · 2 MIN READ

Photo by Caught In Joy on Unsplash

A journalist at The Atlantic has identified four datasets containing millions of music tracks that have been used to train artificial intelligence models, and has made those datasets fully searchable by the public — exposing a wide range of artists whose work appears in AI training pipelines without their explicit consent for commercial use.

Atlantic reporter Alex Reisner uncovered the four datasets and built a searchable tool on The Atlantic's AI Watchdog site, where anyone can look up whether songs, books, or other media appear in the training data.

Two of the datasets are substantial in scale, containing 12 million and 9 million tracks respectively. The remaining two are considerably smaller but still represent meaningful volumes of training material, each containing more than 100,000 songs.

The datasets have been downloaded thousands of times, according to Reisner. While it is impossible to determine precisely who has used them, both Google and Stability AI have confirmed their use in published research papers.

The range of artists whose work appears in the data is broad. Pop acts including Lady Gaga and Fred Again.. sit alongside rock and hip-hop acts such as Radiohead, Bruce Springsteen, and Wu-Tang Clan. Electronic artists Aphex Twin and experimental composer Hainbach also appear in the datasets.

Some of the source material, including tracks from the Free Music Archive dataset, is freely available for personal streaming but requires a commercial license for other uses — a distinction that becomes legally significant when the music is used to train AI models.

The technical process of obtaining the audio is not straightforward. As Reisner explains, three of the four datasets are distributed as lists of links pointing to songs hosted on YouTube or Spotify, rather than as audio files themselves. AI developers then use automated tools to download the actual audio, and some of those tools are capable of bypassing platform logins, advertisements, and other mechanisms designed to generate revenue for creators. Such tools violate the terms of service of those platforms.

The disclosure arrives amid a broader industry debate over the legality and ethics of training AI on copyrighted content without licensing agreements or compensation to rights holders. Music publishers and record labels have pursued litigation against several AI music generation companies over similar concerns.

The searchable database lowers the barrier for artists and rights holders to determine whether their work is included, potentially informing future legal challenges or licensing demands directed at AI developers who have relied on these datasets.

Disclaimer

━ ABOUT THE REPORTER

Sara Montes de Oca

Sara Montes de Oca is the Editor in Chief of TechEchelon. Previously a correspondent and producer in Washington, D.C., covering business, finance, and politics.

More from this desk

№01 · ARTIFICIAL INTELLIGENCE

Apple Embeds Eight AI Features Across iOS 27, From Bill Splitting to Automated Password Updates

Apple's iOS 27 distributes artificial intelligence across eight features built into existing apps, including a bill-splitting tool in Apple Cash, automated password updates, and natural-language Shortcuts — with a public release expected this fall.

Sara Montes de Oca · 1 HR AGO

№02 · ARTIFICIAL INTELLIGENCE

Apple Embeds AI Across iOS 27 With Bill Splitting, Password Updates, and Smart Notifications

Apple's iOS 27 will bring a range of AI-powered features to iPhone this fall, including receipt-based bill splitting in Apple Cash, autonomous password updates, and smart notification grouping in the Home app — all running through Apple Intelligence.

Sara Montes de Oca · 2 HR AGO

№03 · ARTIFICIAL INTELLIGENCE

Nobel Laureate John Jumper Leaves Google DeepMind for Anthropic Amid Escalating AI Talent War

Nobel Prize-winning scientist John Jumper, co-creator of AlphaFold, is leaving Google DeepMind after nearly nine years to join Anthropic, the latest departure to rock Alphabet's AI research division amid an intensifying industry-wide talent war.

Sara Montes de Oca · 16 HR AGO

● THE BRIEF · DAILY NEWSLETTER

Five stories every morning. Before the opening bell.

Written for readers who already know the basics — markets, AI, and the policy decisions that shape both.

Mon — Fri · 06:30 ET · Free