journalismAI.com | ‘Disgorgement’: Amazon researchers suggest ways to get rid of bad AI data

‘Disgorgement’: Amazon researchers suggest ways to get rid of bad AI data | SEMAFOR

May 1, 2024

1–2 minutes

AI companies are facing mounting lawsuits, and they might be able to avoid legal issues in the future if they could remove data that violates copyright.

– Katyanna Quach

“Follow the data” is good advice for understanding the provenance of large language models, particularly when it comes to vexing issues such as copyright infringement, potential privacy breaches, and bias. But it’s easier said than done, as training data often comes from massive collections that combine multiple types of information from many sources.

Researchers in Amazon’s AWS division are working on a way to purge pieces of a training set without sacrificing all content and starting again.

SEE FULL STORY

‘Disgorgement’: Amazon researchers suggest ways to get rid of bad AI data | SEMAFOR | May 1, 2024 | by Katyanna Quach

SEE BY SUBJECT

SEE BY TYPE

LATEST

Ethics & standards

Meet the One Woman Anthropic Trusts to Teach AI Morals | WSJ.MAGAZINE
AI advances, Trust

Albania Created an ‘A.I. Minister’ to Curb Corruption. Then Its Developers Were Accused of Graft | THE NEW YORK TIMES
Trust

Anthropic Releases A New ‘Constitution’ For Claude | FORTUNE
AI in the newsroom

Publishers prepare to be “squeezed” by AI and creators in 2026 | NEIMAN LAB

‘Disgorgement’: Amazon researchers suggest ways to get rid of bad AI data | SEMAFOR

SEE BY SUBJECT

SEE BY TYPE

LATEST

Meet the One Woman Anthropic Trusts to Teach AI Morals | WSJ.MAGAZINE

Albania Created an ‘A.I. Minister’ to Curb Corruption. Then Its Developers Were Accused of Graft | THE NEW YORK TIMES

Anthropic Releases A New ‘Constitution’ For Claude | FORTUNE

Publishers prepare to be “squeezed” by AI and creators in 2026 | NEIMAN LAB

Discover more from journalismAI.com