AI’s Desperate Hunger For News Training Data Has Publishers Fighting Back. Here’s How.

The AI race has reached a new fever pitch, with companies like Google, Meta, Anthropic and OpenAI locked in an all-out sprint to develop advanced AI models. But as these companies burn through the internet’s high-quality training data at an astonishing rate, some AI companies are resorting to controversial tactics — including the alleged mass scraping of news articles.

Now, the journalism world is beginning to fight back against what some are calling the “largest theft in the United States.” Here’s where things stand in the escalating battle between Big Tech’s data-hungry AIs and a news industry determined to protect its content.

Throwing Up Walls Against Bots

Concerned news organizations, including Graham Media Group, The New York Times, The Guardian, Hearst, and Hubbard Broadcasting have already blocked AI chatbots like OpenAI’s ChatGPT and Google’s Gemini from scraping their sites. That list keeps growing by the day.

Why the sudden alarm? Many publishers, analysts and press freedom advocates see the rise of AI scraping as an existential threat — not only to their business models, but to the fundamental integrity of journalism itself.

They worry that training chatbots on news articles, without oversight, could turbocharge the already challenging problems of misinformation and synthetic content online.

“It is clearly possible that some groups or organizations use and fine-tune models to create tailored disinformation that suits their projects or their purpose,” warned Vincent Berthier of Reporters Without Borders in an interview with VOA News.

Newsrooms Look For Ways To Defend Their Content Against

Search

Browse

Editors

AI’s Desperate Hunger For News Training Data Has Publishers Fighting Back. Here’s How.

Newsletter

Share