AUTO-UPDATED

From ad tech tax to AI data brokers: the new middlemen keep 100%, publishers say

Publishers are increasingly concerned that AI data scrapers are engaging in a hostile takeover of their intellectual property by extracting content without compensation to fuel competing generative products.

Key Points

  • Media analyst Matthew Scott Goldstein estimates the "scraper economy" is now a $1 billion industry.
  • Reports identify nearly 40 vendors, including companies like Perplexity, Firecrawl, Exa, Tavily, and Bright Data, that scrape content for AI training.
  • Industry executives compare the current landscape to the Napster era, noting that scrapers often ignore "no-crawl" directives and evade defensive website tools.
  • Some scrapers are rebranding as "agentic infrastructure" to justify large-scale data consumption while avoiding licensing agreements with original content creators.
  • While some publishers like USA Today have secured AI licensing deals, many others struggle to prevent their content from being scraped via third-party syndication portals.

Why it Matters

The rise of unauthorized data scraping threatens the long-term economic viability of digital publishing by devaluing original content and creating competing AI-driven platforms. Without a standardized marketplace to govern and price this data consumption, publishers face a significant loss of control over their intellectual property and revenue streams.
Digiday Published by Jessica Davies
Read original