The New York Times Updates its Terms Against Content Scraping

In addition to this, the revised terms restrict website crawlers — tools that index web pages for search results — from using this content to instruct LLMs or AI tools. A breach of these guidelines could lead to penalties, although the exact nature remains ambiguous.

Sara Montes de Oca

AUG 11, 2023 · 04:09 PM ET · 2 MIN READ

Editorial

Katie Gardner, a partner at Gunderson Dettmer, observed, "While restrictions on data scraping are commonplace in terms of service, it's rare to see a direct mention of AI training."

AI models, such as ChatGPT, depend on vast amounts of content, like journalistic articles, to produce results. This raises concerns for publishers with subscription models, as AI might replicate and redistribute content without acknowledgment, potentially compromising revenue and trust.

It's tricky for publishers to ascertain the intentions of crawlers, whether for improving search engine visibility or training AI. As reported by Digiday, some are exploring ways to block these crawlers. Meanwhile, crawlers like CommonCrawl have entered into agreements with giants like OpenAI, Meta, and Google for AI training purposes.

OpenAI introduced GPTBot this week, a web crawler designed to enhance AI models and allow publishers to manage its access. However, major players like Bing and Google haven't introduced such control features yet.

The Washington Post scrutinized Google’s C4 dataset and found content from prominent sites, including The New York Times, used in training LLMs.

Chris Pedigo from Digital Content Next notes that other publishers are now revisiting their terms of service in light of these developments.

Towards Licensing Agreements

The response from AI companies to these updated terms remains to be seen. Given potential legal pitfalls, there are talks between AI firms and top-tier publishers to form licensing agreements, similar to the one between OpenAI and The Associated Press.

The aim is not just monetary compensation. Publishers are pushing for citations for their content and introducing procedures within AI firms to ensure content accuracy.

Pedigo emphasizes the importance of quality, stating, "For any licensing agreements, publishers want their information to maintain a certain brand standard."

━ ABOUT THE REPORTER

Sara Montes de Oca

Sara Montes de Oca is the Editor in Chief of TechEchelon. Previously a correspondent and producer in Washington, D.C., covering business, finance, and politics.

More from this desk

№01 · ARTIFICIAL INTELLIGENCE

Memory Chip Shortage Leaves Hyperscalers Behind as Hardware Costs Squeeze AI Spending

A shortage of high-bandwidth memory chips is squeezing the four major hyperscalers — Amazon, Alphabet, Microsoft, and Meta — driving up AI infrastructure costs while memory and storage stocks surge 41% over the past month.

Sara Montes de Oca · 15 HR AGO

№02 · ARTIFICIAL INTELLIGENCE

Apple Embeds Eight AI Features Across iOS 27, From Bill Splitting to Automated Password Updates

Apple's iOS 27 distributes artificial intelligence across eight features built into existing apps, including a bill-splitting tool in Apple Cash, automated password updates, and natural-language Shortcuts — with a public release expected this fall.

Sara Montes de Oca · 17 HR AGO

№03 · ARTIFICIAL INTELLIGENCE

Apple Embeds AI Across iOS 27 With Bill Splitting, Password Updates, and Smart Notifications

Apple's iOS 27 will bring a range of AI-powered features to iPhone this fall, including receipt-based bill splitting in Apple Cash, autonomous password updates, and smart notification grouping in the Home app — all running through Apple Intelligence.

Sara Montes de Oca · 19 HR AGO

● THE BRIEF · DAILY NEWSLETTER

Five stories every morning. Before the opening bell.

Written for readers who already know the basics — markets, AI, and the policy decisions that shape both.

Mon — Fri · 06:30 ET · Free