Google Unveils Google-Extended: A New Tool Empowering Website Publishers to Control Data Use for AI Training

Google introduces Google-Extended, allowing publishers to opt out of data utilization for AI model training while staying accessible on Google Search.

Table of Contents

In a move to give website publishers more control over their data, Google has announced the launch of “Google-Extended,” a novel tool designed to enable publishers to manage their data’s usage in training the company’s AI models. The new feature allows websites to continue being crawled and indexed by Googlebot while avoiding their data being incorporated into the development of AI models.

Enhanced Control Over AI Training Data

Google-Extended provides website publishers with the ability to decide whether their sites contribute to improving Bard and Vertex AI generative APIs, offering a unique level of control over their content’s accessibility on the web. This development comes in response to growing concerns regarding the use of publicly available data scraped from the web to train AI systems, particularly after Google confirmed its use for training its AI chatbot, Bard.

The tool’s implementation is facilitated through the robots.txt file, a widely used text document that instructs web crawlers about which parts of a website can be accessed. By using Google-Extended, publishers can now manage their data preferences more efficiently.

Adapting to Evolving AI Landscape

Google acknowledges the expanding landscape of AI applications and pledges to explore additional machine-readable approaches to empower web publishers with even more choices and control. The company assures that it will provide further updates in the near future.

Navigating the Complex Web of Data Usage

The introduction of Google-Extended reflects a broader trend among website publishers who are increasingly concerned about the use of their data for AI training. Many prominent sites, including The New York Times, CNN, Reuters, and Medium, have already taken measures to block web crawlers used by organizations like OpenAI for data scraping and AI model training.

However, distinguishing Google from other web crawlers presents a unique challenge. Complete blocking of Google’s crawlers is not a viable option for many websites, as it would result in them being excluded from Google Search results. To address this issue, some sites, like The New York Times, have resorted to legal measures by updating their terms of service to prohibit companies from using their content for AI training.

However, distinguishing Google from other web crawlers presents a unique challenge. Complete blocking of Google’s crawlers is not a viable option for many websites, as it would result in them being excluded from Google Search results. To address this issue, some sites, like The New York Times, have resorted to legal measures by updating their terms of service to prohibit companies from using their content for AI training.

You may also like...