Proposal: Meta Tag for AI Consent Management #9334

brennancaldwell · 2023-05-25T15:27:38Z

Introduction

With the rapid growth of artificial intelligence, and especially machine learning models that train on web data, the issue of data usage consent has become more relevant than ever. Currently, there is no standard way for website owners to express their consent or otherwise for AI models to use their data for training or crawling purposes. This proposal seeks to address this issue by introducing a new HTML meta tag called ai-consent.

The Proposed Solution

I propose the introduction of an HTML meta tag named ai-consent. This tag would have a content attribute with the following possible values:

all: The website owner consents to the use of their content for both AI model training and live search operations.
search-only: The website owner consents to the use of their content for live search operations only, provided the source website is cited by the AI agent. They do not consent to the use of their content for AI model training.
none: The website owner does not consent to the use of their content by AI for any purpose.

The tag would appear in the <head> of an HTML document. For example:

<meta name="ai-consent" content="all">

Use Cases and Examples

Below are some examples of how the ai-consent tag could be used:

A news website owner wants their articles to be included in both AI training and search results. They would use:

<meta name="ai-consent" content="all">

A personal blog author does not want their content included in AI model training but is fine with it being used for live search results, provided the blog is cited. They would use:

<meta name="ai-consent" content="search-only">

A privacy-focused website's owner does not want their content used by AI at all. They would use:

<meta name="ai-consent" content="none">

Considerations

This proposal introduces a method for website owners to manage consent regarding AI data usage and is similar in intent to the noindex meta tag. However, it does not enforce the consent. It would be the responsibility of AI creators and operators to respect and enforce these tags, which might not happen short of robust regulation. Additionally, the proposed tag would need to be included in popular web crawlers' whitelists of meta tags.

Conclusion

The proposed ai-consent meta tag provides a standard method for website owners to express their consent for AI data usage. It would promote transparency and respect for website owners' data preferences, contributing to a more ethical web environment for AI.

The text was updated successfully, but these errors were encountered:

rthrejheytjyrtj545 · 2023-05-25T16:59:12Z

Why should the author explicitly choose none to indicate that they do not agree? What is meant by the absence of this type of metadata?

Doesn't this sentence duplicate the existing license link type? Interested parties can already create a mechanism like CC REL and provide the appropriate legal background, this is an organizational issue, not a technological one.

brennancaldwell · 2023-05-25T17:35:46Z

These are great points! Thank you for pointing these out. I had considered just proposing all and search-only -- I believe the default assumption should be no consent.

I also agree that this is more a question of organization than technology. The details of implementation aren't important to me so much as agreeing on a standard for establishing consent specifically in the case of model training and search. Perhaps this can indeed be handled using a license link tag.

rthrejheytjyrtj545 · 2023-05-25T18:22:18Z

By the way, if you leave it in force something similar to DNT, you can move the proposal to the Microformats Wiki (which will be officially recognized as a specification), or go with the same to WICG. Also, bikeshedding: something like notraining and nosnipping would sound more “vanilla”.

brennancaldwell · 2023-05-25T19:08:08Z

Thank you!

rthrejheytjyrtj545 · 2023-05-25T19:09:27Z

No problem. What I suggested to you in the comment above is a move away from metadata in favor of a link type.

You can, of course, write a specification and send it to MetaExtensions, but this is a chore and “However, a new metadata name should not be created in any of the following cases: If the name is for something expected to have processing requirements in user agents; in that case it ought to be standardized” might be applicable given that crawlers are also UA in some way. So <link href = . rel = training/> might be a good option...

ramijwar · 2023-05-26T09:04:42Z

wow that's awesome

saschanaz · 2023-06-23T22:44:28Z

FYI, DeviantArt and SketchFab came up with <meta name="robots" content="noai">.

myakura · 2023-06-27T05:23:04Z

I believe that bots can crawl non-HTML resource files, such as source codes or images. Isn't it better to define this in (or on top of) the robots.txt protocol?
https://datatracker.ietf.org/doc/html/rfc9309

rthrejheytjyrtj545 · 2023-06-27T10:10:09Z

@myakura, no, because there are countless crawlers in the future, and the author cannot be made responsible for following them. In addition, no one wants to limit crawling in this case, only the use of the collected content.

jfhr · 2023-08-15T22:34:35Z

One consideration here is that crawlers would need to download each individual page to find out if it has an ai-consent meta tag. Downloading lots of pages just to find out you can't use them is a waste of money - as long as this is a voluntary standard, companies would be less incentivized to respect it at all.

The robots.txt standard avoids exactly that problem by having a single file for an entire origin. Perhaps a similar file could be introduced for ai consent management. e.g.

All: /documentation
Search-Only: /weblog
None: /personal

This could be hosted under a well-known URI such as /.well-known/ai-consent.txt

domenic added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Jun 1, 2023

evayde mentioned this issue Jul 2, 2023

Proposal: Meta Tag for AI Generated Content #9479

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Meta Tag for AI Consent Management #9334

Proposal: Meta Tag for AI Consent Management #9334

brennancaldwell commented May 25, 2023 •

edited

rthrejheytjyrtj545 commented May 25, 2023

brennancaldwell commented May 25, 2023

rthrejheytjyrtj545 commented May 25, 2023 •

edited

brennancaldwell commented May 25, 2023

rthrejheytjyrtj545 commented May 25, 2023 •

edited

ramijwar commented May 26, 2023

saschanaz commented Jun 23, 2023 •

edited

myakura commented Jun 27, 2023

rthrejheytjyrtj545 commented Jun 27, 2023

jfhr commented Aug 15, 2023

Proposal: Meta Tag for AI Consent Management #9334

Proposal: Meta Tag for AI Consent Management #9334

Comments

brennancaldwell commented May 25, 2023 • edited

Introduction

The Proposed Solution

Use Cases and Examples

Considerations

Conclusion

rthrejheytjyrtj545 commented May 25, 2023

brennancaldwell commented May 25, 2023

rthrejheytjyrtj545 commented May 25, 2023 • edited

brennancaldwell commented May 25, 2023

rthrejheytjyrtj545 commented May 25, 2023 • edited

ramijwar commented May 26, 2023

saschanaz commented Jun 23, 2023 • edited

myakura commented Jun 27, 2023

rthrejheytjyrtj545 commented Jun 27, 2023

jfhr commented Aug 15, 2023

brennancaldwell commented May 25, 2023 •

edited

rthrejheytjyrtj545 commented May 25, 2023 •

edited

rthrejheytjyrtj545 commented May 25, 2023 •

edited

saschanaz commented Jun 23, 2023 •

edited