A meta tag for annotation preferences

Intro

The robots.txt convention (it is not a standard) began on a mailing list in 1994 with this observation:

In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren't welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In other situations robots traversed parts of WWW servers that weren't suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).

These incidents indicated the need for established mechanisms for WWW servers to indicate to robots which parts of their server should not be accessed. This standard addresses this need with an operational solution.

Annotation clients do not impose any operational burden on servers, as web crawlers may do. An annotation client typically operates on a page that’s already been fetched into a browser by a human. We mention robots.txt here only because it’s a well-understood mechanism for a site owner to express preferences for how other software agents should interact (or not) with the site.

The goal of this proposal is to mitigate the need to deploy annotation-blocking technology in cases where a site owner feels that direct overlay of annotation is intrusive.

Proposal

An HTML meta tag that can take one of these forms.

  1. <meta name="annotation-archive" content="https://web.archive.org"> means: “Please send a copy of this page to the Wayback machine and annotate it there.”
  2. <meta name="annotation-archive" content="https://archive.is"> means the same thing for archive.is.
  3. <meta name="annotation-url" content="[URL]"> means: “Please annotate at a copy of the page provided for the purpose.”

For the annotation-archive case, values other than those above can indicate other archives. All we need from an archive to do this is the ability to save a page programmatically and acquire the URL of the archived page.

When an annotation client detects this tag, it should:

  1. Archive a copy in cases 1 and 2 above, and capture the URL of the saved page.
  2. Bind annotations to the URL acquired by way of 1, 2, or 3 above.
  3. Display annotations normally, as overlays on those alternate pages.
  4. Not display annotations on the origin pages, but direct users to the alternate pages where annotation is invited.