The robots.txt convention (it is not a standard) began on a mailing list in 1994 with this observation:
In 1993 and 1994 there have been occasions where robots have visited WWW servers where they weren't welcome for various reasons. Sometimes these reasons were robot specific, e.g. certain robots swamped servers with rapid-fire requests, or retrieved the same files repeatedly. In other situations robots traversed parts of WWW servers that weren't suitable, e.g. very deep virtual trees, duplicated information, temporary information, or cgi-scripts with side-effects (such as voting).
These incidents indicated the need for established mechanisms for WWW servers to indicate to robots which parts of their server should not be accessed. This standard addresses this need with an operational solution.
Annotation clients do not impose any operational burden on servers, as web crawlers may do. An annotation client typically operates on a page that’s already been fetched into a browser by a human. We mention robots.txt here only because it’s a well-understood mechanism for a site owner to express preferences for how other software agents should interact (or not) with the site.
The goal of this proposal is to mitigate the need to deploy annotation-blocking technology in cases where a site owner feels that direct overlay of annotation is intrusive.
An HTML meta tag that can take one of these forms.
For the annotation-archive case, values other than those above can indicate other archives. All we need from an archive to do this is the ability to save a page programmatically and acquire the URL of the archived page.
When an annotation client detects this tag, it should: