1. Abstract

Declare machine-readable content licensing terms for compliant AI and crawler systems when the site needs them.

Really Simple Licensing is an emerging rights-expression layer for communicating content usage and licensing terms separately from robots.txt crawl permission.

2. Classification

Check ID: rsl
Check version: 1.0.0
Package path: lib/checks/rsl/versions/1.0.0
Category: AI Discoverability
Subcategory: Bot Access Control
Check group: Bot Policy
Check group ID: bot-policy
Maturity: Emerging recommendation
Scope: site
Check weight: 1

3. Input And Output Contracts

Input: [email protected]
Output: [email protected]
Resources inspected: /robots.txt, Link rel=license, HTML link rel=license, application/rsl+xml

4. Scoring Semantics

Step ID	Title	Weight	Description
`discover-rsl`	Discover RSL declarations	`0.25`	Find RSL License records in robots.txt, Link headers, HTML links, or inline RSL XML.
`fetch-rsl`	Fetch RSL license documents	`0.25`	Fetch declared RSL license URLs and record content type and reachability.
`validate-rsl`	Validate RSL XML semantics	`0.4`	Validate RSL XML root, namespace, content/license structure, usage/payment/legal vocabulary, and max-age.
`review-rsl`	Review RSL integration	`0.1`	Report weaker integration warnings such as non-preferred content type or mixed discovery surfaces.

5. Package Documentation

RSL Check v1.0.0

Status

Version: 1.0.0
Check identifier: rsl
Input contract: [email protected]
Output contract: [email protected]
Scope: site

Abstract

This check validates Really Simple Licensing discovery and RSL XML license documents. RSL is an emerging rights and licensing signal, not an RFC 9309 crawl-permission rule. Absence is warning-level evidence. When an RSL declaration is present, this version validates the discovery surface, fetches linked license documents, and validates core RSL XML semantics.

Motivation

RSL lets publishers expose machine-readable licensing terms for crawlers and AI systems without overloading robots.txt access rules. The market signal is much stronger than purely ad hoc AI policy files: RSL has a published specification, industry participants, and infrastructure/publisher support. It is still emerging, and major AI-provider compliance is not assumed by this check.

Normative Model

This version recognizes RSL declarations from:

/robots.txt License: records.
HTTP Link headers with rel="license" and type="application/rsl+xml".
HTML <link rel="license" type="application/rsl+xml" href="...">.
Inline HTML <script type="application/rsl+xml">.

RSL license documents are XML. This version validates:

Root <rsl> element.
Namespace https://rslstandard.org/rsl.
At least one <content> element.
Each <content> has url.
Each <content> has at least one <license>.
Each <license> has at least one <permits> or <prohibits>.
Valid usage tokens: all, ai-all, ai-train, ai-input, ai-index,

search.

Valid payment types: purchase, subscription, training, crawl, use,

contribution, attribution, free.

Valid legal types: terms, privacy, license, copyright, other.
max-age, when present, is a positive integer.

The RSL specification defines application/rsl+xml. This media type was not found in the IANA media type registry during the research pass, so this check treats XML-compatible types as warning-level acceptable and prefers application/rsl+xml.

Applicability

The check applies to every scanned site. It looks for RSL declarations across robots.txt, HTTP headers, HTML links, and inline RSL XML.

If no RSL declarations are found, the result is warning.

Pass Criteria

At least one RSL discovery surface is present.
Every discovered RSL URL resolves to an HTTP(S) URL.
Every linked RSL document is reachable and non-empty.
Every RSL document is XML with the expected root and namespace.
Every <content> and <license> satisfies the required structure.
Usage, payment, legal, and max-age values are valid.

Warning Criteria

No RSL declarations are found.
A linked RSL document is XML-compatible but not served as

application/rsl+xml.

Multiple discovery surfaces are used and should be kept consistent.
A <content url> is a pattern or relative path rather than a concrete

absolute HTTP(S) URL.

The same usage token appears in both <permits> and <prohibits>; RSL says

prohibition wins, but publishers should review the intent.

Failure Criteria

An RSL discovery record is malformed.
A linked RSL document is unreachable or empty.
A linked document is not XML.
The XML lacks <rsl> root or the RSL namespace.
The XML lacks <content>.
A <content> lacks url or contains no <license>.
A <license> lacks both <permits> and <prohibits>.
Usage, payment, legal, or max-age values are invalid.

Evidence Model

The result emits:

Discovery sources: robots, HTTP Link, HTML link, or inline script.
Robots line numbers and user-agent group context for License: records.
Resolved RSL URLs and malformed discovery records.
Fetch status, content type, and fetch failures for linked documents.
RSL validation issues and warnings.
Summary counts for content and license elements.
Per-license permit, prohibit, payment, legal, and overlap evidence.

Validation And Scoring Steps

Step	Weight	Purpose
`discover-rsl`	0.25	Find RSL declarations across robots, headers, HTML, and inline XML.
`fetch-rsl`	0.25	Fetch linked RSL license documents.
`validate-rsl`	0.40	Validate RSL XML structure and vocabulary.
`review-rsl`	0.10	Report integration warnings.

Standard Behavior

Use robots.txt for crawl permission and RSL for licensing terms:

User-agent: *
Allow: /

License: https://example.com/rsl.xml

Serve the linked document as RSL XML:

<rsl xmlns="https://rslstandard.org/rsl">
  <content url="https://example.com/articles/*">
    <license>
      <permits>search ai-input</permits>
      <prohibits>ai-train</prohibits>
      <payment type="subscription">https://example.com/license</payment>
      <legal type="terms">https://example.com/terms</legal>
    </license>
  </content>
</rsl>

Non-Standard And Real-World Behavior

Some implementations may serve RSL XML as application/xml, text/xml, or a generic +xml media type while the RSL-specific media type is still emerging. This check warns rather than fails for XML-compatible media types.

RSL discovery can appear in HTTP headers, HTML, and inline scripts. This version validates those homepage-level surfaces in addition to robots.txt.

Non-Goals And Limitations

This check warns, rather than fails, when a site does not publish RSL.
This check does not verify crawler compliance.
This check does not decide legal enforceability.
This check does not perform full XSD validation.
This check does not validate RSS, XMP, ID3, EPUB, or other embedded metadata

surfaces in v1.0.0.

This check does not consume sibling outputs from robots.txt, AI bot rules,

TDMRep, Content Signal, Web Bot Auth, ai.txt, or llms.txt.

References

Source: lib/checks/rsl/versions/1.0.0/docs.md

6. Version Changelog

rsl v1.0.0 Changelog

Initial versioned package for rsl.

Discovers emerging RSL License: records in robots.txt.
Validates declared license locations as HTTP(S) URLs.
Treats missing RSL records as warning-level evidence.
Adds isolated versioned runtime ownership.
Detects HTTP Link, HTML link, and inline RSL XML declarations.
Fetches linked RSL documents and validates core RSL XML semantics.

Source: lib/checks/rsl/versions/1.0.0/changelog.md