1. Abstract

Declare AI content usage preferences when the site intentionally publishes machine-readable usage policy.

Content usage preference signals communicate intended downstream AI use separately from robots.txt crawl permission. They can express training and search preferences for compliant systems without replacing access-control rules.

2. Classification

Check ID: content-signals
Check version: 1.0.0
Package path: lib/checks/content-signals/versions/1.0.0
Category: AI Discoverability
Subcategory: Bot Access Control
Check group: Bot Policy
Check group ID: bot-policy
Maturity: Informational
Scope: site
Check weight: 1

3. Input And Output Contracts

Input: [email protected]
Output: [email protected]
Resources inspected: /robots.txt, Content-Usage response header

4. Scoring Semantics

Step ID	Title	Weight	Description
`http-header`	Find Content-Usage HTTP headers	`0.3`	Inspect the scanned page response for IETF-style Content-Usage policy headers.
`robots-records`	Find robots.txt usage policy records	`0.35`	Fetch robots.txt and inspect Content-Usage and Content-Signal records.
`validate-declarations`	Validate declared usage preferences	`0.35`	Validate declared preference names, values, and legacy completeness.

5. Package Documentation

Content Signal Check v1.0.0

Status

Version: 1.0.0
Check identifier: content-signals
Input contract: [email protected]
Output contract: [email protected]
Scope: site

Abstract

This check detects machine-readable AI content usage preferences. Version 1.0.0 recognizes both the deployed Content-Signal convention and the newer IETF AI Preferences shape using Content-Usage.

The check is isolated: it fetches /robots.txt and inspects the scanned page's response headers itself. It does not consume output from the generic robots.txt check.

Motivation

AI usage preferences are not crawl permission. robots.txt Allow and Disallow records express acquisition policy for compliant crawlers. Content usage preference signals express intended downstream use, such as whether crawlable content may be used for AI training or AI search.

Because the market is still emerging, absence of a signal is treated as a warning rather than a failure. A site can improve this check by publishing machine-readable AI usage policy.

Normative Model

This version recognizes two declaration families:

IETF-style Content-Usage declarations.
Legacy/deployed Content-Signal declarations.

IETF-style declarations may appear as:

Content-Usage: train-ai=n, search=y
Content-Usage: /docs/ train-ai=n, search=y

The supported v1.0.0 IETF terms are:

train-ai with values y or n.
search with values y or n.

Legacy declarations may appear as:

Content-Signal: ai-train=no, search=yes, ai-input=no

The supported v1.0.0 legacy terms are:

ai-train with values yes or no.
search with values yes or no.
ai-input with values yes or no.

Applicability

The check applies to every scanned site. It inspects /robots.txt and the scanned response headers for Content-Usage or Content-Signal records.

If no declaration is found, the result is warning.

Pass Criteria

At least one recognized declaration is present.
All declared recognized terms use valid values for their declaration family.
IETF-style Content-Usage declarations may be partial; unspecified

preferences remain unspecified.

Legacy Content-Signal declarations pass when ai-train, search, and

ai-input are all present with valid values.

Warning Criteria

No Content-Usage or Content-Signal declaration is found.
A legacy Content-Signal declaration is present but partial.
A declaration includes unrecognized extension terms while at least one

recognized term remains valid.

Failure Criteria

A declared record contains malformed preference syntax.
A recognized term uses an invalid value.
Declarations exist, but none include a recognized preference term.

Evidence Model

The result emits:

Declared records from response headers.
Declared records from /robots.txt.
Record source, directive family, line number where available, optional path

scope, parsed preferences, invalid entries, and warnings.

Summary counts by declaration family.

Validation And Scoring Steps

Step	Weight	Purpose
`http-header`	0.30	Inspect the scanned page response for `Content-Usage` headers.
`robots-records`	0.35	Fetch `/robots.txt` and inspect `Content-Usage` and `Content-Signal` records.
`validate-declarations`	0.35	Validate preference terms, values, and legacy completeness.

Standard Behavior

Content-Usage is the preferred standards-track shape for new implementations. Use train-ai=n to decline AI training use, and search=y when AI search use is allowed. Path-scoped robots.txt records are accepted for publishers that need different policy by URL prefix.

Non-Standard And Real-World Behavior

Content-Signal is accepted as deployed evidence because it appears in market guidance and production tooling. This version treats a complete legacy declaration as passing and a partial legacy declaration as warning-level evidence.

Non-Goals And Limitations

This check does not determine whether crawlers comply with the declared

preference.

This check does not replace robots.txt crawl access validation.
This check does not validate legal enforceability.
This check does not require every site to publish AI usage preferences.
This check does not consume results from sibling checks such as TDMRep,

ai.txt, Web Bot Auth, RSL, or AI bot rules.

References

Source: lib/checks/content-signals/versions/1.0.0/docs.md

6. Version Changelog

content-signals v1.0.0 Changelog

Initial versioned package for content-signals.

Migrated v1.0.0 runtime ownership into the versioned package. The check now recognizes both deployed Content-Signal records and IETF-style Content-Usage declarations, treats absence as warning-level, warns on partial legacy declarations, and fails malformed declared records.

Source: lib/checks/content-signals/versions/1.0.0/changelog.md