1. Abstract
Declare AI content usage preferences when the site intentionally publishes machine-readable usage policy.
Content usage preference signals communicate intended downstream AI use separately from robots.txt crawl permission. They can express training and search preferences for compliant systems without replacing access-control rules.
2. Classification
- Check ID
- content-signals
- Check version
- 1.0.0
- Package path
- lib/checks/content-signals/versions/1.0.0
- Category
- AI Discoverability
- Subcategory
- Bot Access Control
- Check group
- Bot Policy
- Check group ID
- bot-policy
- Maturity
- Informational
- Scope
- site
- Check weight
- 1
3. Input And Output Contracts
- Input
- [email protected]
- Output
- [email protected]
- Resources inspected
- /robots.txt, Content-Usage response header
4. Scoring Semantics
| Step ID | Title | Weight | Description |
|---|---|---|---|
http-header | Find Content-Usage HTTP headers | 0.3 | Inspect the scanned page response for IETF-style Content-Usage policy headers. |
robots-records | Find robots.txt usage policy records | 0.35 | Fetch robots.txt and inspect Content-Usage and Content-Signal records. |
validate-declarations | Validate declared usage preferences | 0.35 | Validate declared preference names, values, and legacy completeness. |
5. Package Documentation
Content Signal Check v1.0.0
Status
- Version:
1.0.0 - Check identifier:
content-signals - Input contract:
[email protected] - Output contract:
[email protected] - Scope: site
Abstract
This check detects machine-readable AI content usage preferences. Version 1.0.0 recognizes both the deployed Content-Signal convention and the newer IETF AI Preferences shape using Content-Usage.
The check is isolated: it fetches /robots.txt and inspects the scanned page's response headers itself. It does not consume output from the generic robots.txt check.
Motivation
AI usage preferences are not crawl permission. robots.txt Allow and Disallow records express acquisition policy for compliant crawlers. Content usage preference signals express intended downstream use, such as whether crawlable content may be used for AI training or AI search.
Because the market is still emerging, absence of a signal is treated as a warning rather than a failure. A site can improve this check by publishing machine-readable AI usage policy.
Normative Model
This version recognizes two declaration families:
- IETF-style
Content-Usagedeclarations. - Legacy/deployed
Content-Signaldeclarations.
IETF-style declarations may appear as:
Content-Usage: train-ai=n, search=y
Content-Usage: /docs/ train-ai=n, search=yThe supported v1.0.0 IETF terms are:
train-aiwith valuesyorn.searchwith valuesyorn.
Legacy declarations may appear as:
Content-Signal: ai-train=no, search=yes, ai-input=noThe supported v1.0.0 legacy terms are:
ai-trainwith valuesyesorno.searchwith valuesyesorno.ai-inputwith valuesyesorno.
Applicability
The check applies to every scanned site. It inspects /robots.txt and the scanned response headers for Content-Usage or Content-Signal records.
If no declaration is found, the result is warning.
Pass Criteria
- At least one recognized declaration is present.
- All declared recognized terms use valid values for their declaration family.
- IETF-style
Content-Usagedeclarations may be partial; unspecified
preferences remain unspecified.
- Legacy
Content-Signaldeclarations pass whenai-train,search, and
ai-input are all present with valid values.
Warning Criteria
- No
Content-UsageorContent-Signaldeclaration is found. - A legacy
Content-Signaldeclaration is present but partial. - A declaration includes unrecognized extension terms while at least one
recognized term remains valid.
Failure Criteria
- A declared record contains malformed preference syntax.
- A recognized term uses an invalid value.
- Declarations exist, but none include a recognized preference term.
Evidence Model
The result emits:
- Declared records from response headers.
- Declared records from
/robots.txt. - Record source, directive family, line number where available, optional path
scope, parsed preferences, invalid entries, and warnings.
- Summary counts by declaration family.
Validation And Scoring Steps
| Step | Weight | Purpose |
|---|---|---|
http-header | 0.30 | Inspect the scanned page response for Content-Usage headers. |
robots-records | 0.35 | Fetch /robots.txt and inspect Content-Usage and Content-Signal records. |
validate-declarations | 0.35 | Validate preference terms, values, and legacy completeness. |
Standard Behavior
Content-Usage is the preferred standards-track shape for new implementations. Use train-ai=n to decline AI training use, and search=y when AI search use is allowed. Path-scoped robots.txt records are accepted for publishers that need different policy by URL prefix.
Non-Standard And Real-World Behavior
Content-Signal is accepted as deployed evidence because it appears in market guidance and production tooling. This version treats a complete legacy declaration as passing and a partial legacy declaration as warning-level evidence.
Non-Goals And Limitations
- This check does not determine whether crawlers comply with the declared
preference.
- This check does not replace
robots.txtcrawl access validation. - This check does not validate legal enforceability.
- This check does not require every site to publish AI usage preferences.
- This check does not consume results from sibling checks such as TDMRep,
ai.txt, Web Bot Auth, RSL, or AI bot rules.
References
Source: lib/checks/content-signals/versions/1.0.0/docs.md
6. Version Changelog
content-signals v1.0.0 Changelog
Initial versioned package for content-signals.
Migrated v1.0.0 runtime ownership into the versioned package. The check now recognizes both deployed Content-Signal records and IETF-style Content-Usage declarations, treats absence as warning-level, warns on partial legacy declarations, and fails malformed declared records.
Source: lib/checks/content-signals/versions/1.0.0/changelog.md