Check specification

content-signals 1.0.0

Content Signal

Detects deployed Content-Signal records and IETF-style Content-Usage AI preference declarations.

Assessment Suite
2026.06.10
Maturity
Informational
Category
AI Discoverability
Subcategory
Bot Policy

1. Abstract

Declare AI content usage preferences when the site intentionally publishes machine-readable usage policy.

Content usage preference signals communicate intended downstream AI use separately from robots.txt crawl permission. They can express training and search preferences for compliant systems without replacing access-control rules.

2. Classification

Check ID
content-signals
Check version
1.0.0
Package path
lib/checks/content-signals/versions/1.0.0
Category
AI Discoverability
Subcategory
Bot Access Control
Check group
Bot Policy
Check group ID
bot-policy
Maturity
Informational
Scope
site
Check weight
1

3. Input And Output Contracts

Resources inspected
/robots.txt, Content-Usage response header

4. Scoring Semantics

Step IDTitleWeightDescription
http-headerFind Content-Usage HTTP headers0.3Inspect the scanned page response for IETF-style Content-Usage policy headers.
robots-recordsFind robots.txt usage policy records0.35Fetch robots.txt and inspect Content-Usage and Content-Signal records.
validate-declarationsValidate declared usage preferences0.35Validate declared preference names, values, and legacy completeness.

5. Package Documentation

Content Signal Check v1.0.0

Status

Abstract

This check detects machine-readable AI content usage preferences. Version 1.0.0 recognizes both the deployed Content-Signal convention and the newer IETF AI Preferences shape using Content-Usage.

The check is isolated: it fetches /robots.txt and inspects the scanned page's response headers itself. It does not consume output from the generic robots.txt check.

Motivation

AI usage preferences are not crawl permission. robots.txt Allow and Disallow records express acquisition policy for compliant crawlers. Content usage preference signals express intended downstream use, such as whether crawlable content may be used for AI training or AI search.

Because the market is still emerging, absence of a signal is treated as a warning rather than a failure. A site can improve this check by publishing machine-readable AI usage policy.

Normative Model

This version recognizes two declaration families:

  • IETF-style Content-Usage declarations.
  • Legacy/deployed Content-Signal declarations.

IETF-style declarations may appear as:

Content-Usage: train-ai=n, search=y
Content-Usage: /docs/ train-ai=n, search=y

The supported v1.0.0 IETF terms are:

  • train-ai with values y or n.
  • search with values y or n.

Legacy declarations may appear as:

Content-Signal: ai-train=no, search=yes, ai-input=no

The supported v1.0.0 legacy terms are:

  • ai-train with values yes or no.
  • search with values yes or no.
  • ai-input with values yes or no.

Applicability

The check applies to every scanned site. It inspects /robots.txt and the scanned response headers for Content-Usage or Content-Signal records.

If no declaration is found, the result is warning.

Pass Criteria

  • At least one recognized declaration is present.
  • All declared recognized terms use valid values for their declaration family.
  • IETF-style Content-Usage declarations may be partial; unspecified

preferences remain unspecified.

  • Legacy Content-Signal declarations pass when ai-train, search, and

ai-input are all present with valid values.

Warning Criteria

  • No Content-Usage or Content-Signal declaration is found.
  • A legacy Content-Signal declaration is present but partial.
  • A declaration includes unrecognized extension terms while at least one

recognized term remains valid.

Failure Criteria

  • A declared record contains malformed preference syntax.
  • A recognized term uses an invalid value.
  • Declarations exist, but none include a recognized preference term.

Evidence Model

The result emits:

  • Declared records from response headers.
  • Declared records from /robots.txt.
  • Record source, directive family, line number where available, optional path

scope, parsed preferences, invalid entries, and warnings.

  • Summary counts by declaration family.

Validation And Scoring Steps

StepWeightPurpose
http-header0.30Inspect the scanned page response for Content-Usage headers.
robots-records0.35Fetch /robots.txt and inspect Content-Usage and Content-Signal records.
validate-declarations0.35Validate preference terms, values, and legacy completeness.

Standard Behavior

Content-Usage is the preferred standards-track shape for new implementations. Use train-ai=n to decline AI training use, and search=y when AI search use is allowed. Path-scoped robots.txt records are accepted for publishers that need different policy by URL prefix.

Non-Standard And Real-World Behavior

Content-Signal is accepted as deployed evidence because it appears in market guidance and production tooling. This version treats a complete legacy declaration as passing and a partial legacy declaration as warning-level evidence.

Non-Goals And Limitations

  • This check does not determine whether crawlers comply with the declared

preference.

  • This check does not replace robots.txt crawl access validation.
  • This check does not validate legal enforceability.
  • This check does not require every site to publish AI usage preferences.
  • This check does not consume results from sibling checks such as TDMRep,

ai.txt, Web Bot Auth, RSL, or AI bot rules.

References

Source: lib/checks/content-signals/versions/1.0.0/docs.md

6. Version Changelog

content-signals v1.0.0 Changelog

Initial versioned package for content-signals.

Migrated v1.0.0 runtime ownership into the versioned package. The check now recognizes both deployed Content-Signal records and IETF-style Content-Usage declarations, treats absence as warning-level, warns on partial legacy declarations, and fails malformed declared records.

Source: lib/checks/content-signals/versions/1.0.0/changelog.md