Check specification

ai-bot-rules 1.0.0

AI bot rules in robots.txt

Classifies provider-specific AI crawler robots.txt policy.

Assessment Suite
2026.06.10
Maturity
Established
Category
AI Discoverability
Subcategory
Bot Policy

1. Abstract

Declare deliberate robots.txt rules for major AI training, AI search, user-triggered, and dataset crawlers.

AI crawler product tokens have different meanings. Explicit robots.txt groups make training, search, and retrieval access policy auditable for compliant crawler operators.

2. Classification

Check ID
ai-bot-rules
Check version
1.0.0
Package path
lib/checks/ai-bot-rules/versions/1.0.0
Category
AI Discoverability
Subcategory
Bot Access Control
Check group
Bot Policy
Check group ID
bot-policy
Maturity
Established
Scope
site
Check weight
1

3. Input And Output Contracts

Resources inspected
/robots.txt

4. Scoring Semantics

Step IDTitleWeightDescription
fetch-robotsFetch robots.txt0.25Fetch robots.txt before inspecting AI crawler rules.
classify-ai-botsClassify AI crawler rules0.55Evaluate explicit AI crawler User-agent groups and effective root-path policy.
policy-reviewReview AI crawler policy risks0.2Warn on broad search crawler blocks or likely policy mistakes.

5. Package Documentation

AI Bot Rules Check v1.0.0

Checks whether /robots.txt declares explicit policy for major AI training, AI search, user-triggered retrieval, and dataset crawlers.

This check is separate from the generic robots.txt check. The generic check validates discoverability and RFC-shaped parsing. This check interprets provider-specific AI crawler product tokens as Bot Access Control evidence.

Input Contract

[email protected]

Requires the scan origin. The check fetches ${origin}/robots.txt.

Output Contract

[email protected]

Emits stepped evidence for fetch, AI crawler classification, and policy review.

Pass Criteria

  • /robots.txt is available.
  • At least one explicit User-agent group is present for a known AI crawler

token.

  • No broad search crawler blocks are detected at /.

Warning Criteria

  • Explicit AI crawler rules exist, but broad search crawlers such as

Googlebot or Applebot are blocked at /. This may be intentional, but often indicates that the publisher meant to use narrower tokens such as Google-Extended or Applebot-Extended.

Failure Criteria

  • No robots.txt content is available.
  • No explicit User-agent rules are found for known AI crawler tokens.

Crawler Purpose Model

PurposeExamples
AI trainingGPTBot, ClaudeBot, Amazonbot, Bytespider, Meta-ExternalAgent
AI training opt-out tokenGoogle-Extended, Applebot-Extended
AI searchOAI-SearchBot, Claude-SearchBot, PerplexityBot, Amzn-SearchBot, YouBot
User-triggered retrievalChatGPT-User, Claude-User, Perplexity-User, Amzn-User
Dataset crawlersCCBot
General searchGooglebot, Applebot, bingbot

For each known crawler, this version reports effective policy at / as allowed, blocked, or unspecified. Exact crawler groups take precedence over User-agent: * fallback groups.

Scoring Steps

StepWeightPurpose
fetch-robots0.25Fetch robots.txt.
classify-ai-bots0.55Classify explicit AI crawler groups and effective root-path policy.
policy-review0.2Warn on broad search crawler blocks or likely policy mistakes.

Current v1.0.0 Coverage

This version checks:

  • Explicit AI crawler User-agent groups in robots.txt.
  • Effective allow/block/unspecified policy at /.
  • Purpose grouping for training, AI search, user-triggered retrieval, dataset,

and general search crawlers.

  • Broad Googlebot and Applebot blocking warnings.

This version does not validate:

  • Whether crawlers comply with the declared policy.
  • WAF/CDN blocks, IP verification, or server logs.
  • Content-Signal, TDMRep, ai.txt, Web Bot Auth, or RSL; those are sibling Bot

Access Control checks.

References

Source: lib/checks/ai-bot-rules/versions/1.0.0/docs.md

6. Version Changelog

ai-bot-rules v1.0.0 Changelog

Initial versioned package for ai-bot-rules.

  • Classifies known AI crawler tokens by purpose.
  • Evaluates effective root-path allow/block/unspecified policy.
  • Warns on broad Googlebot and Applebot blocks that may indicate accidental search crawler blocking.

Source: lib/checks/ai-bot-rules/versions/1.0.0/changelog.md