1. Abstract
Publish a concise llms.txt index that helps agents discover useful public site context.
llms.txt is an emerging Markdown convention for giving language-model clients curated context and links before they crawl broadly. Broken, private, or low-signal links make the file much less useful even when it exists.
2. Classification
- Check ID
- llms
- Check version
- 1.0.0
- Package path
- lib/checks/llms/versions/1.0.0
- Category
- AI Discoverability
- Subcategory
- Content Readiness
- Check group
- Page Structure
- Check group ID
- page-structure
- Maturity
- Emerging recommendation
- Scope
- site
- Check weight
- 1
3. Input And Output Contracts
- Input
- [email protected]
- Output
- [email protected]
- Resources inspected
- /llms.txt, /llms-full.txt
4. Scoring Semantics
| Step ID | Title | Weight | Description |
|---|---|---|---|
fetch | Fetch root llms.txt | 20 | Fetches the required root /llms.txt resource and records status, content type, and size. |
body-shape | Validate Markdown discovery shape | 30 | Checks for a compatible text/Markdown response, H1 title, useful content length, and usable links. |
structure-quality | Score llms.txt structure and usefulness | 20 | Evaluates proposed llms.txt structure, sectioning, link labels, Markdown links, and unsafe/private targets. |
link-probes | Probe sampled linked resources | 25 | Safely probes a bounded sample of linked resources and warns or fails when targets are broken or not agent-readable. |
llms-full | Inspect optional llms-full.txt | 0 | Records whether the optional full corpus file exists and appears text/Markdown compatible. |
5. Package Documentation
llms.txt Check v1.0.0
Status
- Version:
1.0.0 - Check identifier:
llms - Input contract:
[email protected] - Output contract:
[email protected] - Scope: site
- Maturity: emerging recommendation
Abstract
This check validates whether a site publishes a useful root /llms.txt file for agent-readable discovery. It treats llms.txt as an emerging Markdown convention, not as an IETF, W3C, browser, search engine, or crawler-control standard. The check verifies that the root file exists, has a Markdown/text shape, contains a top-level title and useful links, follows the proposed structure where practical, and advertises links that are safe and reachable enough to help agents.
/llms-full.txt is inspected only as optional supporting evidence. It cannot replace the required root /llms.txt index.
Motivation
Language-model clients benefit from concise, curated site context before they crawl broadly. The llms.txt proposal gives site owners a simple Markdown index where they can describe the site and point agents toward important documentation, API references, policies, examples, and other high-signal resources.
The file is only useful when its links are curated, public, descriptive, and working. A syntactically present file with broken, private, misleading, or low-signal links may create more confusion than value.
Normative Model
The check uses the public llms.txt proposal as the primary model:
- A site should publish a root
/llms.txtresource. - The resource should be Markdown or plain text compatible.
- The document should begin with a single H1 title naming the site, product, project, or documentation corpus.
- A short blockquote summary after the H1 is preferred.
- H2 sections should group important links.
- Links should be Markdown links with descriptive labels.
/llms-full.txtmay provide a larger complete corpus, but the concise index remains/llms.txt.
The check also recognizes adjacent standards:
- RFC 7763 registers
text/markdownas a media type for Markdown resources. - CommonMark defines a stable Markdown syntax family, but
llms.txtdoes not require strict CommonMark conformance. - RFC 9309 defines
robots.txtcrawler policy.llms.txtis not an access-control or AI-training opt-out mechanism.
Applicability
This check applies to public websites, documentation sites, API providers, SaaS products, developer platforms, open-source projects, and content sites that want to expose curated agent-readable context.
It is especially relevant when a site has public documentation, API references, changelogs, support docs, pricing, policy pages, examples, or Markdown mirrors.
Pass Criteria
The root /llms.txt resource must:
- Return a successful HTTP response.
- Use a compatible content type such as
text/markdown, another Markdown-marked type, ortext/plain. - Not be an HTML error page, login wall, or JavaScript shell.
- Include an H1 title.
- Contain enough readable text to be useful.
- Include at least one usable HTTP(S), same-origin, or root-relative link.
- Avoid private, internal, credentialed, non-HTTP(S), admin, account, dashboard, checkout, preview, staging, or similar sensitive targets.
- Have at least one sampled linked target that is reachable and agent-readable.
Warnings are emitted when the file is present but weak:
- Missing blockquote summary.
- Missing H2 sections.
- Missing Markdown-formatted links.
- Vague link labels such as raw URLs or "click here".
- Few or no links to agent-friendly resources such as Markdown, API docs, reference docs, changelogs, OpenAPI, JSON, XML, schema, examples, policies, or support docs.
- A minority of sampled links are broken or not clearly agent-readable.
/llms-full.txtis present but is not text/Markdown compatible. Absence of/llms-full.txtis informational only in this version.
Failures are emitted when:
/llms.txtis missing or unreachable.- The root file is not text/Markdown compatible.
- The body looks like HTML instead of Markdown/text.
- The document lacks an H1 title.
- The document is too thin to be useful.
- No usable links are present.
- Unsafe/private links are advertised.
- Most sampled links are broken, or none of the sampled links return agent-readable content.
Evidence Model
The check emits step evidence for:
/llms.txtfetch status, content type, byte/text length, and excerpt.- H1 title and line number.
- Blockquote summary presence.
- H2 section headings.
- Link counts by total, usable, unsafe, Markdown-formatted, raw URL, internal, external, descriptive labels, and agent-friendly hints.
- Sample link records with labels, normalized URLs, same-origin status, safety classification, and agent-friendly hints.
- Bounded link probes with method, final URL, status code, content type, reachability, agent-readability, errors, and short excerpts.
- Optional
/llms-full.txtpresence, status, content type, length, and likely full-corpus signal.
The check caps excerpts and does not log full files or full linked-resource bodies.
Validation And Scoring Steps
- Fetch root
llms.txt.
- Weight: 20%.
- Required.
- Fails when
/llms.txtcannot be fetched successfully.
- Validate Markdown discovery shape.
- Weight: 30%.
- Checks content type, H1 title, text length, HTML leakage, and usable links.
- Fails when the file does not meet the minimum discovery shape.
- Score structure and usefulness.
- Weight: 20%.
- Checks blockquote summary, H2 sections, Markdown links, descriptive labels, agent-friendly hints, duplicate links, and unsafe links.
- Warns for weak structure.
- Fails for unsafe/private links.
- Probe sampled linked resources.
- Weight: 25%.
- Probes a bounded sample of up to 25 links.
- Prioritizes links that look high-value for agents.
- Uses
HEADfirst and falls back to lightweightGET. - Follows redirects with timeouts and short excerpts.
- Fails when no safe public links can be probed, most sampled links are broken, or none return agent-readable content.
- Warns when a minority of sampled links are broken or weak.
- Inspect optional
llms-full.txt.
- Weight: 0%.
- Records whether
/llms-full.txtexists and appears text/Markdown compatible. - Informational only in this version.
Standard Behavior
There is no formal llms.txt standard today. This check therefore does not claim compliance with an IETF, W3C, browser, or search engine standard.
The strongest standards-backed assertions are:
- Markdown resources should use an appropriate Markdown or text media type.
- Markdown syntax should be parseable as a broadly compatible Markdown family.
- Crawler access control belongs to
robots.txt, notllms.txt.
The check accepts text/plain because production llms.txt files commonly use it, including prominent documentation sites.
Non-Standard And Real-World Behavior
Production implementations vary:
- Some documentation sites publish compact
text/plainfiles with long lines and dense Markdown links. - Some developer platforms publish both
/llms.txtand/llms-full.txt. - Some docs platforms expose per-section
llms.txt, per-sectionllms-full.txt, and.mdmirrors. - WordPress and documentation generators can produce
llms.txtautomatically. - Some useful real-world files do not perfectly follow the proposal's blockquote and section layout.
This version treats the proposal structure as quality evidence rather than an absolute syntax gate. It remains stricter about unsafe links, broken links, missing titles, and missing root discovery.
Non-Goals And Limitations
This check does not:
- Treat
llms.txtas crawler policy, AI-training consent, or access control. - Prove that OpenAI, Google, Anthropic, Perplexity, or another AI platform honors the file.
- Crawl every linked resource recursively.
- Validate the semantic correctness of every linked page.
- Prove strict CommonMark conformance.
- Require
/llms-full.txt. - Require every linked resource to be Markdown.
- Fetch private, credentialed, internal, localhost, or non-HTTP(S) targets.
Link usefulness is heuristic. The check uses labels, URL patterns, content types, reachability, and short response excerpts as evidence, but it cannot fully know a site's editorial intent.
References
- llmstxt.org/
- github.com/AnswerDotAI/llms-txt
- www.rfc-editor.org/rfc/rfc7763
- spec.commonmark.org/0.31.2
- www.rfc-editor.org/rfc/rfc9309
- developers.openai.com/api/docs/bots
- developers.google.com/search/docs/crawling-indexing/google-common-crawlers
- developers.cloudflare.com/docs-for-agents
- developer.yoast.com/features/llms-txt/functional-specification
Source: lib/checks/llms/versions/1.0.0/docs.md
6. Version Changelog
llms v1.0.0 Changelog
- Migrated runtime ownership into
lib/checks/llms/versions/1.0.0. - Requires root
/llms.txtdiscovery instead of allowing/llms-full.txtto substitute for the concise index. - Added proposal-aware Markdown structure validation for H1 title, summary, H2 sections, Markdown links, descriptive labels, unsafe targets, and agent-friendly link hints.
- Added bounded link probing for sampled
llms.txtlinks with warnings or failures for broken, unsafe, or non-agent-readable targets. - Added informational inspection of optional
/llms-full.txt. - Expanded versioned documentation with standards, platform guidance, real-world behavior, pass/warn/fail boundaries, evidence, and limitations.
Source: lib/checks/llms/versions/1.0.0/changelog.md