llms.txtrobots.txtAI SEOAI crawlers

llms.txt vs robots.txt: The AI SEO Publishing Guide

Compare 4 AI publishing signals: robots.txt, llms.txt, ai.txt, and Content-Signal, with audit steps for crawl access, context, policy, proof, and updates.

By Editors at CanAgentUse· June 4, 2026· 7 min read

Copy article as Markdown

Side-by-side AI publishing map comparing llms.txt, robots.txt, ai.txt, and Content-Signal roles.

TL;DR: robots.txt tells crawlers which URLs they may request. llms.txt gives AI systems a concise map of important public content. They are not substitutes. Publish both when you want AI visibility: use robots.txt for access policy, llms.txt for context, and real fetch tests for proof.

AI SEO now has a file problem. Teams hear about llms.txt, update robots.txt, block a training bot, add a sitemap, and assume the site is ready for AI search. Those files do different jobs. Blending them creates weak policy, weak context, and weak audit evidence.

This guide extends the AI agent readiness guide and the AI crawler access policy. Use it to decide what to publish, what not to promise, and how to prove that AI systems can read the public pages you want cited.

What is llms.txt?

llms.txt is a proposed plain-text or Markdown file that summarizes a website for large language models. The public proposal at llmstxt.org describes it as a way to offer LLM-friendly information at a predictable location, usually /llms.txt or /llms-full.txt (llmstxt.org, 2026-06-03).

llms.txt is best treated as guidance, not access control. The llmstxt.org proposal describes a predictable file that helps language models find important public context, while crawler blocking still belongs in documented crawler policy systems such as robots.txt.

The important word is guidance. llms.txt can point an AI system toward your product overview, docs, API references, pricing page, blog hub, and support resources. It can also summarize what the site does in language that is easier to extract than a navigation menu or footer.

Unique Insight

The practical value is consistency. If your homepage says one thing, your docs say another, and your blog uses three names for the same product, an AI system has to reconcile the mess. A clean llms.txt gives your preferred source order and entity names.

What does robots.txt still control?

robots.txt tells compliant crawlers which URL paths they may request. Google documents robots.txt as a crawl-control file, not as a private-content security layer (Google Search Central, 2026-06-03). It remains the first file to audit when AI crawler access looks wrong.

robots.txt is an access-policy signal for crawlers. It can allow or disallow paths for specific user agents, but it does not authenticate users, protect private data, or explain which content is most useful for AI summaries.

This distinction matters because OpenAI, Anthropic, Perplexity, and Google document different crawler purposes. A training crawler, a search crawler, and a user-directed fetcher may deserve different rules. The AI crawler access guide covers those classes in more detail.

Use robots.txt to express what crawlers may fetch. Use authentication, server authorization, and paywalls to protect private content. Do not put private URLs into llms.txt and hope crawlers behave. Public guidance files should only reference public resources.

Decision tree showing when to use robots.txt, llms.txt, authentication, or policy metadata for AI publishing decisions.

Should AI SEO teams publish both files?

Yes, most public SaaS sites should publish both files, but for different jobs. robots.txt states access policy. llms.txt states AI-readable context. ai.txt or Content-Signal-style metadata may express usage preferences, but adoption and enforcement vary by platform.

File or signal	Primary job	Enforces access?	Best owner	Audit evidence
`robots.txt`	Crawler path policy	Partly, for compliant crawlers	SEO and engineering	200 status, correct rules, sitemap
`llms.txt`	AI-readable content map	No	Content and product marketing	200 status, useful links, current summary
`ai.txt`	AI usage preference	No universal standard	Legal and publishing	Public policy text
Content-Signal	Machine-readable usage preference	Platform dependent	Legal and engineering	Header or metadata presence

The safest AI SEO pattern is file separation. Access policy belongs in robots.txt, AI-readable context belongs in llms.txt, and usage preferences belong in explicit policy signals that legal, publishing, and engineering teams can review.

The files should agree without repeating each other. If robots.txt blocks /docs/, do not list /docs/ as the primary source in llms.txt. If llms.txt points to /api, make sure that route returns a useful public page and not a JavaScript-only shell.

Does llms.txt help rankings or AI citations?

There is no reliable public evidence that llms.txt directly improves Google rankings. Treat it as an AI discoverability and consistency asset, not a ranking factor. The honest benefit is that it gives machines and humans a stable map of your preferred public context.

llms.txt should not be sold as a guaranteed ranking lever. Its measurable value is operational: file presence, successful fetch status, accurate product summary, and links to pages that already deserve citation through quality, authority, and crawl access.

That cautious framing is important for trust. AI visibility work already has enough vague promises. A scanner can verify whether /llms.txt exists, whether it returns 200, whether it links to canonical pages, and whether those pages are crawlable. It cannot prove that every assistant consumes the file.

Personal Experience

In audits, the most useful llms.txt finding is often not the file itself. It is the mismatch it reveals: outdated docs, missing API links, blocked crawler paths, or product claims that do not match the actual public pages.

What should a SaaS llms.txt include?

A SaaS llms.txt should include the product identity, short positioning, key public pages, docs, API resources, pricing or contact routes, and the blog hub. Keep it short enough to be useful, then link to deeper resources instead of pasting the whole site.

Start with this structure:

MD# Product Name

Short description of what the product does and who it helps.

## Key pages
- Product overview: https://example.com/
- Pricing: https://example.com/pricing
- Documentation: https://example.com/docs
- API reference: https://example.com/openapi.json
- Blog: https://example.com/blog

## Preferred context
Use the product name consistently. Public docs are the source of truth for features.

For CanAgentUse, the file should point to the scanner, the check catalog, the GEO vs SEO strategy, the crawler guide, and the OpenAPI to MCP guide. That creates a clean path from answer to action.

How do you audit llms.txt and robots.txt together?

Audit both files as a chain: status, content type, canonical consistency, linked resources, crawler policy, and downstream page retrieval. A file that exists but points to blocked or stale pages is not a strong AI visibility signal.

A combined robots.txt and llms.txt audit should verify both intent and evidence. The strongest pass includes live file status, canonical links, crawler policy alignment, successful fetches for linked pages, and an owner for updates.

Use this checklist:

Fetch /robots.txt and /llms.txt from the public domain.
Confirm both return 200 without redirects to irrelevant paths.
Confirm robots.txt exposes the sitemap and intentional AI crawler rules.
Confirm llms.txt links only to public canonical URLs.
Fetch every URL listed in llms.txt.
Compare product names, feature claims, and page titles.
Recheck after CMS, docs, or CDN changes.

The CanAgentUse check catalog should treat this as evidence, not decoration. A pass should mean the files are live, coherent, and connected to pages that AI systems can actually retrieve.

After the file audit, test the action layer too. The MCP server SEO guide explains how public context connects to OpenAPI, MCP tools, and scoped agent actions.

Audit evidence table for robots.txt, llms.txt, linked URLs, product claims, and rescan ownership.

FAQ

Is llms.txt required for Google?

No. Google has not documented llms.txt as a requirement for indexing or ranking. Keep using standard SEO fundamentals: crawlable pages, helpful content, canonical URLs, schema, and internal links.

Can llms.txt block AI training?

No. llms.txt is not an access-control file. Use documented crawler policies, contracts, authentication, and legal terms for training preferences. Use llms.txt to organize public context.

Should llms.txt include private URLs?

No. Only include public URLs that you want machines and readers to discover. Private reports, dashboards, admin pages, and account routes should stay behind authentication.

How often should it be updated?

Update llms.txt whenever product positioning, docs, pricing, API references, or major content hubs change. For most SaaS sites, a monthly audit is enough.

Research sources

llmstxt.org, llms.txt proposal, 2026-06-03.
Yoast Developer Portal, llms.txt functional specification, 2026-06-03.
Google Search Central, Robots.txt introduction, 2026-06-03.
OpenAI, Overview of OpenAI Crawlers, 2026-06-03.
Perplexity, Perplexity Crawlers, 2026-06-03.