1. Abstract
Make page content easy for AI answer engines to extract, cite, and attribute without promising inclusion in any proprietary answer surface.
Generative answer systems work best with visible, self-contained, evidence-backed content, clear entities, trustworthy attribution, structured page sections, and crawler-accessible HTML. These signals improve machine understanding even though they do not guarantee AI citations.
2. Classification
- Check ID
- geo-readiness
- Check version
- 1.0.0
- Package path
- lib/checks/geo-readiness/versions/1.0.0
- Category
- GEO, AIO and AEO
- Subcategory
- GEO, AIO & AEO
- Check group
- AI Answer Readiness
- Check group ID
- ai-answer-readiness
- Maturity
- Emerging recommendation
- Scope
- page
- Check weight
- 1
3. Input And Output Contracts
- Input
- [email protected]
- Output
- [email protected]
- Resources inspected
- Initial HTML, Citation-ready passages, Entity clarity, Structured extraction signals, Trust signals, AI crawler access
4. Scoring Semantics
| Step ID | Title | Weight | Description |
|---|---|---|---|
extractable-html | Extractable HTML | 0.18 | Verify meaningful visible text exists in the initial HTML without depending on client-side rendering. |
entity-clarity | Entity clarity | 0.18 | Align title, h1, description, and primary visible copy around the page's main entity or topic. |
citable-passages | Citable passages | 0.24 | Identify self-contained explanatory passages with context and evidence signals. |
structured-extraction | Structured extraction | 0.18 | Check headings, lists, tables, definitions, summaries, and structured data that help answer extraction. |
source-and-trust | Source and trust signals | 0.12 | Check authorship, publisher, freshness dates, source links, and accountability signals. |
ai-retrieval-access | AI retrieval access | 0.1 | Evaluate robots.txt and snippet controls for relevant AI retrieval and user-triggered crawlers. |
5. Package Documentation
GEO Readiness Check v1.0.0
Status
- Version:
1.0.0 - Check identifier:
geo-readiness - Input contract:
[email protected] - Output contract:
[email protected] - Scope: page
Abstract
The GEO Readiness check evaluates whether a public HTML page is easy for AI answer engines and retrieval systems to extract, understand, cite, and attribute. It checks initial HTML extractability, entity clarity, self-contained citable passages, structured extraction patterns, source and trust signals, and access controls for relevant AI retrieval crawlers.
This check does not promise inclusion in ChatGPT, Perplexity, Google AI Overviews, AI Mode, or any other proprietary answer surface.
Motivation
Generative answer systems can only use what they can fetch, parse, understand, and attribute. Pages with visible explanatory content, clear entities, evidence-backed answer passages, structured sections, and accountable sources are easier to quote or summarize than pages that rely on vague copy, hidden JavaScript-rendered content, generic metadata, or blocked crawlers.
GEO readiness complements traditional SEO, AIO readiness, AEO readiness, structured data checks, robots.txt, llms.txt, and policy/licensing checks. It is a page-level extraction and citability check, not a query-level visibility monitor.
Normative Model
The check models GEO readiness as six independent validation steps:
| Step | Weight | Purpose |
|---|---|---|
extractable-html | 0.18 | Verify meaningful visible text exists in the initial HTML without depending on client-side rendering. |
entity-clarity | 0.18 | Align title, h1, description, and primary visible copy around the page's main entity or topic. |
citable-passages | 0.24 | Identify self-contained explanatory passages with context and evidence signals. |
structured-extraction | 0.18 | Check headings, lists, tables, definitions, summaries, and structured data that help answer extraction. |
source-and-trust | 0.12 | Check authorship, publisher, freshness dates, source links, and accountability signals. |
ai-retrieval-access | 0.10 | Evaluate robots.txt and snippet controls for relevant AI retrieval and user-triggered crawlers. |
Step weights are normalized decimals and sum to 1.0.
The check uses the page's initial HTML as primary evidence. It may use homepage response headers for X-Robots-Tag and fetch /robots.txt from the scanned origin for path-specific AI crawler access evaluation.
Applicability
The check applies to public HTML pages where the page is expected to be understood, summarized, compared, cited, or attributed by AI answer engines.
Applicable pages include:
- Homepages and landing pages.
- Product, service, and feature pages.
- Blog posts, guides, docs, reports, and FAQs.
- Public comparison, pricing, help, and knowledge-base pages.
The check is lower-signal for login-only pages, thin utility routes, binary assets, API JSON, redirects, non-HTML resources, and pages intentionally blocked from public crawling.
Pass Criteria
The check passes when the weighted score is at least 90/100.
A passing page normally has:
- Meaningful visible text in the initial HTML.
- Clear title, single h1, description, and primary-topic alignment.
- At least one self-contained explanatory passage or answer block that can be quoted with context.
- Visible evidence signals for factual claims, such as source links, dates, statistics, named research, or reports.
- Structured headings and sections that expose definitions, comparisons, grouped facts, or steps.
- Relevant typed JSON-LD or equivalent structured entity evidence.
- Authorship, publisher, freshness, source, contact/about, policy, or sameAs/entity trust signals where appropriate.
- No path-specific robots.txt rule blocking relevant AI retrieval or user-triggered crawlers from the scanned URL.
- No snippet controls that remove important text from extraction.
Warning Criteria
The check warns when the weighted score is from 50 through 89.
Warnings include:
- Initial HTML has visible content but is thin, low-density, or likely dependent on client-side rendering.
- Paragraphs are too short, too long, vague, or missing visible evidence signals.
- Title, h1, description, and opening copy are weakly aligned.
- JSON-LD is present but does not expose useful typed entity evidence.
- Heading hierarchy, summaries, lists, tables, definitions, or FAQ-style sections are incomplete.
- Source, author, publisher, date, sameAs, about/contact, policy, or citation signals are missing.
robots.txtcould not be fetched, making AI crawler access unconfirmed.nosnippet,max-snippet:0,X-Robots-Tag, ordata-nosnippetconstrains extractable text.- AI policy/training crawlers are blocked. These are reported separately from retrieval crawler blockers.
Failure Criteria
The check fails when the weighted score is below 50.
Failures include:
- The page has almost no visible, extractable explanatory text in the initial HTML.
- The page entity cannot be determined from title, h1, description, or opening content.
- No meaningful citation-length or evidence-backed explanatory passages are present.
- Relevant AI retrieval or user-triggered crawlers are blocked for the scanned URL path.
- Snippet controls remove text the page appears to need for extraction.
- The page exposes materially misleading machine-readable content compared with visible content.
Evidence Model
The result emits:
- Final URL, raw HTML byte count, and visible word count.
- Step-level score and weight.
- Title, h1 count, h1 snippets, description, topic overlap, and missing entity terms.
- Candidate and citable passages with word counts, snippets, headings, and improvement reasons.
- Heading outline, heading issues, summaries, table/list counts, FAQ pattern evidence, definition counts, JSON-LD count, and JSON-LD
@typevalues. - Source and trust evidence including author, publisher, dates, sameAs values, source links, about/contact links, and policy links.
/robots.txtfetch status, excerpt, crawler-specific access decisions, matched user-agent groups, matched rules, and scanned URL path.- Meta robots,
X-Robots-Tag, anddata-nosnippetextraction controls.
Evidence must not include full HTML, credentials, cookies, authorization headers, or private data.
Validation And Scoring Steps
- Parse the initial HTML and extract visible text while ignoring script, style, SVG, and noscript content.
- Score initial HTML extractability using visible word count, text density, HTML size, and client-rendering root signals.
- Score entity clarity using title, h1 count, description, topic overlap, and title/h1 terms missing from the description.
- Inspect paragraph passages. A useful citation passage is a heuristic, not a standard: roughly 80-220 words, self-contained, and supported by visible evidence signals.
- Score structured extraction using heading structure, summaries, tables with headers, lists, FAQ patterns, definition sentences, and typed JSON-LD.
- Score source and trust signals using authorship, publisher/organization, freshness dates, source links, about/contact links, policy links, and sameAs/entity references.
- Fetch
/robots.txtfrom the origin and evaluate the scanned URL path against relevant AI retrieval, user-triggered, and policy crawlers. - Inspect snippet controls from meta robots,
X-Robots-Tag, anddata-nosnippet. - Compute the weighted 0-100 score using the published step weights. Pass is
>=90, warning is50-89, and fail is<50.
Standard Behavior
Robots evaluation follows RFC 9309-oriented behavior:
- User-agent groups are parsed from
/robots.txt. - Relevant groups are matched case-insensitively by crawler product token.
- Matching groups with the most specific user-agent token are evaluated.
- Path rules are evaluated against the scanned URL path and query.
AllowandDisallowpath patterns support*and$.- The most specific matching rule wins; tied
Allowrules win overDisallow. - Empty or absent matching rules allow crawling for the evaluated path.
The check evaluates retrieval and user-triggered crawlers separately from policy or training crawlers. Google-Extended is reported as a Google Gemini/Vertex AI product-policy signal, not as a Google Search or AI Overview crawler.
Non-Standard And Real-World Behavior
GEO is an emerging practice, not a formal web standard. The check treats llms.txt, sameAs links, FAQ patterns, and paragraph length targets as supporting evidence rather than mandatory requirements.
The check recognizes that platforms use different crawlers and purposes:
OAI-SearchBotfor OpenAI search retrieval.ChatGPT-Userfor user-triggered OpenAI fetches.PerplexityBotandPerplexity-Userfor Perplexity retrieval and user-triggered fetches.ClaudeBot,Claude-SearchBot, andClaude-Userfor Anthropic retrieval and user-triggered fetches.GPTBotfor OpenAI model improvement.Google-Extendedfor Google Gemini/Vertex AI product policy.
Non-Goals And Limitations
This check does not:
- Promise inclusion or ranking in any AI answer surface.
- Perform query-level AI visibility monitoring.
- Replace traditional SEO indexability checks.
- Fully validate structured data; specialized structured-data checks own that behavior.
- Fully validate
/robots.txt; therobots-txtcheck owns full site-level robots syntax validation. - Require
llms.txt, RSL, FAQ schema, Wikipedia presence, Reddit mentions, YouTube mentions, or brand-mention metrics as hard pass criteria. - Execute arbitrary client-side JavaScript to recover hidden content.
AI systems are proprietary and change frequently. A page can be technically ready and still not be selected or cited. A page can also be cited despite imperfect structure if it has unique, authoritative information.
References
- www.rfc-editor.org/rfc/rfc9309.html
- developers.google.com/search/docs/appearance/ai-features
- developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
- developers.google.com/search/docs/crawling-indexing/google-common-crawlers
- developers.google.com/search/docs/appearance/structured-data/intro-structured-data
- developers.google.com/search/docs/fundamentals/creating-helpful-content
- platform.openai.com/docs/bots
- support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler
- docs.perplexity.ai/guides/bots
- www.w3.org/WAI/tutorials/page-structure/headings
- schema.org/
- llmstxt.org/
- arxiv.org/abs/2311.09735
Source: lib/checks/geo-readiness/versions/1.0.0/docs.md
6. Version Changelog
geo-readiness v1.0.0 Changelog
- Added RFC 9309-oriented path-specific AI crawler access evaluation.
- Added
OAI-SearchBot,Claude-SearchBot,Claude-User, andPerplexity-Usercrawler evidence. - Separated AI retrieval and user-triggered crawler access from policy/training crawler signals such as
GPTBotandGoogle-Extended. - Added snippet-control, initial HTML extractability, JSON-LD type, and source/trust evidence.
Source: lib/checks/geo-readiness/versions/1.0.0/changelog.md