1. Abstract
Expose a useful Markdown representation through negotiation or an explicit alternate URL.
Markdown representations give agents a cleaner page form while preserving normal HTML for browsers and cache-safe representation handling.
2. Classification
- Check ID
- markdown-negotiation
- Check version
- 1.0.0
- Package path
- lib/checks/markdown-negotiation/versions/1.0.0
- Category
- AI Discoverability
- Subcategory
- Content Readiness
- Check group
- Page Structure
- Check group ID
- page-structure
- Maturity
- Established
- Scope
- page
- Check weight
- 1
3. Input And Output Contracts
- Input
- [email protected]
- Output
- [email protected]
- Resources inspected
- Accept: text/markdown, rel=alternate, text/markdown, .md
4. Scoring Semantics
| Step ID | Title | Weight | Description |
|---|---|---|---|
markdown-representation | Markdown representation | 0.3 | Find at least one usable Markdown representation for the current page. |
negotiated-markdown | Same-URL negotiation | 0.2 | Validate Accept: text/markdown support and Vary: Accept cache safety when the page URL negotiates Markdown. |
markdown-format | Markdown format validation | 0.2 | Classify Markdown structure, dialect signals, and source leakage risks such as raw HTML or MDX/JSX. |
advertised-alternate | Advertised Markdown alternate | 0.2 | Validate Link or HTML rel=alternate text/markdown discovery for dedicated Markdown URLs. |
conventional-md-mirror | Conventional .md mirror | 0.1 | Record common .md mirror conventions as partial support when not advertised. |
5. Package Documentation
Markdown Negotiation Check v1.0.0
Check identifier: markdown-negotiation
Abstract
This check validates whether a page exposes a useful Markdown representation for agents through standards-aligned HTTP negotiation, an explicitly advertised alternate URL, or a conservative real-world .md mirror convention.
Scope
The check covers page-level Markdown representations. It is not a general llms.txt check, a full-document corpus check, or a test of whether every page on a site has Markdown. It validates the current page URL and the Markdown URLs that can be discovered from that page.
Standards Basis
text/markdown is the registered Markdown media type. A successful Markdown representation SHOULD return Content-Type: text/markdown, commonly with charset=utf-8. Optional media type parameters such as variant MAY be present, but this version does not require any specific Markdown variant.
HTTP clients can request Markdown with Accept: text/markdown. When the same URL returns different representations based on Accept, the Markdown response MUST include Vary: Accept so shared caches do not mix HTML and Markdown representations.
Dedicated Markdown alternate URLs are different from same-URL negotiation. If /docs/page.md is a separate resource and does not vary by Accept, it does not need Vary: Accept.
Markdown syntax itself is not a single IETF-standardized grammar. RFC 7763 registers the media type, and RFC 7764 records Markdown variant guidance. CommonMark provides a stable baseline specification for interoperable Markdown parsing, while GitHub Flavored Markdown extends that baseline with widely deployed features such as tables, task list items, strikethrough, and autolinks.
Discovery Paths
The check evaluates three paths in order:
- Same-URL negotiation: fetch the page URL with
Accept: text/markdown. - Advertised alternate: inspect HTTP
Linkheaders and HTML<link rel="alternate" type="text/markdown" href="...">, then fetch the advertised URL. - Conventional mirror: try conservative real-world Markdown mirror patterns such as appending
.mdto the page path or usingindex.html.mdfor directory URLs.
Advertised alternates are standards-aligned. Conventional mirrors are common in documentation ecosystems, especially around llms.txt, but are weaker evidence unless linked or advertised.
Normative Requirements
- A same-URL Markdown response MUST return a 2xx status,
Content-Type: text/markdown, useful Markdown body content, andVary: Accept. - An advertised Markdown alternate MUST be discoverable through HTTP
Linkor HTMLrel=alternatemetadata withtype="text/markdown". - An advertised Markdown alternate SHOULD return a 2xx status,
Content-Type: text/markdown, and useful Markdown body content. - A conventional
.mdmirror SHOULD be advertised withrel=alternateor HTTPLinkmetadata if it is intended as the canonical Markdown representation. text/plainMarkdown is treated as partial support and returns warning rather than full pass, because the registered Markdown media type istext/markdown.406 Not AcceptableforAccept: text/markdownis a valid HTTP outcome but means the page does not support same-URL Markdown negotiation.
Markdown Format Validation
The check validates that the returned body is structurally useful Markdown, not merely a response with a Markdown media type. This version uses a deterministic structural classifier rather than a full CommonMark parser. It records dialect and feature evidence so reports and future check versions can distinguish clean Markdown from partial or source-leaking implementations.
The classifier records:
- Heading structure, word count, and body excerpt.
- Markdown links and reference-style links.
- Lists, fenced code blocks, and tables.
- GitHub Flavored Markdown signals: tables, task list items, strikethrough, and autolinks.
- YAML frontmatter presence and malformed frontmatter.
- Fenced JSON-LD blocks, which are used by some agent-oriented Markdown renderers to preserve structured data.
- Admonition/directive syntax such as
:::noteand!!! warning, which is common in documentation generators but not part of baseline CommonMark. - Raw HTML tag density.
- MDX or JSX source leakage, such as imports, exports, or capitalized component tags.
The body fails format validation when it is too thin, lacks a Markdown heading, looks like plain text, contains heavy raw HTML, exposes MDX/JSX source, or starts YAML frontmatter without closing it.
Dialect Classification
The selected Markdown candidate is classified into one of these profiles:
commonmark-like: baseline Markdown structure without strong extension signals.gfm-like: Markdown with GitHub Flavored Markdown extension signals.frontmatter-markdown: Markdown with YAML frontmatter metadata.llms-txt-like: an agent-oriented index shape with a top-level heading, section headings, and Markdown link lists.html-heavy: a Markdown response that appears to contain a raw HTML dump.mdx-like: a Markdown response that appears to expose MDX/JSX source.plain-text-like: a response with no meaningful Markdown structure.
Only the first four profiles are considered acceptable when the body is substantive. html-heavy, mdx-like, and plain-text-like profiles fail because many generic agents consume Markdown without a site-specific build pipeline or DOM renderer.
Real-World Implementation Notes
Many modern documentation sites expose Markdown through dedicated .md URLs instead of same-URL content negotiation. The llms.txt proposal recommends a root llms.txt index and Markdown page copies by appending .md; for directory-like pages, index.html.md is a known convention. Stripe, Next.js, and Docusaurus plugin ecosystems all provide examples of Markdown documentation mirrors or llms.txt-driven Markdown discovery.
This check therefore does not require Accept negotiation as the only valid implementation. It prefers standards-aligned negotiation or advertised alternates, and reports unadvertised .md mirrors as partial support.
Cloudflare's Markdown for Agents implementation is a useful non-standard reference point for negotiated Markdown: it serves Markdown for Accept: text/markdown, emits Content-Type: text/markdown; charset=utf-8, includes Vary: accept, strips page chrome, may include YAML frontmatter from page metadata, and preserves JSON-LD in fenced code blocks. This check does not require Cloudflare-specific headers, but it records these document-shape signals when present.
Evidence Model
The check records:
- Selected representation path: negotiated, advertised, or conventional.
- Candidate URLs checked.
- HTTP status,
Content-Type, parsed media type, andVaryheaders. - Markdown quality evidence: dialect, heading presence, word count, structural features, issues, and excerpt.
- Whether cache safety is required for the selected path.
- Whether the selected path is standards-advertised or only conventionally discoverable.
Scoring Model
The check uses five weighted steps:
- Markdown representation: 30%.
- Same-URL negotiation: 20%.
- Markdown format validation: 20%.
- Advertised Markdown alternate: 20%.
- Conventional
.mdmirror: 10%.
Same-URL negotiation and advertised Markdown alternates can pass when their HTTP metadata and body quality are correct. A conventional but unadvertised .md mirror is reported as partial support because agents may find it only by convention. Missing, non-Markdown, thin, or cache-unsafe representations fail.
Limitations
This check is page-local. It does not crawl a full docs corpus, validate every llms.txt link, compare full semantic equivalence between HTML and Markdown, or prove formal CommonMark/GFM conformance. Format validation is structural and conservative: it is designed to catch empty responses, plain text, raw HTML dumps, malformed frontmatter, and MDX/JSX source leakage.
References
- www.rfc-editor.org/rfc/rfc7763
- www.rfc-editor.org/rfc/rfc7764
- www.rfc-editor.org/rfc/rfc9110#name-accept
- www.rfc-editor.org/rfc/rfc9110#name-vary
- www.rfc-editor.org/rfc/rfc8288
- html.spec.whatwg.org/multipage/links.html#link-type-alternate
- spec.commonmark.org/0.31.2
- github.github.com/gfm
- llmstxt.org/
- developers.cloudflare.com/fundamentals/reference/markdown-for-agents
Source: lib/checks/markdown-negotiation/versions/1.0.0/docs.md
6. Version Changelog
markdown-negotiation v1.0.0 Changelog
Initial versioned package for markdown-negotiation.
Source: lib/checks/markdown-negotiation/versions/1.0.0/changelog.md