1. Abstract

Expose a useful Markdown representation through negotiation or an explicit alternate URL.

Markdown representations give agents a cleaner page form while preserving normal HTML for browsers and cache-safe representation handling.

2. Classification

Check ID: markdown-negotiation
Check version: 1.0.0
Package path: lib/checks/markdown-negotiation/versions/1.0.0
Category: AI Discoverability
Subcategory: Content Readiness
Check group: Page Structure
Check group ID: page-structure
Maturity: Established
Scope: page
Check weight: 1

3. Input And Output Contracts

Input: [email protected]
Output: [email protected]
Resources inspected: Accept: text/markdown, rel=alternate, text/markdown, .md

4. Scoring Semantics

Step ID	Title	Weight	Description
`markdown-representation`	Markdown representation	`0.3`	Find at least one usable Markdown representation for the current page.
`negotiated-markdown`	Same-URL negotiation	`0.2`	Validate Accept: text/markdown support and Vary: Accept cache safety when the page URL negotiates Markdown.
`markdown-format`	Markdown format validation	`0.2`	Classify Markdown structure, dialect signals, and source leakage risks such as raw HTML or MDX/JSX.
`advertised-alternate`	Advertised Markdown alternate	`0.2`	Validate Link or HTML rel=alternate text/markdown discovery for dedicated Markdown URLs.
`conventional-md-mirror`	Conventional .md mirror	`0.1`	Record common .md mirror conventions as partial support when not advertised.

5. Package Documentation

Markdown Negotiation Check v1.0.0

Check identifier: markdown-negotiation

Abstract

This check validates whether a page exposes a useful Markdown representation for agents through standards-aligned HTTP negotiation, an explicitly advertised alternate URL, or a conservative real-world .md mirror convention.

Scope

The check covers page-level Markdown representations. It is not a general llms.txt check, a full-document corpus check, or a test of whether every page on a site has Markdown. It validates the current page URL and the Markdown URLs that can be discovered from that page.

Standards Basis

text/markdown is the registered Markdown media type. A successful Markdown representation SHOULD return Content-Type: text/markdown, commonly with charset=utf-8. Optional media type parameters such as variant MAY be present, but this version does not require any specific Markdown variant.

HTTP clients can request Markdown with Accept: text/markdown. When the same URL returns different representations based on Accept, the Markdown response MUST include Vary: Accept so shared caches do not mix HTML and Markdown representations.

Dedicated Markdown alternate URLs are different from same-URL negotiation. If /docs/page.md is a separate resource and does not vary by Accept, it does not need Vary: Accept.

Markdown syntax itself is not a single IETF-standardized grammar. RFC 7763 registers the media type, and RFC 7764 records Markdown variant guidance. CommonMark provides a stable baseline specification for interoperable Markdown parsing, while GitHub Flavored Markdown extends that baseline with widely deployed features such as tables, task list items, strikethrough, and autolinks.

Discovery Paths

The check evaluates three paths in order:

Same-URL negotiation: fetch the page URL with Accept: text/markdown.
Advertised alternate: inspect HTTP Link headers and HTML <link rel="alternate" type="text/markdown" href="...">, then fetch the advertised URL.
Conventional mirror: try conservative real-world Markdown mirror patterns such as appending .md to the page path or using index.html.md for directory URLs.

Advertised alternates are standards-aligned. Conventional mirrors are common in documentation ecosystems, especially around llms.txt, but are weaker evidence unless linked or advertised.

Normative Requirements

A same-URL Markdown response MUST return a 2xx status, Content-Type: text/markdown, useful Markdown body content, and Vary: Accept.
An advertised Markdown alternate MUST be discoverable through HTTP Link or HTML rel=alternate metadata with type="text/markdown".
An advertised Markdown alternate SHOULD return a 2xx status, Content-Type: text/markdown, and useful Markdown body content.
A conventional .md mirror SHOULD be advertised with rel=alternate or HTTP Link metadata if it is intended as the canonical Markdown representation.
text/plain Markdown is treated as partial support and returns warning rather than full pass, because the registered Markdown media type is text/markdown.
406 Not Acceptable for Accept: text/markdown is a valid HTTP outcome but means the page does not support same-URL Markdown negotiation.

Markdown Format Validation

The check validates that the returned body is structurally useful Markdown, not merely a response with a Markdown media type. This version uses a deterministic structural classifier rather than a full CommonMark parser. It records dialect and feature evidence so reports and future check versions can distinguish clean Markdown from partial or source-leaking implementations.

The classifier records:

Heading structure, word count, and body excerpt.
Markdown links and reference-style links.
Lists, fenced code blocks, and tables.
GitHub Flavored Markdown signals: tables, task list items, strikethrough, and autolinks.
YAML frontmatter presence and malformed frontmatter.
Fenced JSON-LD blocks, which are used by some agent-oriented Markdown renderers to preserve structured data.
Admonition/directive syntax such as :::note and !!! warning, which is common in documentation generators but not part of baseline CommonMark.
Raw HTML tag density.
MDX or JSX source leakage, such as imports, exports, or capitalized component tags.

The body fails format validation when it is too thin, lacks a Markdown heading, looks like plain text, contains heavy raw HTML, exposes MDX/JSX source, or starts YAML frontmatter without closing it.

Dialect Classification

The selected Markdown candidate is classified into one of these profiles:

commonmark-like: baseline Markdown structure without strong extension signals.
gfm-like: Markdown with GitHub Flavored Markdown extension signals.
frontmatter-markdown: Markdown with YAML frontmatter metadata.
llms-txt-like: an agent-oriented index shape with a top-level heading, section headings, and Markdown link lists.
html-heavy: a Markdown response that appears to contain a raw HTML dump.
mdx-like: a Markdown response that appears to expose MDX/JSX source.
plain-text-like: a response with no meaningful Markdown structure.

Only the first four profiles are considered acceptable when the body is substantive. html-heavy, mdx-like, and plain-text-like profiles fail because many generic agents consume Markdown without a site-specific build pipeline or DOM renderer.

Real-World Implementation Notes

Many modern documentation sites expose Markdown through dedicated .md URLs instead of same-URL content negotiation. The llms.txt proposal recommends a root llms.txt index and Markdown page copies by appending .md; for directory-like pages, index.html.md is a known convention. Stripe, Next.js, and Docusaurus plugin ecosystems all provide examples of Markdown documentation mirrors or llms.txt-driven Markdown discovery.

This check therefore does not require Accept negotiation as the only valid implementation. It prefers standards-aligned negotiation or advertised alternates, and reports unadvertised .md mirrors as partial support.

Cloudflare's Markdown for Agents implementation is a useful non-standard reference point for negotiated Markdown: it serves Markdown for Accept: text/markdown, emits Content-Type: text/markdown; charset=utf-8, includes Vary: accept, strips page chrome, may include YAML frontmatter from page metadata, and preserves JSON-LD in fenced code blocks. This check does not require Cloudflare-specific headers, but it records these document-shape signals when present.

Evidence Model

The check records:

Selected representation path: negotiated, advertised, or conventional.
Candidate URLs checked.
HTTP status, Content-Type, parsed media type, and Vary headers.
Markdown quality evidence: dialect, heading presence, word count, structural features, issues, and excerpt.
Whether cache safety is required for the selected path.
Whether the selected path is standards-advertised or only conventionally discoverable.

Scoring Model

The check uses five weighted steps:

Markdown representation: 30%.
Same-URL negotiation: 20%.
Markdown format validation: 20%.
Advertised Markdown alternate: 20%.
Conventional .md mirror: 10%.

Same-URL negotiation and advertised Markdown alternates can pass when their HTTP metadata and body quality are correct. A conventional but unadvertised .md mirror is reported as partial support because agents may find it only by convention. Missing, non-Markdown, thin, or cache-unsafe representations fail.

Limitations

This check is page-local. It does not crawl a full docs corpus, validate every llms.txt link, compare full semantic equivalence between HTML and Markdown, or prove formal CommonMark/GFM conformance. Format validation is structural and conservative: it is designed to catch empty responses, plain text, raw HTML dumps, malformed frontmatter, and MDX/JSX source leakage.

References

Source: lib/checks/markdown-negotiation/versions/1.0.0/docs.md

6. Version Changelog

markdown-negotiation v1.0.0 Changelog

Initial versioned package for markdown-negotiation.

Source: lib/checks/markdown-negotiation/versions/1.0.0/changelog.md