1. Abstract
Expose modified and published dates for freshness-aware retrieval, citation, and ranking.
Freshness signals help agents, crawlers, and search systems decide whether content is current enough to cite, summarize, cache, or compare against newer sources.
2. Classification
- Check ID
- content-freshness
- Check version
- 1.0.0
- Package path
- lib/checks/content-freshness/versions/1.0.0
- Category
- AI Discoverability
- Subcategory
- Content Readiness
- Check group
- Page Structure
- Check group ID
- page-structure
- Maturity
- Established
- Scope
- page
- Check weight
- 1
3. Input And Output Contracts
- Input
- [email protected]
- Output
- [email protected]
- Resources inspected
- Last-Modified, dateModified, datePublished, dateCreated, uploadDate, article:modified_time, dcterms.modified
4. Scoring Semantics
| Step ID | Title | Weight | Description |
|---|---|---|---|
http-last-modified | Validate Last-Modified header | 25 | Checks whether the page response exposes a parseable HTTP Last-Modified date. |
structured-dates | Validate structured freshness dates | 50 | Collects Schema.org freshness dates from JSON-LD, Microdata, and RDFa. |
visible-meta-dates | Validate metadata freshness dates | 25 | Collects Open Graph, Dublin Core, and generic meta freshness dates. |
sitemap-lastmod | Corroborate with sitemap lastmod | 0 | Discovers sitemaps from robots.txt and standard paths, then records matching page lastmod evidence when available. |
date-consistency | Check date validity and consistency | 0 | Fails invalid, future, or impossible date ordering such as dateModified before datePublished. |
5. Package Documentation
Content Freshness Signals Check v1.0.0
Status
- Version:
1.0.0 - Check identifier:
content-freshness - Input contract:
[email protected] - Output contract:
[email protected] - Scope: page
- Maturity: established
Abstract
This check validates whether a page exposes trustworthy content freshness signals for agents, crawlers, search systems, caches, and citation workflows. It collects dates from HTTP headers, Schema.org structured data, Open Graph, Dublin Core, generic metadata, and matching sitemap <lastmod> entries.
Schema.org freshness dates are syntax-neutral in this version. JSON-LD, Microdata, and RDFa all count when they expose valid properties such as datePublished, dateModified, dateCreated, or uploadDate.
Motivation
Agents need to decide whether a page is current enough to cite, summarize, compare, cache, or use as operational guidance. A page with clear published and updated dates is easier to trust than a page with no date, invalid dates, conflicting dates, or dates that describe unrelated events.
Freshness is not one single standard. It is a layered evidence model. HTTP validators describe representation freshness, structured data describes content freshness, and metadata helps platforms extract the same facts consistently.
Normative Model
The check recognizes these primary models:
- HTTP
Last-Modifiedfrom RFC 9110 for the selected representation's last modification time. - Schema.org
datePublished,dateModified,dateCreated, anduploadDateon content-bearing entities. - Open Graph article dates:
article:published_time,article:modified_time, andarticle:expiration_time. - Dublin Core terms:
dc.date,dcterms.created,dcterms.issued, anddcterms.modified. - Sitemap
<lastmod>entries are fetched as corroborating page freshness signals when the scanned URL appears in a discovered sitemap. Robots.txt is used only to discoverSitemap:locations. - Feed dates are recognized as related corroborating signals, but this version does not fetch feed resources.
The check treats dateModified earlier than datePublished, invalid dates, and future-dated freshness claims as correctness failures.
Applicability
This check applies to public pages whose usefulness depends on recency, including articles, blog posts, documentation, API docs, changelogs, product documentation, policy pages, tutorials, news, support articles, and high-value evergreen resources.
The check can be not applicable for static app/product homepages that expose SoftwareApplication, WebApplication, or MobileApplication schema but do not present editorial/update content and do not claim freshness.
Pass Criteria
A page passes when:
- It exposes a valid HTTP
Last-Modifiedheader when the representation timestamp can be reliably determined. - It exposes at least one valid structured freshness date through JSON-LD, Microdata, or RDFa.
- It exposes at least one metadata freshness date through Open Graph, Dublin Core, or a clear meta date field.
- All dates are parseable.
- No freshness date is future-dated beyond a small clock-skew allowance.
- Modified/uploaded dates are not earlier than published/issued dates.
- Matching sitemap
<lastmod>is valid when present. It is corroborating evidence, not a required pass condition.
Warnings are emitted when:
- Only one or two freshness layers are present.
Last-Modifiedis absent but page-level dates exist.- Sitemap
<lastmod>and structured modified/upload dates do not roughly agree. - Date semantics are weak, such as generic
meta name="date"without clearer published/modified labeling.
Failures are emitted when:
- A freshness-sensitive page has no usable date signals.
- Any collected freshness date is invalid.
- Any collected freshness date is future-dated.
- A modified/uploaded date is earlier than a published/issued date.
Evidence Model
The check emits:
lastModified: raw HTTPLast-Modifiedvalue.httpDate: parsed HTTP date evidence.structuredDates: syntax-neutral Schema.org dates from JSON-LD, Microdata, and RDFa.structuredDateFormats: counts by structured syntax.metaDates: Open Graph, Dublin Core, and generic meta dates.timeDates: retained as an empty compatibility field; visible<time datetime>dates are not collected as freshness evidence in this version.sitemapDate: matching sitemap<lastmod>for the scanned page when found.sitemap: robots.txt discovery details, sitemap fetch attempts, parsed sitemap summaries, and matched URL evidence.validDateCountandinvalidDateCount.issues: invalid, future, ordering, or consistency findings.
Each date record includes:
- source syntax or channel.
- property name.
- raw value.
- parsed ISO value when parseable.
- validity flag.
- schema types when available.
Validation And Scoring Steps
- Validate
Last-Modified.
- Weight: 25.
- Counts when the header is present and parseable.
- Validate structured freshness dates.
- Weight: 50.
- Counts when JSON-LD, Microdata, or RDFa exposes a valid Schema.org freshness date.
- Validate metadata freshness dates.
- Weight: 25.
- Counts when Open Graph, Dublin Core, or generic meta dates expose valid freshness dates.
- Corroborate with sitemap
<lastmod>.
- Guardrail/corroboration step.
- Discovers sitemaps from robots.txt
Sitemap:lines and/sitemap.xml. - Follows a bounded number of sitemap-index children.
- Records a matching page
<lastmod>when found. - Does not fail merely because sitemap evidence is absent.
- Check date validity and consistency.
- Guardrail step.
- Invalid, future, or impossible date ordering failures override partial scoring.
Standard Behavior
HTTP Last-Modified is a standards-backed representation freshness signal. It should describe when the selected representation was last modified and should not be later than the response date.
Schema.org dates are vocabulary-backed structured content signals. JSON-LD is common, but it is not exclusive. Microdata and RDFa are valid structured-data syntaxes and must count when they expose equivalent Schema.org properties.
Open Graph and Dublin Core are non-HTTP metadata vocabularies in wide production use. They are useful corroborating evidence, especially on editorial and CMS-backed pages.
Sitemap <lastmod> is a standards-backed sitemap field for the linked URL's last modification date. In this check it is used only when the scanned page's URL is found in the sitemap. The Last-Modified header on robots.txt or the sitemap file itself does not count as page freshness.
Non-Standard And Real-World Behavior
Real sites often mix layers:
- News and blog pages commonly expose Article JSON-LD plus Open Graph article dates.
- CMS templates often emit Dublin Core or generic meta dates.
- Some pages expose Microdata or RDFa instead of JSON-LD.
- Some servers omit
Last-Modifiedeven when page-level dates are available. - Some sites publish accurate page-level
<lastmod>in XML sitemaps and advertise sitemap locations from robots.txt. - Some pages expose build timestamps, event dates, copyright years, or generated-at times that should not be treated as content freshness.
This version accepts multiple date syntaxes but still requires dates to be parseable and semantically plausible.
Non-Goals And Limitations
This check does not:
- Prove that a page was actually changed on the claimed date.
- Fetch feeds to corroborate page dates.
- Detect every possible natural-language date in visible text.
- Treat copyright years as freshness signals.
- Treat visible
<time datetime>, event dates, course start dates, sale dates, or unrelated sidebar dates as page freshness unless they are exposed through recognized structured or metadata freshness properties. - Require
Last-Modifiedon pages where the origin cannot reliably determine representation modification time.
References
- www.rfc-editor.org/rfc/rfc9110.html
- html.spec.whatwg.org/multipage/text-level-semantics.html#the-time-element
- schema.org/dateModified
- schema.org/datePublished
- schema.org/CreativeWork
- ogp.me/
- www.dublincore.org/specifications/dublin-core/dcmi-terms
- www.sitemaps.org/protocol.html
- developers.google.com/search/docs/appearance/publication-dates
- developers.google.com/search/docs/appearance/structured-data/article
Source: lib/checks/content-freshness/versions/1.0.0/docs.md
6. Version Changelog
content-freshness v1.0.0 Changelog
- Migrated runtime ownership into
lib/checks/content-freshness/versions/1.0.0. - Added version-aware logging through the versioned check package boundary.
- Expanded structured freshness extraction from JSON-LD only to syntax-neutral Schema.org dates from JSON-LD, Microdata, and RDFa.
- Added Open Graph article dates, Dublin Core dates, and generic meta date evidence.
- Stopped collecting visible
time[datetime]values as freshness evidence because visible page dates may describe publication, events, cards, or other non-modified dates. - Added bounded sitemap
<lastmod>corroboration using robots.txtSitemap:discovery and/sitemap.xml. - Added date validity checks for invalid dates, future-dated freshness claims, and modified dates earlier than published dates.
- Expanded versioned documentation with standards, platform guidance, real-world behavior, pass/warn/fail boundaries, evidence, and limitations.
Source: lib/checks/content-freshness/versions/1.0.0/changelog.md