1. Abstract
Expose machine-readable page entities and relationships through a recognized structured-data syntax.
Structured data gives agents explicit entities, relationships, and page meaning that are harder to infer reliably from visual layout alone.
2. Classification
- Check ID
- structured-data
- Check version
- 1.0.0
- Package path
- lib/checks/structured-data/versions/1.0.0
- Category
- AI Discoverability
- Subcategory
- Content Readiness
- Check group
- Page Structure
- Check group ID
- page-structure
- Maturity
- Established
- Scope
- page
- Check weight
- 1
3. Input And Output Contracts
- Input
- [email protected]
- Output
- [email protected]
- Resources inspected
- JSON-LD, Microdata, RDFa
4. Scoring Semantics
| Step ID | Title | Weight | Description |
|---|---|---|---|
format-presence | Recognized structured data format | 0.3 | Detect JSON-LD, Microdata, or RDFa structured data in the page HTML. |
parseability | Parseability | 0.25 | Validate that detected structured data can be parsed without fatal syntax errors. |
entity-typing | Schema.org entity typing | 0.25 | Confirm that detected structured data exposes typed entities rather than empty markup. |
format-consistency | Format consistency | 0.2 | Warn when multiple syntaxes are mixed and fail when the same entity conflicts across syntaxes. |
page-relevant-schema-family | Page-relevant schema family | 0.2 | When visible page intent is specific enough, verify structured data includes a matching primary Schema.org family. |
minimum-useful-schema-fields | Minimum useful schema fields | 0.15 | Validate minimum fields for detected primary schema families without requiring unrelated schema types. |
schema-policy-recommendations | Schema policy recommendations | 0.05 | Warn when detected primary schema misses search feature recommended fields. |
supporting-schema-linkage | Supporting schema linkage | 0.1 | Check whether primary schema is connected to supporting entities such as Organization, WebSite, BreadcrumbList, Offer, Person, or ImageObject. |
5. Package Documentation
Structured Data Check v1.0.0
Status
- Version:
1.0.0 - Check identifier:
structured-data - Input contract:
[email protected] - Output contract:
[email protected] - Scope: page
Abstract
This check verifies whether a page exposes machine-readable structured data through JSON-LD, Microdata, or RDFa. It also performs the schema-family validation that used to live in separate long-tail schema checks.
The active structured-data check now covers 15 merged schema families:
faq-page-schemabreadcrumb-schemaarticle-schemaproduct-schemasoftware-application-schemalocal-business-schemaevent-schemamedia-schemaqa-page-schemadiscussion-forum-schemaprofile-page-schemacourse-schemajob-posting-schemarecipe-schemaservice-schema
author-attribution and organization-website-schema remain separate top-level checks because they are site trust and publisher identity signals, not only page-family schema validation.
Motivation
Agents, search engines, and semantic parsers can infer some meaning from visible HTML, but structured data gives them explicit entities, relationships, identifiers, names, URLs, and content types. JSON-LD, Microdata, and RDFa are different syntaxes for expressing that machine-readable model. A robust page can use any of them, but mixed or contradictory implementations are harder to debug and can cause different consumers to extract different facts.
The long-tail schema checks were merged here because exposing every family as a standalone report check inflated the suite count and made schema disproportionately affect the overall score. Inside this check, family-specific schema validation still produces evidence, but only applies when the visible page intent or existing markup makes the family relevant.
Normative Model
The check recognizes:
- JSON-LD blocks using
script[type="application/ld+json"]. - Microdata entities using
itemscope,itemtype, anditemprop. - RDFa subjects using
typeof,property,resource, andabout.
The check treats JSON-LD as the preferred implementation format for new Schema.org markup because current search platform guidance recommends JSON-LD where possible. Microdata and RDFa remain accepted when they expose valid, typed, consistent entities.
Types are normalized by removing https://schema.org/ and schema: prefixes.
Applicability
The structured-data syntax checks apply to public HTML pages where machine-readable page meaning, identity, content type, commerce, editorial metadata, navigation, or local/business facts would help agents understand the page.
The merged schema-family checks are stricter. A family is applicable only when:
- visible page text contains enough specific intent hints for that family, or
- the page already declares one of that family's Schema.org types.
If neither condition is true, the family is recorded as not_applicable and does not affect the check score.
Schema Family Matrix
| Merged family | Applicable when visible page intent includes | Expected Schema.org types | Pass | Fail | Warning |
|---|---|---|---|---|---|
| FAQPage schema | FAQ or question/answer content | FAQPage | Applicable family has FAQPage and useful fields such as mainEntity. | FAQ intent is detected but no FAQPage family is present. | Schema is present but minimum useful fields or supporting linkage are incomplete. |
| Breadcrumb schema | Breadcrumb or breadcrumb navigation text | BreadcrumbList | Applicable family has BreadcrumbList and useful fields such as itemListElement. | Breadcrumb intent is detected but no BreadcrumbList is present. | Schema is present but field/linkage quality is incomplete. |
| Article schema | Article, blog, news, guide, report, author, byline, published, or updated signals | Article, BlogPosting, NewsArticle, Report, Review | Applicable family has a matching Article-family type and minimum fields such as headline, author, and datePublished. | Editorial intent is detected but no matching Article-family type is present. | Matching schema exists but minimum fields or supporting linkage are incomplete. |
| Product schema | Product, price, pricing, buy, cart, checkout, SKU, stock, availability, or offer signals | Product, ProductGroup | Applicable family has Product schema and useful fields such as name, description, and offers. | Commerce intent is detected but no Product/ProductGroup schema is present. | Product schema exists but field/linkage quality is incomplete. |
| SoftwareApplication schema | Software, app, SaaS, platform, API, SDK, integration, download, or app-store signals | SoftwareApplication, WebApplication, MobileApplication | Applicable family has matching software schema and useful fields such as name, applicationCategory, and operatingSystem. | Software intent is detected but no matching application schema is present. | Matching schema exists but field/linkage quality is incomplete. |
| LocalBusiness schema | Address, hours, directions, location, restaurant, store, clinic, office, or near-me signals | LocalBusiness | Applicable family has LocalBusiness and useful fields such as name, address, and telephone. | Local-business intent is detected but no LocalBusiness schema is present. | Schema exists but field/linkage quality is incomplete. |
| Event schema | Event, webinar, conference, ticket, venue, starts, schedule, or agenda signals | Event, BroadcastEvent | Applicable family has Event schema and useful fields such as name, startDate, and location. | Event intent is detected but no Event/BroadcastEvent schema is present. | Event schema exists but field/linkage quality is incomplete. |
| Media schema | Video, watch, transcript, duration, thumbnail, or upload-date signals | ImageObject, VideoObject, Clip | Applicable family has matching media schema and useful media fields. | Media intent is detected but no matching media schema is present. | Media schema exists but field/linkage quality is incomplete. |
| QAPage schema | Question/answer, asked, answered, or FAQ-like signals | QAPage | Applicable family has QAPage and useful fields such as mainEntity. | Q&A intent is detected but no QAPage schema is present. | Schema exists but field/linkage quality is incomplete. |
| DiscussionForumPosting schema | Thread, forum, discussion, reply, replies, or posts signals | DiscussionForumPosting | Applicable family has discussion schema and useful fields such as headline, author, and datePublished. | Discussion intent is detected but no DiscussionForumPosting schema is present. | Schema exists but field/linkage quality is incomplete. |
| ProfilePage schema | Profile, author bio, about me, person, member, or team-member signals | ProfilePage | Applicable family has ProfilePage and useful fields such as mainEntity. | Profile intent is detected but no ProfilePage schema is present. | Schema exists but field/linkage quality is incomplete. |
| Course schema | Course, lesson, curriculum, enroll, instructor, or syllabus signals | Course | Applicable family has Course and useful fields such as name, description, and provider. | Course intent is detected but no Course schema is present. | Course schema exists but field/linkage quality is incomplete. |
| JobPosting schema | Job, career, role, employment, salary, apply now, hiring, or job-location signals | JobPosting | Applicable family has JobPosting and useful fields such as title, hiringOrganization, and jobLocation. | Job intent is detected but no JobPosting schema is present. | Job schema exists but field/linkage quality is incomplete. |
| Recipe schema | Recipe, ingredients, cook time, prep time, nutrition, or instructions signals | Recipe | Applicable family has Recipe and useful fields such as name, recipeIngredient, and recipeInstructions. | Recipe intent is detected but no Recipe schema is present. | Recipe schema exists but field/linkage quality is incomplete. |
| Service schema | Commerce/service, price, pricing, offer, product, buy, cart, or checkout signals where a service is represented | Service | Applicable family has Service and useful fields such as name and provider. | Service/commerce intent is detected but no Service schema is present. | Service schema exists but field/linkage quality is incomplete. |
Step Results
The check emits seven validation steps:
- Recognized structured data format.
- Weight:
0.30 - Pass: JSON-LD, Microdata, or RDFa is present.
- Fail: no recognized structured-data syntax is found.
- Parseability.
- Weight:
0.25 - Pass: detected structured data parses into entities.
- Fail: detected structured data has fatal parse issues.
- Not applicable: no structured-data syntax was found.
- Schema.org entity typing.
- Weight:
0.25 - Pass: at least one extracted entity has an explicit type.
- Fail: structured data exists but no typed Schema.org entity is extracted.
- Not applicable: no structured-data syntax was found.
- Format consistency.
- Weight:
0.20 - Pass: one syntax is used, or duplicated entities agree.
- Warning: multiple syntaxes are mixed without detected same-entity conflicts.
- Fail: duplicated entities conflict on
id,name, orurl. - Not applicable: no structured-data syntax was found.
- Page-relevant schema family.
- Weight:
0.20 - Pass: specific visible page intent has a matching merged schema family.
- Fail: specific visible page intent is detected but no matching family is present.
- Not applicable: no eligible page-family intent or existing merged-family schema is detected.
- Minimum useful schema fields.
- Weight:
0.15 - Pass: best matching family node has at least 75% of the minimum useful fields.
- Warning: some minimum fields are present but coverage is below 75%.
- Fail: a matching family node exists but none of its minimum useful fields are present.
- Not applicable: no eligible primary or page-relevant family node is present.
- Supporting schema linkage.
- Weight:
0.10 - Pass: supporting entities or linked
@idreferences are present. - Warning: primary/page-relevant schema exists without supporting linkage.
- Not applicable: no eligible primary or page-relevant family node is present.
not_applicable and informational steps are excluded from step-score denominators.
Evidence Model
The result evidence includes:
formats: per-format presence, validity, count, errors, types, and extracted entity summaries.formatsFound: detected syntaxes.primaryFormat: one ofjson-ld,microdata,rdfa,mixed, ornone.mixedFormats: whether more than one syntax was found.entityCount: number of extracted entity summaries.schemaTypes: normalized Schema.org type names found across syntaxes.entities: capped entity summaries with format provenance.conflicts: same-entity property conflicts across syntaxes.schemaFamilies: one row per merged schema family withid,title,status,applicable,intentMatched,expectedTypes, andpresentTypes.
Each entity summary records:
formatsourcetypesidnameurl- selected extracted
properties
Standard Behavior
JSON-LD is parsed from application/ld+json script blocks. The check flattens top-level arrays and @graph nodes and records @type, @id, name, headline, and url where available.
Microdata is parsed from itemscope entities. The check reads Schema.org types from itemtype, stable identifiers from itemid, and simple properties from descendant itemprop attributes.
RDFa is parsed from elements with typeof. The check reads types from typeof, stable identifiers from resource or about, and simple properties from descendant property attributes.
Non-Standard And Real-World Behavior
Real sites sometimes:
- Use JSON-LD for site identity and Microdata for products inherited from ecommerce templates.
- Leave old RDFa or Microdata fragments in templates after migrating to JSON-LD.
- Duplicate Organization, Product, or BreadcrumbList entities in more than one syntax.
- Emit property-only Microdata or RDFa fragments without a clear enclosing entity.
- Use incomplete JSON-LD blocks generated by tag managers or CMS plugins.
This version allows mixed syntaxes when extracted entities are consistent. It warns because one primary syntax is easier to maintain. It fails only when the same apparent entity has conflicting id, name, or url values across syntaxes.
Non-Goals And Limitations
This check does not fully expand JSON-LD contexts, execute JSON-LD framing, or validate every Schema.org property range. It performs pragmatic page-family diagnostics for agent-readiness scoring, not a full Google rich-result eligibility audit.
References
- schema.org/
- schema.org/docs/gs.html
- developers.google.com/search/docs/appearance/structured-data/intro-structured-data
- www.w3.org/TR/json-ld11
- html.spec.whatwg.org/multipage/microdata.html
- www.w3.org/TR/rdfa-core
- developers.google.com/search/docs/appearance/structured-data/sd-policies
- developers.google.com/search/docs/appearance/structured-data/search-gallery
Source: lib/checks/structured-data/versions/1.0.0/docs.md
6. Version Changelog
structured-data v1.0.0 Changelog
Initial versioned package for structured-data.
- Detects JSON-LD, Microdata, and RDFa.
- Reports the primary structured-data syntax and mixed-format usage.
- Emits typed entity summaries with format provenance.
- Warns when multiple syntaxes are mixed without conflicts.
- Fails when the same apparent entity conflicts across syntaxes.
Source: lib/checks/structured-data/versions/1.0.0/changelog.md