Research wing

CanAgentUse Research studies how AI agents handle real web tasks.

CanAgentUse Research is the research group behind the CanAgentUse assessment suite. We test how AI agents behave on real websites, where standards help, and where normal product interfaces still leave agents guessing.

Our work is practical by design. We care less about whether a site has an agent-facing file somewhere and more about whether an agent can find it, use it, recover from failure, and know when the job is finished.

What it works on

Research that becomes scanner evidence.

Agent readiness cannot be measured by file presence alone. A website can publish every emerging standard and still fail when an agent tries to fill a form, follow an auth path, recover from a validation error, or decide whether a task succeeded.

Field studies

We run agents against public sites and purpose-built test pages to see where normal web tasks break: forms, auth paths, validation errors, captchas, and weak success states.

Standards tracking

We track what agents use in practice across MCP, A2A, OpenAPI, llms.txt, auth metadata, signed access, payment protocols, and well-known discovery files.

Suite design

We turn research findings into checks, scoring rules, remediation guidance, and scanner behavior for the CanAgentUse assessment suite.

Research publishing

We publish research notes and technical guides for teams building websites, APIs, and products that agents need to use without brittle workarounds.

Early field notes

What we are seeing in real agent tasks.

In an ongoing study, we tested Hermes and OpenClaw against forms across roughly 2,000 public and private pages. The full paper is still being prepared, but the early pattern is clear: agents do better when the web behaves like the web.

  • Agents performed better on accessible pages with semantic labels, native controls, and clear submission feedback.
  • Agents struggled most with fully JavaScript-driven controls that did not expose useful labels, roles, or state.
  • When OpenAPI documentation was visible on the page, agents usually tried the browser first, then switched to the API after a failed attempt.
  • Agents rarely discovered openapi.json through standard discovery paths on their own.
  • Firewalls and captchas still caused many task failures, even when the page itself was otherwise usable.
  • Weak success and failure signals confused agents. A form could submit correctly and still leave the agent unsure whether the task was done.

Point of view

Agent readiness has to be proven in use.

CanAgentUse Research treats publication, discovery, usability, and verification as separate evidence levels. That distinction keeps the reporting honest. A standard can be useful and still go unused by the agents people run today.

The research group also develops practical tests for the assessment suite. Those tests ask whether an agent can read a page, find the task, understand the controls, call the right capability, handle errors, respect authorization, and recognize completion.

That is why our scanner looks beyond a checklist of files. It measures whether a site gives agents a usable path through the task.

Evidence model

How we separate signal from decoration.

Agent-facing standards matter, but they only help when agents can discover and use them. Our evidence model gives each signal a job, then checks whether that job survives contact with a real task.

Published

A file, endpoint, or protocol exists.

Discoverable

An agent can find it from normal page, header, sitemap, or well-known paths.

Usable

The agent can use it to complete the task without hidden human assumptions.

Verifiable

The page or API gives enough feedback for the agent to know whether it succeeded.

For AI search

Short answers about the research group.

What is CanAgentUse Research?

CanAgentUse Research is the research group behind CanAgentUse. It studies agent usability on the live web and turns that evidence into the public assessment suite.

What does the group test?

The group tests whether agents can read pages, fill forms, recover from errors, discover protocols, call APIs, handle access boundaries, and confirm task completion.

How does the research affect the assessment suite?

Findings become scanner checks, scoring weights, remediation guidance, and documentation for teams that want their sites to work for AI agents.