Back to blog
AI bot accesssigned agentsWeb Bot Authbot authentication

AI Bot Access: Web Bot Auth Audit Guide

Audit AI bot access with Web Bot Auth, signed agents, Cloudflare verified bots, HTTP message signatures, OAuth, API keys, and edge rules.

By Senior Editor· · 13 min read
AI bot access flow showing Web Bot Auth identity verification before user authorization and edge policy decisions.
AI bot access flow showing Web Bot Auth identity verification before user authorization and edge policy decisions.
TL;DR: AI bot access is the access-control layer for crawlers, answer engines, browser agents, and signed agents. A good audit separates bot identity from user authorization. Web Bot Auth proves which automated client sent a request. OAuth, API keys, sessions, or Cloudflare Access prove whether that client may act for a user or account. Do not collapse those into one rule.

AI bot access used to mean "can Googlebot crawl the page?" That is too small now. A modern site may want OpenAI, Perplexity, Claude, Bing, Google, monitoring tools, and user-directed agents to fetch public evidence. The same site also needs to keep login, checkout, billing, admin, and API actions behind real authorization. The hard part is no longer just blocking bad bots. The hard part is giving the right automated clients the right lane.

This checklist fits beside the AI crawler access guide, the AI crawler audit guide, and the Auth.md and DNS-AID provider setup guide. Crawl policy, bot identity, and account authorization are separate layers. If a site mixes them together, it either blocks legitimate AI retrieval or lets automation reach places it should never reach.

AI bot access is the set of rules, identity checks, auth checks, edge decisions, and logs that decides how automated AI clients reach a website. Bot authentication proves the automated client. User authorization proves permission to act for a person, workspace, account, or service. Signed agents are bots or browser agents that attach HTTP message signatures so the receiving site or CDN can verify identity cryptographically.

What is bot authentication?

Bot authentication is how a site verifies that an automated request really came from the bot, crawler, or agent it claims to be. A User-Agent string is not authentication. Anyone can send GPTBot, Googlebot, or ClaudeBot in a header. A good bot-auth system needs an independent proof: reverse DNS, known IP ranges, CDN verification, a verified-bot directory, mTLS, or an HTTP message signature.

Cloudflare describes Web Bot Auth as a bot verification method based on cryptographic signatures in HTTP messages. The request carries Signature, Signature-Input, and Signature-Agent headers. The Signature-Agent points to a public key directory. The verifier uses the public key and signed request components to decide whether the request was really signed by that bot or agent.

That proof still has a narrow job. Bot auth answers "who sent this automated request?" It does not answer "is this bot allowed to buy something, edit a record, download private data, or act for Alice?" Sensitive actions still need user or service authorization.

What are the main types of AI bot auth?

AI bot access usually combines several identity and authorization methods. Treat them as layers, not rivals.

MethodWhat it provesGood forWeakness
User-AgentClaimed software nameBasic logging and robots rulesSpoofable
Reverse DNS and IP verificationRequest came from known infrastructureSearch crawlers and old verified-bot flowsBrittle for distributed agents
Cloudflare verified botCloudflare classified request as known good botSearch, monitoring, SEO crawlersDepends on CDN classification
Web Bot AuthRequest was signed by a known bot or agent keySigned agents and modern bot identityRequires key directory and signature handling
OAuth/OIDCUser or account delegated accessPrivate APIs, account data, user actionsMore setup and consent design
API key or bearer tokenService credential or server-to-server accessBackend automationOften overbroad if scopes are missing
Cloudflare Access or mTLSNetwork or identity-aware perimeterInternal apps, staging, admin pathsNot a public crawler solution

The safest pattern is boring: public retrieval can rely on crawler policy plus bot identity. Account-specific reads need OAuth, session auth, or API keys. Writes, purchases, billing, deletion, and admin actions need scoped authorization, confirmation, and logs.

How does Web Bot Auth work?

Web Bot Auth uses HTTP Message Signatures for automated traffic. The bot or agent signs selected request components, publishes a public key directory, and sends signature headers with each request. Cloudflare's docs currently use Ed25519 examples and describe a key directory at /.well-known/http-message-signatures-directory with a JSON Web Key Set.

The practical flow looks like this:

  1. Generate a signing key pair for the bot or agent.
  2. Publish the public keys in a key directory.
  3. Register the bot, agent, or key directory when using a verified-bot program.
  4. Sign requests with Signature and Signature-Input.
  5. Send Signature-Agent pointing to the key directory.
  6. Verify the signature, key ID, timestamp, expiry, tag, and signed components.
  7. Feed the verified identity into the edge policy.
GET /docs/agent-readiness HTTP/1.1
Host: example.com
User-Agent: ExampleAgent/1.0
Signature-Agent: "https://agent.example/.well-known/http-message-signatures-directory"
Signature-Input: sig1=("@authority" "signature-agent");created=1780000000;expires=1780000060;keyid="thumbprint";alg="ed25519";tag="web-bot-auth"
Signature: sig1=:base64-signature:

The details matter. Signature-Agent needs to be an HTTPS URI and, in Cloudflare's implementation, must be included in the signed component list. created and expires reduce replay risk. keyid maps the signature to a public key. The tag tells the verifier this signature is for bot authentication, not some unrelated signing scheme.

What is the Cloudflare allowed bots list?

Cloudflare maintains an internal directory of verified bots and signed agents. For a public view, use the Cloudflare Radar Bots Directory. For automation, use the Cloudflare Radar bots API, which supports filters such as botVerificationStatus=VERIFIED, kind=AGENT, kind=BOT, and response format.

curl "https://api.cloudflare.com/client/v4/radar/bots?botVerificationStatus=VERIFIED&format=JSON" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"

curl "https://api.cloudflare.com/client/v4/radar/bots?kind=AGENT&format=JSON" \
  -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN"

Do not copy a static allowlist into a blog post and treat it as permanent. Bot operators, categories, and signatures change. In a real audit, record the list source, the query used, the date checked, and the rule that consumes the result.

Cloudflare also documents a managed AI Bots rule. As of the referenced docs, the named list includes Amazonbot, Applebot, Bytespider, ClaudeBot, DuckAssistBot, Google-CloudVertexBot, GoogleOther, GPTBot, Meta-ExternalAgent, PetalBot, TikTokSpider, and CCBot. Cloudflare notes that verified AI crawlers and similar unverified bots may also be included, and that categories can change.

Which Cloudflare fields matter for bot access?

On Cloudflare, the policy usually starts with verified bot and bot-management fields. Exact availability depends on plan and product, but these are the important concepts to audit:

Cloudflare signalMeaningAudit use
cf.bot_management.verified_botRequest originates from a Cloudflare allowed botSkip unnecessary challenges for trusted crawlers
cf.bot_management.signed_agentRequest originated from a known signed agentSegment signed agent traffic
cf.verified_bot_categoryBot category such as AI crawler or search crawlerAllow search while restricting training crawlers
cf.bot_management.score1-99 bot likelihood scoreChallenge or block likely automation on sensitive paths
cf.bot_management.detection_idsSpecific heuristic detectionsInvestigate scraping, account abuse, or odd clients

A common Cloudflare mistake is allowing or blocking all bots with one rule. Better: allow verified search and monitoring bots to public content, log signed agents first, challenge unknown automation on account paths, and require account auth for private APIs.

Allow or skip public docs:
cf.bot_management.verified_bot and http.request.uri.path starts_with "/docs"

Challenge likely automation on login:
cf.bot_management.score lt 30
and not cf.bot_management.verified_bot
and http.request.uri.path in {"/login" "/account" "/checkout"}

Log signed agents during rollout:
cf.bot_management.signed_agent

Use Log first where possible. Move to Skip, Managed Challenge, or Block only after reviewing Security Events and origin logs.

How do you validate bot auth correctly?

Validate bot auth in two directions: can trusted automation reach public evidence, and are sensitive actions still protected?

TestExpected resultEvidence
Known search bot to public docsAllowed or clean 200/301Status, verified bot field, no challenge
AI crawler to public contentAllowed or blocked by explicit policyBot category and rule ID
Unknown bot with spoofed UANot treated as trustedNo verified flag, challenged or logged
Signed agent with valid signatureVerified as signed or allowed by identity policySignature fields, key ID, rule match
Signed agent with expired signatureRejected or falls back to untrusted handlingExpiry failure, no verified identity
Account page without user auth401, 403, login, or Access challengeAuth challenge and no data leakage
API action without OAuth/API key401 with correct auth metadataWWW-Authenticate, OAuth metadata
API action with least-privilege tokenAllowed only for scoped taskToken scope, request ID, audit log

This is where shallow bot-access guidance breaks down. A signed request to /pricing and a signed request to /api/delete-account cannot share one verdict. The first is a retrieval decision. The second is an authorization decision.

How do you validate Web Bot Auth headers?

For incoming signed traffic, capture the raw headers before the CDN or app normalizes them. Then check:

  1. Signature-Agent exists, is HTTPS, and points to the expected key directory.
  2. Signature-Input includes signature-agent and the signed request components.
  3. created and expires are present and the validity window is short.
  4. keyid maps to a key in the directory.
  5. tag is set for Web Bot Auth.
  6. The signature verifies over the exact signature base.
  7. Failed verification does not silently become trusted traffic.

If Cloudflare verifies the request, inspect Security Events, Bot Management fields, and Workers request.cf.botManagement where available. If you verify at origin, use the same evidence model: key directory fetched, key selected, signature base built, signature passed or failed, decision logged.

Comparison of User-Agent strings, IP ranges, and Web Bot Auth signatures as bot identity evidence.
Comparison of User-Agent strings, IP ranges, and Web Bot Auth signatures as bot identity evidence.

How should bot auth and user auth work together?

Think in two gates.

Gate one is client identity: the site decides whether the requester is Googlebot, GPTBot, a signed browser agent, a monitoring bot, or unknown automation. Web Bot Auth, verified-bot directories, reverse DNS, IP validation, and bot scores live here.

Gate two is resource authorization: the site decides whether the requester may access this resource for this user, account, workspace, or service. OAuth, OIDC, API keys, bearer tokens, session cookies, Cloudflare Access, and mTLS live here.

ResourceBot identity requirementUser/account auth requirement
Public docsCrawler policy or verified botNone
Public product pageUsually none, optional bot policyNone
Search-only preview endpointVerified search or signed retrieval agentNone or rate-limited token
Account dashboardBot identity is not enoughSession, OAuth, or Access
Order status APIBot identity is not enoughScoped OAuth or API key
Purchase, refund, deletionBot identity is not enoughStrong auth, scope, confirmation, logs

The benefit of signed bot auth is not that it lets agents do everything. The benefit is that it removes identity theater. Once the site knows which automated client is present, it can apply precise policy instead of treating every Chrome-looking agent as either human or hostile.

What should a signed-agent audit report?

A signed-agent audit should report the whole decision chain:

Audit areaWhat to capture
Policy intentWhich bots, agents, and crawler categories are allowed or blocked
Cloudflare directory evidenceRadar directory/API query, bot kind, category, operator, verification status
Edge rule evidenceRule expression, action, order, and Security Events sample
Signature evidenceSignature-Agent, Signature-Input, key ID, timestamp, expiry, signature result
Public-content behaviorStatus codes for docs, blog, pricing, robots, llms.txt, sitemap
Private-path behaviorLogin, checkout, account, API, admin, and write-action outcomes
User auth evidenceOAuth metadata, protected-resource metadata, scopes, API-key behavior
LogsRequest ID, bot fields, signature fields, token subject, scope, decision
AI bot access audit flow from request capture through signature verification to policy decision.
AI bot access audit flow from request capture through signature verification to policy decision.

The final report should say something concrete, not "bot access passed." Better: "Verified search bots can fetch public docs. Cloudflare's AI Bots managed rule blocks training crawlers. Signed agents are logged on public routes but still require OAuth for account APIs. Spoofed GPTBot traffic is not treated as verified. Checkout and deletion endpoints return 401 without scoped user authorization."

What are the benefits of signed agent auth?

Signed agent auth gives site owners a better control surface:

  • Less User-Agent spoofing because identity is tied to a private key.
  • Better logs because the verifier can record bot identity, key ID, and signature status.
  • More precise Cloudflare rules because signed agents can be segmented from generic bots.
  • Safer AI access because public retrieval and private action paths can diverge.
  • Better agent ecosystem incentives because transparent agents can earn access without pretending to be browsers.

It also helps bot operators. A legitimate agent can prove itself without publishing brittle IP ranges for every execution environment. That matters for browser agents, hosted agents, and user-directed agents where traffic does not look like an old search crawler.

What standards are emerging around this?

Web Bot Auth builds on RFC 9421 HTTP Message Signatures and is being discussed through the IETF Web Bot Auth working group. Cloudflare's docs describe active drafts for a key directory and bot-auth protocol. Recent IETF materials describe Signature-Agent, Accept-Signature, nonces, anti-replay concerns, and implementation work across several languages.

The pattern is already visible, even while the ecosystem is still moving: bot identity is shifting away from spoofable labels and IP-only heuristics toward cryptographic proof. That will not replace OAuth, API keys, or user consent. It gives those systems a cleaner requester identity to work with.

Common mistakes

Do not make these mistakes:

  • Treat User-Agent as proof.
  • Allow all verified bots into account or checkout paths.
  • Block all AI bots without checking whether search, answer, or assistant retrieval should remain available.
  • Treat a valid Web Bot Auth signature as user consent.
  • Publish a key directory but forget to sign requests.
  • Use long signature expiry windows.
  • Fail closed on public docs without realizing AI answer systems can no longer fetch citations.
  • Fail open on private APIs because the requester was a signed agent.
  • Skip logs for rejected signed-agent attempts.
  • Copy a stale bot list instead of using Cloudflare Radar or API data.

FAQ

Is Web Bot Auth the same as OAuth?

No. Web Bot Auth proves the identity of an automated client. OAuth proves delegated access to a protected resource. A signed agent may still need OAuth before reading account data or taking action for a user.

Does Cloudflare have a public verified bot list?

Cloudflare exposes a public Radar Bots Directory and a Radar bots API. Cloudflare also maintains internal verified-bot and signed-agent directories used by its products. Use the public directory or API for audit evidence, and avoid hardcoding a stale copy.

Which Cloudflare bots are blocked by the AI Bots rule?

Cloudflare's docs currently name bots such as Amazonbot, Applebot, Bytespider, ClaudeBot, DuckAssistBot, Google-CloudVertexBot, GoogleOther, GPTBot, Meta-ExternalAgent, PetalBot, TikTokSpider, and CCBot. The rule may also include verified AI crawlers and similar unverified bots, and Cloudflare says categories can change.

Should signed agents be allowed everywhere?

No. A signed agent has stronger identity proof, not universal permission. Allow or log signed agents on public retrieval paths first. Require user or service authorization for private data, writes, payments, and admin workflows.

What does CanAgentUse check?

CanAgentUse checks policy intent, AI bot-specific rules, transport status, signed-agent evidence, public key-directory signals, protected paths, and logs. The goal is not a vanity score. It is evidence showing which bots can fetch public content and which sensitive paths remain protected.

Research sources