AgentGrade
EnglishEspañol日本語中文
← Knowledge Base

How AI Agents Actually Browse the Web

The problem with assumptions

Most website optimization assumes a visitor fetches your page via HTTP, sees the headers, follows redirects, and renders HTML. AI agents break every one of these assumptions.

ChatGPT

ChatGPT's browsing tool fetches pages live via HTTP, but the model never sees the raw response:

What this means: Content negotiation works silently (the tool layer handles it), but the model only sees the extracted text. Serve clean, structured text and your content will be more useful to ChatGPT.

Perplexity

Perplexity uses a multi-stage retrieval pipeline:

What this means: Your robots.txt rules for PerplexityBot may not stop their stealth crawlers. Structured content with clear headings helps their span extraction find the right passages.

Gemini

Gemini's most common browsing mode never hits your server at all:

What this means: Your site needs to be indexed by Googlebot for Gemini to see it. Adding <link rel="alternate" href="/llms.txt"> in your HTML ensures Google indexes the llms.txt relationship. JSON-LD structured data also survives the indexing pipeline.

What to do about it

ActionHelps with
Serve llms.txt with clean markdownChatGPT, Perplexity
Add <link rel="alternate" href="/llms.txt">Gemini (via Google index)
Add JSON-LD structured dataGemini (via Google index)
Don't block Google-Extended in robots.txtGemini
Use RFC 9421 signatures for bot authChatGPT Agent Mode verification
Serve structured content with clear headingsPerplexity span extraction

Learn more

Related