## The problem with assumptions

Most website optimization assumes a visitor fetches your page via HTTP, sees the headers, follows redirects, and renders HTML. AI agents break every one of these assumptions.

## ChatGPT

ChatGPT's browsing tool fetches pages live via HTTP, but the model never sees the raw response:

- **Text extraction only** — HTML is stripped to ~4,096 tokens of plain text before the model sees it
- **No headers** — the model never knows Content-Type, status codes, or redirects
- **SearchGPT intermediate** — a secondary model checks for prompt injection before content reaches the main model
- **Agent Mode** uses a fake Chrome UA (`Chrome/138.0.0.0`) and identifies via RFC 9421 cryptographic signatures, not User-Agent

**What this means:** Content negotiation works silently (the tool layer handles it), but the model only sees the extracted text. Serve clean, structured text and your content will be more useful to ChatGPT.

## Perplexity

Perplexity uses a multi-stage retrieval pipeline:

- **Stealth crawlers** — 3-6 million requests/day with generic Chrome UAs and rotating IPs, not `PerplexityBot`
- **Hybrid ranking** — BM25 keyword matching + vector similarity to find relevant passages
- **Atomic span retrieval** — extracts specific text spans rather than full pages
- **Separate index** — maintains its own crawled index alongside web search results

**What this means:** Your `robots.txt` rules for `PerplexityBot` may not stop their stealth crawlers. Structured content with clear headings helps their span extraction find the right passages.

## Gemini

Gemini's most common browsing mode never hits your server at all:

- **Index-based** — `url_context` reads from Google's internal index, not live HTTP. When tested, no request appeared in server logs
- **Screenshot-based** — Project Mariner renders the page visually for tasks that need it
- **Rejected markdown** — Gemini CLI rejected `Accept: text/markdown` responses in early testing

**What this means:** Your site needs to be indexed by Googlebot for Gemini to see it. Adding `<link rel="alternate" href="/llms.txt">` in your HTML ensures Google indexes the [llms.txt](/kb/llms-txt) relationship. JSON-LD structured data also survives the indexing pipeline.

## What to do about it

| Action | Helps with |
|--------|------------|
| Serve `llms.txt` with clean markdown | ChatGPT, Perplexity |
| Add `<link rel="alternate" href="/llms.txt">` | Gemini (via Google index) |
| Add JSON-LD structured data | Gemini (via Google index) |
| Don't block `Google-Extended` in [robots.txt](/kb/robots-txt) | Gemini |
| Use RFC 9421 signatures for bot auth | ChatGPT Agent Mode verification |
| Serve structured content with clear headings | Perplexity span extraction |

## Learn more

- [Dejan.ai: Google's URL Context Tool](https://dejan.ai/blog/googles-new-url-context-tool/)
- [Dejan.ai: AI Mode Is Not Live Web](https://dejan.ai/blog/ai-mode-is-not-live-web/)
- [Cloudflare: Perplexity Stealth Crawlers](https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/)
- [SeatGeek: Chasing Signature](https://chairnerd.seatgeek.com/chasing-signature/)

## Related

- [SKILL.md](/kb/skills)
- [OpenAPI](/kb/openapi)
- [A2A](/kb/a2a)
- [WebMCP](/kb/webmcp)