面向代理的 robots.txt

什么是面向代理的 robots.txt 指令？

标准的 robots.txt 控制 web 爬虫。通过为 AI 专属爬虫添加 User-agent 指令，你向外界表明：你的网站知晓并欢迎 AI 代理。

示例

添加到你的 /robots.txt：

User-agent: GPTBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: Anthropic-AI
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

已知 AI 爬虫

User-agent	运营方	用途
GPTBot	OpenAI	训练与浏览
Claude-Web	Anthropic	Web 浏览
Anthropic-AI	Anthropic	训练
PerplexityBot	Perplexity	搜索与回答
Google-Extended	Google	AI 训练
Applebot	Apple	Siri 与 AI 功能

为什么重要

许多站点默认屏蔽 AI 爬虫。显式允许它们意味着你的内容和 API 是为代理使用而设计的。

规范成熟度

已确立的惯例。 robots.txt 是长期存在的 web 标准（RFC 9309）。AI 专属的 User-agent 字符串由各家 AI 公司自行定义。

了解更多

RFC 9309 —— robots.txt 规范