## 什么是面向代理的 robots.txt 指令？

标准的 robots.txt 控制 web 爬虫。通过为 AI 专属爬虫添加 User-agent 指令，你向外界表明：你的网站知晓并欢迎 AI 代理。

## 示例

添加到你的 `/robots.txt`：

```
User-agent: GPTBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: Anthropic-AI
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /
```

## 已知 AI 爬虫

| User-agent | 运营方 | 用途 |
|---|---|---|
| GPTBot | OpenAI | 训练与浏览 |
| Claude-Web | Anthropic | Web 浏览 |
| Anthropic-AI | Anthropic | 训练 |
| PerplexityBot | Perplexity | 搜索与回答 |
| Google-Extended | Google | AI 训练 |
| Applebot | Apple | Siri 与 AI 功能 |

## 为什么重要

许多站点默认屏蔽 AI 爬虫。显式允许它们意味着你的内容和 API 是为代理使用而设计的。

## 规范成熟度

**已确立的惯例。** robots.txt 是长期存在的 web 标准（RFC 9309）。AI 专属的 User-agent 字符串由各家 AI 公司自行定义。

## 了解更多

- [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309) —— robots.txt 规范

## 相关

- [llms.txt](/kb/zh/llms-txt)