2 meses atrás · db04b99f4e
--- a/docs/chapter11/第十一章
+++ b/docs/chapter11/第十一章
@@ -86,7 +86,7 @@ $$
 
				 
			
 
				 <strong>Agentic RL</strong>则是一种新的范式，它将 LLM 视为一个可学习的策略，嵌入在一个顺序决策循环中。在这个框架下，智能体需要在动态环境中与外部世界交互，执行多步行动来完成复杂任务，获得中间反馈来指导后续决策，优化长期累积奖励而非单步奖励。
			
 
				 
			
 
				-让我们通过一个具体例子来理解这个区别。在 PBRFT 场景中，用户问"请解释什么是强化学习"，模型生成完整回答，然后根据回答质量直接给分。而在 Agentic RL 场景中，用户请求"帮我分析这个 GitHub 仓库的代码质量"，智能体需要经历多个步骤:首先调用 GitHub API 获取仓库信息，成功获得仓库结构和文件列表，得到+0.1 的奖;然后读取主要代码文件，成功获得代码内容，得到+0.1 的奖励;接着分析代码质量合理，得到+0.2 的奖励;最后生成分析报告质量高，得到+0.6 的奖励。总奖励是所有步骤的累积:1.0。
			
 
				+让我们通过一个具体例子来理解这个区别。在 PBRFT 场景中，用户问"请解释什么是强化学习"，模型生成完整回答，然后根据回答质量直接给分。而在 Agentic RL 场景中，用户请求"帮我分析这个 GitHub 仓库的代码质量"，智能体需要经历多个步骤:首先调用 GitHub API 获取仓库信息，成功获得仓库结构和文件列表，得到+0.1 的奖励;然后读取主要代码文件，成功获得代码内容，得到+0.1 的奖励;接着分析代码质量合理，得到+0.2 的奖励;最后生成分析报告质量高，得到+0.6 的奖励。总奖励是所有步骤的累积:1.0。
			
 
				 
			
 
				 可以看到，Agentic RL 的关键特征是多步交互、每一步的行动都会改变环境状态、每一步都可以获得反馈、优化整个任务的完成质量。
			
 
				 
			
@@ -136,7 +136,7 @@ Agentic RL 的目标是赋予 LLM 智能体六大核心能力，如图 11.2 所
 
				 
			
 
				 <strong>推理(Reasoning)</strong>是指从给定信息中逻辑地得出结论的过程，是智能体的核心能力。传统的 CoT 提示方法依赖少样本示例，泛化能力有限;SFT 只能模仿训练数据中的推理模式，难以创新。强化学习的优势在于通过试错学习有效的推理策略，发现训练数据中没有的推理路径，学会何时需要深度思考、何时可以快速回答。推理任务可以建模为序列决策问题，给定问题 $q$，智能体需要生成推理链 $c = (c_1, c_2, ..., c_n)$ 和最终答案 $a$。奖励函数通常设计为 $r(q, c, a) = 1$ if $a = a^*$ else $0$，训练目标是 $\max_\theta \mathbb{E}_{q, (c,a) \sim \pi_\theta} [r(q, c, a)]$。通过这种方式，模型学会生成高质量的推理链，而不仅仅是记忆答案。
			
 
				 
			
 
				-<strong>工具使用(Tool Use)</strong>是指智能体调用外部工来完成任务的能力。在工具使用任务中，行动空间扩展为 $a_t \in \{a_t^{\text{think}}, a_t^{\text{tool}}\}$,其中 $a_t^{\text{think}}$ 是生成思考过程,$a_t^{\text{tool}} = (\text{tool\_name}， \text{arguments})$ 是调用工具。强化学习让智能体学会何时需要使用工具、选择哪个工具、如何组合多个工具。例如，在解决数学问题时，智能体需要学会何时使用计算器、何时使用代码解释器、何时直接推理。
			
 
				+<strong>工具使用(Tool Use)</strong>是指智能体调用外部工具来完成任务的能力。在工具使用任务中，行动空间扩展为 $a_t \in \{a_t^{\text{think}}, a_t^{\text{tool}}\}$,其中 $a_t^{\text{think}}$ 是生成思考过程,$a_t^{\text{tool}} = (\text{tool\_name}， \text{arguments})$ 是调用工具。强化学习让智能体学会何时需要使用工具、选择哪个工具、如何组合多个工具。例如，在解决数学问题时，智能体需要学会何时使用计算器、何时使用代码解释器、何时直接推理。
			
 
				 
			
 
				 <strong>记忆(Memory)</strong>是指智能体保持和重用过去信息的能力，对于长期任务至关重要。LLM 的上下文窗口有限，静态检索策略(如 RAG)无法针对任务优化。强化学习让智能体学会记忆管理策略:决定哪些信息值得记住、何时更新记忆、何时删除过时信息。这类似于人类的工作记忆，我们会主动管理大脑中的信息，保留重要的、遗忘无关的。
			
 
				 
			
--- a/docs/chapter12/第十二章智能体性能评估.md
+++ b/docs/chapter12/第十二章智能体性能评估.md
@@ -1496,7 +1496,7 @@ https://huggingface.co/spaces/gaia-benchmark/leaderboard
 
				   <p>图 12.4 BFCL 评估流程图</p>
			
 
				 </div>
			
 
				 
			
 
				-提交前，可以手动检查生成的 JSONL 文件：
			
 
				+提交前，可以手动检查生成的 JSON 文件：
			
 
				 
			
 
				 ```python
			
 
				 import json
			
--- a/docs/chapter7/Chapter7-Building-Your-Agent-Framework.md
+++ b/docs/chapter7/Chapter7-Building-Your-Agent-Framework.md
@@ -612,7 +612,7 @@ The content of this section will perform framework refactoring based on the thre
 
				 
			
 
				 ### 7.4.1 SimpleAgent
			
 
				 
			
 
				-SimpleAgent is the most basic Agent implementation, demonstrating how to build a complete conversational agent on the framework foundation. We will rewrite SimpleAgent by inheriting the framework base class. First, create a `my_simple_agent.py` file in your project directory:
			
 
				+SimpleAgent is the most basic Agent implementation, demonstrating how to build a complete conversational agent on the framework foundation. We will extend the existing `SimpleAgent` class and override its core methods to build a more extensible version. First, create a `my_simple_agent.py` file in your project directory:
			
 
				 
			
 
				 ```python
			
 
				 # my_simple_agent.py
			
@@ -622,7 +622,7 @@ from hello_agents import SimpleAgent, HelloAgentsLLM, Config, Message
 
				 class MySimpleAgent(SimpleAgent):
			
 
				     """
			
 
				     Rewritten simple conversation Agent
			
 
				-    Demonstrates how to build custom Agent based on framework base class
			
 
				+    Demonstrates how to build a custom Agent by extending SimpleAgent
			
 
				     """
			
 
				 
			
 
				     def __init__(
			
@@ -640,7 +640,7 @@ class MySimpleAgent(SimpleAgent):
 
				         print(f"✅ {name} initialization complete, tool calling: {'enabled' if self.enable_tool_calling else 'disabled'}")
			
 
				 ```
			
 
				 
			
 
				-Next, we need to override the abstract method `run` of the Agent base class. SimpleAgent supports optional tool calling functionality, which also facilitates expansion in subsequent chapters:
			
 
				+Next, we need to override the `run` method. SimpleAgent supports optional tool calling functionality, which also facilitates expansion in subsequent chapters:
			
 
				 
			
 
				 ```python
			
 
				 # Continue adding in my_simple_agent.py
			
--- a/docs/chapter7/第七章构建你的Agent框架.md
+++ b/docs/chapter7/第七章构建你的Agent框架.md
@@ -612,7 +612,7 @@ class Agent(ABC):
 
				 
			
 
				 ### 7.4.1 SimpleAgent
			
 
				 
			
 
				-SimpleAgent是最基础的Agent实现，它展示了如何在框架基础上构建一个完整的对话智能体。我们将通过继承框架基类来重写SimpleAgent。首先，在你的项目目录中创建一个`my_simple_agent.py`文件：
			
 
				+SimpleAgent是最基础的Agent实现，它展示了如何在框架基础上构建一个完整的对话智能体。我们将通过继承框架中已有的`SimpleAgent`类并重写其核心方法，来实现一个可扩展的版本。首先，在你的项目目录中创建一个`my_simple_agent.py`文件：
			
 
				 
			
 
				 ```python
			
 
				 # my_simple_agent.py
			
@@ -640,7 +640,7 @@ class MySimpleAgent(SimpleAgent):
 
				         print(f"✅ {name} 初始化完成，工具调用: {'启用' if self.enable_tool_calling else '禁用'}")
			
 
				 ```
			
 
				 
			
 
				-接下来，我们需要重写Agent基类的抽象方法`run`。SimpleAgent支持可选的工具调用功能，也方便后续章节的扩展：
			
 
				+接下来，我们需要重写`run`方法。SimpleAgent支持可选的工具调用功能，也方便后续章节的扩展：
			
 
				 
			
 
				 ```python
			
 
				 # 继续在 my_simple_agent.py 中添加