Browse Source

Merge pull request #111 from jjyaoao/main

docs: update English version && full text review
jjyaoao 7 months ago
parent
commit
eefe292383
41 changed files with 29648 additions and 1620 deletions
  1. 27 23
      README.md
  2. 166 0
      README_EN.md
  3. 48 0
      docs/Preface.md
  4. 27 23
      docs/README.md
  5. 158 0
      docs/README_EN.md
  6. 29 0
      docs/_sidebar_en.md
  7. 621 0
      docs/chapter1/Chapter1-Introduction-to-Agents.md
  8. 87 83
      docs/chapter1/第一章 初识智能体.md
  9. 2444 0
      docs/chapter10/Chapter10-Agent-Communication-Protocols.md
  10. 185 181
      docs/chapter10/第十章 智能体通信协议.md
  11. 2696 0
      docs/chapter11/Chapter11-Agentic-RL.md
  12. 199 195
      docs/chapter11/第十一章 Agentic-RL.md
  13. 2766 0
      docs/chapter12/Chapter12-Agent-Performance-Evaluation.md
  14. 179 175
      docs/chapter12/第十二章 智能体性能评估.md
  15. 1583 0
      docs/chapter13/Chapter13-Intelligent-Travel-Assistant.md
  16. 153 149
      docs/chapter13/第十三章 智能旅行助手.md
  17. 2160 0
      docs/chapter14/Chapter14-Automated-Deep-Research-Agent.md
  18. 159 155
      docs/chapter14/第十四章 自动化深度研究智能体.md
  19. 1885 0
      docs/chapter15/Chapter15-Building-Cyber-Town.md
  20. 170 166
      docs/chapter15/第十五章 构建赛博小镇.md
  21. 1011 0
      docs/chapter16/Chapter16-Graduation-Project.md
  22. 139 135
      docs/chapter16/第十六章 毕业设计.md
  23. 567 0
      docs/chapter2/Chapter2-History-of-Agents.md
  24. 12 8
      docs/chapter2/第二章 智能体发展史.md
  25. 1014 0
      docs/chapter3/Chapter3-Fundamentals-of-Large-Language-Models.md
  26. 45 41
      docs/chapter3/第三章 大语言模型基础.md
  27. 1309 0
      docs/chapter4/Chapter4-Building-Classic-Agent-Paradigms.md
  28. 83 79
      docs/chapter4/第四章 智能体经典范式构建.md
  29. 1065 0
      docs/chapter5/Chapter5-Building-Agents-with-Low-Code-Platforms.md
  30. 5 1
      docs/chapter5/第五章 基于低代码平台的智能体搭建.md
  31. 1343 0
      docs/chapter6/Chapter6-Framework-Development-Practice.md
  32. 4 0
      docs/chapter6/第六章 框架开发实践.md
  33. 2083 0
      docs/chapter7/Chapter7-Building-Your-Agent-Framework.md
  34. 62 58
      docs/chapter7/第七章 构建你的Agent框架.md
  35. 2083 0
      docs/chapter8/Chapter8-Memory-and-Retrieval.md
  36. 4 0
      docs/chapter8/第八章 记忆与检索.md
  37. 2816 0
      docs/chapter9/Chapter9-Context-Engineering.md
  38. 147 143
      docs/chapter9/第九章 上下文工程.md
  39. 0 0
      docs/images/5-figures/dify-12.png
  40. 106 1
      docs/index.html
  41. 8 4
      docs/前言.md

+ 27 - 23
README.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./README_EN.md">English</a> | 中文
+</div>
+
 <div align='center'>
   <img src="./docs/images/hello-agents.png" alt="alt text" width="100%">
   <h1>Hello-Agents</h1>
@@ -14,9 +18,9 @@
 
 ## 🎯 项目介绍
 
-&emsp;&emsp;如果说2024年是"百模大战"的元年,那么2025年无疑开启了"Agent元年"。技术的焦点正从训练更大的基础模型,转向构建更聪明的智能体应用。然而,当前系统性、重实践的教程却极度匮乏。为此,我们发起了 Hello-Agents 项目,希望能为社区提供一本从零开始、理论与实战并重的智能体系统构建指南。
+&emsp;&emsp;如果说 2024 年是"百模大战"的元年,那么 2025 年无疑开启了"Agent 元年"。技术的焦点正从训练更大的基础模型,转向构建更聪明的智能体应用。然而,当前系统性、重实践的教程却极度匮乏。为此,我们发起了 Hello-Agents 项目,希望能为社区提供一本从零开始、理论与实战并重的智能体系统构建指南。
 
-&emsp;&emsp;Hello-Agents 是Datawhale社区的<strong>系统性智能体学习教程</strong>。如今Agent构建主要分为两派,一派是Dify,Coze,n8n这类软件工程类Agent,其本质是流程驱动的软件开发,LLM作为数据处理的后端;另一派则是AI原生的Agent,即真正以AI驱动的Agent。本教程旨在带领大家深入理解并构建后者——真正的AI Native Agent。教程将带领你穿透框架表象,从智能体的核心原理出发,深入其核心架构,理解其经典范式,并最终亲手构建起属于自己的多智能体应用。我们相信,最好的学习方式就是动手实践。希望这本教程能成为你探索智能体世界的起点,能够从一名大语言模型的"使用者",蜕变为一名智能体系统的"构建者"。
+&emsp;&emsp;Hello-Agents 是 Datawhale 社区的<strong>系统性智能体学习教程</strong>。如今 Agent 构建主要分为两派,一派是 Dify,Coze,n8n 这类软件工程类 Agent,其本质是流程驱动的软件开发,LLM 作为数据处理的后端;另一派则是 AI 原生的 Agent,即真正以 AI 驱动的 Agent。本教程旨在带领大家深入理解并构建后者——真正的 AI Native Agent。教程将带领你穿透框架表象,从智能体的核心原理出发,深入其核心架构,理解其经典范式,并最终亲手构建起属于自己的多智能体应用。我们相信,最好的学习方式就是动手实践。希望这本教程能成为你探索智能体世界的起点,能够从一名大语言模型的"使用者",蜕变为一名智能体系统的"构建者"。
 
 ## 📚 快速开始
 
@@ -33,9 +37,9 @@
 - 📖 <strong>Datawhale 开源免费</strong> 完全免费学习本项目所有内容,与社区共同成长
 - 🔍 <strong>理解核心原理</strong> 深入理解智能体的概念、历史与经典范式
 - 🏗️ <strong>亲手实现</strong> 掌握热门低代码平台和智能体代码框架的使用
-- 🛠️ <strong>自研框架[HelloAgents](https://github.com/jjyaoao/helloagents)</strong> 基于Openai原生API从零构建一个自己的智能体框架
+- 🛠️ <strong>自研框架[HelloAgents](https://github.com/jjyaoao/helloagents)</strong> 基于 Openai 原生 API 从零构建一个自己的智能体框架
 - ⚙️ <strong>掌握高级技能</strong> 一步步实现上下文工程、Memory、协议、评估等系统性技术
-- 🤝 <strong>模型训练</strong> 掌握Agentic RL,从SFT到GRPO的全流程实战训练LLM
+- 🤝 <strong>模型训练</strong> 掌握 Agentic RL,从 SFT  GRPO 的全流程实战训练 LLM
 - 🚀 <strong>驱动真实案例</strong> 实战开发智能旅行助手、赛博小镇等综合项目
 - 📖 <strong>求职面试</strong> 学习智能体求职相关面试问题
 
@@ -47,20 +51,20 @@
 | <strong>第一部分:智能体与语言模型基础</strong> |  |  |
 | [第一章 初识智能体](./docs/chapter1/第一章%20初识智能体.md) | 智能体定义、类型、范式与应用 | ✅ |
 | [第二章 智能体发展史](./docs/chapter2/第二章%20智能体发展史.md) | 从符号主义到 LLM 驱动的智能体演进 | ✅ |
-| [第三章 大语言模型基础](./docs/chapter3/第三章%20大语言模型基础.md) | Transformer、提示、主流LLM及其局限 | ✅ |
+| [第三章 大语言模型基础](./docs/chapter3/第三章%20大语言模型基础.md) | Transformer、提示、主流 LLM 及其局限 | ✅ |
 | <strong>第二部分:构建你的大语言模型智能体</strong> |  |  |
 | [第四章 智能体经典范式构建](./docs/chapter4/第四章%20智能体经典范式构建.md) | 手把手实现 ReAct、Plan-and-Solve、Reflection | ✅ |
-| [第五章 基于低代码平台的智能体搭建](./docs/chapter5/第五章%20基于低代码平台的智能体搭建.md) | 了解Coze、Dify、n8n等低代码智能体平台使用 | ✅ |
+| [第五章 基于低代码平台的智能体搭建](./docs/chapter5/第五章%20基于低代码平台的智能体搭建.md) | 了解 Coze、Dify、n8n 等低代码智能体平台使用 | ✅ |
 | [第六章 框架开发实践](./docs/chapter6/第六章%20框架开发实践.md) | AutoGen、AgentScope、LangGraph 等主流框架应用 | ✅ |
-| [第七章 构建你的Agent框架](./docs/chapter7/第七章%20构建你的Agent框架.md) | 从0开始构建智能体框架 | ✅ |
+| [第七章 构建你的Agent框架](./docs/chapter7/第七章%20构建你的Agent框架.md) | 从 0 开始构建智能体框架 | ✅ |
 | <strong>第三部分:高级知识扩展</strong> |  |  |
 | [第八章 记忆与检索](./docs/chapter8/第八章%20记忆与检索.md) | 记忆系统,RAG,存储 | ✅ |
 | [第九章 上下文工程](./docs/chapter9/第九章%20上下文工程.md) | 持续交互的"情境理解" | ✅ |
 | [第十章 智能体通信协议](./docs/chapter10/第十章%20智能体通信协议.md) | MCP、A2A、ANP 等协议解析 | ✅ |
-| [第十一章 Agentic-RL](./docs/chapter11/第十一章%20Agentic-RL.md) | 从SFT到GRPO的LLM训练实战 | ✅ |
+| [第十一章 Agentic-RL](./docs/chapter11/第十一章%20Agentic-RL.md) | 从 SFT  GRPO  LLM 训练实战 | ✅ |
 | [第十二章 智能体性能评估](./docs/chapter12/第十二章%20智能体性能评估.md) | 核心指标、基准测试与评估框架 | ✅ |
 | <strong>第四部分:综合案例进阶</strong> |  |  |
-| [第十三章 智能旅行助手](./docs/chapter13/第十三章%20智能旅行助手.md) | MCP与多智能体协作的真实世界应用 | ✅ |
+| [第十三章 智能旅行助手](./docs/chapter13/第十三章%20智能旅行助手.md) | MCP 与多智能体协作的真实世界应用 | ✅ |
 | [第十四章 自动化深度研究智能体](./docs/chapter14/第十四章%20自动化深度研究智能体.md) | DeepResearch Agent 复现与解析 | ✅ |
 | [第十五章 构建赛博小镇](./docs/chapter15/第十五章%20构建赛博小镇.md) | Agent 与游戏的结合,模拟社会动态 | ✅ |
 | <strong>第五部分:毕业设计及未来展望</strong> |  |  |
@@ -68,11 +72,11 @@
 
 ### 社区贡献精选 (Community Blog)
 
-&emsp;&emsp;欢迎大家将在学习 Hello-Agents 或 Agent 相关技术中的独到见解、实践总结,以 PR 的形式贡献到社区精选。如果是独立于正文的内容,也可以投稿至Extra-Chapter!<strong>期待你的第一次贡献!</strong>
+&emsp;&emsp;欢迎大家将在学习 Hello-Agents 或 Agent 相关技术中的独到见解、实践总结,以 PR 的形式贡献到社区精选。如果是独立于正文的内容,也可以投稿至 Extra-Chapter!<strong>期待你的第一次贡献!</strong>
 
 | 社区精选 | 内容总结 |
 | --- | --- |
-| [01-Agent面试题总结](./Extra-Chapter/Extra01-面试问题总结.md) | Agent岗位相关面试问题 |
+| [01-Agent面试题总结](./Extra-Chapter/Extra01-面试问题总结.md) | Agent 岗位相关面试问题 |
 | [01-Agent面试题答案](./Extra-Chapter/Extra01-参考答案.md) | 相关面试问题答案 |
 | [02-上下文工程内容补充](./Extra-Chapter/Extra02-上下文工程补充知识.md) | 上下文工程内容扩展 |
 
@@ -87,22 +91,22 @@
 
 &emsp;&emsp;欢迎你,未来的智能系统构建者!在开启这段激动人心的旅程之前,请允许我们给你一些清晰的指引。
 
-&emsp;&emsp;本项目内容兼顾理论与实战,旨在帮助你系统性地掌握从单个智能体到多智能体系统的设计与开发全流程。因此,尤其适合有一定编程基础的 <strong>AI开发者、软件工程师、在校学生</strong> 以及对前沿 AI 技术抱有浓厚兴趣的 <strong>自学者</strong>。在学习本项目之前,我们希望你具备基础的 Python 编程能力,并对大语言模型有基本的概念性了解(例如,知道如何通过 API 调用一个 LLM)。项目的重点是应用与构建,因此你无需具备深厚的算法或模型训练背景。
+&emsp;&emsp;本项目内容兼顾理论与实战,旨在帮助你系统性地掌握从单个智能体到多智能体系统的设计与开发全流程。因此,尤其适合有一定编程基础的 <strong>AI 开发者、软件工程师、在校学生</strong> 以及对前沿 AI 技术抱有浓厚兴趣的 <strong>自学者</strong>。在学习本项目之前,我们希望你具备基础的 Python 编程能力,并对大语言模型有基本的概念性了解(例如,知道如何通过 API 调用一个 LLM)。项目的重点是应用与构建,因此你无需具备深厚的算法或模型训练背景。
 
 &emsp;&emsp;项目分为五大部分,每一部分都是通往下一阶段的坚实阶梯:
 
-- <strong>第一部分:智能体与语言模型基础</strong>(第1章~第3章),我们将从智能体的定义、类型与发展历史讲起,为你梳理"智能体"这一概念的来龙去脉。随后,我们会快速巩固大语言模型的核心知识,为你的实践之旅打下坚实的理论地基。
+- <strong>第一部分:智能体与语言模型基础</strong>(第一章~第三章),我们将从智能体的定义、类型与发展历史讲起,为你梳理"智能体"这一概念的来龙去脉。随后,我们会快速巩固大语言模型的核心知识,为你的实践之旅打下坚实的理论地基。
 
-- <strong>第二部分:构建你的大语言模型智能体</strong>(第4章~第7章),这是你动手实践的起点。你将亲手实现 ReAct 等经典范式,体验 Coze 等低代码平台的便捷,并掌握 Langgraph 等主流框架的应用。最终,我们还会带你从零开始构建一个属于自己的智能体框架,让你兼具“用轮子”与“造轮子”的能力。
+- <strong>第二部分:构建你的大语言模型智能体</strong>(第四章~第七章),这是你动手实践的起点。你将亲手实现 ReAct 等经典范式,体验 Coze 等低代码平台的便捷,并掌握 Langgraph 等主流框架的应用。最终,我们还会带你从零开始构建一个属于自己的智能体框架,让你兼具“用轮子”与“造轮子”的能力。
 
-- <strong>第三部分:高级知识扩展</strong>(第8章~第12章),在这一部分,你的智能体将“学会”思考与协作。我们将使用第二部分的自研框架,深入探索记忆与检索、上下文工程、Agent训练等核心技术,并学习多智能体间的通信协议。最终,你将掌握评估智能体系统性能的专业方法。
+- <strong>第三部分:高级知识扩展</strong>(第八章~第十二章),在这一部分,你的智能体将“学会”思考与协作。我们将使用第二部分的自研框架,深入探索记忆与检索、上下文工程、Agent 训练等核心技术,并学习多智能体间的通信协议。最终,你将掌握评估智能体系统性能的专业方法。
 
-- <strong>第四部分:综合案例进阶</strong>(第13章~第15章),这里是理论与实践的交汇点。你将把所学融会贯通,亲手打造智能旅行助手、自动化深度研究智能体,乃至一个模拟社会动态的赛博小镇,在真实有趣的项目中淬炼你的构建能力。
+- <strong>第四部分:综合案例进阶</strong>(第十三章~第十五章),这里是理论与实践的交汇点。你将把所学融会贯通,亲手打造智能旅行助手、自动化深度研究智能体,乃至一个模拟社会动态的赛博小镇,在真实有趣的项目中淬炼你的构建能力。
 
-- <strong>第五部分:毕业设计及未来展望</strong>(第16章),在旅程的终点,你将迎来一个毕业设计,构建一个完整的、属于你自己的多智能体应用,全面检验你的学习成果。我们还将与你一同展望智能体的未来,探索激动人心的前沿方向。
+- <strong>第五部分:毕业设计及未来展望</strong>(第十六章),在旅程的终点,你将迎来一个毕业设计,构建一个完整的、属于你自己的多智能体应用,全面检验你的学习成果。我们还将与你一同展望智能体的未来,探索激动人心的前沿方向。
 
 
-&emsp;&emsp;智能体是一个飞速发展且极度依赖实践的领域。为了获得最佳的学习效果,我们在项目的`code`文件夹内提供了配套的全部代码,强烈建议你<strong>将理论与实践相结合</strong>。请务必亲手运行、调试甚至修改项目里提供的每一份代码。欢迎你随时关注 Datawhale 以及其他 Agent相关社区,当遇到问题时,你可以随时在本项目的 issue 区提问。
+&emsp;&emsp;智能体是一个飞速发展且极度依赖实践的领域。为了获得最佳的学习效果,我们在项目的`code`文件夹内提供了配套的全部代码,强烈建议你<strong>将理论与实践相结合</strong>。请务必亲手运行、调试甚至修改项目里提供的每一份代码。欢迎你随时关注 Datawhale 以及其他 Agent 相关社区,当遇到问题时,你可以随时在本项目的 issue 区提问。
 
 &emsp;&emsp;现在,准备好进入智能体的奇妙世界了吗?让我们即刻启程!
 
@@ -118,15 +122,15 @@
 ## 🙏 致谢
 
 ### 核心贡献者
-- [陈思州-项目负责人](https://github.com/jjyaoao) (Datawhale成员, 全文写作和校对)
-- [孙韬-项目负责人](https://github.com/fengju0213) (Datawhale成员, 第九章内容和校对)  
-- [姜舒凡-项目负责人](https://github.com/Tsumugii24)(Datawhale成员, 章节习题设计和校对)
-- [黄佩林-Datawhale意向成员](https://github.com/HeteroCat) (Agent开发工程师, 第五章内容贡献者)
+- [陈思州-项目负责人](https://github.com/jjyaoao) (Datawhale 成员, 全文写作和校对)
+- [孙韬-项目负责人](https://github.com/fengju0213) (Datawhale 成员, 第九章内容和校对)  
+- [姜舒凡-项目负责人](https://github.com/Tsumugii24)(Datawhale 成员, 章节习题设计和校对)
+- [黄佩林-Datawhale意向成员](https://github.com/HeteroCat) (Agent 开发工程师, 第五章内容贡献者)
 - [曾鑫民-Agent工程师](https://github.com/fancyboi999) (牛客科技, 第十四章案例开发)
 
 ### Extra-Chapter 贡献者
 - [WH](https://github.com/WHQAQ11) (内容贡献者)
-- [周奥杰-DW贡献者团队](https://github.com/thunderbolt-fire) (西安交通大学, Extra02内容贡献)
+- [周奥杰-DW贡献者团队](https://github.com/thunderbolt-fire) (西安交通大学, Extra02 内容贡献)
 
 ### 特别感谢
 - 感谢 [@Sm1les](https://github.com/Sm1les) 对本项目的帮助与支持

+ 166 - 0
README_EN.md

@@ -0,0 +1,166 @@
+<div align="right">
+  English | <a href="./README.md">中文</a>
+</div>
+
+<div align='center'>
+  <img src="./docs/images/hello-agents.png" alt="alt text" width="100%">
+  <h1>Hello-Agents</h1>
+  <h3>🤖 Building Agent Systems from Scratch: Principles and Practice</h3>
+  <p><em>From foundational theory to practical applications, master the design and implementation of agent systems</em></p>
+  <img src="https://img.shields.io/github/stars/datawhalechina/Hello-Agents?style=flat&logo=github" alt="GitHub stars"/>
+  <img src="https://img.shields.io/github/forks/datawhalechina/Hello-Agents?style=flat&logo=github" alt="GitHub forks"/>
+  <img src="https://img.shields.io/badge/language-English-brightgreen?style=flat" alt="Language"/>
+  <a href="https://github.com/datawhalechina/Hello-Agents"><img src="https://img.shields.io/badge/GitHub-Project-blue?style=flat&logo=github" alt="GitHub Project"></a>
+  <a href="https://datawhalechina.github.io/hello-agents/"><img src="https://img.shields.io/badge/Online%20Reading-green?style=flat&logo=gitbook" alt="Online Reading"></a>
+</div>
+
+---
+
+## 🎯 Project Introduction
+
+&emsp;&emsp;If 2024 was the year of the "Battle of a Hundred Models," then 2025 has undoubtedly ushered in the "Year of Agents." The focus of technology is shifting from training larger foundation models to building smarter agent applications. However, systematic, practice-oriented tutorials are extremely scarce. For this reason, we launched the Hello-Agents project, hoping to provide the community with a comprehensive guide to building agent systems from scratch, balancing theory and practice.
+
+&emsp;&emsp;Hello-Agents is a **systematic agent learning tutorial** from the Datawhale community. Today, agent development is mainly divided into two schools: one is software engineering-oriented agents like Dify, Coze, and n8n, which are essentially process-driven software development with LLMs serving as data processing backends; the other is AI-native agents, truly AI-driven agents. This tutorial aims to lead you to deeply understand and build the latter—truly AI Native Agents. The tutorial will guide you through the surface of frameworks, starting from the core principles of agents, delving into their core architecture, understanding their classic paradigms, and ultimately building your own multi-agent applications. We believe that the best way to learn is through hands-on practice. We hope this tutorial can be your starting point for exploring the world of agents, transforming you from a "user" of large language models to a "builder" of agent systems.
+
+## 📚 Quick Start
+
+### Online Reading
+**[🌐 Click here to start reading online](https://datawhalechina.github.io/hello-agents/)** - No download required, learn anytime, anywhere
+
+**[📖 Cookbook (Beta)](https://book.heterocat.com.cn/)**
+
+### Local Reading
+If you wish to read locally or contribute content, please refer to the learning guide below.
+
+### ✨ What Will You Gain?
+
+- 📖 **Datawhale Open Source & Free** - Learn all project content completely free, grow with the community
+- 🔍 **Understand Core Principles** - Deeply understand agent concepts, history, and classic paradigms
+- 🏗️ **Hands-on Implementation** - Master popular low-code platforms and agent code frameworks
+- 🛠️ **Self-developed Framework [HelloAgents](https://github.com/jjyaoao/helloagents)** - Build your own agent framework from scratch based on OpenAI native API
+- ⚙️ **Master Advanced Skills** - Step-by-step implementation of context engineering, Memory, protocols, evaluation, and other systematic technologies
+- 🤝 **Model Training** - Master Agentic RL, from SFT to GRPO full-process practical LLM training
+- 🚀 **Drive Real Cases** - Practical development of intelligent travel assistants, cyber towns, and other comprehensive projects
+- 📖 **Job Interviews** - Learn agent-related interview questions for job hunting
+
+## 📖 Content Navigation
+
+| Chapter | Key Content | Status |
+| --- | --- | --- |
+| [Preface](./docs/Preface.md) | Project origin, background, and reader suggestions | ✅ |
+| **Part 1: Agent and Language Model Fundamentals** |  |  |
+| [Chapter 1: Introduction to Agents](./docs/chapter1/Chapter1-Introduction-to-Agents.md) | Agent definition, types, paradigms, and applications | ✅ |
+| [Chapter 2: History of Agents](./docs/chapter2/Chapter2-History-of-Agents.md) | Evolution from symbolism to LLM-driven agents | ✅ |
+| [Chapter 3: Large Language Model Fundamentals](./docs/chapter3/Chapter3-Fundamentals-of-Large-Language-Models.md) | Transformer, prompts, mainstream LLMs and their limitations | ✅ |
+| **Part 2: Building Your LLM Agent** |  |  |
+| [Chapter 4: Classic Agent Paradigm Construction](./docs/chapter4/Chapter4-Building-Classic-Agent-Paradigms.md) | Hands-on implementation of ReAct, Plan-and-Solve, Reflection | ✅ |
+| [Chapter 5: Low-Code Platform Agent Development](./docs/chapter5/Chapter5-Building-Agents-with-Low-Code-Platforms.md) | Understanding Coze, Dify, n8n and other low-code agent platforms | ✅ |
+| [Chapter 6: Framework Development Practice](./docs/chapter6/Chapter6-Framework-Development-Practice.md) | AutoGen, AgentScope, LangGraph and other mainstream framework applications | ✅ |
+| [Chapter 7: Building Your Agent Framework](./docs/chapter7/Chapter7-Building-Your-Agent-Framework.md) | Building an agent framework from scratch | ✅ |
+| **Part 3: Advanced Knowledge Extension** |  |  |
+| [Chapter 8: Memory and Retrieval](./docs/chapter8/Chapter8-Memory-and-Retrieval.md) | Memory systems, RAG, storage | ✅ |
+| [Chapter 9: Context Engineering](./docs/chapter9/Chapter9-Context-Engineering.md) | "Contextual understanding" for continuous interaction | ✅ |
+| [Chapter 10: Agent Communication Protocols](./docs/chapter10/Chapter10-Agent-Communication-Protocols.md) | MCP, A2A, ANP and other protocol analysis | ✅ |
+| [Chapter 11: Agentic-RL](./docs/chapter11/Chapter11-Agentic-RL.md) | Practical LLM training from SFT to GRPO | ✅ |
+| [Chapter 12: Agent Performance Evaluation](./docs/chapter12/Chapter12-Agent-Performance-Evaluation.md) | Core metrics, benchmarks, and evaluation frameworks | ✅ |
+| **Part 4: Comprehensive Case Studies** |  |  |
+| [Chapter 13: Intelligent Travel Assistant](./docs/chapter13/Chapter13-Intelligent-Travel-Assistant.md) | Real-world applications of MCP and multi-agent collaboration | ✅ |
+| [Chapter 14: Automated Deep Research Agent](./docs/chapter14/Chapter14-Automated-Deep-Research-Agent.md) | DeepResearch Agent reproduction and analysis | ✅ |
+| [Chapter 15: Building a Cyber Town](./docs/chapter15/Chapter15-Building-Cyber-Town.md) | Combining agents with games, simulating social dynamics | ✅ |
+| **Part 5: Capstone Project and Future Outlook** |  |  |
+| [Chapter 16: Capstone Project](./docs/chapter16/Chapter16-Graduation-Project.md) | Build your own complete multi-agent application | ✅ |
+
+### Community Contributions
+
+&emsp;&emsp;We welcome everyone to contribute their unique insights and practical summaries from learning Hello-Agents or Agent-related technologies to the community selection in the form of PRs. If the content is independent of the main text, you can also submit it to Extra-Chapter! **Looking forward to your first contribution!**
+
+| Community Selection | Content Summary |
+| --- | --- |
+| [01-Agent Interview Questions Summary](./Extra-Chapter/Extra01-Interview-Questions-EN.md) | Agent position-related interview questions |
+| [01-Agent Interview Answers](./Extra-Chapter/Extra01-Interview-Answers-EN.md) | Related interview question answers |
+| [02-Context Engineering Supplement](./Extra-Chapter/Extra02-Context-Engineering-Supplement-EN.md) | Context engineering content extension |
+
+### PDF Version Download
+
+&emsp;&emsp;*<strong>This Hello-Agents PDF tutorial is completely open source and free. To prevent various marketing accounts from adding watermarks and selling it to multi-agent system beginners, we have pre-added a Datawhale open source logo watermark that does not affect reading in the PDF file. Please understand~</strong>*
+
+> *Hello-Agents PDF: https://github.com/datawhalechina/Hello-Agents/releases/tag/PDF (Not yet completed)*  
+> *Hello-Agents PDF Domestic Download: https://www.datawhale.cn/learn/summary/XXX* 
+
+## 💡 How to Learn
+
+&emsp;&emsp;Welcome, future builder of intelligent systems! Before embarking on this exciting journey, please allow us to give you some clear guidance.
+
+&emsp;&emsp;This project balances theory and practice, aiming to help you systematically master the entire process of designing and developing from single agents to multi-agent systems. Therefore, it is especially suitable for **AI developers, software engineers, students** with some programming foundation, as well as **self-learners** with a strong interest in cutting-edge AI technology. Before learning this project, we hope you have basic Python programming skills and a basic conceptual understanding of large language models (for example, knowing how to call an LLM through an API). The focus of the project is on application and construction, so you do not need a deep background in algorithms or model training.
+
+&emsp;&emsp;The project is divided into five major parts, each being a solid step towards the next stage:
+
+- **Part 1: Agent and Language Model Fundamentals** (Chapters 1-3), we will start from the definition, types, and development history of agents, sorting out the ins and outs of the concept of "agents." Then, we will quickly consolidate the core knowledge of large language models, laying a solid theoretical foundation for your practical journey.
+
+- **Part 2: Building Your LLM Agent** (Chapters 4-7), this is the starting point of your hands-on practice. You will personally implement classic paradigms such as ReAct, experience the convenience of low-code platforms like Coze, and master the application of mainstream frameworks like Langgraph. Finally, we will also guide you to build your own agent framework from scratch, giving you the ability to both "use wheels" and "build wheels."
+
+- **Part 3: Advanced Knowledge Extension** (Chapters 8-12), in this part, your agent will "learn" to think and collaborate. We will use the self-developed framework from Part 2 to deeply explore core technologies such as memory and retrieval, context engineering, and Agent training, and learn communication protocols between multi-agents. Finally, you will master professional methods for evaluating agent system performance.
+
+- **Part 4: Comprehensive Case Studies** (Chapters 13-15), this is the intersection of theory and practice. You will integrate what you have learned and personally create intelligent travel assistants, automated deep research agents, and even a cyber town that simulates social dynamics, tempering your construction ability in real and interesting projects.
+
+- **Part 5: Capstone Project and Future Outlook** (Chapter 16), at the end of the journey, you will face a capstone project, building a complete multi-agent application of your own, comprehensively testing your learning outcomes. We will also look forward to the future of agents with you, exploring exciting frontier directions.
+
+&emsp;&emsp;Agents are a rapidly developing field that is extremely dependent on practice. To achieve the best learning effect, we provide all supporting code in the project's `code` folder. We strongly recommend that you **combine theory with practice**. Please be sure to personally run, debug, and even modify every piece of code provided in the project. You are welcome to follow Datawhale and other Agent-related communities at any time. When you encounter problems, you can ask questions in the issue area of this project at any time.
+
+&emsp;&emsp;Now, are you ready to enter the wonderful world of agents? Let's start right away!
+
+## 🤝 How to Contribute
+
+We are an open-source community and welcome any form of contribution!
+
+- 🐛 **Report Bugs** - Found content or code issues, please submit an Issue
+- 💡 **Make Suggestions** - Have good ideas for the project, welcome to initiate discussions
+- 📝 **Improve Content** - Help improve the tutorial, submit your Pull Request
+- ✍️ **Share Practice** - Share your learning notes and projects in "Community Contributions"
+
+## 🙏 Acknowledgments
+
+### Core Contributors
+- [Chen Sizhou - Project Lead](https://github.com/jjyaoao) (Datawhale member, full text writing and proofreading)
+- [Sun Tao - Project Lead](https://github.com/fengju0213) (Datawhale member, Chapter 9 content and proofreading)  
+- [Jiang Shufan - Project Lead](https://github.com/Tsumugii24) (Datawhale member, chapter exercise design and proofreading)
+- [Huang Peilin - Datawhale Prospective Member](https://github.com/HeteroCat) (Agent Development Engineer, Chapter 5 content contributor)
+- [Zeng Xinmin - Agent Engineer](https://github.com/fancyboi999) (Niuke Technology, Chapter 14 case development)
+
+### Extra-Chapter Contributors
+- [WH](https://github.com/WHQAQ11) (Content contributor)
+- [Zhou Aojie - DW Contributor Team](https://github.com/thunderbolt-fire) (Xi'an Jiaotong University, Extra02 content contribution)
+
+### Special Thanks
+- Thanks to [@Sm1les](https://github.com/Sm1les) for help and support for this project
+- Thanks to all developers who have contributed to this project ❤️
+
+<div align=center style="margin-top: 30px;">
+  <a href="https://github.com/datawhalechina/Hello-Agents/graphs/contributors">
+    <img src="https://contrib.rocks/image?repo=datawhalechina/Hello-Agents" />
+  </a>
+</div>
+
+## Star History
+
+<div align='center'>
+    <img src="./docs/images/star-history-2025111.png" alt="Datawhale" width="90%">
+</div>
+
+<div align="center">
+  <p>⭐ If this project helps you, please give us a Star!</p>
+</div>
+
+## About Datawhale
+
+<div align='center'>
+    <img src="./docs/images/datawhale.png" alt="Datawhale" width="30%">
+    <p>Scan the QR code to follow the Datawhale official account and get more high-quality open source content</p>
+</div>
+
+---
+
+## 📜 Open Source License
+
+This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).
+

+ 48 - 0
docs/Preface.md

@@ -0,0 +1,48 @@
+<div align="right">
+  English | <a href="./前言.md">中文</a>
+</div>
+
+# Preface
+
+Since the end of 2022, Large Language Models (LLMs) represented by ChatGPT have swept across the world like a technological tsunami, completely transforming how we interact with artificial intelligence. The powerful natural language understanding and generation capabilities of LLMs have shown us a glimpse of the path toward Artificial General Intelligence (AGI). However, as the initial amazement settled, developers began exploring the next frontier: how to make AI not just a "question-answering" tool, but an "actor" capable of autonomous planning, tool invocation, and solving complex problems?
+
+The answer is **Agents**.
+
+If 2024 was the inaugural year of the "battle of a hundred models," then 2025 has undoubtedly ushered in the "Year of Agents." We see that the technological focus is shifting from training larger and more powerful foundation models to building smarter and more efficient agent applications. Individual agents can already handle tasks in specific domains, while Multi-Agent Systems (MAS), where multiple agents collaborate through division of labor, cooperation, and even debate to accomplish grand goals, are viewed as the key to unlocking the full potential of LLMs and solving complex real-world problems.
+
+However, there is an obvious gap in the current ecosystem: on one hand, there is a dizzying array of Agent frameworks and applications emerging continuously; on the other hand, there is an extreme scarcity of systematic knowledge. Most tutorials focus on API calls for specific frameworks, leaving learners "knowing how but not knowing why," still feeling powerless when facing complex requirements. We lack a practical guide that can penetrate framework appearances, start from first principles, and systematically explain agent design, construction, and collaboration.
+
+In view of this, we launched the Hello-Agents project, hoping to provide the community with a guide for building agent systems from scratch, balancing theory and practice. We will not only lead you to appreciate the most cutting-edge technologies in the agent field but also guide you to delve into their core architecture, understand their classic paradigms, and ultimately build your own multi-agent applications with your own hands.
+
+We believe that the best way to learn is through hands-on practice. We hope this tutorial can become your starting point for exploring the world of agents, enabling you to transform from a "user" of large language models to a "builder" of agent systems.
+
+## Suggestions for Readers
+
+Welcome, future intelligent system builder! Before embarking on this exciting journey, please allow us to give you some small suggestions.
+
+Before reading this project, we hope you:
+
+- Have basic Python programming skills.
+
+- Have a basic conceptual understanding of large language models (for example, know how to obtain LLM APIs).
+
+- Rest assured, you don't need a deep background in algorithms or model training; the project focuses on application and construction.
+
+This project is divided into five parts, covering basics to practice, progressing step by step, layer by layer:
+
+**Part One (Fundamentals)**: We will lay the foundation of core knowledge about artificial intelligence and LLMs, giving you a macro understanding of the background of agent emergence.
+
+**Part Two (Single Agent)**: This is where your hands-on practice begins. We will guide you to build a fully functional single agent from scratch, deeply understanding its internal "mental" structure.
+
+**Part Three (Advanced)**: Here, your agent will "learn" to think, possess memory and tools, and master communication protocols between agents, ultimately completing the evaluation closed loop.
+
+**Part Four (Practice)**: This is where the core value of the project lies. You will integrate all learned knowledge through a series of carefully designed comprehensive cases, tempering true gold in practice.
+
+**Part Five (Outlook)**: The end of the journey is a new beginning. You will personally create your "graduation project," drawing a perfect conclusion to your learning journey.
+
+"What is learned on paper is superficial; to truly understand, one must practice." To achieve the best learning effect, we provide all supporting code in the project's `code` folder. We strongly recommend combining theory with practice. Please be sure to personally run, debug, and even modify every piece of code provided in the project. We encourage you to apply what you've learned to real scenarios that interest you—this is the ultimate purpose of learning.
+
+Finally, as an open-source project, we warmly welcome your participation and contribution. When you encounter problems, you can ask questions in our community; when you have new ideas or discoveries, you are also welcome to join the project's co-construction at any time.
+
+Thank you for choosing to read Hello-Agents. We wish you happy learning and unlimited exploration!
+

+ 27 - 23
docs/README.md

@@ -1,3 +1,6 @@
+<div align="right">
+  <a href="./README_EN.md">English</a> | 中文
+</div>
 <div align='center'>
   <img src="./images/hello-agents.png" alt="alt text" width="100%">
   <h1>Hello-Agents</h1>
@@ -10,13 +13,14 @@
   <a href="https://datawhalechina.github.io/hello-agents/"><img src="https://img.shields.io/badge/在线阅读-Online%20Reading-green?style=flat&logo=gitbook" alt="Online Reading"></a>
 </div>
 
+
 ---
 
 ## 🎯 项目介绍
 
-&emsp;&emsp;如果说2024年是"百模大战"的元年,那么2025年无疑开启了"Agent元年"。技术的焦点正从训练更大的基础模型,转向构建更聪明的智能体应用。然而,当前系统性、重实践的教程却极度匮乏。为此,我们发起了 Hello-Agents 项目,希望能为社区提供一本从零开始、理论与实战并重的智能体系统构建指南。
+&emsp;&emsp;如果说 2024 年是"百模大战"的元年,那么 2025 年无疑开启了"Agent 元年"。技术的焦点正从训练更大的基础模型,转向构建更聪明的智能体应用。然而,当前系统性、重实践的教程却极度匮乏。为此,我们发起了 Hello-Agents 项目,希望能为社区提供一本从零开始、理论与实战并重的智能体系统构建指南。
 
-&emsp;&emsp;Hello-Agents 是Datawhale社区的<strong>系统性智能体学习教程</strong>。如今Agent构建主要分为两派,一派是Dify,Coze,n8n这类软件工程类Agent,其本质是流程驱动的软件开发,LLM作为数据处理的后端;另一派则是AI原生的Agent,即真正以AI驱动的Agent。本教程旨在带领大家深入理解并构建后者——真正的AI Native Agent。教程将带领你穿透框架表象,从智能体的核心原理出发,深入其核心架构,理解其经典范式,并最终亲手构建起属于自己的多智能体应用。我们相信,最好的学习方式就是动手实践。希望这本教程能成为你探索智能体世界的起点,能够从一名大语言模型的"使用者",蜕变为一名智能体系统的"构建者"。
+&emsp;&emsp;Hello-Agents 是 Datawhale 社区的<strong>系统性智能体学习教程</strong>。如今 Agent 构建主要分为两派,一派是 Dify,Coze,n8n 这类软件工程类 Agent,其本质是流程驱动的软件开发,LLM 作为数据处理的后端;另一派则是 AI 原生的 Agent,即真正以 AI 驱动的 Agent。本教程旨在带领大家深入理解并构建后者——真正的 AI Native Agent。教程将带领你穿透框架表象,从智能体的核心原理出发,深入其核心架构,理解其经典范式,并最终亲手构建起属于自己的多智能体应用。我们相信,最好的学习方式就是动手实践。希望这本教程能成为你探索智能体世界的起点,能够从一名大语言模型的"使用者",蜕变为一名智能体系统的"构建者"。
 
 ## 🌐 在线阅读
 
@@ -29,9 +33,9 @@
 - 📖 <strong>Datawhale 开源免费</strong> 完全免费学习本项目所有内容,与社区共同成长
 - 🔍 <strong>理解核心原理</strong> 深入理解智能体的概念、历史与经典范式
 - 🏗️ <strong>亲手实现</strong> 掌握热门低代码平台和智能体代码框架的使用
-- 🛠️ <strong>自研框架[HelloAgents](https://github.com/jjyaoao/helloagents)</strong> 基于Openai原生API从零构建一个自己的智能体框架
+- 🛠️ <strong>自研框架[HelloAgents](https://github.com/jjyaoao/helloagents)</strong> 基于 Openai 原生 API 从零构建一个自己的智能体框架
 - ⚙️ <strong>掌握高级技能</strong> 一步步实现上下文工程、Memory、协议、评估等系统性技术
-- 🤝 <strong>模型训练</strong> 掌握Agentic RL,从SFT到GRPO的全流程实战训练LLM
+- 🤝 <strong>模型训练</strong> 掌握 Agentic RL,从 SFT  GRPO 的全流程实战训练 LLM
 - 🚀 <strong>驱动真实案例</strong> 实战开发智能旅行助手、赛博小镇等综合项目
 - 📖 <strong>求职面试</strong> 学习智能体求职相关面试问题
 
@@ -43,20 +47,20 @@
 | <strong>第一部分:智能体与语言模型基础</strong> |  |  |
 | [第一章 初识智能体](./chapter1/第一章%20初识智能体.md) | 智能体定义、类型、范式与应用 | ✅ |
 | [第二章 智能体发展史](./chapter2/第二章%20智能体发展史.md) | 从符号主义到 LLM 驱动的智能体演进 | ✅ |
-| [第三章 大语言模型基础](./chapter3/第三章%20大语言模型基础.md) | Transformer、提示、主流LLM及其局限 | ✅ |
+| [第三章 大语言模型基础](./chapter3/第三章%20大语言模型基础.md) | Transformer、提示、主流 LLM 及其局限 | ✅ |
 | <strong>第二部分:构建你的大语言模型智能体</strong> |  |  |
 | [第四章 智能体经典范式构建](./chapter4/第四章%20智能体经典范式构建.md) | 手把手实现 ReAct、Plan-and-Solve、Reflection | ✅ |
-| [第五章 基于低代码平台的智能体搭建](./chapter5/第五章%20基于低代码平台的智能体搭建.md) | 了解Coze、Dify、n8n等低代码智能体平台使用 | ✅ |
+| [第五章 基于低代码平台的智能体搭建](./chapter5/第五章%20基于低代码平台的智能体搭建.md) | 了解 Coze、Dify、n8n 等低代码智能体平台使用 | ✅ |
 | [第六章 框架开发实践](./chapter6/第六章%20框架开发实践.md) | AutoGen、AgentScope、LangGraph 等主流框架应用 | ✅ |
-| [第七章 构建你的Agent框架](./chapter7/第七章%20构建你的Agent框架.md) | 从0开始构建智能体框架 | ✅ |
+| [第七章 构建你的Agent框架](./chapter7/第七章%20构建你的Agent框架.md) | 从 0 开始构建智能体框架 | ✅ |
 | <strong>第三部分:高级知识扩展</strong> |  |  |
 | [第八章 记忆与检索](./chapter8/第八章%20记忆与检索.md) | 记忆系统,RAG,存储 | ✅ |
 | [第九章 上下文工程](./chapter9/第九章%20上下文工程.md) | 持续交互的"情境理解" | ✅ |
 | [第十章 智能体通信协议](./chapter10/第十章%20智能体通信协议.md) | MCP、A2A、ANP 等协议解析 | ✅ |
-| [第十一章 Agentic-RL](./chapter11/第十一章%20Agentic-RL.md) | 从SFT到GRPO的LLM训练实战 | ✅ |
+| [第十一章 Agentic-RL](./chapter11/第十一章%20Agentic-RL.md) | 从 SFT  GRPO  LLM 训练实战 | ✅ |
 | [第十二章 智能体性能评估](./chapter12/第十二章%20智能体性能评估.md) | 核心指标、基准测试与评估框架 | ✅ |
 | <strong>第四部分:综合案例进阶</strong> |  |  |
-| [第十三章 智能旅行助手](./chapter13/第十三章%20智能旅行助手.md) | MCP与多智能体协作的真实世界应用 | ✅ |
+| [第十三章 智能旅行助手](./chapter13/第十三章%20智能旅行助手.md) | MCP 与多智能体协作的真实世界应用 | ✅ |
 | [第十四章 自动化深度研究智能体](./chapter14/第十四章%20自动化深度研究智能体.md) | DeepResearch Agent 复现与解析 | ✅ |
 | [第十五章 构建赛博小镇](./chapter15/第十五章%20构建赛博小镇.md) | Agent 与游戏的结合,模拟社会动态 | ✅ |
 | <strong>第五部分:毕业设计及未来展望</strong> |  |  |
@@ -64,11 +68,11 @@
 
 ### 社区贡献精选 (Community Blog)
 
-&emsp;&emsp;欢迎大家将在学习 Hello-Agents 或 Agent 相关技术中的独到见解、实践总结,以 PR 的形式贡献到社区精选。如果是独立于正文的内容,也可以投稿至Extra-Chapter!<strong>期待你的第一次贡献!</strong>
+&emsp;&emsp;欢迎大家将在学习 Hello-Agents 或 Agent 相关技术中的独到见解、实践总结,以 PR 的形式贡献到社区精选。如果是独立于正文的内容,也可以投稿至 Extra-Chapter!<strong>期待你的第一次贡献!</strong>
 
 | 社区精选 | 内容总结 |
 | --- | --- |
-| [01-Agent面试题总结](../Extra-Chapter/Extra01-面试问题总结.md) | Agent岗位相关面试问题 |
+| [01-Agent面试题总结](../Extra-Chapter/Extra01-面试问题总结.md) | Agent 岗位相关面试问题 |
 | [01-Agent面试题答案](../Extra-Chapter/Extra01-参考答案.md) | 相关面试问题答案 |
 | [02-上下文工程内容补充](../Extra-Chapter/Extra02-上下文工程补充知识.md) | 上下文工程内容扩展 |
 
@@ -83,22 +87,22 @@
 
 &emsp;&emsp;欢迎你,未来的智能系统构建者!在开启这段激动人心的旅程之前,请允许我们给你一些清晰的指引。
 
-&emsp;&emsp;本项目内容兼顾理论与实战,旨在帮助你系统性地掌握从单个智能体到多智能体系统的设计与开发全流程。因此,尤其适合有一定编程基础的 <strong>AI开发者、软件工程师、在校学生</strong> 以及对前沿 AI 技术抱有浓厚兴趣的 <strong>自学者</strong>。在学习本项目之前,我们希望你具备基础的 Python 编程能力,并对大语言模型有基本的概念性了解(例如,知道如何通过 API 调用一个 LLM)。项目的重点是应用与构建,因此你无需具备深厚的算法或模型训练背景。
+&emsp;&emsp;本项目内容兼顾理论与实战,旨在帮助你系统性地掌握从单个智能体到多智能体系统的设计与开发全流程。因此,尤其适合有一定编程基础的 <strong>AI 开发者、软件工程师、在校学生</strong> 以及对前沿 AI 技术抱有浓厚兴趣的 <strong>自学者</strong>。在学习本项目之前,我们希望你具备基础的 Python 编程能力,并对大语言模型有基本的概念性了解(例如,知道如何通过 API 调用一个 LLM)。项目的重点是应用与构建,因此你无需具备深厚的算法或模型训练背景。
 
 &emsp;&emsp;项目分为五大部分,每一部分都是通往下一阶段的坚实阶梯:
 
-- <strong>第一部分:智能体与语言模型基础</strong>(第1章~第3章),我们将从智能体的定义、类型与发展历史讲起,为你梳理"智能体"这一概念的来龙去脉。随后,我们会快速巩固大语言模型的核心知识,为你的实践之旅打下坚实的理论地基。
+- <strong>第一部分:智能体与语言模型基础</strong>(第一章~第三章),我们将从智能体的定义、类型与发展历史讲起,为你梳理"智能体"这一概念的来龙去脉。随后,我们会快速巩固大语言模型的核心知识,为你的实践之旅打下坚实的理论地基。
 
-- <strong>第二部分:构建你的大语言模型智能体</strong>(第4章~第7章),这是你动手实践的起点。你将亲手实现 ReAct 等经典范式,体验 Coze 等低代码平台的便捷,并掌握 Langgraph 等主流框架的应用。最终,我们还会带你从零开始构建一个属于自己的智能体框架,让你兼具“用轮子”与“造轮子”的能力。
+- <strong>第二部分:构建你的大语言模型智能体</strong>(第四章~第七章),这是你动手实践的起点。你将亲手实现 ReAct 等经典范式,体验 Coze 等低代码平台的便捷,并掌握 Langgraph 等主流框架的应用。最终,我们还会带你从零开始构建一个属于自己的智能体框架,让你兼具“用轮子”与“造轮子”的能力。
 
-- <strong>第三部分:高级知识扩展</strong>(第8章~第12章),在这一部分,你的智能体将“学会”思考与协作。我们将使用第二部分的自研框架,深入探索记忆与检索、上下文工程、Agent训练等核心技术,并学习多智能体间的通信协议。最终,你将掌握评估智能体系统性能的专业方法。
+- <strong>第三部分:高级知识扩展</strong>(第八章~第十二章),在这一部分,你的智能体将“学会”思考与协作。我们将使用第二部分的自研框架,深入探索记忆与检索、上下文工程、Agent 训练等核心技术,并学习多智能体间的通信协议。最终,你将掌握评估智能体系统性能的专业方法。
 
-- <strong>第四部分:综合案例进阶</strong>(第13章~第15章),这里是理论与实践的交汇点。你将把所学融会贯通,亲手打造智能旅行助手、自动化深度研究智能体,乃至一个模拟社会动态的赛博小镇,在真实有趣的项目中淬炼你的构建能力。
+- <strong>第四部分:综合案例进阶</strong>(第十三章~第十五章),这里是理论与实践的交汇点。你将把所学融会贯通,亲手打造智能旅行助手、自动化深度研究智能体,乃至一个模拟社会动态的赛博小镇,在真实有趣的项目中淬炼你的构建能力。
 
-- <strong>第五部分:毕业设计及未来展望</strong>(第16章),在旅程的终点,你将迎来一个毕业设计,构建一个完整的、属于你自己的多智能体应用,全面检验你的学习成果。我们还将与你一同展望智能体的未来,探索激动人心的前沿方向。
+- <strong>第五部分:毕业设计及未来展望</strong>(第十六章),在旅程的终点,你将迎来一个毕业设计,构建一个完整的、属于你自己的多智能体应用,全面检验你的学习成果。我们还将与你一同展望智能体的未来,探索激动人心的前沿方向。
 
 
-&emsp;&emsp;智能体是一个飞速发展且极度依赖实践的领域。为了获得最佳的学习效果,我们在项目的`code`文件夹内提供了配套的全部代码,强烈建议你<strong>将理论与实践相结合</strong>。请务必亲手运行、调试甚至修改项目里提供的每一份代码。欢迎你随时关注 Datawhale 以及其他 Agent相关社区,当遇到问题时,你可以随时在本项目的 issue 区提问。
+&emsp;&emsp;智能体是一个飞速发展且极度依赖实践的领域。为了获得最佳的学习效果,我们在项目的`code`文件夹内提供了配套的全部代码,强烈建议你<strong>将理论与实践相结合</strong>。请务必亲手运行、调试甚至修改项目里提供的每一份代码。欢迎你随时关注 Datawhale 以及其他 Agent 相关社区,当遇到问题时,你可以随时在本项目的 issue 区提问。
 
 &emsp;&emsp;现在,准备好进入智能体的奇妙世界了吗?让我们即刻启程!
 
@@ -114,15 +118,15 @@
 ## 🙏 致谢
 
 ### 核心贡献者
-- [陈思州-项目负责人](https://github.com/jjyaoao) (Datawhale成员, 全文写作和校对)
-- [孙韬-项目负责人](https://github.com/fengju0213) (Datawhale成员, 第九章内容和校对)  
-- [姜舒凡-项目负责人](https://github.com/Tsumugii24)(Datawhale成员, 章节习题设计和校对)
-- [黄佩林-Datawhale意向成员](https://github.com/HeteroCat) (Agent开发工程师, 第五章内容贡献者)
+- [陈思州-项目负责人](https://github.com/jjyaoao) (Datawhale 成员, 全文写作和校对)
+- [孙韬-项目负责人](https://github.com/fengju0213) (Datawhale 成员, 第九章内容和校对)  
+- [姜舒凡-项目负责人](https://github.com/Tsumugii24)(Datawhale 成员, 章节习题设计和校对)
+- [黄佩林-Datawhale意向成员](https://github.com/HeteroCat) (Agent 开发工程师, 第五章内容贡献者)
 - [曾鑫民-Agent工程师](https://github.com/fancyboi999) (牛客科技, 第十四章案例开发)
 
 ### Extra-Chapter 贡献者
 - [WH](https://github.com/WHQAQ11) (内容贡献者)
-- [周奥杰-DW贡献者团队](https://github.com/thunderbolt-fire) (西安交通大学, Extra02内容贡献)
+- [周奥杰-DW贡献者团队](https://github.com/thunderbolt-fire) (西安交通大学, Extra02 内容贡献)
 
 ### 特别感谢
 - 感谢 [@Sm1les](https://github.com/Sm1les) 对本项目的帮助与支持

+ 158 - 0
docs/README_EN.md

@@ -0,0 +1,158 @@
+<div align='center'>
+  <img src="./images/hello-agents.png" alt="alt text" width="100%">
+  <h1>Hello-Agents</h1>
+  <h3>🤖 Building Agent Systems from Scratch: Principles and Practice Tutorial</h3>
+  <p><em>From fundamental theory to practical applications, comprehensively master the design and implementation of agent systems</em></p>
+  <img src="https://img.shields.io/github/stars/datawhalechina/Hello-Agents?style=flat&logo=github" alt="GitHub stars"/>
+  <img src="https://img.shields.io/github/forks/datawhalechina/Hello-Agents?style=flat&logo=github" alt="GitHub forks"/>
+  <img src="https://img.shields.io/badge/language-English-blue?style=flat" alt="Language"/>
+  <a href="https://github.com/datawhalechina/Hello-Agents"><img src="https://img.shields.io/badge/GitHub-Project-blue?style=flat&logo=github" alt="GitHub Project"></a>
+  <a href="https://datawhalechina.github.io/hello-agents/"><img src="https://img.shields.io/badge/Online%20Reading-在线阅读-green?style=flat&logo=gitbook" alt="Online Reading"></a>
+</div>
+
+---
+
+## 🎯 Project Introduction
+
+&emsp;&emsp;If 2024 was the inaugural year of the "battle of a hundred models," then 2025 has undoubtedly ushered in the "Year of Agents." The technological focus is shifting from training larger foundation models to building smarter agent applications. However, systematic, practice-oriented tutorials are extremely scarce. For this reason, we launched the Hello-Agents project, hoping to provide the community with a guide for building agent systems from scratch, balancing theory and practice.
+
+&emsp;&emsp;Hello-Agents is a **systematic agent learning tutorial** from the Datawhale community. Currently, Agent construction is mainly divided into two schools: one is software engineering-type Agents like Dify, Coze, and n8n, which are essentially process-driven software development with LLMs serving as data processing backends; the other is AI-native Agents, truly AI-driven Agents. This tutorial aims to lead you to deeply understand and build the latter—true AI Native Agents. The tutorial will guide you to penetrate framework appearances, start from the core principles of agents, delve into their core architecture, understand their classic paradigms, and ultimately build your own multi-agent applications with your own hands. We believe that the best way to learn is through hands-on practice. We hope this tutorial can become your starting point for exploring the world of agents, enabling you to transform from a "user" of large language models to a "builder" of agent systems.
+
+## 🌐 Online Reading
+
+**[🌐 Click here to start online reading](https://datawhalechina.github.io/hello-agents/)**
+
+**[📖 Cookbook (Beta)](https://book.heterocat.com.cn/)**
+
+### ✨ What Will You Gain?
+
+- 📖 **Datawhale Open Source Free** - Learn all content of this project completely free, grow together with the community
+- 🔍 **Understand Core Principles** - Deeply understand the concepts, history, and classic paradigms of agents
+- 🏗️ **Hands-on Implementation** - Master the use of popular low-code platforms and agent code frameworks
+- 🛠️ **Self-developed Framework [HelloAgents](https://github.com/jjyaoao/helloagents)** - Build your own agent framework from scratch based on OpenAI native API
+- ⚙️ **Master Advanced Skills** - Step by step implement systematic technologies such as context engineering, Memory, protocols, and evaluation
+- 🤝 **Model Training** - Master Agentic RL, from SFT to GRPO full-process practical training of LLMs
+- 🚀 **Drive Real Cases** - Practical development of comprehensive projects such as intelligent travel assistants and cyber towns
+- 📖 **Job Interviews** - Learn agent-related interview questions for job hunting
+
+## 📖 Content Navigation
+
+| Chapter | Key Content | Status |
+| --- | --- | --- |
+| [Preface](./Preface.md) | Project origin, background, and reader suggestions | ✅ |
+| **Part One: Agent and Language Model Fundamentals** |  |  |
+| [Chapter 1: Introduction to Agents](./chapter1/Chapter1-Introduction-to-Agents.md) | Agent definition, types, paradigms, and applications | ✅ |
+| [Chapter 2: History of Agents](./chapter2/Chapter2-History-of-Agents.md) | Evolution from symbolism to LLM-driven agents | ✅ |
+| [Chapter 3: Large Language Model Fundamentals](./chapter3/Chapter3-Fundamentals-of-Large-Language-Models.md) | Transformer, prompts, mainstream LLMs and their limitations | ✅ |
+| **Part Two: Building Your Large Language Model Agent** |  |  |
+| [Chapter 4: Building Classic Agent Paradigms](./chapter4/Chapter4-Building-Classic-Agent-Paradigms.md) | Hands-on implementation of ReAct, Plan-and-Solve, Reflection | ✅ |
+| [Chapter 5: Agent Building Based on Low-Code Platforms](./chapter5/Chapter5-Building-Agents-with-Low-Code-Platforms.md) | Understanding the use of low-code agent platforms like Coze, Dify, n8n | ✅ |
+| [Chapter 6: Framework Development Practice](./chapter6/Chapter6-Framework-Development-Practice.md) | Application of mainstream frameworks such as AutoGen, AgentScope, LangGraph | ✅ |
+| [Chapter 7: Building Your Agent Framework](./chapter7/Chapter7-Building-Your-Agent-Framework.md) | Building an agent framework from scratch | ✅ |
+| **Part Three: Advanced Knowledge Extension** |  |  |
+| [Chapter 8: Memory and Retrieval](./chapter8/Chapter8-Memory-and-Retrieval.md) | Memory systems, RAG, storage | ✅ |
+| [Chapter 9: Context Engineering](./chapter9/Chapter9-Context-Engineering.md) | "Contextual understanding" for continuous interaction | ✅ |
+| [Chapter 10: Agent Communication Protocols](./chapter10/Chapter10-Agent-Communication-Protocols.md) | Analysis of protocols such as MCP, A2A, ANP | ✅ |
+| [Chapter 11: Agentic-RL](./chapter11/Chapter11-Agentic-RL.md) | Practical LLM training from SFT to GRPO | ✅ |
+| [Chapter 12: Agent Performance Evaluation](./chapter12/Chapter12-Agent-Performance-Evaluation.md) | Core metrics, benchmarks, and evaluation frameworks | ✅ |
+| **Part Four: Comprehensive Case Studies** |  |  |
+| [Chapter 13: Intelligent Travel Assistant](./chapter13/Chapter13-Intelligent-Travel-Assistant.md) | Real-world application of MCP and multi-agent collaboration | ✅ |
+| [Chapter 14: Automated Deep Research Agent](./chapter14/Chapter14-Automated-Deep-Research-Agent.md) | DeepResearch Agent reproduction and analysis | ✅ |
+| [Chapter 15: Building a Cyber Town](./chapter15/Chapter15-Building-Cyber-Town.md) | Combination of Agents and games, simulating social dynamics | ✅ |
+| **Part Five: Graduation Project and Future Outlook** |  |  |
+| [Chapter 16: Graduation Project](./chapter16/Chapter16-Graduation-Project.md) | Build your own complete multi-agent application | ✅ |
+
+### Community Contribution Highlights (Community Blog)
+
+&emsp;&emsp;We welcome everyone to contribute their unique insights and practical summaries from learning Hello-Agents or Agent-related technologies to the community highlights in the form of PRs. If the content is independent of the main text, you can also submit it to Extra-Chapter! **Looking forward to your first contribution!**
+
+| Community Highlights | Content Summary |
+| --- | --- |
+| [01-Agent Interview Questions Summary](../Extra-Chapter/Extra01-面试问题总结.md) | Agent position-related interview questions |
+| [01-Agent Interview Answers](../Extra-Chapter/Extra01-参考答案.md) | Answers to related interview questions |
+| [02-Context Engineering Content Supplement](../Extra-Chapter/Extra02-上下文工程补充知识.md) | Context engineering content extension |
+
+### PDF Version Download
+
+&emsp;&emsp; *<strong>This Hello-Agents PDF tutorial is completely open source and free. To prevent various marketing accounts from adding watermarks and selling it to multi-agent system beginners, we have pre-added Datawhale open-source logo watermarks that do not affect reading in the PDF file. Please understand~</strong>*
+
+> *Hello-Agents PDF: https://github.com/datawhalechina/Hello-Agents/releases/tag/PDF (not yet completed)*  
+> *Hello-Agents PDF domestic download address: https://www.datawhale.cn/learn/summary/XXX* 
+
+## 💡 How to Learn
+
+&emsp;&emsp;Welcome, future intelligent system builder! Before embarking on this exciting journey, please allow us to give you some clear guidance.
+
+&emsp;&emsp;This project balances theory and practice, aiming to help you systematically master the entire process of designing and developing from single agents to multi-agent systems. Therefore, it is especially suitable for **AI developers, software engineers, students** with some programming foundation, as well as **self-learners** with a strong interest in cutting-edge AI technology. Before learning this project, we hope you have basic Python programming skills and a basic conceptual understanding of large language models (for example, know how to call an LLM through an API). The project focuses on application and construction, so you don't need a deep background in algorithms or model training.
+
+&emsp;&emsp;The project is divided into five major parts, each being a solid step toward the next stage:
+
+- **Part One: Agent and Language Model Fundamentals** (Chapters 1-3), we will start from the definition, types, and development history of agents, sorting out the ins and outs of the concept of "agents" for you. Then, we will quickly consolidate core knowledge of large language models, laying a solid theoretical foundation for your practical journey.
+
+- **Part Two: Building Your Large Language Model Agent** (Chapters 4-7), this is the starting point of your hands-on practice. You will personally implement classic paradigms such as ReAct, experience the convenience of low-code platforms like Coze, and master the application of mainstream frameworks like Langgraph. Finally, we will guide you to build your own agent framework from scratch, giving you the ability to both "use wheels" and "build wheels."
+
+- **Part Three: Advanced Knowledge Extension** (Chapters 8-12), in this part, your agent will "learn" to think and collaborate. We will use the self-developed framework from Part Two to deeply explore core technologies such as memory and retrieval, context engineering, and Agent training, and learn communication protocols between multi-agents. Finally, you will master professional methods for evaluating agent system performance.
+
+- **Part Four: Comprehensive Case Studies** (Chapters 13-15), this is where theory and practice converge. You will integrate what you've learned, personally create intelligent travel assistants, automated deep research agents, and even a cyber town simulating social dynamics, tempering your construction abilities in real and interesting projects.
+
+- **Part Five: Graduation Project and Future Outlook** (Chapter 16), at the end of the journey, you will face a graduation project, building a complete multi-agent application of your own, comprehensively testing your learning outcomes. We will also look forward to the future of agents with you, exploring exciting frontier directions.
+
+&emsp;&emsp;Agents are a rapidly developing field that heavily relies on practice. To achieve the best learning effect, we provide all supporting code in the project's `code` folder. We strongly recommend **combining theory with practice**. Please be sure to personally run, debug, and even modify every piece of code provided in the project. You are welcome to follow Datawhale and other Agent-related communities at any time. When you encounter problems, you can ask questions in the issue section of this project at any time.
+
+&emsp;&emsp;Now, are you ready to enter the wonderful world of agents? Let's set off immediately!
+
+## 🤝 How to Contribute
+
+We are an open-source community and welcome any form of contribution!
+
+- 🐛 **Report Bugs** - If you find content or code issues, please submit an Issue
+- 💡 **Make Suggestions** - If you have good ideas for the project, feel free to start a discussion
+- 📝 **Improve Content** - Help improve the tutorial, submit your Pull Request
+- ✍️ **Share Practice** - Share your learning notes and projects in "Community Contribution Highlights"
+
+## 🙏 Acknowledgments
+
+### Core Contributors
+- [Chen Sizhou - Project Leader](https://github.com/jjyaoao) (Datawhale member, full text writing and proofreading)
+- [Sun Tao - Project Leader](https://github.com/fengju0213) (Datawhale member, Chapter 9 content and proofreading)  
+- [Jiang Shufan - Project Leader](https://github.com/Tsumugii24) (Datawhale member, chapter exercise design and proofreading)
+- [Huang Peilin - Datawhale Prospective Member](https://github.com/HeteroCat) (Agent Development Engineer, Chapter 5 content contributor)
+- [Zeng Xinmin - Agent Engineer](https://github.com/fancyboi999) (Niuke Technology, Chapter 14 case development)
+
+### Extra-Chapter Contributors
+- [WH](https://github.com/WHQAQ11) (Content contributor)
+- [Zhou Aojie - DW Contributor Team](https://github.com/thunderbolt-fire) (Xi'an Jiaotong University, Extra02 content contribution)
+
+### Special Thanks
+- Thanks to [@Sm1les](https://github.com/Sm1les) for help and support for this project
+- Thanks to all developers who have contributed to this project ❤️
+
+<div align=center style="margin-top: 30px;">
+  <a href="https://github.com/datawhalechina/Hello-Agents/graphs/contributors">
+    <img src="https://contrib.rocks/image?repo=datawhalechina/Hello-Agents" />
+  </a>
+</div>
+
+## Star History
+
+<div align='center'>
+    <img src="./images/star-history-2025111.png" alt="Datawhale" width="90%">
+</div>
+
+<div align="center">
+  <p>⭐ If this project helps you, please give us a Star!</p>
+</div>
+
+## About Datawhale
+
+<div align='center'>
+    <img src="./images/datawhale.png" alt="Datawhale" width="30%">
+    <p>Scan the QR code to follow the Datawhale official account and get more quality open-source content</p>
+</div>
+
+---
+
+## 📜 Open Source License
+
+This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).
+

+ 29 - 0
docs/_sidebar_en.md

@@ -0,0 +1,29 @@
+- [Hello-Agents](./README_EN.md)
+  - [Preface](./Preface.md)
+  
+- <strong>Part I: Fundamentals of Agents and Language Models</strong>
+  - [Chapter 1 Introduction to Agents](./chapter1/Chapter1-Introduction-to-Agents.md)
+  - [Chapter 2 History of Agents](./chapter2/Chapter2-History-of-Agents.md)
+  - [Chapter 3 Fundamentals of Large Language Models](./chapter3/Chapter3-Fundamentals-of-Large-Language-Models.md)
+
+- <strong>Part II: Building Your Large Language Model Agent</strong>
+  - [Chapter 4 Building Classic Agent Paradigms](./chapter4/Chapter4-Building-Classic-Agent-Paradigms.md)
+  - [Chapter 5 Building Agents with Low-Code Platforms](./chapter5/Chapter5-Building-Agents-with-Low-Code-Platforms.md)
+  - [Chapter 6 Framework Development Practice](./chapter6/Chapter6-Framework-Development-Practice.md)
+  - [Chapter 7 Building Your Agent Framework](./chapter7/Chapter7-Building-Your-Agent-Framework.md)
+
+- <strong>Part III: Advanced Knowledge Extension</strong>
+  - [Chapter 8 Memory and Retrieval](./chapter8/Chapter8-Memory-and-Retrieval.md)
+  - [Chapter 9 Context Engineering](./chapter9/Chapter9-Context-Engineering.md)
+  - [Chapter 10 Agent Communication Protocols](./chapter10/Chapter10-Agent-Communication-Protocols.md)
+  - [Chapter 11 Agentic-RL](./chapter11/Chapter11-Agentic-RL.md)
+  - [Chapter 12 Agent Performance Evaluation](./chapter12/Chapter12-Agent-Performance-Evaluation.md)
+
+- <strong>Part IV: Comprehensive Case Studies</strong>
+  - [Chapter 13 Intelligent Travel Assistant](./chapter13/Chapter13-Intelligent-Travel-Assistant.md)
+  - [Chapter 14 Automated Deep Research Agent](./chapter14/Chapter14-Automated-Deep-Research-Agent.md)
+  - [Chapter 15 Building Cyber Town](./chapter15/Chapter15-Building-Cyber-Town.md)
+
+- <strong>Part V: Graduation Project and Future Outlook</strong>
+  - [Chapter 16 Graduation Project](./chapter16/Chapter16-Graduation-Project.md)
+

+ 621 - 0
docs/chapter1/Chapter1-Introduction-to-Agents.md

@@ -0,0 +1,621 @@
+<div align="right">
+  English | <a href="./第一章 初识智能体.md">中文</a>
+</div>
+
+# Chapter 1: Introduction to Agents
+
+Welcome to the world of agents! In today's era where the wave of artificial intelligence is sweeping across the globe, **Agents** have become one of the core concepts driving technological transformation and application innovation. Whether your aspiration is to become a researcher or engineer in the AI field, or you hope to deeply understand the cutting edge of technology as an observer, mastering the essence of agents will be an indispensable part of your knowledge system.
+
+Therefore, in this chapter, let's return to the fundamentals and explore several questions together: What is an agent? What are its main types? How does it interact with the world we live in? Through these discussions, we hope to lay a solid foundation for your future learning and exploration.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-0.png" alt="Figure description" width="90%"/>
+  <p>Figure 1.1 Basic interaction loop between agent and environment</p>
+</div>
+
+## 1.1 What is an Agent?
+
+When exploring any complex concept, it's best to start with a concise definition. In the field of artificial intelligence, an agent is defined as any entity that can perceive its **Environment** through **Sensors**, and **autonomously** take **Actions** through **Actuators** to achieve specific goals.
+
+This definition contains four fundamental elements of an agent's existence. The environment is the external world in which the agent operates. For an autonomous vehicle, the environment is the dynamically changing road traffic; for a trading algorithm, the environment is the ever-changing financial market. The agent is not isolated from the environment—it continuously perceives the environmental state through its sensors. Cameras, microphones, radar, or data streams returned by various **Application Programming Interfaces (APIs)** are all extensions of its perceptual capabilities.
+
+After acquiring information, the agent needs to take actions to influence the environment, changing its state through actuators. Actuators can be physical devices (such as robotic arms or steering wheels) or virtual tools (such as executing code or calling a service).
+
+However, what truly endows an agent with "intelligence" is its **Autonomy**. An agent is not merely a program that passively responds to external stimuli or strictly executes preset instructions; it can make independent decisions based on its perceptions and internal state to achieve its design goals. This closed loop from perception to action forms the foundation of all agent behavior, as shown in Figure 1.1.
+
+### 1.1.1 Agents from a Traditional Perspective
+
+Before the current wave of **Large Language Models (LLMs)**, pioneers in artificial intelligence had already spent decades exploring and building the concept of "agents." These paradigms, which we now call "traditional agents," are not a single static concept but have undergone a clear evolutionary path from simple to complex, from passive reaction to active learning.
+
+The starting point of this evolution is the structurally simplest **Simple Reflex Agent**. Their decision-making core consists of "condition-action" rules explicitly designed by engineers, as shown in Figure 1.2. A classic automatic thermostat works this way: if the sensor perceives that the room temperature is higher than the set value, it activates the cooling system.
+
+This type of agent relies entirely on current perceptual input and has no memory or predictive capability. It's like a digitized instinct—reliable and efficient, but therefore unable to handle complex tasks that require understanding context. Its limitations raise a key question: What should an agent do if the current state of the environment is insufficient as the sole basis for decision-making?
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-1.png" alt="Figure description" width="90%"/>
+  <p>Figure 1.2 Decision logic diagram of a simple reflex agent</p>
+</div>
+
+To answer this question, researchers introduced the concept of "state" and developed **Model-Based Reflex Agents**. This type of agent has an internal **World Model** used to track and understand aspects of the environment that cannot be directly perceived. It attempts to answer: "What is the world like now?" For example, an autonomous vehicle driving through a tunnel, even if its camera temporarily cannot perceive the vehicle ahead, its internal model will still maintain a judgment about that vehicle's existence, speed, and estimated position. This internal model gives the agent a primitive form of "memory," making its decisions no longer solely dependent on instantaneous perception but based on a more coherent and complete understanding of the world state.
+
+However, merely understanding the world is not enough—an agent needs clear goals. This led to the development of **Goal-Based Agents**. Unlike the previous two types, their behavior is no longer passively reacting to the environment but actively and proactively selecting actions that can lead to a specific future state. The question this type of agent needs to answer is: "What should I do to achieve my goal?" A classic example is a GPS navigation system: your goal is to reach the office, and the agent will plan an optimal route using search algorithms (such as A*) based on map data (world model). The core capability of this type of agent is reflected in its consideration and planning for the future.
+
+Going further, real-world goals are often not singular. We not only want to reach the office but also want the shortest time, the most fuel-efficient route, and to avoid congestion. When multiple goals need to be balanced, **Utility-Based Agents** emerge. They assign a utility value to every possible world state, representing the level of satisfaction. The agent's core goal is no longer simply to achieve a specific state but to maximize expected utility. It needs to answer a more complex question: "Which behavior will bring me the most satisfactory result?" This architecture allows the agent to learn to balance conflicting goals, making its decisions closer to human rational choice.
+
+So far, the agents we've discussed, although increasingly complex in functionality, still rely on the prior knowledge of human designers for their core decision-making logic, whether rules, models, or utility functions. What if an agent could learn autonomously through interaction with the environment without relying on presets?
+
+This is the core idea of **Learning Agents**, and **Reinforcement Learning (RL)** is the most representative path to realizing this idea. A learning agent contains a performance element (the various types of agents we discussed earlier) and a learning element. The learning element continuously modifies the performance element's decision-making strategy by observing the results of the performance element's actions in the environment.
+
+Imagine an AI learning to play chess. It might start by making random moves, but when it finally wins a game, the system gives it a positive reward. Through extensive self-play, the learning element gradually discovers which moves are more likely to lead to ultimate victory. AlphaGo is a milestone achievement of this philosophy. In the complex game of Go, through reinforcement learning, it discovered many effective strategies that surpass existing human knowledge.
+
+From simple thermostats to cars with internal models, to navigation that can plan routes, to decision-makers who know how to weigh pros and cons, and finally to learners who can self-evolve through experience. This evolutionary path demonstrates the development trajectory that traditional artificial intelligence has undergone in building machine intelligence. They have laid a solid and necessary foundation for our understanding of more cutting-edge agent paradigms today.
+
+### 1.1.2 New Paradigm Driven by Large Language Models
+
+The emergence of large language models represented by **GPT (Generative Pre-trained Transformer)** is significantly changing the construction methods and capability boundaries of agents. LLM agents driven by large language models have fundamentally different core decision-making mechanisms from traditional agents, thus endowing them with a series of entirely new characteristics.
+
+This transformation can be clearly seen from the comparison of the two in multiple dimensions such as core engine, knowledge source, and interaction method, as shown in Table 1.1. In short, the capabilities of traditional agents stem from engineers' explicit programming and knowledge construction, and their behavior patterns are deterministic and bounded; while LLM agents, through pre-training on massive data, have acquired implicit world models and powerful emergent capabilities, enabling them to handle complex tasks in a more flexible and general way.
+
+<div align="center">
+  <p>Table 1.1 Core comparison between traditional agents and LLM-driven agents</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-2.png" alt="Figure description" width="90%"/>
+</div>
+
+This difference enables LLM agents to directly process high-level, ambiguous, and context-rich natural language instructions. Let's use an "intelligent travel assistant" as an example to illustrate.
+
+Before the emergence of LLM agents, planning a trip typically meant users needed to manually switch between multiple dedicated applications (such as weather, maps, booking websites), with the user themselves playing the role of information integration and decision-making. An LLM agent, however, can integrate this process. When receiving an ambiguous instruction like "plan a trip to Xiamen," its working method reflects the following points:
+
+- **Planning and Reasoning**: The agent first decomposes this high-level goal into a series of logical subtasks, for example: `[Confirm travel preferences] -> [Query destination information] -> [Draft itinerary] -> [Book tickets and accommodation]`. This is an internal, model-driven planning process.
+- **Tool Use**: When executing the plan, the agent identifies information gaps and proactively calls external tools to fill them. For example, it will call a weather query interface to get real-time weather, and based on the information "rain is forecast," it will tend to recommend indoor activities in subsequent planning.
+- **Dynamic Adjustment**: During the interaction, the agent treats user feedback (such as "this hotel exceeds the budget") as new constraints and adjusts subsequent actions accordingly, re-searching and recommending options that meet the new requirements. The entire process of "**check weather → adjust itinerary → book hotel**" demonstrates its ability to dynamically modify its behavior based on context.
+
+In summary, we are shifting from developing specialized automation tools to building systems that can autonomously solve problems. The core is no longer writing code but guiding a general "brain" to plan, act, and learn.
+
+### 1.1.3 Types of Agents
+
+Following the review of agent evolution above, this section will classify agents from three complementary dimensions.
+
+(1) **Classification Based on Internal Decision Architecture**
+
+The first classification dimension is based on the complexity of the agent's internal decision architecture. This perspective was systematically proposed in "Artificial Intelligence: A Modern Approach"<sup>[1]</sup>. As described in Section 1.1.1, the evolutionary path of traditional agents itself constitutes the most classic classification ladder, covering from simple **reactive** agents to **model-based** agents that introduce internal models, and then to more forward-looking **goal-based** and **utility-based** agents. Additionally, **learning capability** is a meta-capability that can be endowed to all the above types, enabling them to self-improve through experience.
+
+(2) **Classification Based on Time and Reactivity**
+
+In addition to the complexity of internal architecture, agents can also be classified from the time dimension of decision-making processing. This perspective focuses on whether an agent acts immediately after receiving information or acts after deliberate planning. This reveals a core trade-off in agent design: the balance between **Reactivity**, which pursues speed, and **Deliberation**, which pursues optimal solutions, as shown in Figure 1.3.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-3.png" alt="Figure description" width="90%"/>
+  <p>Figure 1.3 Relationship between agent decision time and quality</p>
+</div>
+
+- **Reactive Agents**
+
+This type of agent makes nearly instantaneous responses to environmental stimuli with extremely low decision latency. They typically follow a direct mapping from perception to action, with no or minimal future planning. The **simple reactive** and **model-based** agents mentioned above belong to this category.
+
+Their core advantage lies in **fast speed and low computational overhead**, which is crucial in dynamic environments requiring rapid decision-making. For example, a vehicle's airbag system must react within milliseconds of a collision—any delay could lead to serious consequences; similarly, high-frequency trading robots must rely on reactive decision-making to capture fleeting market opportunities. However, the cost of this speed is "short-sightedness." Due to lack of long-term planning, reactive agents easily fall into local optima and struggle to complete complex tasks requiring multi-step coordination.
+
+- **Deliberative Agents**
+
+In contrast to reactive agents, deliberative (or planning) agents engage in complex thinking and planning before acting. They do not immediately react to perceptions but first use their internal world model to systematically explore various future possibilities, evaluate the consequences of different action sequences, in hopes of finding an optimal path to achieve goals. **Goal-based** and **utility-based** agents are typical deliberative agents.
+
+Their decision-making process can be likened to a chess player. They don't just look at the immediate move but anticipate possible opponent responses and plan out subsequent moves, even dozens of moves ahead. This deliberative capability enables them to handle complex tasks requiring long-term vision, such as formulating a business plan or planning a long-distance trip. Their advantage lies in the strategic nature and foresight of their decisions. However, the flip side of this advantage is high time and computational costs. In rapidly changing environments, when a deliberative agent is still deep in thought, the best moment to act may have long passed.
+
+- **Hybrid Agents**
+
+Complex tasks in the real world often require both immediate reactions and long-term planning. For example, the intelligent travel assistant we mentioned earlier needs to adjust recommendations based on user's immediate feedback (such as "this hotel is too expensive") (reactivity), while also being able to plan a complete multi-day travel itinerary (deliberation). Therefore, hybrid agents emerged, aiming to combine the advantages of both and achieve a balance between reaction and planning.
+
+A classic hybrid architecture is hierarchical design: the lower layer is a fast reactive module that handles emergencies and basic actions; the upper layer is a deliberative planning module responsible for formulating long-term goals. Modern LLM agents demonstrate a more flexible hybrid mode. They typically operate in a "think-act-observe" loop, cleverly integrating both modes:
+
+- **Reasoning**: In the "thinking" phase, the LLM analyzes the current situation and plans the next reasonable action. This is a deliberative process.
+- **Acting & Observing**: In the "acting" and "observing" phases, the agent interacts with external tools or the environment and immediately receives feedback. This is a reactive process.
+
+Through this approach, the agent decomposes a grand task requiring long-term planning into a series of "planning-reaction" micro-loops. This enables it to flexibly respond to immediate environmental changes while ultimately completing complex long-term goals through coherent steps.
+
+**(3) Classification Based on Knowledge Representation**
+
+This is a more fundamental classification dimension that explores what form the knowledge used by agents for decision-making exists in their "minds." This question is at the core of a debate that has lasted more than half a century in the field of artificial intelligence and has shaped two distinctly different AI cultures.
+
+- **Symbolic AI**
+
+Symbolism, often called traditional artificial intelligence, has a core belief: intelligence stems from logical operations on symbols. The symbols here are human-readable entities (such as words, concepts), and operations follow strict logical rules, as shown on the left side of Figure 1.4. This is like a meticulous librarian organizing world knowledge into clear rule bases and knowledge graphs.
+
+Its main advantage lies in transparency and interpretability. Since reasoning steps are explicit, its decision-making process can be fully traced, which is crucial in high-risk fields such as finance and healthcare. However, its "Achilles' heel" lies in fragility: it relies on a complete rule system, but in the real world full of ambiguity and exceptions, any new situation not covered can lead to system failure, which is the so-called "knowledge acquisition bottleneck."
+
+- **Sub-symbolic AI**
+
+Sub-symbolism, or connectionism, provides a completely different picture. Here, knowledge is not explicit rules but implicitly distributed in a complex network composed of numerous neurons, representing statistical patterns learned from massive data. Neural networks and deep learning are its representatives.
+
+As shown in the middle of Figure 1.4, if symbolic AI is a librarian, then sub-symbolic AI is like a babbling child. They don't learn to recognize cats by learning rules like "cats have four legs, are furry, and meow," but after seeing thousands of cat pictures, the neural network in their brain can identify the visual pattern of the concept "cat." The power of this approach lies in its pattern recognition capability and robustness to noisy data. It can easily handle unstructured data such as images and sounds, which are extremely difficult tasks for symbolic AI.
+
+However, this powerful intuitive capability also comes with opacity. Sub-symbolic systems are typically viewed as a **Black Box**. It can identify a cat in a picture with amazing accuracy, but if you ask it "why do you think this is a cat?", it likely cannot provide a logically sound explanation. Additionally, it performs poorly on pure logical reasoning tasks and sometimes produces hallucinations that seem reasonable but are factually incorrect.
+
+- **Neuro-Symbolic AI**
+
+For a long time, the two camps of symbolism and sub-symbolism developed like two parallel lines. To overcome the limitations of the above two paradigms, a "grand reconciliation" idea began to emerge, which is neuro-symbolic AI, also called neuro-symbolic hybrid. Its goal is to merge the advantages of both paradigms, creating a hybrid agent that can both learn from data like neural networks and perform logical reasoning like symbolic systems. It attempts to bridge the gap between perception and cognition, intuition and rationality. Nobel Prize-winning economist Daniel Kahneman's dual-system theory proposed in his book "Thinking, Fast and Slow" provides an excellent analogy for understanding neuro-symbolism<sup>[2]</sup>, as shown in Figure 1.4:
+
+- **System 1** is a fast, intuitive, parallel thinking mode, similar to the powerful pattern recognition capability of sub-symbolic AI.
+- **System 2** is slow, methodical, logic-based deliberative thinking, just like the reasoning process of symbolic AI.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-4.png" alt="Figure description" width="90%"/>
+  <p>Figure 1.4 Knowledge representation paradigms of symbolism, sub-symbolism, and neuro-symbolic hybrid</p>
+</div>
+
+Human intelligence stems from the collaborative work of these two systems. Similarly, a truly robust AI also needs to combine the strengths of both. Large language model-driven agents are an excellent practical example of neuro-symbolism. Its core is a huge neural network, giving it pattern recognition and language generation capabilities. However, when it works, it generates a series of structured intermediate steps, such as thoughts, plans, or API calls, which are all explicit, operable symbols. Through this approach, it achieves a preliminary fusion of perception and cognition, intuition and rationality.
+
+## 1.2 Composition and Operating Principles of Agents
+
+### 1.2.1 Task Environment Definition
+
+To understand how an agent operates, we must first understand the **task environment** in which it operates. In the field of artificial intelligence, the **PEAS model** is typically used to precisely describe a task environment, analyzing its **Performance measure, Environment, Actuators, and Sensors**. Taking the intelligent travel assistant mentioned above as an example, Table 1.2 below shows how to use the PEAS model to specify its task environment.
+
+<div align="center">
+  <p>Table 1.2 PEAS description of intelligent travel assistant</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-6.png" alt="Figure description" width="90%"/>
+</div>
+
+In practice, the digital environment in which LLM agents operate exhibits several complex characteristics that directly affect agent design.
+
+First, the environment is typically **partially observable**. For example, when a travel assistant queries flights, it cannot obtain all real-time seat information from all airlines at once. It can only see partial data returned by the flight booking API it calls, which requires the agent to have memory (remembering queried routes) and exploration (trying different query dates) capabilities.
+
+Second, the results of actions are not always deterministic. Based on the predictability of results, environments can be divided into **deterministic** and **stochastic**. The task environment of a travel assistant is a typical stochastic environment. When it searches for ticket prices, two adjacent calls may return different ticket prices and remaining seat numbers, requiring the agent to have the ability to handle uncertainty, monitor changes, and make timely decisions.
+
+Additionally, there may be other actors in the environment, forming a **multi-agent** environment. For a travel assistant, other users' booking behaviors, other automated scripts, and even airlines' dynamic pricing systems are all other "agents" in the environment. Their actions (for example, booking the last discounted ticket) directly change the state of the environment in which the travel assistant operates, placing higher demands on the agent's rapid response and strategy selection.
+
+Finally, almost all tasks occur in **sequential** and **dynamic** environments. "Sequential" means current actions affect the future; while "dynamic" means the environment itself may change while the agent is making decisions. This requires the agent's "perceive-think-act-observe" loop to be able to quickly and flexibly adapt to a continuously changing world.
+
+### 1.2.2 Agent Operating Mechanism
+
+After defining the task environment in which an agent operates, let's explore its core operating mechanism. An agent does not complete tasks in one go but interacts with the environment through a continuous loop. This core mechanism is called the **Agent Loop**. As shown in Figure 1.5, this loop describes the dynamic interaction process between the agent and the environment, forming the foundation of its autonomous behavior.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-5.png" alt="Figure description" width="90%"/>
+  <p>Figure 1.5 Basic loop of agent-environment interaction</p>
+</div>
+
+This loop mainly contains the following interconnected stages:
+
+1. **Perception**: This is the starting point of the loop. The agent receives input information from the environment through its sensors (for example, API listening ports, user input interfaces). This information, i.e., **Observation**, can be either the user's initial instruction or feedback on environmental state changes caused by the previous action.
+2. **Thought**: After receiving observation information, the agent enters its core decision-making stage. For LLM agents, this is typically an internal reasoning process driven by large language models. As shown in the figure, the "thought" stage can be further subdivided into two key links:
+   - **Planning**: Based on current observations and its internal memory, the agent updates its understanding of the task and environment and formulates or adjusts an action plan. This may involve decomposing complex goals into a series of more specific subtasks.
+   - **Tool Selection**: Based on the current plan, the agent selects the most suitable tool from its available tool library to execute the next step and determines the specific parameters needed to call that tool.
+3. **Action**: After decision-making is complete, the agent executes specific actions through its actuators. This typically manifests as calling a selected tool (such as a code interpreter or search engine API), thereby influencing the environment with the intent to change its state.
+
+Action is not the end of the loop. The agent's action causes a **state change** in the **environment**, which then produces a new **observation** as result feedback. This new observation is captured by the agent's perception system in the next round of the loop, forming a continuous "perceive-think-act-observe" closed loop. It is through continuously repeating this loop that the agent gradually advances the task, evolving from the initial state toward the goal state.
+
+### 1.2.3 Agent Perception and Action
+
+In engineering practice, to enable LLMs to effectively drive this loop, we need a clear **Interaction Protocol** to regulate information exchange between it and the environment.
+
+In many modern agent frameworks, this protocol is embodied in the structured definition of each agent output. The agent's output is no longer a single natural language response but a piece of text following a specific format that explicitly shows its internal reasoning process and final decision.
+
+This structure typically contains two core parts:
+
+- **Thought**: This is a "snapshot" of the agent's internal decision-making. It articulates in natural language how the agent analyzes the current situation, reviews the observation results from the previous step, engages in self-reflection and problem decomposition, and ultimately plans the next specific action.
+- **Action**: This is the specific operation the agent decides to impose on the environment based on its thinking, typically expressed as a function call.
+
+For example, an agent planning a trip might generate the following formatted output:
+
+```Bash
+Thought: The user wants to know the weather in Beijing. I need to call the weather query tool.
+Action: get_weather("Beijing")
+```
+
+The `Action` field here constitutes an instruction to the external world. An external **Parser** will capture this instruction and call the corresponding `get_weather` function.
+
+After the action is executed, the environment returns a result. For example, the `get_weather` function might return a JSON object containing detailed weather data. However, raw machine-readable data (such as JSON) typically contains redundant information that the LLM doesn't need to focus on, and the format doesn't conform to its natural language processing habits.
+
+Therefore, an important responsibility of the perception system is to play the role of a sensor: processing and encapsulating this raw output into concise, clear natural language text, i.e., observation.
+
+```Bash
+Observation: Beijing's current weather is sunny, temperature 25 degrees Celsius, light breeze.
+```
+
+This `Observation` text is fed back to the agent as the main input information for the next round of the loop, for it to conduct a new round of `Thought` and `Action`.
+
+In summary, through this rigorous loop composed of Thought, Action, and Observation, LLM agents can effectively combine their internal language reasoning capabilities with real information and tool operation capabilities from the external environment.
+
+## 1.3 Hands-on Experience: Implementing Your First Agent in 5 Minutes
+
+In the previous sections, we learned about the agent's task environment, core operating mechanism, and the `Thought-Action-Observation` interaction paradigm. While theoretical knowledge is important, the best way to learn is through hands-on practice. In this section, we will guide you to build a working intelligent travel assistant from scratch using a few simple lines of Python code. This process will follow the theoretical loop we just learned, allowing you to intuitively experience how an agent "thinks" and interacts with external "tools." Let's get started!
+
+In this case, our goal is to build an intelligent travel assistant that can handle step-by-step tasks. The user task to be solved is defined as: "Hello, please help me check today's weather in Beijing, and then recommend a suitable tourist attraction based on the weather." To complete this task, the agent must demonstrate clear logical planning capabilities. It needs to first call the weather query tool and use the obtained observation results as the basis for the next step. In the next round of the loop, it then calls the attraction recommendation tool to arrive at the final suggestion.
+
+### 1.3.1 Preparation
+
+To access web APIs from a Python program, we need an HTTP library. `requests` is the most popular and easy-to-use choice in the Python community. `tavily-python` is a powerful AI search API client for obtaining real-time web search results, which can be obtained by registering on the [official website](https://www.tavily.com/). `openai` is the official Python SDK provided by OpenAI for calling large language model services such as GPT. Please install them first with the following command:
+
+```bash
+pip install requests tavily-python openai
+```
+
+(1) Instruction Template
+
+The key to driving a real LLM lies in **Prompt Engineering**. We need to design an "instruction template" that tells the LLM what role it should play, what tools it has, and how to format its thinking and actions. This is the "manual" for our agent, which will be passed to the LLM as `system_prompt`.
+
+```
+AGENT_SYSTEM_PROMPT = """
+You are an intelligent travel assistant. Your task is to analyze user requests and use available tools to solve problems step by step.
+
+# Available Tools:
+- `get_weather(city: str)`: Query real-time weather for a specified city.
+- `get_attraction(city: str, weather: str)`: Search for recommended tourist attractions based on city and weather.
+
+# Action Format:
+Your response must strictly follow the following format. First is your thinking process, then the specific action you want to execute.
+Thought: [Here is your thinking process and next step plan]
+Action: [Here is the tool you want to call, in the format function_name(arg_name="arg_value")]
+
+# Task Completion:
+When you have collected enough information to answer the user's final question, you must use `finish(answer="...")` after the Action: field to output the final answer.
+
+Let's begin!
+"""
+```
+
+(2) Tool 1: Query Real Weather
+
+We will use the free weather query service `wttr.in`, which can return weather data for a specified city in JSON format. Here is the code to implement this tool:
+
+```python
+import requests
+import json
+
+def get_weather(city: str) -> str:
+    """
+    Query real weather information by calling the wttr.in API.
+    """
+    # API endpoint, we request data in JSON format
+    url = f"https://wttr.in/{city}?format=j1"
+
+    try:
+        # Make network request
+        response = requests.get(url)
+        # Check if response status code is 200 (success)
+        response.raise_for_status()
+        # Parse returned JSON data
+        data = response.json()
+
+        # Extract current weather conditions
+        current_condition = data['current_condition'][0]
+        weather_desc = current_condition['weatherDesc'][0]['value']
+        temp_c = current_condition['temp_C']
+
+        # Format as natural language return
+        return f"{city} current weather: {weather_desc}, temperature {temp_c} degrees Celsius"
+
+    except requests.exceptions.RequestException as e:
+        # Handle network errors
+        return f"Error: Network problem encountered when querying weather - {e}"
+    except (KeyError, IndexError) as e:
+        # Handle data parsing errors
+        return f"Error: Failed to parse weather data, city name may be invalid - {e}"
+```
+
+(3) Tool 2: Search and Recommend Tourist Attractions
+
+We will define a new tool `search_attraction` that searches the internet for suitable attractions based on city and weather conditions:
+
+```python
+import os
+from tavily import TavilyClient
+
+def get_attraction(city: str, weather: str) -> str:
+    """
+    Based on city and weather, use Tavily Search API to search and return optimized attraction recommendations.
+    """
+    # 1. Read API key from environment variable
+    api_key = os.environ.get("TAVILY_API_KEY")
+    if not api_key:
+        return "Error: TAVILY_API_KEY environment variable not configured."
+
+    # 2. Initialize Tavily client
+    tavily = TavilyClient(api_key=api_key)
+
+    # 3. Construct a precise query
+    query = f"'{city}' most worthwhile tourist attractions and reasons in '{weather}' weather"
+
+    try:
+        # 4. Call API, include_answer=True will return a comprehensive answer
+        response = tavily.search(query=query, search_depth="basic", include_answer=True)
+
+        # 5. Tavily's returned results are already very clean and can be used directly
+        # response['answer'] is a summary answer based on all search results
+        if response.get("answer"):
+            return response["answer"]
+
+        # If there's no comprehensive answer, format raw results
+        formatted_results = []
+        for result in response.get("results", []):
+            formatted_results.append(f"- {result['title']}: {result['content']}")
+
+        if not formatted_results:
+             return "Sorry, no relevant tourist attraction recommendations found."
+
+        return "Based on search, found the following information for you:\n" + "\n".join(formatted_results)
+
+    except Exception as e:
+        return f"Error: Problem occurred when executing Tavily search - {e}"
+```
+
+Finally, we put all tool functions into a dictionary for the main loop to call:
+
+```python
+# Put all tool functions into a dictionary for easy subsequent calling
+available_tools = {
+    "get_weather": get_weather,
+    "get_attraction": get_attraction,
+}
+```
+
+### 1.3.2 Connecting to Large Language Models
+
+Currently, many LLM service providers (including OpenAI, Azure, and numerous open-source model service frameworks such as Ollama, vLLM, etc.) follow interface specifications similar to the OpenAI API. This standardization brings great convenience to developers. The agent's autonomous decision-making capability comes from the LLM. We will implement a universal client `OpenAICompatibleClient` that can connect to any LLM service compatible with the OpenAI interface specification.
+
+```python
+from openai import OpenAI
+
+class OpenAICompatibleClient:
+    """
+    A client for calling any LLM service compatible with the OpenAI interface.
+    """
+    def __init__(self, model: str, api_key: str, base_url: str):
+        self.model = model
+        self.client = OpenAI(api_key=api_key, base_url=base_url)
+
+    def generate(self, prompt: str, system_prompt: str) -> str:
+        """Call LLM API to generate response."""
+        print("Calling large language model...")
+        try:
+            messages = [
+                {'role': 'system', 'content': system_prompt},
+                {'role': 'user', 'content': prompt}
+            ]
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                stream=False
+            )
+            answer = response.choices[0].message.content
+            print("Large language model responded successfully.")
+            return answer
+        except Exception as e:
+            print(f"Error occurred when calling LLM API: {e}")
+            return "Error: Error occurred when calling language model service."
+```
+
+To instantiate this class, you need to provide three pieces of information: `API_KEY`, `BASE_URL`, and `MODEL_ID`. The specific values depend on the service provider you use (such as OpenAI official, Azure, or local models like Ollama). If you don't have access to these yet, you can refer to [1.2 API Setup](https://datawhalechina.github.io/handy-multi-agent/#/chapter1/1.2.api-setup) in another Datawhale tutorial.
+
+### 1.3.3 Executing the Action Loop
+
+The main loop below will integrate all components and drive the LLM to make decisions through formatted prompts.
+
+```python
+import re
+
+# --- 1. Configure LLM client ---
+# Please replace this with the corresponding credentials and address for the service you use
+API_KEY = "YOUR_API_KEY"
+BASE_URL = "YOUR_BASE_URL"
+MODEL_ID = "YOUR_MODEL_ID"
+TAVILY_API_KEY="YOUR_Tavily_KEY"
+os.environ['TAVILY_API_KEY'] = "YOUR_TAVILY_API_KEY"
+
+llm = OpenAICompatibleClient(
+    model=MODEL_ID,
+    api_key=API_KEY,
+    base_url=BASE_URL
+)
+
+# --- 2. Initialize ---
+user_prompt = "Hello, please help me check today's weather in Beijing, and then recommend a suitable tourist attraction based on the weather."
+prompt_history = [f"User request: {user_prompt}"]
+
+print(f"User input: {user_prompt}\n" + "="*40)
+
+# --- 3. Run main loop ---
+for i in range(5): # Set maximum number of loops
+    print(f"--- Loop {i+1} ---\n")
+
+    # 3.1. Build Prompt
+    full_prompt = "\n".join(prompt_history)
+
+    # 3.2. Call LLM for thinking
+    llm_output = llm.generate(full_prompt, system_prompt=AGENT_SYSTEM_PROMPT)
+    print(f"Model output:\n{llm_output}\n")
+    prompt_history.append(llm_output)
+
+    # 3.3. Parse and execute action
+    action_match = re.search(r"Action: (.*)", llm_output, re.DOTALL)
+    if not action_match:
+        print("Parse error: Action not found in model output.")
+        break
+    action_str = action_match.group(1).strip()
+
+    if action_str.startswith("finish"):
+        final_answer = re.search(r'finish\(answer="(.*)"\)', action_str).group(1)
+        print(f"Task completed, final answer: {final_answer}")
+        break
+
+    tool_name = re.search(r"(\w+)\(", action_str).group(1)
+    args_str = re.search(r"\((.*)\)", action_str).group(1)
+    kwargs = dict(re.findall(r'(\w+)="([^"]*)"', args_str))
+
+    if tool_name in available_tools:
+        observation = available_tools[tool_name](**kwargs)
+    else:
+        observation = f"Error: Undefined tool '{tool_name}'"
+
+    # 3.4. Record observation results
+    observation_str = f"Observation: {observation}"
+    print(f"{observation_str}\n" + "="*40)
+    prompt_history.append(observation_str)
+```
+
+Through the above steps, we have built a complete agent driven by a real LLM. Its core lies in the combination of "tools" and "prompt engineering," which is precisely the design essence of current mainstream agent frameworks (such as LangChain, LlamaIndex, etc.).
+
+### 1.3.4 Running Case Analysis
+
+The following output fully demonstrates a successful agent execution process. Through analysis of this three-round loop, we can clearly see the core capabilities of the agent in solving problems.
+
+```bash
+User input: Hello, please help me check today's weather in Beijing, and then recommend a suitable tourist attraction based on the weather.
+========================================
+--- Loop 1 ---
+
+Calling large language model...
+Large language model responded successfully.
+Model output:
+Thought: First need to get Beijing's weather today, then recommend tourist attractions based on the weather.
+Action: get_weather(city="Beijing")
+
+Observation: Beijing current weather: Sunny, temperature 26 degrees Celsius
+========================================
+--- Loop 2 ---
+
+Calling large language model...
+Large language model responded successfully.
+Model output:
+Thought: Now I know Beijing's weather today is sunny with moderate temperature, I can recommend a suitable tourist attraction based on this information.
+Action: get_attraction(city="Beijing", weather="Sunny")
+
+Observation: The most worthwhile tourist attractions in Beijing on sunny days are the Summer Palace for its beautiful lake views and ancient architecture. Another recommendation is the Great Wall for its spectacular scenery and historical significance.
+========================================
+--- Loop 3 ---
+
+Calling large language model...
+Large language model responded successfully.
+Model output:
+Thought: I have obtained two attraction suggestions suitable for sunny days, now I can provide a satisfactory response to the user based on this information.
+Action: finish(answer="Today's weather in Beijing is sunny with a temperature of 26 degrees Celsius, very suitable for outdoor activities. I recommend you visit the Summer Palace to enjoy the beautiful lake views and ancient architecture, or go to the Great Wall to experience its spectacular scenery and profound historical significance. Hope you have a pleasant trip!")
+
+Task completed, final answer: Today's weather in Beijing is sunny with a temperature of 26 degrees Celsius, very suitable for outdoor activities. I recommend you visit the Summer Palace to enjoy the beautiful lake views and ancient architecture, or go to the Great Wall to experience its spectacular scenery and profound historical significance. Hope you have a pleasant trip!
+```
+
+This simple travel assistant case concentrates on demonstrating the four basic capabilities of an agent based on the `Thought-Action-Observation` paradigm: task decomposition, tool invocation, context understanding, and result synthesis. It is through the continuous iteration of this loop that the agent can transform a vague user intent into a series of specific, executable steps and ultimately achieve the goal.
+
+## 1.4 Collaboration Modes of Agent Applications
+
+In the previous section, we gained a deep understanding of the internal operating loop of an agent by building one ourselves. However, in broader application scenarios, our role is increasingly transforming into users and collaborators. Based on the agent's role in tasks and degree of autonomy, its collaboration modes are mainly divided into two types: one is as an efficient tool deeply integrated into our workflow; the other is as an autonomous collaborator working with other agents to complete complex goals.
+
+### 1.4.1 Agents as Developer Tools
+
+In this mode, agents are deeply integrated into developers' workflows as powerful auxiliary tools. They enhance rather than replace the developer's role, automating tedious, repetitive tasks so developers can focus more on creative core work. This human-machine collaboration approach greatly improves the efficiency and quality of software development.
+
+Currently, the market has seen the emergence of multiple excellent AI programming assistance tools. While they all improve development efficiency, they differ in implementation paths and functional focus:
+
+- **GitHub Copilot**: As one of the most influential products in this field, Copilot was jointly developed by GitHub and OpenAI. It is deeply integrated into mainstream editors such as Visual Studio Code and is renowned for its powerful code auto-completion capabilities. When developers write code, Copilot can provide real-time suggestions for entire lines or even entire function blocks. In recent years, it has also expanded conversational programming capabilities through Copilot Chat, allowing developers to solve programming problems through chat within the editor.
+- **Claude Code**: Claude Code is an AI programming assistant developed by Anthropic, designed to help developers efficiently complete coding tasks in the terminal through natural language instructions. It can understand complete codebase structures, perform operations such as code editing, testing, and debugging, and supports full-process development from describing functionality to code implementation. Claude Code also provides a headless mode suitable for CI, pre-commit hooks, build scripts, and other automation scenarios, providing developers with a powerful command-line programming experience.
+- **Trae**: As an emerging AI programming tool, Trae focuses on providing developers with intelligent code generation and optimization services. It analyzes code patterns through deep learning technology and can provide developers with precise code suggestions and automated refactoring solutions. Trae's distinctive feature is its lightweight design and fast response capability, particularly suitable for scenarios requiring frequent iteration and rapid prototyping.
+- **Cursor**: Unlike the above tools that mainly exist as plugins or integrated features, Cursor has chosen a more integrated path—it is itself an AI-native code editor. Rather than adding AI functionality to existing editors, it made AI interaction a core feature from the design stage. In addition to top-tier code generation and chat capabilities, it emphasizes letting AI understand the context of the entire codebase, thereby achieving deeper Q&A, refactoring, and debugging.
+
+Of course, there are many other excellent tools not listed here, but they all point to a clear trend: AI is deeply integrating into the entire software development lifecycle, profoundly reshaping the efficiency boundaries and development paradigms of software engineering by building efficient human-machine collaborative workflows.
+
+### 1.4.2 Agents as Autonomous Collaborators
+
+Unlike serving as tools to assist humans, the second interaction mode elevates the automation level of agents to an entirely new level: autonomous collaborators. In this mode, we no longer guide AI step-by-step through every action but delegate a high-level goal to it. The agent, like a true project team member, independently plans, reasons, executes, and reflects until finally delivering results. This transformation from assistant to collaborator has brought LLM agents deeper into public view. It marks the evolution of our relationship with AI from "command-execute" to "goal-delegate." Agents are no longer passive tools but active goal pursuers.
+
+Currently, approaches to achieving this autonomous collaboration are flourishing, with numerous excellent frameworks and products emerging, from early BabyAGI and AutoGPT to now more mature frameworks like CrewAI, AutoGen, MetaGPT, and LangGraph, collectively driving rapid development in this field. Although specific implementations vary greatly, their architectural paradigms can be roughly summarized into several mainstream directions:
+
+1. **Single-Agent Autonomous Loop**: This is an early typical paradigm, represented by models like **AgentGPT**. Its core is a general agent that continuously self-prompts and iterates through a "think-plan-execute-reflect" closed loop to complete an open-ended high-level goal.
+2. **Multi-Agent Collaboration**: This is currently the most mainstream exploration direction, aiming to solve complex problems by simulating human team collaboration modes. It can be further subdivided into different modes: **Role-Playing Dialogue**: Like the **CAMEL** framework, which assigns clear roles and communication protocols to two agents (for example, "programmer" and "product manager"), allowing them to collaboratively complete tasks in a structured dialogue. **Organized Workflow**: Like **MetaGPT** and **CrewAI**, which simulate a "virtual team" with clear division of labor (such as a software company or consulting group). Each agent has preset responsibilities and workflows (SOPs), collaborating in a hierarchical or sequential manner to produce high-quality complex outputs (such as complete codebases or research reports). **AutoGen** and **AgentScope** provide more flexible dialogue modes, allowing developers to customize complex interaction networks between agents.
+3. **Advanced Control Flow Architecture**: Frameworks such as **LangGraph** focus more on providing agents with more powerful underlying engineering foundations. They model the agent's execution process as a state graph, enabling more flexible and reliable implementation of complex processes such as loops, branches, backtracking, and human intervention.
+
+These different architectural paradigms collectively drive autonomous agents from theoretical concepts toward broader practical applications, enabling them to handle increasingly complex real-world tasks. In our subsequent chapters, we will also experience the differences and advantages between different types of frameworks.
+
+### 1.4.3 Differences Between Workflow and Agent
+
+After understanding the two modes of agents as "tools" and "collaborators," it is necessary to discuss the differences between Workflow and Agent. Although both aim to achieve task automation, their underlying logic, core characteristics, and applicable scenarios are fundamentally different.
+
+Simply put, **Workflow makes AI execute instructions step by step, while Agent gives AI freedom to autonomously achieve goals.**
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-18.png" alt="Figure description" width="90%"/>
+  <p>Figure 1.6 Differences between Workflow and Agent</p>
+</div>
+
+As shown in Figure 1.6, workflow is a traditional automation paradigm whose core is **pre-defined, structured orchestration of a series of tasks or steps**. It is essentially a precise, static flowchart that specifies which operations to execute under what conditions and in what order. A typical case: a company's expense reimbursement approval process. Employee submits reimbursement form (trigger) -> If amount is less than 500 yuan, directly approved by department manager -> If amount is greater than 500 yuan, first approved by department manager, then forwarded to CFO for approval -> After approval, notify finance department to make payment. Every step and every judgment condition of the entire process is precisely preset.
+
+Unlike workflows, agents based on large language models are **autonomous, goal-oriented systems**. They not only execute preset instructions but can also understand the environment to a certain extent, reason, formulate plans, and dynamically take actions to achieve final goals. LLMs play the role of the "brain" in this process. A typical example is the intelligent travel assistant we wrote in Section 1.3. When we give it a new instruction, for example: **"Hello, please help me check today's weather in Beijing, and then recommend a suitable tourist attraction based on the weather."** Its processing fully demonstrates its autonomy:
+
+1. **Planning and Tool Invocation:** The agent first breaks down the task into two steps: ① Query weather; ② Recommend attractions based on weather. Then, it autonomously selects and calls the "weather query API," passing "Beijing" as a parameter.
+2. **Reasoning and Decision-Making:** Suppose the API returns "sunny, light breeze." The agent's LLM brain will reason based on this information: "Sunny days are suitable for outdoor activities." Then, based on this judgment, it will filter outdoor attractions in Beijing from its knowledge base or through search engine tools, such as the Forbidden City, Summer Palace, Temple of Heaven Park, etc.
+3. **Generate Results:** Finally, the agent will synthesize the information and provide a complete, humanized answer: "Today's weather in Beijing is sunny with a light breeze, very suitable for outdoor activities. I recommend you visit the Summer Palace, where you can boat on Kunming Lake and enjoy the beautiful royal garden scenery."
+
+In this process, there are no hard-coded rules like `if weather=sunny then recommend Summer Palace`. If the weather is "rainy," the agent will autonomously reason and recommend indoor venues such as the National Museum or Capital Museum. **This ability to dynamically reason and make decisions based on real-time information is the core value of agents.**
+
+## 1.4 Chapter Summary
+
+In this chapter, we embarked on an introductory journey to explore agents. Our journey began with the most fundamental questions:
+
+- **What are large language model-driven agents?** We first clarified their definition and understood that modern agents are entities with capabilities. They are no longer just scripts executing preset programs but decision-makers capable of autonomous reasoning and tool use.
+- **How do agents work?** We delved into the operating mechanism of agent-environment interaction. We learned that this continuous closed loop is the foundation for agents to process information, make decisions, influence the environment, and adjust their behavior based on feedback.
+- **How to build an agent?** This was the practical core of this chapter. Using the "intelligent travel assistant" as an example, we built a complete agent driven by a real LLM.
+- **What are the mainstream application paradigms of agents?** Finally, we cast our vision toward broader application domains. We explored two mainstream agent interaction modes: one is "developer tools" represented by GitHub Copilot and Cursor that enhance human workflows; the other is "autonomous collaborators" represented by frameworks like CrewAI, MetaGPT, and AgentScope that can independently complete high-level goals. We also explained the differences between Workflow and Agent.
+
+Through this chapter's learning, we have established a foundational cognitive framework about agents. So, how did it evolve step by step from its initial conception to the present? In the next chapter, we will explore the development history of agents—a journey to trace back to the origins is about to begin!
+
+## Exercises
+
+> **Note**: Some of the following exercises do not have standard answers. The focus is on cultivating learners' critical in-depth thinking and hands-on practical abilities regarding agent systems.
+
+1. Please analyze whether the **subject** in the following four `cases` qualifies as an agent. If so, what type of agent does it belong to (can be analyzed from multiple classification dimensions), and explain your reasoning:
+
+   `Case A`: **A supercomputer conforming to von Neumann architecture**, with peak computing power of up to 2 EFlops per second
+
+   `Case B`: **Tesla's autonomous driving system** is driving on a highway when it suddenly detects an obstacle ahead and needs to make a braking or lane-change decision within milliseconds
+
+   `Case C`: **AlphaGo** is playing against a human player and needs to evaluate the current situation and plan the optimal strategy for dozens of moves ahead
+
+   `Case D`: **ChatGPT acting as an intelligent customer service** is handling a user complaint and needs to query order information, analyze the problem cause, provide solutions, and soothe user emotions
+
+2. Suppose you need to design a task environment for an "intelligent fitness coach." This agent can:
+   - Monitor users' physiological data such as heart rate and exercise intensity through wearable devices
+   - Dynamically adjust training plans based on users' fitness goals (fat loss/muscle gain/endurance improvement)
+   - Provide real-time voice guidance and motion correction during user exercise
+   - Evaluate training effectiveness and provide dietary recommendations
+
+   Please use the PEAS model to completely describe this agent's task environment and analyze what characteristics this environment has (such as partially observable, stochastic, dynamic, etc.).
+
+3. An e-commerce company is considering two approaches to handle after-sales refund requests:
+
+   Approach A (`Workflow`): Design a fixed process, for example:
+
+   A.1 For general products within 7 days, amounts `< 100 RMB` are automatically approved; `100-500 RMB` are reviewed by customer service; `> 500 RMB` require supervisor approval; special products (such as customized items) are always rejected
+
+   A.2 For products beyond 7 days, regardless of amount, they can only be reviewed by customer service or approved by supervisors;
+
+   Approach B (`Agent`): Build an agent system that understands refund policies, analyzes user historical behavior, evaluates product conditions, and autonomously decides whether to approve refunds
+
+   Please analyze:
+   - What are the advantages and disadvantages of these two approaches?
+   - Under what circumstances is `Workflow` more suitable? When does `Agent` have advantages? If you were the head of this e-commerce company, which approach would you prefer?
+   - Is there an Approach C that can combine both approaches to achieve complementary strengths?
+
+4. Based on the intelligent travel assistant in Section 1.3, please consider how to add the following features (you can just describe the design ideas or further attempt code implementation):
+
+   > **Hint**: Think about how to modify the `Thought-Action-Observation` loop to implement these features.
+
+   - Add a "memory" feature that allows the agent to remember user preferences (such as liking historical and cultural attractions, budget range, etc.)
+   - When recommended attraction tickets are sold out, the agent can automatically recommend alternative options
+   - If the user consecutively rejects 3 recommendations, the agent can reflect and adjust its recommendation strategy
+
+5. Kahneman's "System 1" (fast intuition) and "System 2" (slow reasoning) theory<sup>[2]</sup> provides a good analogy for neuro-symbolic AI. Please first conceive a specific agent application scenario, then explain in the scenario:
+
+   > **Hint**: Medical diagnosis assistants, legal consulting robots, financial risk control systems, etc., are all common application scenarios
+
+   - Which tasks should be handled by "System 1"?
+   - Which tasks should be handled by "System 2"?
+   - How do these two systems work together to achieve the final goal?
+
+6. Although large language model-driven agent systems demonstrate powerful capabilities, they still have many limitations. Please analyze the following questions:
+   - Why do agents or agent systems sometimes produce "hallucinations" (generating seemingly reasonable but actually incorrect information)?
+   - In the case in Section 1.3, we set the maximum number of loops to 5. Without this limit, what problems might the agent encounter?
+   - How to evaluate an agent's "intelligence" level? Is using only accuracy metrics sufficient?
+
+## References
+
+[1] RUSSELL S, NORVIG P. Artificial Intelligence: A Modern Approach[M]. 4th ed. London: Pearson, 2020.
+
+[2] KAHNEMAN D. Thinking, Fast and Slow[M]. New York: Farrar, Straus and Giroux, 2011.
+

+ 87 - 83
docs/chapter1/第一章 初识智能体.md

@@ -1,6 +1,10 @@
+<div align="right">
+  <a href="./Chapter1-Introduction-to-Agents.md">English</a> | 中文
+</div>
+
 # 第一章 初识智能体
 
-欢迎来到智能体的世界!在人工智能浪潮席卷全球的今天,<strong>智能体(Agent)</strong>已成为驱动技术变革与应用创新的核心概念之一。无论你的志向是成为AI领域的研究者、工程师,还是希望深刻理解技术前沿的观察者,掌握智能体的本质,都将是你知识体系中不可或缺的一环。
+欢迎来到智能体的世界!在人工智能浪潮席卷全球的今天,<strong>智能体(Agent)</strong>已成为驱动技术变革与应用创新的核心概念之一。无论你的志向是成为 AI 领域的研究者、工程师,还是希望深刻理解技术前沿的观察者,掌握智能体的本质,都将是你知识体系中不可或缺的一环。
 
 因此,在本章,让我们回到原点,一起探讨几个问题:智能体是什么?它有哪些主要的类型?它又是如何与我们所处的世界进行交互的?通过这些讨论,希望能为你未来的学习和探索打下坚实的基础。
 
@@ -17,14 +21,14 @@
 
 获取信息后,智能体需要采取行动来对环境施加影响,它通过执行器来改变环境的状态。执行器可以是物理设备(如机械臂、方向盘)或虚拟工具(如执行一段代码、调用一个服务)。
 
-然而,真正赋予智能体"智能"的,是其<strong>自主性(Autonomy)</strong>。智能体并非只是被动响应外部刺激或严格执行预设指令的程序,它能够基于其感知和内部状态进行独立决策,以达成其设计目标。这种从感知到行动的闭环,构成了所有智能体行为的基础,如图1.1所示。
+然而,真正赋予智能体"智能"的,是其<strong>自主性(Autonomy)</strong>。智能体并非只是被动响应外部刺激或严格执行预设指令的程序,它能够基于其感知和内部状态进行独立决策,以达成其设计目标。这种从感知到行动的闭环,构成了所有智能体行为的基础,如图 1.1 所示。
 
 
 ### 1.1.1 传统视角下的智能体
 
 在当前<strong>大语言模型(Large Language Model, LLM)</strong>的热潮出现之前,人工智能的先驱们已经对“智能体”这一概念进行了数十年的探索与构建。这些如今我们称之为“传统智能体”的范式,并非单一的静态概念,而是经历了一条从简单到复杂、从被动反应到主动学习的清晰演进路线。
 
-这个演进的起点,是那些结构最简单的<strong>反射智能体(Simple Reflex Agent)</strong>。它们的决策核心由工程师明确设计的“条件-动作”规则构成,如图1.2所示。经典的自动恒温器便是如此:若传感器感知的室温高于设定值,则启动制冷系统。
+这个演进的起点,是那些结构最简单的<strong>反射智能体(Simple Reflex Agent)</strong>。它们的决策核心由工程师明确设计的“条件-动作”规则构成,如图 1.2 所示。经典的自动恒温器便是如此:若传感器感知的室温高于设定值,则启动制冷系统。
 
 这种智能体完全依赖于当前的感知输入,不具备记忆或预测能力。它像一种数字化的本能,可靠且高效,但也因此无法应对需要理解上下文的复杂任务。它的局限性引出了一个关键问题:如果环境的当前状态不足以作为决策的全部依据,智能体该怎么办?
 
@@ -35,7 +39,7 @@
 
 为了回答这个问题,研究者们引入了“状态”的概念,发展出<strong>基于模型的反射智能体(Model-Based Reflex Agent)</strong>。这类智能体拥有一个内部的<strong>世界模型(World Model)</strong>,用于追踪和理解环境中那些无法被直接感知的方面。它试图回答:“世界现在是什么样子的?”。例如,一辆在隧道中行驶的自动驾驶汽车,即便摄像头暂时无法感知到前方的车辆,它的内部模型依然会维持对那辆车存在、速度和预估位置的判断。这个内部模型让智能体拥有了初级的“记忆”,使其决策不再仅仅依赖于瞬时感知,而是基于一个更连贯、更完整的世界状态理解。
 
-然而,仅仅理解世界还不够,智能体需要有明确的目标。这促进了<strong>基于目标的智能体(Goal-Based Agent)</strong>的发展。与前两者不同,它的行为不再是被动地对环境做出反应,而是主动地、有预见性地选择能够导向某个特定未来状态的行动。这类智能体需要回答的问题是:“我应该做什么才能达成目标?”。经典的例子是GPS导航系统:你的目标是到达公司,智能体会基于地图数据(世界模型),通过搜索算法(如A*算法)来规划(Planning)出一条最优路径。这类智能体的核心能力体现在了对未来的考量与规划上。
+然而,仅仅理解世界还不够,智能体需要有明确的目标。这促进了<strong>基于目标的智能体(Goal-Based Agent)</strong>的发展。与前两者不同,它的行为不再是被动地对环境做出反应,而是主动地、有预见性地选择能够导向某个特定未来状态的行动。这类智能体需要回答的问题是:“我应该做什么才能达成目标?”。经典的例子是 GPS 导航系统:你的目标是到达公司,智能体会基于地图数据(世界模型),通过搜索算法(如 A*算法)来规划(Planning)出一条最优路径。这类智能体的核心能力体现在了对未来的考量与规划上。
 
 更进一步,现实世界的目标往往不是单一的。我们不仅希望到达公司,还希望时间最短、路程最省油并且避开拥堵。当多个目标需要权衡时,<strong>基于效用的智能体(Utility-Based Agent)</strong>便随之出现。它为每一个可能的世界状态都赋予一个效用值,这个值代表了满意度的高低。智能体的核心目标不再是简单地达成某个特定状态,而是最大化期望效用。它需要回答一个更复杂的问题:“哪种行为能为我带来最满意的结果?”。这种架构让智能体学会在相互冲突的目标之间进行权衡,使其决策更接近人类的理性选择。
 
@@ -43,24 +47,24 @@
 
 这便是<strong>学习型智能体(Learning Agent)</strong>的核心思想,而<strong>强化学习(Reinforcement Learning, RL)</strong>是实现这一思想最具代表性的路径。一个学习型智能体包含一个性能元件(即我们前面讨论的各类智能体)和一个学习元件。学习元件通过观察性能元件在环境中的行动所带来的结果来不断修正性能元件的决策策略。
 
-想象一个学习下棋的AI。它开始时可能只是随机落子,当它最终赢下一局时,系统会给予它一个正向的奖励。通过大量的自我对弈,学习元件会逐渐发现哪些棋路更有可能导向最终的胜利。AlphaGo是这一理念的一个里程碑式的成就。它在围棋这一复杂博弈中,通过强化学习发现了许多超越人类既有知识的有效策略。
+想象一个学习下棋的 AI。它开始时可能只是随机落子,当它最终赢下一局时,系统会给予它一个正向的奖励。通过大量的自我对弈,学习元件会逐渐发现哪些棋路更有可能导向最终的胜利。AlphaGo 是这一理念的一个里程碑式的成就。它在围棋这一复杂博弈中,通过强化学习发现了许多超越人类既有知识的有效策略。
 
 从简单的恒温器,到拥有内部模型的汽车,再到能够规划路线的导航、懂得权衡利弊的决策者,最终到可以通过经验自我进化的学习者。这条演进之路,展示了传统人工智能在构建机器智能的道路上所经历的发展脉络。它们为我们今天理解更前沿的智能体范式,打下了坚实而必要的基础。
 
 ### 1.1.2 大语言模型驱动的新范式
 
-以<strong>GPT(Generative Pre-trained Transformer)</strong>为代表的大语言模型的出现,正在显著改变智能体的构建方法与能力边界。由大语言模型驱动的LLM智能体,其核心决策机制与传统智能体存在本质区别,从而赋予了其一系列全新的特性。
+以<strong>GPT(Generative Pre-trained Transformer)</strong>为代表的大语言模型的出现,正在显著改变智能体的构建方法与能力边界。由大语言模型驱动的 LLM 智能体,其核心决策机制与传统智能体存在本质区别,从而赋予了其一系列全新的特性。
 
-这种转变,可以从两者在核心引擎、知识来源、交互方式等多个维度的对比中清晰地看出,如表1.1所示。简而言之,传统智能体的能力源于工程师的显式编程与知识构建,其行为模式是确定且有边界的;而LLM智能体则通过在海量数据上的预训练,获得了隐式的世界模型与强大的涌现能力,使其能够以更灵活、更通用的方式应对复杂任务。
+这种转变,可以从两者在核心引擎、知识来源、交互方式等多个维度的对比中清晰地看出,如表 1.1 所示。简而言之,传统智能体的能力源于工程师的显式编程与知识构建,其行为模式是确定且有边界的;而 LLM 智能体则通过在海量数据上的预训练,获得了隐式的世界模型与强大的涌现能力,使其能够以更灵活、更通用的方式应对复杂任务。
 
 <div align="center">
-  <p>表 1.1 传统智能体与LLM驱动智能体的核心对比</p>
+  <p>表 1.1 传统智能体与 LLM 驱动智能体的核心对比</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-2.png" alt="图片描述" width="90%"/>
 </div>
 
-这种差异使得LLM智能体可以直接处理高层级、模糊且充满上下文信息的自然语言指令。让我们以一个“智能旅行助手”为例来说明。
+这种差异使得 LLM 智能体可以直接处理高层级、模糊且充满上下文信息的自然语言指令。让我们以一个“智能旅行助手”为例来说明。
 
-在LLM智能体出现之前,规划旅行通常意味着用户需要在多个专用应用(如天气、地图、预订网站)之间手动切换,并由用户自己扮演信息整合与决策的角色。而一个LLM智能体则能将这个流程整合起来。当接收到“规划一次厦门之旅”这样的模糊指令时,它的工作方式体现了以下几点:
+在 LLM 智能体出现之前,规划旅行通常意味着用户需要在多个专用应用(如天气、地图、预订网站)之间手动切换,并由用户自己扮演信息整合与决策的角色。而一个 LLM 智能体则能将这个流程整合起来。当接收到“规划一次厦门之旅”这样的模糊指令时,它的工作方式体现了以下几点:
 
 - <strong>规划与推理</strong>:智能体首先会将这个高层级目标分解为一系列逻辑子任务,例如:`[确认出行偏好] -> [查询目的地信息] -> [制定行程草案] -> [预订票务住宿]`。这是一个内在的、由模型驱动的规划过程。
 - <strong>工具使用</strong>:在执行规划时,智能体识别到信息缺口,会主动调用外部工具来补全。例如,它会调用天气查询接口获取实时天气,并基于“预报有雨”这一信息,在后续规划中倾向于推荐室内活动。
@@ -78,7 +82,7 @@
 
 (2)<strong>基于时间与反应性的分类</strong>
 
-除了内部架构的复杂性,还可以从智能体处理决策的时间维度进行分类。这个视角关注智能体是在接收到信息后立即行动,还是会经过深思熟虑的规划再行动。这揭示了智能体设计中一个核心权衡:追求速度的<strong>反应性(Reactivity)</strong>与追求最优解的<strong>规划性(Deliberation)</strong>之间的平衡,如图1.3所示。
+除了内部架构的复杂性,还可以从智能体处理决策的时间维度进行分类。这个视角关注智能体是在接收到信息后立即行动,还是会经过深思熟虑的规划再行动。这揭示了智能体设计中一个核心权衡:追求速度的<strong>反应性(Reactivity)</strong>与追求最优解的<strong>规划性(Deliberation)</strong>之间的平衡,如图 1.3 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-3.png" alt="图片描述" width="90%"/>
@@ -101,44 +105,44 @@
 
 现实世界的复杂任务,往往既需要即时反应,也需要长远规划。例如,我们之前提到的智能旅行助手,既要能根据用户的即时反馈(如“这家酒店太贵了”)调整推荐(反应性),又要能规划出为期数天的完整旅行方案(规划性)。因此,混合式智能体应运而生,它旨在结合两者的优点,实现反应与规划的平衡。
 
-一种经典的混合架构是分层设计:底层是一个快速的反应模块,处理紧急情况和基本动作;高层则是一个审慎的规划模块,负责制定长远目标。而现代的LLM智能体,则展现了一种更灵活的混合模式。它们通常在一个“思考-行动-观察”的循环中运作,巧妙地将两种模式融为一体:
+一种经典的混合架构是分层设计:底层是一个快速的反应模块,处理紧急情况和基本动作;高层则是一个审慎的规划模块,负责制定长远目标。而现代的 LLM 智能体,则展现了一种更灵活的混合模式。它们通常在一个“思考-行动-观察”的循环中运作,巧妙地将两种模式融为一体:
 
-- <strong>规划(Reasoning)</strong> :在“思考”阶段,LLM分析当前状况,规划出下一步的合理行动。这是一个审议过程。
+- <strong>规划(Reasoning)</strong> :在“思考”阶段,LLM 分析当前状况,规划出下一步的合理行动。这是一个审议过程。
 - <strong>反应(Acting & Observing)</strong> :在“行动”和“观察”阶段,智能体与外部工具或环境交互,并立即获得反馈。这是一个反应过程。
 
 通过这种方式,智能体将一个需要长远规划的宏大任务,分解为一系列“规划-反应”的微循环。这使其既能灵活应对环境的即时变化,又能通过连贯的步骤,最终完成复杂的长期目标。
 
 <strong>(3)基于知识表示的分类</strong>
 
-这是一个更根本的分类维度,它探究智能体用以决策的知识,究竟是以何种形式存于其“思想”之中。这个问题是人工智能领域一场持续半个多世纪的辩论核心,并塑造了两种截然不同的AI文化。
+这是一个更根本的分类维度,它探究智能体用以决策的知识,究竟是以何种形式存于其“思想”之中。这个问题是人工智能领域一场持续半个多世纪的辩论核心,并塑造了两种截然不同的 AI 文化。
 
-- <strong>符号主义AI(Symbolic AI)</strong>
+- <strong>符号主义 AI(Symbolic AI)</strong>
 
-符号主义,常被称为传统人工智能,其核心信念是:智能源于对符号的逻辑操作。这里的符号是人类可读的实体(如词语、概念),操作则遵循严格的逻辑规则,如图1.4左侧所示。这好比一位一丝不苟的图书管理员,将世界知识整理为清晰的规则库和知识图谱。
+符号主义,常被称为传统人工智能,其核心信念是:智能源于对符号的逻辑操作。这里的符号是人类可读的实体(如词语、概念),操作则遵循严格的逻辑规则,如图 1.4 左侧所示。这好比一位一丝不苟的图书管理员,将世界知识整理为清晰的规则库和知识图谱。
 
 其主要优势在于透明和可解释。由于推理步骤明确,其决策过程可以被完整追溯,这在金融、医疗等高风险领域至关重要。然而,其“阿喀琉斯之踵”在于脆弱性:它依赖于一个完备的规则体系,但在充满模糊和例外的现实世界中,任何未被覆盖的新情况都可能导致系统失灵,这就是所谓的“知识获取瓶颈”。
 
-- <strong>亚符号主义AI(Sub-symbolic AI)</strong>
+- <strong>亚符号主义 AI(Sub-symbolic AI)</strong>
 
 亚符号主义,或称连接主义,则提供了一幅截然不同的图景。在这里,知识并非显式的规则,而是内隐地分布在一个由大量神经元组成的复杂网络中,是从海量数据中学习到的统计模式。神经网络和深度学习是其代表。
 
-如图1.4中间所示,如果说符号主义AI是图书管理员,那么亚符号主义AI就像一个牙牙学语的孩童 。他不是通过学习“猫有四条腿、毛茸茸、会喵喵叫”这样的规则来认识猫的,而是在看过成千上万张猫的图片后,大脑中的神经网络能辨识出“猫”这个概念的视觉模式 。这种方法的强大之处在于其模式识别能力和对噪声数据的鲁棒性 。它能够轻松处理图像、声音等非结构化数据,这在符号主义AI看来是极其困难的任务。
+如图 1.4 中间所示,如果说符号主义 AI 是图书管理员,那么亚符号主义 AI 就像一个牙牙学语的孩童 。他不是通过学习“猫有四条腿、毛茸茸、会喵喵叫”这样的规则来认识猫的,而是在看过成千上万张猫的图片后,大脑中的神经网络能辨识出“猫”这个概念的视觉模式 。这种方法的强大之处在于其模式识别能力和对噪声数据的鲁棒性 。它能够轻松处理图像、声音等非结构化数据,这在符号主义 AI 看来是极其困难的任务。
 
 然而,这种强大的直觉能力也伴随着不透明性。亚符号主义系统通常被视为一个<strong>黑箱(Black Box)</strong>。它能以惊人的准确率识别出图片中的猫,但你若问它“为什么你认为这是猫?”,它很可能无法给出一个合乎逻辑的解释。此外,它在纯粹的逻辑推理任务上表现不佳,有时会产生看似合理却事实错误的幻觉 。
 
-- <strong>神经符号主义AI(Neuro-Symbolic AI)</strong>
+- <strong>神经符号主义 AI(Neuro-Symbolic AI)</strong>
 
-长久以来,符号主义和亚符号主义这两大阵营如同两条平行线,各自发展。为克服上述两种范式的局限,一种“大和解”的思想开始兴起,这就是神经符号主义AI,也称神经符号混合主义。它的目标,是融合两大范式的优点,创造出一个既能像神经网络一样从数据中学习,又能像符号系统一样进行逻辑推理的混合智能体。它试图弥合感知与认知、直觉与理性之间的鸿沟。诺贝尔经济学奖得主丹尼尔·卡尼曼(Daniel Kahneman)在其著作《思考,快与慢》(Thinking, Fast and Slow)中提出的双系统理论,为我们理解神经符号主义提供了一个绝佳的类比<sup>[2]</sup>,如图1.4所示:
+长久以来,符号主义和亚符号主义这两大阵营如同两条平行线,各自发展。为克服上述两种范式的局限,一种“大和解”的思想开始兴起,这就是神经符号主义 AI,也称神经符号混合主义。它的目标,是融合两大范式的优点,创造出一个既能像神经网络一样从数据中学习,又能像符号系统一样进行逻辑推理的混合智能体。它试图弥合感知与认知、直觉与理性之间的鸿沟。诺贝尔经济学奖得主丹尼尔·卡尼曼(Daniel Kahneman)在其著作《思考,快与慢》(Thinking, Fast and Slow)中提出的双系统理论,为我们理解神经符号主义提供了一个绝佳的类比<sup>[2]</sup>,如图 1.4 所示:
 
-- <strong>系统1</strong>是快速、凭直觉、并行的思维模式,类似于亚符号主义AI强大的模式识别能力。
-- <strong>系统2</strong>是缓慢、有条理、基于逻辑的审慎思维,恰如符号主义AI的推理过程。
+- <strong>系统 1</strong>是快速、凭直觉、并行的思维模式,类似于亚符号主义 AI 强大的模式识别能力。
+- <strong>系统 2</strong>是缓慢、有条理、基于逻辑的审慎思维,恰如符号主义 AI 的推理过程。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-4.png" alt="图片描述" width="90%"/>
   <p>图 1.4 符号主义、亚符号主义与神经符号混合主义的知识表示范式</p>
 </div>
 
-人类的智能,正源于这两个系统的协同工作。同样,一个真正鲁棒的AI,也需要兼具二者之长。大语言模型驱动的智能体是神经符号主义的一个极佳实践范例。其内核是一个巨大的神经网络,使其具备模式识别和语言生成能力。然而,当它工作时,它会生成一系列结构化的中间步骤,如思想、计划或API调用,这些都是明确的、可操作的符号。通过这种方式,它实现了感知与认知、直觉与理性的初步融合。
+人类的智能,正源于这两个系统的协同工作。同样,一个真正鲁棒的 AI,也需要兼具二者之长。大语言模型驱动的智能体是神经符号主义的一个极佳实践范例。其内核是一个巨大的神经网络,使其具备模式识别和语言生成能力。然而,当它工作时,它会生成一系列结构化的中间步骤,如思想、计划或 API 调用,这些都是明确的、可操作的符号。通过这种方式,它实现了感知与认知、直觉与理性的初步融合。
 
 
 
@@ -146,17 +150,17 @@
 
 ### 1.2.1 任务环境定义
 
-要理解智能体的运作,我们必须先理解它所处的<strong>任务环境</strong>。在人工智能领域,通常使用<strong>PEAS模型</strong>来精确描述一个任务环境,即分析其<strong>性能度量(Performance)、环境(Environment)、执行器(Actuators)和传感器(Sensors)</strong> 。以上文提到的智能旅行助手为例,下表1.2展示了如何运用PEAS模型对其任务环境进行规约。
+要理解智能体的运作,我们必须先理解它所处的<strong>任务环境</strong>。在人工智能领域,通常使用<strong>PEAS 模型</strong>来精确描述一个任务环境,即分析其<strong>性能度量(Performance)、环境(Environment)、执行器(Actuators)和传感器(Sensors)</strong> 。以上文提到的智能旅行助手为例,下表 1.2 展示了如何运用 PEAS 模型对其任务环境进行规约。
 
 <div align="center">
-  <p>表 1.2 智能旅行助手的PEAS描述</p>
+  <p>表 1.2 智能旅行助手的 PEAS 描述</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-6.png" alt="图片描述" width="90%"/>
 </div>
 
 
-在实践中,LLM智能体所处的数字环境展现出若干复杂特性,这些特性直接影响着智能体的设计。
+在实践中,LLM 智能体所处的数字环境展现出若干复杂特性,这些特性直接影响着智能体的设计。
 
-首先,环境通常是<strong>部分可观察的</strong>。例如,旅行助手在查询航班时,无法一次性获取所有航空公司的全部实时座位信息。它只能通过调用航班预订API,看到该API返回的部分数据,这就要求智能体必须具备记忆(记住已查询过的航线)和探索(尝试不同的查询日期)的能力。
+首先,环境通常是<strong>部分可观察的</strong>。例如,旅行助手在查询航班时,无法一次性获取所有航空公司的全部实时座位信息。它只能通过调用航班预订 API,看到该 API 返回的部分数据,这就要求智能体必须具备记忆(记住已查询过的航线)和探索(尝试不同的查询日期)的能力。
 
 其次,行动的结果也并非总是确定的。根据结果的可预测性,环境可分为<strong>确定性</strong>和<strong>随机性</strong>。旅行助手的任务环境就是典型的随机性环境。当它搜索票价时,两次相邻的调用返回的机票价格和余票数量都可能不同,这就要求智能体必须具备处理不确定性、监控变化并及时决策的能力。
 
@@ -166,7 +170,7 @@
 
 ### 1.2.2 智能体的运行机制
 
-在定义了智能体所处的任务环境后,我们来探讨其核心的运行机制。智能体并非一次性完成任务,而是通过一个持续的循环与环境进行交互,这个核心机制被称为 <strong>智能体循环 (Agent Loop)</strong>。如图1.5所示,该循环描述了智能体与环境之间的动态交互过程,构成了其自主行为的基础。
+在定义了智能体所处的任务环境后,我们来探讨其核心的运行机制。智能体并非一次性完成任务,而是通过一个持续的循环与环境进行交互,这个核心机制被称为 <strong>智能体循环 (Agent Loop)</strong>。如图 1.5 所示,该循环描述了智能体与环境之间的动态交互过程,构成了其自主行为的基础。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-5.png" alt="图片描述" width="90%"/>
@@ -175,17 +179,17 @@
 
 这个循环主要包含以下几个相互关联的阶段:
 
-1. <strong>感知 (Perception)</strong>:这是循环的起点。智能体通过其传感器(例如,API的监听端口、用户输入接口)接收来自环境的输入信息。这些信息,即<strong>观察 (Observation)</strong>,既可以是用户的初始指令,也可以是上一步行动所导致的环境状态变化反馈。
-2. <strong>思考 (Thought)</strong>:接收到观察信息后,智能体进入其核心决策阶段。对于LLM智能体而言,这通常是由大语言模型驱动的内部推理过程。如图所示,“思考”阶段可进一步细分为两个关键环节:
+1. <strong>感知 (Perception)</strong>:这是循环的起点。智能体通过其传感器(例如,API 的监听端口、用户输入接口)接收来自环境的输入信息。这些信息,即<strong>观察 (Observation)</strong>,既可以是用户的初始指令,也可以是上一步行动所导致的环境状态变化反馈。
+2. <strong>思考 (Thought)</strong>:接收到观察信息后,智能体进入其核心决策阶段。对于 LLM 智能体而言,这通常是由大语言模型驱动的内部推理过程。如图所示,“思考”阶段可进一步细分为两个关键环节:
    - <strong>规划 (Planning)</strong>:智能体基于当前的观察和其内部记忆,更新对任务和环境的理解,并制定或调整一个行动计划。这可能涉及将复杂目标分解为一系列更具体的子任务。
    - <strong>工具选择 (Tool Selection)</strong>:根据当前计划,智能体从其可用的工具库中,选择最适合执行下一步骤的工具,并确定调用该工具所需的具体参数。
-3. <strong>行动 (Action)</strong>:决策完成后,智能体通过其执行器(Actuators)执行具体的行动。这通常表现为调用一个选定的工具(如代码解释器、搜索引擎API),从而对环境施加影响,意图改变环境的状态。
+3. <strong>行动 (Action)</strong>:决策完成后,智能体通过其执行器(Actuators)执行具体的行动。这通常表现为调用一个选定的工具(如代码解释器、搜索引擎 API),从而对环境施加影响,意图改变环境的状态。
 
 行动并非循环的终点。智能体的行动会引起<strong>环境 (Environment)</strong> 的<strong>状态变化 (State Change)</strong>,环境随即会产生一个新的<strong>观察 (Observation)</strong> 作为结果反馈。这个新的观察又会在下一轮循环中被智能体的感知系统捕获,形成一个持续的“感知-思考-行动-观察”的闭环。智能体正是通过不断重复这一循环,逐步推进任务,从初始状态向目标状态演进。
 
 ### 1.2.3 智能体的感知与行动
 
-在工程实践中,为了让LLM能够有效驱动这个循环,我们需要一套明确的<strong>交互协议 (Interaction Protocol)</strong> 来规范其与环境之间的信息交换。
+在工程实践中,为了让 LLM 能够有效驱动这个循环,我们需要一套明确的<strong>交互协议 (Interaction Protocol)</strong> 来规范其与环境之间的信息交换。
 
 在许多现代智能体框架中,这一协议体现在对智能体每一次输出的结构化定义上。智能体的输出不再是单一的自然语言回复,而是一段遵循特定格式的文本,其中明确地展示了其内部的推理过程与最终决策。
 
@@ -203,7 +207,7 @@ Action: get_weather("北京")
 
 这里的`Action`字段构成了对外部世界的指令。一个外部的<strong>解析器 (Parser)</strong> 会捕捉到这个指令,并调用相应的`get_weather`函数。
 
-行动执行后,环境会返回一个结果。例如,`get_weather`函数可能返回一个包含详细天气数据的JSON对象。然而,原始的机器可读数据(如JSON)通常包含LLM无需关注的冗余信息,且格式不符合其自然语言处理的习惯。
+行动执行后,环境会返回一个结果。例如,`get_weather`函数可能返回一个包含详细天气数据的 JSON 对象。然而,原始的机器可读数据(如 JSON)通常包含 LLM 无需关注的冗余信息,且格式不符合其自然语言处理的习惯。
 
 因此,感知系统的一个重要职责就是扮演传感器的角色:将这个原始输出处理并封装成一段简洁、清晰的自然语言文本,即观察。
 
@@ -213,17 +217,17 @@ Observation: 北京当前天气为晴,气温25摄氏度,微风。
 
 这段`Observation`文本会被反馈给智能体,作为下一轮循环的主要输入信息,供其进行新一轮的`Thought`和`Action`。
 
-综上所述,通过这个由Thought、Action、Observation构成的严谨循环,LLM智能体得以将内部的语言推理能力,与外部环境的真实信息和工具操作能力有效地结合起来。
+综上所述,通过这个由 Thought、Action、Observation 构成的严谨循环,LLM 智能体得以将内部的语言推理能力,与外部环境的真实信息和工具操作能力有效地结合起来。
 
 ## 1.3 动手体验:5 分钟实现第一个智能体
 
-在前面的小节,我们学习了智能体的任务环境、核心运行机制以及 `Thought-Action-Observation` 交互范式。理论知识固然重要,但最好的学习方式是亲手实践。在本节中,我们将引导您使用几行简单的Python代码,从零开始构建一个可以工作的智能旅行助手。这个过程将遵循我们刚刚学到的理论循环,让您直观地感受到一个智能体是如何“思考”并与外部“工具”互动的。让我们开始吧!
+在前面的小节,我们学习了智能体的任务环境、核心运行机制以及 `Thought-Action-Observation` 交互范式。理论知识固然重要,但最好的学习方式是亲手实践。在本节中,我们将引导您使用几行简单的 Python 代码,从零开始构建一个可以工作的智能旅行助手。这个过程将遵循我们刚刚学到的理论循环,让您直观地感受到一个智能体是如何“思考”并与外部“工具”互动的。让我们开始吧!
 
 在本案例中,我们的目标是构建一个能处理分步任务的智能旅行助手。需要解决的用户任务定义为:"你好,请帮我查询一下今天北京的天气,然后根据天气推荐一个合适的旅游景点。"要完成这个任务,智能体必须展现出清晰的逻辑规划能力。它需要先调用天气查询工具,并将获得的观察结果作为下一步的依据。在下一轮循环中,它再调用景点推荐工具,从而得出最终建议。
 
 ### 1.3.1 准备工作
 
-为了能从Python程序中访问网络API,我们需要一个HTTP库。`requests`是Python社区中最流行、最易用的选择。`tavily-python`是一个强大的AI搜索API客户端,用于获取实时的网络搜索结果,可以在[官网](https://www.tavily.com/)注册后获取API。`openai`是OpenAI官方提供的Python SDK,用于调用GPT等大语言模型服务。请先通过以下命令安装它们::
+为了能从 Python 程序中访问网络 API,我们需要一个 HTTP 库。`requests`是 Python 社区中最流行、最易用的选择。`tavily-python`是一个强大的 AI 搜索 API 客户端,用于获取实时的网络搜索结果,可以在[官网](https://www.tavily.com/)注册后获取 API。`openai`是 OpenAI 官方提供的 Python SDK,用于调用 GPT 等大语言模型服务。请先通过以下命令安装它们::
 
 ```bash
 pip install requests tavily-python openai
@@ -231,7 +235,7 @@ pip install requests tavily-python openai
 
 (1)指令模板
 
-驱动真实LLM的关键在于<strong>提示工程(Prompt Engineering)</strong>。我们需要设计一个“指令模板”,告诉LLM它应该扮演什么角色、拥有哪些工具、以及如何格式化它的思考和行动。这是我们智能体的“说明书”,它将作为`system_prompt`传递给LLM。
+驱动真实 LLM 的关键在于<strong>提示工程(Prompt Engineering)</strong>。我们需要设计一个“指令模板”,告诉 LLM 它应该扮演什么角色、拥有哪些工具、以及如何格式化它的思考和行动。这是我们智能体的“说明书”,它将作为`system_prompt`传递给 LLM。
 
 ```
 AGENT_SYSTEM_PROMPT = """
@@ -253,9 +257,9 @@ Action: [这里是你要调用的工具,格式为 function_name(arg_name="arg_
 """
 ```
 
-(2)工具1:查询真实天气
+(2)工具 1:查询真实天气
 
-我们将使用免费的天气查询服务 `wttr.in`,它能以JSON格式返回指定城市的天气数据。下面是实现该工具的代码:
+我们将使用免费的天气查询服务 `wttr.in`,它能以 JSON 格式返回指定城市的天气数据。下面是实现该工具的代码:
 
 ```python
 import requests
@@ -282,17 +286,17 @@ def get_weather(city: str) -> str:
         temp_c = current_condition['temp_C']
         
         # 格式化成自然语言返回
-        return f"{city}当前天气{weather_desc},气温{temp_c}摄氏度"
+        return f"{city}当前天气:{weather_desc},气温{temp_c}摄氏度"
         
     except requests.exceptions.RequestException as e:
         # 处理网络错误
-        return f"错误查询天气时遇到网络问题 - {e}"
+        return f"错误:查询天气时遇到网络问题 - {e}"
     except (KeyError, IndexError) as e:
         # 处理数据解析错误
-        return f"错误解析天气数据失败,可能是城市名称无效 - {e}"
+        return f"错误:解析天气数据失败,可能是城市名称无效 - {e}"
 ```
 
-(3)工具2:搜索并推荐旅游景点
+(3)工具 2:搜索并推荐旅游景点
 
 我们将定义一个新工具 `search_attraction`,它会根据城市和天气状况,互联网上搜索合适的景点:
 
@@ -307,7 +311,7 @@ def get_attraction(city: str, weather: str) -> str:
     # 1. 从环境变量中读取API密钥
     api_key = os.environ.get("TAVILY_API_KEY")
     if not api_key:
-        return "错误未配置TAVILY_API_KEY环境变量。"
+        return "错误:未配置TAVILY_API_KEY环境变量。"
 
     # 2. 初始化Tavily客户端
     tavily = TavilyClient(api_key=api_key)
@@ -332,10 +336,10 @@ def get_attraction(city: str, weather: str) -> str:
         if not formatted_results:
              return "抱歉,没有找到相关的旅游景点推荐。"
 
-        return "根据搜索,为您找到以下信息\n" + "\n".join(formatted_results)
+        return "根据搜索,为您找到以下信息:\n" + "\n".join(formatted_results)
 
     except Exception as e:
-        return f"错误执行Tavily搜索时出现问题 - {e}"
+        return f"错误:执行Tavily搜索时出现问题 - {e}"
 ```
 
 最后,我们将所有工具函数放入一个字典,供主循环调用:
@@ -352,7 +356,7 @@ available_tools = {
 
 ### 1.3.2 接入大语言模型
 
-当前,许多LLM服务提供商(包括OpenAI、Azure、以及众多开源模型服务框架如Ollama、vLLM等)都遵循了与OpenAI API相似的接口规范。这种标准化为开发者带来了极大的便利。智能体的自主决策能力来源于LLM。我们将实现一个通用的客户端 `OpenAICompatibleClient`,它可以连接到任何兼容OpenAI接口规范的LLM服务。
+当前,许多 LLM 服务提供商(包括 OpenAI、Azure、以及众多开源模型服务框架如 Ollama、vLLM 等)都遵循了与 OpenAI API 相似的接口规范。这种标准化为开发者带来了极大的便利。智能体的自主决策能力来源于 LLM。我们将实现一个通用的客户端 `OpenAICompatibleClient`,它可以连接到任何兼容 OpenAI 接口规范的 LLM 服务。
 
 ```python
 from openai import OpenAI
@@ -383,14 +387,14 @@ class OpenAICompatibleClient:
             return answer
         except Exception as e:
             print(f"调用LLM API时发生错误: {e}")
-            return "错误调用语言模型服务时出错。"
+            return "错误:调用语言模型服务时出错。"
 ```
 
-要实例化此类,您需要提供三个信息:`API_KEY`、`BASE_URL` 和 `MODEL_ID`,具体值取决于您使用的服务商(如OpenAI官方、Azure、或Ollama等本地模型),如果暂时没有渠道获取,可以参考Datawhale另一本教程的[1.2 API设置](https://datawhalechina.github.io/handy-multi-agent/#/chapter1/1.2.api-setup)。
+要实例化此类,您需要提供三个信息:`API_KEY`、`BASE_URL` 和 `MODEL_ID`,具体值取决于您使用的服务商(如 OpenAI 官方、Azure、或 Ollama 等本地模型),如果暂时没有渠道获取,可以参考 Datawhale 另一本教程的[1.2 API 设置](https://datawhalechina.github.io/handy-multi-agent/#/chapter1/1.2.api-setup)。
 
 ### 1.3.3 执行行动循环
 
-下面的主循环将整合所有组件,并通过格式化后的Prompt驱动LLM进行决策。
+下面的主循环将整合所有组件,并通过格式化后的 Prompt 驱动 LLM 进行决策。
 
 ```python
 import re
@@ -430,7 +434,7 @@ for i in range(5): # 设置最大循环次数
     # 3.3. 解析并执行行动
     action_match = re.search(r"Action: (.*)", llm_output, re.DOTALL)
     if not action_match:
-        print("解析错误模型输出中未找到 Action。")
+        print("解析错误:模型输出中未找到 Action。")
         break
     action_str = action_match.group(1).strip()
 
@@ -446,7 +450,7 @@ for i in range(5): # 设置最大循环次数
     if tool_name in available_tools:
         observation = available_tools[tool_name](**kwargs)
     else:
-        observation = f"错误未定义的工具 '{tool_name}'"
+        observation = f"错误:未定义的工具 '{tool_name}'"
 
     # 3.4. 记录观察结果
     observation_str = f"Observation: {observation}"
@@ -454,7 +458,7 @@ for i in range(5): # 设置最大循环次数
     prompt_history.append(observation_str)
 ```
 
-通过以上步骤,我们构建了一个完整的、由真实LLM驱动的智能体。其核心在于“工具”和“提示工程”的结合,这正是当前主流智能体框架(如LangChain、LlamaIndex等)的设计精髓。
+通过以上步骤,我们构建了一个完整的、由真实 LLM 驱动的智能体。其核心在于“工具”和“提示工程”的结合,这正是当前主流智能体框架(如 LangChain、LlamaIndex 等)的设计精髓。
 
 ### 1.3.4 运行案例分析
 
@@ -471,7 +475,7 @@ for i in range(5): # 设置最大循环次数
 Thought: 首先需要获取北京今天的天气情况,之后再根据天气情况来推荐旅游景点。
 Action: get_weather(city="北京")
 
-Observation: 北京当前天气Sunny,气温26摄氏度
+Observation: 北京当前天气:Sunny,气温26摄氏度
 ========================================      
 --- 循环 2 ---
 
@@ -505,18 +509,18 @@ Action: finish(answer="今天北京的天气是晴朗的,气温26摄氏度,
 
 在这种模式下,智能体被深度集成到开发者的工作流中,作为一种强大的辅助工具。它增强而非取代开发者的角色,通过自动化处理繁琐、重复的任务,让开发者能更专注于创造性的核心工作。这种人机协同的方式,极大地提升了软件开发的效率与质量。
 
-目前,市场上涌现了多款优秀的AI编程辅助工具,它们虽然均能提升开发效率,但在实现路径和功能侧重上各有千秋:
+目前,市场上涌现了多款优秀的 AI 编程辅助工具,它们虽然均能提升开发效率,但在实现路径和功能侧重上各有千秋:
 
-- <strong>GitHubCopilot</strong>: 作为该领域最具影响力的产品之一,Copilot 由 GitHub 与 OpenAI 联合开发。它深度集成于 Visual Studio Code等主流编辑器中,以其强大的代码自动补全能力而闻名。开发者在编写代码时,Copilot 能实时提供整行甚至整个函数块的建议。近年来,它也通过 Copilot Chat 扩展了对话式编程的能力,允许开发者在编辑器内通过聊天解决编程问题。
+- <strong>GitHubCopilot</strong>: 作为该领域最具影响力的产品之一,Copilot 由 GitHub 与 OpenAI 联合开发。它深度集成于 Visual Studio Code 等主流编辑器中,以其强大的代码自动补全能力而闻名。开发者在编写代码时,Copilot 能实时提供整行甚至整个函数块的建议。近年来,它也通过 Copilot Chat 扩展了对话式编程的能力,允许开发者在编辑器内通过聊天解决编程问题。
 - <strong>Claude Code</strong>: Claude Code 是由 Anthropic 开发的 AI 编程助手,旨在通过自然语言指令帮助开发者在终端中高效地完成编码任务。它能够理解完整的代码库结构,执行代码编辑、测试和调试等操作,支持从描述功能到代码实现的全流程开发。Claude Code 还提供了无交互(headless)模式,适用于 CI、pre-commit hooks、构建脚本和其他自动化场景,为开发者提供了强大的命令行编程体验。
 - <strong>Trae</strong>: 作为新兴的 AI 编程工具,Trae 专注于为开发者提供智能化的代码生成和优化服务。它通过深度学习技术分析代码模式,能够为开发者提供精准的代码建议和自动化重构方案。Trae 的特色在于其轻量级的设计和快速响应能力,特别适合需要频繁迭代和快速原型开发的场景。
-- <strong>Cursor</strong>: 与上述主要作为插件或集成功能存在的工具不同,Cursor 则选择了一条更具整合性的路径,它本身就是一个AI原生的代码编辑器。它并非在现有编辑器上增加AI功能,而是在设计之初就将AI交互作为核心。除了具备顶级的代码生成和聊天能力外,它更强调让AI理解整个代码库的上下文,从而实现更深层次的问答、重构和调试。
+- <strong>Cursor</strong>: 与上述主要作为插件或集成功能存在的工具不同,Cursor 则选择了一条更具整合性的路径,它本身就是一个 AI 原生的代码编辑器。它并非在现有编辑器上增加 AI 功能,而是在设计之初就将 AI 交互作为核心。除了具备顶级的代码生成和聊天能力外,它更强调让 AI 理解整个代码库的上下文,从而实现更深层次的问答、重构和调试。
 
 当然还有许多优秀的工具没有例举,不过它们共同指向了一个明确的趋势:AI 正在深度融入软件开发的全生命周期,通过构建高效的人机协同工作流,深刻地重塑着软件工程的效率边界与开发范式。
 
 ### 1.4.2 作为自主协作者的智能体
 
-与作为工具辅助人类不同,第二种交互模式将智能体的自动化程度提升到了一个全新的层次,自主协作者。在这种模式下,我们不再是手把手地指导AI完成每一步,而是将一个高层级的目标委托给它。智能体会像一个真正的项目成员一样,独立地进行规划、推理、执行和反思,直到最终交付成果。这种从助手到协作者的转变,使得LLM智能体更深的进入了大众的视野。它标志着我们与AI的关系从“命令-执行”演变为“目标-委托”。智能体不再是被动的工具,而是主动的目标追求者。
+与作为工具辅助人类不同,第二种交互模式将智能体的自动化程度提升到了一个全新的层次,自主协作者。在这种模式下,我们不再是手把手地指导 AI 完成每一步,而是将一个高层级的目标委托给它。智能体会像一个真正的项目成员一样,独立地进行规划、推理、执行和反思,直到最终交付成果。这种从助手到协作者的转变,使得 LLM 智能体更深的进入了大众的视野。它标志着我们与 AI 的关系从“命令-执行”演变为“目标-委托”。智能体不再是被动的工具,而是主动的目标追求者。
 
 当前,实现这种自主协作的思路百花齐放,涌现了大量优秀的框架和产品,从早期的 BabyAGI、AutoGPT,到如今更为成熟的 CrewAI、AutoGen、MetaGPT、LangGraph 等优秀框架,共同推动着这一领域的高速发展。虽然具体实现千差万别,但它们的架构范式大致可以归纳为几个主流方向:
 
@@ -526,26 +530,26 @@ Action: finish(answer="今天北京的天气是晴朗的,气温26摄氏度,
 
 这些不同的架构范式,共同推动着自主智能体从理论构想走向更广泛的实际应用,使其有能力应对日益复杂的真实世界任务。在我们的后续章节中,也会感受不同类型框架之间的差异和优势。
 
-### 1.4.3 Workflow和Agent的差异
+### 1.4.3 Workflow  Agent 的差异
 
-在理解了智能体作为“工具”和“协作者”两种模式后,我们有必要对Workflow和Agent的差异展开讨论,尽管它们都旨在实现任务自动化,但其底层逻辑、核心特征和适用场景却截然不同。
+在理解了智能体作为“工具”和“协作者”两种模式后,我们有必要对 Workflow  Agent 的差异展开讨论,尽管它们都旨在实现任务自动化,但其底层逻辑、核心特征和适用场景却截然不同。
 
 简单来说,<strong>Workflow 是让 AI 按部就班地执行指令,而 Agent 则是赋予 AI 自由度去自主达成目标。</strong>
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/1-figures/1757242319667-18.png" alt="图片描述" width="90%"/>
-  <p>图 1.6 Workflow和Agent的差异</p>
+  <p>图 1.6 Workflow  Agent 的差异</p>
 </div>
 
-如图1.6所示,工作流是一种传统的自动化范式,其核心是<strong>对一系列任务或步骤进行预先定义的、结构化的编排</strong>。它本质上是一个精确的、静态的流程图,规定了在何种条件下、以何种顺序执行哪些操作。一个典型的案例:某企业的费用报销审批流程。员工提交报销单(触发)-> 如果金额小于500元,直接由部门经理审批 -> 如果金额大于500元,先由部门经理审批,再流转至财务总监审批 -> 审批通过后,通知财务部打款。整个过程的每一步、每一个判断条件都被精确地预先设定。
+如图 1.6 所示,工作流是一种传统的自动化范式,其核心是<strong>对一系列任务或步骤进行预先定义的、结构化的编排</strong>。它本质上是一个精确的、静态的流程图,规定了在何种条件下、以何种顺序执行哪些操作。一个典型的案例:某企业的费用报销审批流程。员工提交报销单(触发)-> 如果金额小于 500 元,直接由部门经理审批 -> 如果金额大于 500 元,先由部门经理审批,再流转至财务总监审批 -> 审批通过后,通知财务部打款。整个过程的每一步、每一个判断条件都被精确地预先设定。
 
-与工作流不同,基于大型语言模型的智能体是一个<strong>具备自主性的、以目标为导向的系统</strong>。它不仅仅是执行预设指令,而是能够在一定程度上理解环境、进行推理、制定计划,并动态地采取行动以达成最终目标。LLM在其中扮演着“大脑”的角色。一个典型的例子,便是我们在1.3节中写的智能旅行助手。当我们向它下达一个新指令,例如:<strong>“你好,请帮我查询一下今天北京的天气,然后根据天气推荐一个合适的旅游景点。”</strong> 它的处理过程充分展现了其自主性:
+与工作流不同,基于大型语言模型的智能体是一个<strong>具备自主性的、以目标为导向的系统</strong>。它不仅仅是执行预设指令,而是能够在一定程度上理解环境、进行推理、制定计划,并动态地采取行动以达成最终目标。LLM 在其中扮演着“大脑”的角色。一个典型的例子,便是我们在 1.3 节中写的智能旅行助手。当我们向它下达一个新指令,例如:<strong>“你好,请帮我查询一下今天北京的天气,然后根据天气推荐一个合适的旅游景点。”</strong> 它的处理过程充分展现了其自主性:
 
-1. <strong>规划与工具调用:</strong> Agent首先会把任务拆解为两个步骤:① 查询天气;② 基于天气推荐景点。随即,它会自主选择并调用“天气查询API”,并将“北京”作为参数传入。
-2. <strong>推理与决策:</strong> 假设API返回结果为“晴朗,微风”。Agent的LLM大脑会基于这个信息进行推理:“晴天适合户外活动”。接着,它会根据这个判断,在它的知识库或通过搜索引擎这个工具中,筛选出北京的户外景点,如故宫、颐和园、天坛公园等。
-3. <strong>生成结果:</strong> 最后,Agent会综合信息,给出一个完整的、人性化的回答:“今天北京天气晴朗,微风,非常适合户外游玩。为您推荐前往【颐和园】,您可以在昆明湖上泛舟,欣赏美丽的皇家园林景色。”
+1. <strong>规划与工具调用:</strong> Agent 首先会把任务拆解为两个步骤:① 查询天气;② 基于天气推荐景点。随即,它会自主选择并调用“天气查询 API”,并将“北京”作为参数传入。
+2. <strong>推理与决策:</strong> 假设 API 返回结果为“晴朗,微风”。Agent  LLM 大脑会基于这个信息进行推理:“晴天适合户外活动”。接着,它会根据这个判断,在它的知识库或通过搜索引擎这个工具中,筛选出北京的户外景点,如故宫、颐和园、天坛公园等。
+3. <strong>生成结果:</strong> 最后,Agent 会综合信息,给出一个完整的、人性化的回答:“今天北京天气晴朗,微风,非常适合户外游玩。为您推荐前往【颐和园】,您可以在昆明湖上泛舟,欣赏美丽的皇家园林景色。”
 
-在这个过程中,没有任何写死的`if天气=晴天 then 推荐颐和园`的规则。如果天气是“雨天”,Agent会自主推理并推荐国家博物馆、首都博物馆等室内场所。<strong>这种基于实时信息进行动态推理和决策的能力,正是Agent的核心价值所在。</strong>
+在这个过程中,没有任何写死的`if天气=晴天 then 推荐颐和园`的规则。如果天气是“雨天”,Agent 会自主推理并推荐国家博物馆、首都博物馆等室内场所。<strong>这种基于实时信息进行动态推理和决策的能力,正是 Agent 的核心价值所在。</strong>
 
 
 
@@ -555,8 +559,8 @@ Action: finish(answer="今天北京的天气是晴朗的,气温26摄氏度,
 
 - <strong>什么是大语言模型驱动的智能体?</strong> 我们首先明确了其定义,理解了现代智能体是具备了能力的实体。它不再仅仅是执行预设程序的脚本,而是能够自主推理和使用工具的决策者。
 - <strong>智能体如何工作?</strong> 我们深入探讨了智能体与环境交互的运行机制。我们了解到,这个持续的闭环是智能体处理信息、做出决策、影响环境并根据反馈调整自身行为的基础。
-- <strong>如何构建智能体?</strong> 这是本章的实践核心。我们以“智能旅行助手”为例,亲手构建了一个完整的、由真实LLM驱动的智能体。
-- <strong>智能体有哪些主流的应用范式?</strong> 最后,我们将视野投向了更广阔的应用领域。我们探讨了两种主流的智能体交互模式:一是以GitHub Copilot和Cursor等为代表的、增强人类工作流的“开发者工具”;二是以CrewAI、MetaGPT和AgentScope等框架为代表的、能够独立完成高层级目标的“自主协作者”。同时讲解了Workflow与Agent的差异。
+- <strong>如何构建智能体?</strong> 这是本章的实践核心。我们以“智能旅行助手”为例,亲手构建了一个完整的、由真实 LLM 驱动的智能体。
+- <strong>智能体有哪些主流的应用范式?</strong> 最后,我们将视野投向了更广阔的应用领域。我们探讨了两种主流的智能体交互模式:一是以 GitHub Copilot  Cursor 等为代表的、增强人类工作流的“开发者工具”;二是以 CrewAI、MetaGPT  AgentScope 等框架为代表的、能够独立完成高层级目标的“自主协作者”。同时讲解了 Workflow  Agent 的差异。
 
 通过本章的学习,我们建立了一个关于智能体的基础认知框架。那么,它是如何一步步从最初的构想演进至今的呢?在下一章中,我们将探索智能体的发展历史,一段追本溯源的旅程即将开始!
 
@@ -568,13 +572,13 @@ Action: finish(answer="今天北京的天气是晴朗的,气温26摄氏度,
 
 1. 请分析以下四个 `case` 中的<strong>主体</strong>是否属于智能体,如果是,那么属于哪种类型的智能体(可以从多个分类维度进行分析),并说明理由:
 
-   `case A`:<strong>一台符合冯·诺依曼结构的超级计算机</strong>,拥有高达每秒2EFlop的峰值算力
+   `case A`:<strong>一台符合冯·诺依曼结构的超级计算机</strong>,拥有高达每秒 2EFlop 的峰值算力
 
    `case B`:<strong>特斯拉自动驾驶系统</strong>在高速公路上行驶时,突然检测到前方有障碍物,需要在毫秒级做出刹车或变道决策
 
    `case C`:<strong>AlphaGo</strong>在与人类棋手对弈时,需要评估当前局面并规划未来数十步的最优策略
 
-   `case D`:<strong>ChatGPT扮演的智能客服</strong>在处理用户投诉时,需要查询订单信息、分析问题原因、提供解决方案并安抚用户情绪
+   `case D`:<strong>ChatGPT 扮演的智能客服</strong>在处理用户投诉时,需要查询订单信息、分析问题原因、提供解决方案并安抚用户情绪
 
 2. 假设你需要为一个"智能健身教练"设计任务环境。这个智能体能够:
    - 通过可穿戴设备监测用户的心率、运动强度等生理数据
@@ -582,42 +586,42 @@ Action: finish(answer="今天北京的天气是晴朗的,气温26摄氏度,
    - 在用户运动过程中提供实时语音指导和动作纠正
    - 评估训练效果并给出饮食建议
 
-   请使用PEAS模型完整描述这个智能体的任务环境,并分析该环境具有哪些特性(如部分可观察、随机性、动态性等)。
+   请使用 PEAS 模型完整描述这个智能体的任务环境,并分析该环境具有哪些特性(如部分可观察、随机性、动态性等)。
 
 3. 某电商公司正在考虑两种方案来处理售后退款申请:
    
-   方案A(`Workflow`):设计一套固定流程,例如:
+   方案 A(`Workflow`):设计一套固定流程,例如:
 
-   A.1 对于一般商品且在7天之内,金额 `< 100RMB` 自动通过;`100-500RMB `由客服审核;`>500RMB` 需主管审批;而特殊商品(如定制品)一律拒绝退款
+   A.1 对于一般商品且在 7 天之内,金额 `< 100RMB` 自动通过;`100-500RMB `由客服审核;`>500RMB` 需主管审批;而特殊商品(如定制品)一律拒绝退款
    
-   A.2 对于超过7天的商品,无论金额,只能由客服审核或主管审批;
+   A.2 对于超过 7 天的商品,无论金额,只能由客服审核或主管审批;
    
-   方案B(`Agent`):搭建一个智能体系统,让它理解退款政策、分析用户历史行为、评估商品状况,并自主决策是否批准退款
+   方案 B(`Agent`):搭建一个智能体系统,让它理解退款政策、分析用户历史行为、评估商品状况,并自主决策是否批准退款
    
    请分析:
    - 这两种方案各自的优缺点是什么?
    - 在什么情况下 `Workflow` 更合适?什么情况下 `Agent` 更有优势?如果你是该电商公司的负责人,你更倾向于采用哪种方案?
-   - 是否存在一个方案C,能够结合两种方案,达到扬长避短的效果?
+   - 是否存在一个方案 C,能够结合两种方案,达到扬长避短的效果?
    
-4. 在1.3节的智能旅行助手基础上,请思考如何添加以下功能(可以只描述设计思路,也可以进一步尝试代码实现):
+4. 在 1.3 节的智能旅行助手基础上,请思考如何添加以下功能(可以只描述设计思路,也可以进一步尝试代码实现):
 
    > <strong>提示</strong>:思考如何修改 `Thought-Action-Observation` 循环来实现这些功能。
 
    - 添加一个"记忆"功能,让智能体记住用户的偏好(如喜欢历史文化景点、预算范围等)
    - 当推荐的景点门票已售罄时,智能体能够自动推荐备选方案
-   - 如果用户连续拒绝了3个推荐,智能体能够反思并调整推荐策略
+   - 如果用户连续拒绝了 3 个推荐,智能体能够反思并调整推荐策略
 
-5. 卡尼曼的"系统1"(快速直觉)和"系统2"(慢速推理)理论<sup>[2]</sup>为神经符号主义AI提供了很好的类比。请首先构思一个具体的智能体的落地应用场景,然后说明场景中的:
+5. 卡尼曼的"系统 1"(快速直觉)和"系统 2"(慢速推理)理论<sup>[2]</sup>为神经符号主义 AI 提供了很好的类比。请首先构思一个具体的智能体的落地应用场景,然后说明场景中的:
 
    > <strong>提示</strong>:医疗诊断助手、法律咨询机器人、金融风控系统等都是常见的应用场景
 
-   - 哪些任务应该由"系统1"处理?
-   - 哪些任务应该由"系统2"处理?
+   - 哪些任务应该由"系统 1"处理?
+   - 哪些任务应该由"系统 2"处理?
    - 这两个系统如何协同工作以达成最终目标?
 
 6. 尽管大语言模型驱动的智能体系统展现出了强大的能力,但它们仍然存在诸多局限。请分析以下问题:
    - 为什么智能体或智能体系统有时会产生"幻觉"(生成看似合理但实际错误的信息)?
-   - 在1.3节的案例中,我们设置了最大循环次数为5次。如果没有这个限制,智能体可能会陷入什么问题?
+   - 在 1.3 节的案例中,我们设置了最大循环次数为 5 次。如果没有这个限制,智能体可能会陷入什么问题?
    - 如何评估一个智能体的"智能"程度?仅使用准确率指标是否足够?
 
 

+ 2444 - 0
docs/chapter10/Chapter10-Agent-Communication-Protocols.md

@@ -0,0 +1,2444 @@
+<div align="right">
+  English | <a href="./第十章%20智能体通信协议.md">中文</a>
+</div>
+
+# Chapter 10: Agent Communication Protocols
+
+In previous chapters, we built fully functional standalone agents with reasoning, tool invocation, and memory capabilities. However, when attempting to build more complex AI systems, natural questions arise: **How can agents efficiently interact with the external world? How can multiple agents collaborate with each other?**
+
+This is precisely the core problem that agent communication protocols aim to solve. This chapter will introduce three communication protocols to the HelloAgents framework: **MCP (Model Context Protocol)** for standardized communication between agents and tools, **A2A (Agent-to-Agent Protocol)** for peer-to-peer collaboration between agents, and **ANP (Agent Network Protocol)** for building large-scale agent networks. These three protocols together form the infrastructure layer for agent communication.
+
+Through this chapter's learning, you will master the design philosophy and practical skills of agent communication protocols, understand the design differences between three mainstream protocols, and learn how to choose appropriate protocols to solve practical problems.
+
+## 10.1 Agent Communication Protocol Fundamentals
+
+### 10.1.1 Why Communication Protocols Are Needed
+
+Recall the ReAct agent we built in Chapter 7, which already possesses powerful reasoning and tool invocation capabilities. Let's look at a typical usage scenario:
+
+```python
+from hello_agents import ReActAgent, HelloAgentsLLM
+from hello_agents.tools import CalculatorTool, SearchTool
+
+llm = HelloAgentsLLM()
+agent = ReActAgent(name="AI Assistant", llm=llm)
+agent.add_tool(CalculatorTool())
+agent.add_tool(SearchTool())
+
+# Agent can complete tasks independently
+response = agent.run("Search for the latest AI news and calculate the total market value of related companies")
+```
+
+This agent works well, but it faces three fundamental limitations. First is the **tool integration dilemma**: Whenever we need to access a new external service (such as GitHub API, database, file system), we must write a specialized Tool class. This is not only labor-intensive, but tools written by different developers cannot be compatible with each other. Second is the **capability expansion bottleneck**: The agent's capabilities are limited to the predefined tool set and cannot dynamically discover and use new services. Finally is the **lack of collaboration**: When tasks are complex enough to require multiple specialized agents to collaborate (such as researcher + writer + editor), we can only coordinate their work through manual orchestration.
+
+Let's understand these limitations through a more specific example. Suppose you want to build an intelligent research assistant that needs to:
+
+```python
+# Traditional approach: Manually integrate each service
+class GitHubTool(BaseTool):
+    """Need to manually write GitHub API adapter"""
+    def run(self, repo_url):
+        # Lots of API calling code...
+        pass
+
+class DatabaseTool(BaseTool):
+    """Need to manually write database adapter"""
+    def run(self, query):
+        # Database connection and query code...
+        pass
+
+class WeatherTool(BaseTool):
+    """Need to manually write weather API adapter"""
+    def run(self, location):
+        # Weather API calling code...
+        pass
+
+# Each new service requires repeating this process
+agent.add_tool(GitHubTool())
+agent.add_tool(DatabaseTool())
+agent.add_tool(WeatherTool())
+```
+
+This approach has obvious problems: code duplication (each tool must handle HTTP requests, error handling, authentication, etc.), difficult to maintain (API changes require modifying all related tools), cannot be reused (tools from other developers cannot be directly used), poor scalability (adding new services requires extensive coding work).
+
+The **core value of communication protocols** is precisely to solve these problems. It provides a set of standardized interface specifications that allow agents to access various external services in a unified way without needing to write specialized adapters for each service. This is like the Internet's TCP/IP protocol, which allows different devices to communicate with each other without needing to write specialized communication code for each type of device.
+
+With communication protocols, the above code can be simplified to:
+
+```python
+from hello_agents.tools import MCPTool
+
+# Connect to MCP server, automatically obtain all tools
+mcp_tool = MCPTool()  # Built-in server provides basic tools
+
+# Or connect to professional MCP servers
+github_mcp = MCPTool(server_command=["npx", "-y", "@modelcontextprotocol/server-github"])
+database_mcp = MCPTool(server_command=["python", "database_mcp_server.py"])
+
+# Agent automatically obtains all capabilities without manually writing adapters
+agent.add_tool(mcp_tool)
+agent.add_tool(github_mcp)
+agent.add_tool(database_mcp)
+```
+
+The changes brought by communication protocols are fundamental: **Standardized interfaces** allow different services to provide unified access methods, **interoperability** enables seamless integration of tools from different developers, **dynamic discovery** allows agents to discover new services and capabilities at runtime, and **scalability** enables systems to easily add new functional modules.
+
+### 10.1.2 Comparison of Three Protocol Design Philosophies
+
+Agent communication protocols are not a single solution, but a series of standards designed for different communication scenarios. This chapter uses the three currently mainstream protocols MCP, A2A, and ANP as examples for practice. Below is an overview comparison.
+
+**(1) MCP: Bridge Between Agents and Tools**
+
+MCP (Model Context Protocol) was proposed by the Anthropic team<sup>[1]</sup>, and its core design philosophy is to **standardize the communication method between agents and external tools/resources**. Imagine that your agent needs to access various services such as file systems, databases, GitHub, Slack, etc. The traditional approach is to write specialized adapters for each service, which is not only labor-intensive but also difficult to maintain. MCP defines a unified protocol specification that allows all services to be accessed in the same way.
+
+MCP's design philosophy is "context sharing". It is not just an RPC (Remote Procedure Call) protocol, but more importantly, it allows agents and tools to share rich contextual information. As shown in Figure 10.1, when an agent accesses a code repository, the MCP server can not only provide file content but also provide contextual information such as code structure, dependency relationships, and commit history, enabling the agent to make more intelligent decisions.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-1.png" alt="" width="85%"/>
+  <p>Figure 10.1 MCP Design Philosophy</p>
+</div>
+
+**(2) A2A: Dialogue Between Agents**
+
+The A2A (Agent-to-Agent Protocol) protocol was proposed by the Google team<sup>2</sup>, and its core design philosophy is to **implement peer-to-peer communication between agents**. Unlike MCP, which focuses on communication between agents and tools, A2A focuses on how agents collaborate with each other. This design allows agents to engage in dialogue, negotiation, and collaboration like human teams.
+
+A2A's design philosophy is "peer-to-peer communication". As shown in Figure 10.2, in an A2A network, each agent is both a service provider and a service consumer. Agents can actively initiate requests and also respond to requests from other agents. This peer-to-peer design avoids the bottleneck of centralized coordinators, making the agent network more flexible and scalable.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-2.png" alt="" width="85%"/>
+  <p>Figure 10.2 A2A Design Philosophy</p>
+</div>
+
+**(3) ANP: Infrastructure for Agent Networks**
+
+ANP (Agent Network Protocol) is a conceptual protocol framework<sup>3</sup>, currently maintained by the open-source community and not yet having a mature ecosystem. Its core design philosophy is to **build infrastructure for large-scale agent networks**. If MCP solves "how to access tools" and A2A solves "how to dialogue with other agents", then ANP solves "how to discover and connect agents in large-scale networks".
+
+ANP's design philosophy is "decentralized service discovery". In a network containing hundreds or thousands of agents, how can agents find the services they need? As shown in Figure 10.3, ANP provides service registration, discovery, and routing mechanisms, allowing agents to dynamically discover other services in the network without needing to pre-configure all connection relationships.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-3.png" alt="" width="85%"/>
+  <p>Figure 10.3 ANP Design Philosophy</p>
+</div>
+
+Finally, in Table 10.1, let's use a comparison table to more clearly understand the differences between these three protocols:
+
+<div align="center">
+  <p>Table 10.1 Comparison of Three Protocols</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-1.png" alt="" width="85%"/>
+</div>
+
+**(4) How to Choose the Right Protocol?**
+
+Current protocols are still in early development stages. MCP's ecosystem is relatively mature, although the timeliness of various tools depends on maintainers. It is more recommended to choose MCP tools backed by large companies.
+
+The key to choosing a protocol lies in understanding your needs:
+
+- If your agent needs to access external services (files, databases, APIs), choose **MCP**
+- If you need multiple agents to collaborate on tasks, choose **A2A**
+- If you want to build a large-scale agent ecosystem, consider **ANP**
+
+### 10.1.3 HelloAgents Communication Protocol Architecture Design
+
+After understanding the design philosophies of the three protocols, let's see how to implement and use them in the HelloAgents framework. Our design goal is: **Enable learners to use these protocols in the simplest way while maintaining sufficient flexibility to handle complex scenarios**.
+
+As shown in Figure 10.4, the HelloAgents communication protocol architecture adopts a three-layer design, from bottom to top: protocol implementation layer, tool encapsulation layer, and agent integration layer.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-4.png" alt="" width="85%"/>
+  <p>Figure 10.4 HelloAgents Communication Protocol Design</p>
+</div>
+
+**(1) Protocol Implementation Layer**: This layer contains the specific implementations of the three protocols. MCP is implemented based on the FastMCP library, providing client and server functionality; A2A is implemented based on Google's official a2a-sdk; ANP is our self-developed lightweight implementation, providing service discovery and network management functions. Of course, there is currently also an official [implementation](https://github.com/agent-network-protocol/AgentConnect), but considering future iterations, we only simulate the concept here.
+
+**(2) Tool Encapsulation Layer**: This layer encapsulates protocol implementations into a unified Tool interface. MCPTool, A2ATool, and ANPTool all inherit from BaseTool, providing a consistent `run()` method. This design allows agents to use different protocols in the same way.
+
+**(3) Agent Integration Layer**: This layer is the integration point between agents and protocols. All agents (ReActAgent, SimpleAgent, etc.) use protocol tools through the Tool System without needing to care about underlying protocol details.
+
+### 10.1.4 Learning Objectives and Quick Experience for This Chapter
+
+Let's first look at the learning content for Chapter 10:
+
+```
+hello_agents/
+├── protocols/                          # Communication protocol module
+│   ├── mcp/                            # MCP protocol implementation (Model Context Protocol)
+│   │   ├── client.py                   # MCP client (supports 5 transport methods)
+│   │   ├── server.py                   # MCP server (FastMCP wrapper)
+│   │   └── utils.py                    # Utility functions (create_context/parse_context)
+│   ├── a2a/                            # A2A protocol implementation (Agent-to-Agent Protocol)
+│   │   └── implementation.py           # A2A server/client (based on a2a-sdk, optional dependency)
+│   └── anp/                            # ANP protocol implementation (Agent Network Protocol)
+│       └── implementation.py           # ANP service discovery/registration (conceptual implementation)
+└── tools/builtin/                      # Built-in tools module
+    └── protocol_tools.py               # Protocol tool wrappers (MCPTool/A2ATool/ANPTool)
+```
+
+For this chapter's content, the focus is mainly on application, and the learning objective is to have the ability to apply protocols in your own projects. Also, since protocols are currently in early development stages, there's no need to spend too much effort reinventing the wheel. Before starting practical work, let's prepare the development environment:
+
+```bash
+# Install HelloAgents framework (Chapter 10 version)
+pip install "hello-agents[protocol]==0.2.2"
+
+# Install NodeJS, refer to documentation in Additional-Chapter
+```
+
+Let's experience the basic functionality of the three protocols with the simplest code:
+
+```python
+from hello_agents.tools import MCPTool, A2ATool, ANPTool
+
+# 1. MCP: Access tools
+mcp_tool = MCPTool()
+result = mcp_tool.run({
+    "action": "call_tool",
+    "tool_name": "add",
+    "arguments": {"a": 10, "b": 20}
+})
+print(f"MCP calculation result: {result}")  # Output: 30.0
+
+# 2. ANP: Service discovery
+anp_tool = ANPTool()
+anp_tool.run({
+    "action": "register_service",
+    "service_id": "calculator",
+    "service_type": "math",
+    "endpoint": "http://localhost:8080"
+})
+services = anp_tool.run({"action": "discover_services"})
+print(f"Discovered services: {services}")
+
+# 3. A2A: Agent communication
+a2a_tool = A2ATool("http://localhost:5000")
+print("A2A tool created successfully")
+```
+
+This simple example demonstrates the core functionality of the three protocols. In the following sections, we will deeply learn the detailed usage and best practices of each protocol.
+
+
+## 10.2 MCP Protocol in Practice
+
+Now, let's dive into MCP and master how to enable agents to access external tools and resources.
+
+### 10.2.1 MCP Protocol Concept Introduction
+
+**(1) MCP: The "USB-C" for Agents**
+
+Imagine that your agent might need to do many things simultaneously, such as:
+- Read documents from the local file system
+- Query PostgreSQL databases
+- Search code on GitHub
+- Send Slack messages
+- Access Google Drive
+
+Traditionally, you would need to write adapter code for each service, handling different APIs, authentication methods, error handling, etc. This is not only labor-intensive but also difficult to maintain. More importantly, different LLM platforms have vastly different function call implementations, requiring extensive code rewrites when switching models.
+
+MCP's emergence changed all this. Just as USB-C unified the connection methods for various devices, **MCP unified the interaction methods between agents and external tools**. Whether you use Claude, GPT, or other models, as long as they support the MCP protocol, they can seamlessly access the same tools and resources.
+
+**(2) MCP Architecture**
+
+The MCP protocol adopts a three-layer architecture design of Host, Client, and Servers. Let's understand how these components work together through the scenario in Figure 10.5.
+
+Suppose you are using Claude Desktop and asking: "What documents are on my desktop?"
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-5.png" alt="" width="85%"/>
+  <p>Figure 10.5 MCP Case Demonstration</p>
+</div>
+
+**Responsibilities of the Three-Layer Architecture:**
+
+1. **Host (Host Layer)**: Claude Desktop acts as the Host, responsible for receiving user questions and interacting with the Claude model. The Host is the interface users directly interact with, managing the entire conversation flow.
+
+2. **Client (Client Layer)**: When the Claude model decides it needs to access the file system, the MCP Client built into the Host is activated. The Client is responsible for establishing connections with the appropriate MCP Server, sending requests, and receiving responses.
+
+3. **Server (Server Layer)**: The file system MCP Server is called, executes the actual file scanning operation, accesses the desktop directory, and returns the list of found documents.
+
+**Complete Interaction Flow:** User question → Claude Desktop (Host) → Claude model analysis → Needs file information → MCP Client connection → File system MCP Server → Execute operation → Return result → Claude generates answer → Display on Claude Desktop
+
+The advantage of this architectural design lies in **separation of concerns**: The Host focuses on user experience, the Client focuses on protocol communication, and the Server focuses on specific functionality implementation. Developers only need to focus on developing the corresponding MCP Server without caring about the implementation details of the Host and Client.
+
+**(3) Core Capabilities of MCP**
+
+As shown in Table 10.2, the MCP protocol provides three core capabilities, forming a complete tool access framework:
+
+<div align="center">
+  <p>Table 10.2 MCP Core Capabilities</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-2.png" alt="" width="85%"/>
+</div>
+
+The difference between these three capabilities is: **Tools are active** (execute operations), **Resources are passive** (provide data), **Prompts are instructive** (provide templates).
+
+**(4) MCP Workflow**
+
+Let's understand the complete workflow of MCP through a specific example, as shown in Figure 10.6:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-6.png" alt="" width="85%"/>
+  <p>Figure 10.6 MCP Case Demonstration</p>
+</div>
+
+A key question is: **How does Claude (or other LLMs) decide which tools to use?**
+
+When a user asks a question, the complete tool selection process is as follows:
+
+1. **Tool Discovery Phase**: After the MCP Client connects to the Server, it first calls `list_tools()` to obtain description information for all available tools (including tool name, function description, parameter definition)
+
+2. **Context Building**: The Client converts the tool list into a format the LLM can understand and adds it to the system prompt. For example:
+   ```
+   You can use the following tools:
+   - read_file(path: str): Read the content of the file at the specified path
+   - search_code(query: str, language: str): Search in the codebase
+   ```
+
+3. **Model Reasoning**: The LLM analyzes the user's question and available tools, deciding whether to call tools and which tool to call. This decision is based on the tool descriptions and current conversation context
+
+4. **Tool Execution**: If the LLM decides to use a tool, the Client executes the selected tool through the MCP Server and obtains the result
+
+5. **Result Integration**: The tool execution result is sent back to the LLM, which combines the result to generate the final answer
+
+This process is **fully automated**, and the LLM will decide whether to use and how to use tools based on the quality of tool descriptions. Therefore, writing clear and accurate tool descriptions is crucial.
+
+**(5) Differences Between MCP and Function Calling**
+
+Many developers ask: **I'm already using Function Calling, why do I still need MCP?** Let's understand their differences through Table 10.3.
+
+<div align="center">
+  <p>Table 10.3 Function Calling vs MCP Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-3.png" alt="" width="85%"/>
+</div>
+
+Here we use the example of an agent needing to access GitHub repositories and the local file system to compare two implementations of the same task in detail.
+
+**Method 1: Using Function Calling**
+
+```python
+# Step 1: Define functions for each LLM provider
+# OpenAI format
+openai_tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "search_github",
+            "description": "Search GitHub repositories",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string", "description": "Search keywords"}
+                },
+                "required": ["query"]
+            }
+        }
+    }
+]
+
+# Claude format
+claude_tools = [
+    {
+        "name": "search_github",
+        "description": "Search GitHub repositories",
+        "input_schema": {  # Note: not parameters
+            "type": "object",
+            "properties": {
+                "query": {"type": "string", "description": "Search keywords"}
+            },
+            "required": ["query"]
+        }
+    }
+]
+
+# Step 2: Implement tool functions yourself
+def search_github(query):
+    import requests
+    response = requests.get(
+        "https://api.github.com/search/repositories",
+        params={"q": query}
+    )
+    return response.json()
+
+# Step 3: Handle different model response formats
+# OpenAI response
+if response.choices[0].message.tool_calls:
+    tool_call = response.choices[0].message.tool_calls[0]
+    result = search_github(**json.loads(tool_call.function.arguments))
+
+# Claude response
+if response.content[0].type == "tool_use":
+    tool_use = response.content[0]
+    result = search_github(**tool_use.input)
+```
+
+**Method 2: Using MCP**
+
+```python
+from hello_agents.protocols import MCPClient
+
+# Step 1: Connect to community-provided MCP server (no need to implement yourself)
+github_client = MCPClient([
+    "npx", "-y", "@modelcontextprotocol/server-github"
+])
+
+fs_client = MCPClient([
+    "npx", "-y", "@modelcontextprotocol/server-filesystem", "."
+])
+
+# Step 2: Unified calling method (model-independent)
+async with github_client:
+    # Automatically discover tools
+    tools = await github_client.list_tools()
+
+    # Call tool (standardized interface)
+    result = await github_client.call_tool(
+        "search_repositories",
+        {"query": "AI agents"}
+    )
+
+# Step 3: Any model supporting MCP can use it
+# OpenAI, Claude, Llama, etc. all use the same MCP client
+```
+
+First, it needs to be clarified that Function Calling and MCP are not in competition, but rather complementary. Function Calling is a core capability of large language models, reflecting the model's inherent intelligence, enabling the model to understand when to call functions and precisely generate corresponding call parameters. In contrast, MCP plays the role of an infrastructure protocol, solving the engineering problem of how tools connect with models at the engineering level, describing and calling tools in a standardized way.
+
+We can use a simple analogy to understand: Function Calling is equivalent to learning the skill of "how to make a phone call", including when to dial, how to communicate with the other party, and when to hang up. MCP, on the other hand, is that globally unified "telephone communication standard" that ensures any phone can successfully dial another.
+
+After understanding their complementary relationship, let's next see how to use the MCP protocol in HelloAgents.
+
+### 10.2.2 Using MCP Client
+
+HelloAgents implements complete MCP client functionality based on FastMCP 2.0. We provide both asynchronous and synchronous APIs to suit different usage scenarios. For most applications, the asynchronous API is recommended as it better handles concurrent requests and long-running operations. Below we will provide a step-by-step operation demonstration.
+
+**(1) Connecting to MCP Server**
+
+The MCP client supports multiple connection methods, with the most common being Stdio mode (communicating with local processes through standard input/output):
+
+```python
+import asyncio
+from hello_agents.protocols import MCPClient
+
+async def connect_to_server():
+    # Method 1: Connect to community-provided file system server
+    # npx will automatically download and run the @modelcontextprotocol/server-filesystem package
+    client = MCPClient([
+        "npx", "-y",
+        "@modelcontextprotocol/server-filesystem",
+        "."  # Specify root directory
+    ])
+
+    # Use async with to ensure connection is properly closed
+    async with client:
+        # Use client here
+        tools = await client.list_tools()
+        print(f"Available tools: {[t['name'] for t in tools]}")
+
+    # Method 2: Connect to custom Python MCP server
+    client = MCPClient(["python", "my_mcp_server.py"])
+    async with client:
+        # Use client...
+        pass
+
+# Run async function
+asyncio.run(connect_to_server())
+```
+
+**(2) Discovering Available Tools**
+
+After successful connection, the first step is usually to query what tools the server provides:
+
+```python
+async def discover_tools():
+    client = MCPClient(["npx", "-y", "@modelcontextprotocol/server-filesystem", "."])
+
+    async with client:
+        # Get all available tools
+        tools = await client.list_tools()
+
+        print(f"Server provides {len(tools)} tools:")
+        for tool in tools:
+            print(f"\nTool name: {tool['name']}")
+            print(f"Description: {tool.get('description', 'No description')}")
+
+            # Print parameter information
+            if 'inputSchema' in tool:
+                schema = tool['inputSchema']
+                if 'properties' in schema:
+                    print("Parameters:")
+                    for param_name, param_info in schema['properties'].items():
+                        param_type = param_info.get('type', 'any')
+                        param_desc = param_info.get('description', '')
+                        print(f"  - {param_name} ({param_type}): {param_desc}")
+
+asyncio.run(discover_tools())
+
+# Output example:
+# Server provides 5 tools:
+#
+# Tool name: read_file
+# Description: Read file content
+# Parameters:
+#   - path (string): File path
+#
+# Tool name: write_file
+# Description: Write file content
+# Parameters:
+#   - path (string): File path
+#   - content (string): File content
+```
+
+**(3) Calling Tools**
+
+When calling tools, simply provide the tool name and parameters conforming to JSON Schema:
+
+```python
+async def use_tools():
+    client = MCPClient(["npx", "-y", "@modelcontextprotocol/server-filesystem", "."])
+
+    async with client:
+        # Read file
+        result = await client.call_tool("read_file", {"path": "my_README.md"})
+        print(f"File content:\n{result}")
+
+        # List directory
+        result = await client.call_tool("list_directory", {"path": "."})
+        print(f"Current directory files: {result}")
+
+        # Write file
+        result = await client.call_tool("write_file", {
+            "path": "output.txt",
+            "content": "Hello from MCP!"
+        })
+        print(f"Write result: {result}")
+
+asyncio.run(use_tools())
+```
+
+Here's a safer way to call MCP services for reference:
+
+```python
+async def safe_tool_call():
+    client = MCPClient(["npx", "-y", "@modelcontextprotocol/server-filesystem", "."])
+
+    async with client:
+        try:
+            # Try to read a potentially non-existent file
+            result = await client.call_tool("read_file", {"path": "nonexistent.txt"})
+            print(result)
+        except Exception as e:
+            print(f"Tool call failed: {e}")
+            # Can choose to retry, use default value, or report error to user
+
+asyncio.run(safe_tool_call())
+```
+
+**(4) Accessing Resources**
+
+Besides tools, MCP servers can also provide resources:
+
+```python
+# List available resources
+resources = client.list_resources()
+print(f"Available resources: {[r['uri'] for r in resources]}")
+
+# Read resource
+resource_content = client.read_resource("file:///path/to/resource")
+print(f"Resource content: {resource_content}")
+```
+
+**(5) Using Prompt Templates**
+
+MCP servers can provide predefined prompt templates:
+
+```python
+# List available prompts
+prompts = client.list_prompts()
+print(f"Available prompts: {[p['name'] for p in prompts]}")
+
+# Get prompt content
+prompt = client.get_prompt("code_review", {"language": "python"})
+print(f"Prompt content: {prompt}")
+```
+
+**(6) Complete Example: Using GitHub MCP Service**
+
+Let's see how to use the community-provided GitHub MCP service through a complete example, using the encapsulated MCP Tools:
+
+```python
+"""
+GitHub MCP Service Example
+
+Note: Need to set environment variable
+    Windows: $env:GITHUB_PERSONAL_ACCESS_TOKEN="your_token_here"
+    Linux/macOS: export GITHUB_PERSONAL_ACCESS_TOKEN="your_token_here"
+"""
+
+from hello_agents.tools import MCPTool
+
+# Create GitHub MCP tool
+github_tool = MCPTool(
+    server_command=["npx", "-y", "@modelcontextprotocol/server-github"]
+)
+
+# 1. List available tools
+print("📋 Available tools:")
+result = github_tool.run({"action": "list_tools"})
+print(result)
+
+# 2. Search repositories
+print("\n🔍 Search repositories:")
+result = github_tool.run({
+    "action": "call_tool",
+    "tool_name": "search_repositories",
+    "arguments": {
+        "query": "AI agents language:python",
+        "page": 1,
+        "perPage": 3
+    }
+})
+print(result)
+
+```
+
+### 10.2.3 MCP Transport Methods Explained
+
+An important feature of the MCP protocol is **transport agnosticism**. This means the MCP protocol itself does not depend on specific transport methods and can run on different communication channels. HelloAgents, based on FastMCP 2.0, provides complete transport method support, allowing you to choose the most appropriate transport mode based on actual scenarios.
+
+**(1) Transport Methods Overview**
+
+HelloAgents' `MCPClient` supports five transport methods, each with different use cases, as shown in Table 10.4:
+
+<div align="center">
+  <p>Table 10.4 MCP Transport Methods Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-4.png" alt="" width="85%"/>
+</div>
+
+**(2) Transport Method Usage Examples**
+
+```python
+from hello_agents.tools import MCPTool
+
+# 1. Memory Transport - Memory transport (for testing)
+# No parameters specified, uses built-in demo server
+mcp_tool = MCPTool()
+
+# 2. Stdio Transport - Standard input/output transport (local development)
+# Use command list to start local server
+mcp_tool = MCPTool(server_command=["python", "examples/mcp_example_server.py"])
+
+# 3. Stdio Transport with Args - Command transport with parameters
+# Can pass additional parameters
+mcp_tool = MCPTool(server_command=["python", "examples/mcp_example_server.py", "--debug"])
+
+# 4. Stdio Transport - Community server (npx method)
+# Use npx to start community MCP server
+mcp_tool = MCPTool(server_command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "."])
+
+# 5. HTTP/SSE/StreamableHTTP Transport
+# Note: MCPTool is mainly for Stdio and Memory transport
+# For HTTP/SSE and other remote transports, recommend using MCPClient directly
+```
+
+**(3) Memory Transport**
+
+Use case: Unit testing, rapid prototyping
+
+```python
+from hello_agents.tools import MCPTool
+
+# Use built-in demo server (Memory transport)
+mcp_tool = MCPTool()
+
+# List available tools
+result = mcp_tool.run({"action": "list_tools"})
+print(result)
+
+# Call tool
+result = mcp_tool.run({
+    "action": "call_tool",
+    "tool_name": "add",
+    "arguments": {"a": 10, "b": 20}
+})
+print(result)
+```
+
+**(4) Stdio Transport - Standard Input/Output Transport**
+
+Use case: Local development, debugging, Python script servers
+
+```python
+from hello_agents.tools import MCPTool
+
+# Method 1: Use custom Python server
+mcp_tool = MCPTool(server_command=["python", "my_mcp_server.py"])
+
+# Method 2: Use community server (file system)
+mcp_tool = MCPTool(server_command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "."])
+
+# List tools
+result = mcp_tool.run({"action": "list_tools"})
+print(result)
+
+# Call tool
+result = mcp_tool.run({
+    "action": "call_tool",
+    "tool_name": "read_file",
+    "arguments": {"path": "README.md"}
+})
+print(result)
+```
+
+**(5) HTTP Transport**
+
+Use case: Production environment, remote services, microservice architecture
+
+```python
+# Note: MCPTool is mainly for Stdio and Memory transport
+# For HTTP/SSE and other remote transports, recommend using underlying MCPClient
+
+import asyncio
+from hello_agents.protocols import MCPClient
+
+async def test_http_transport():
+    # Connect to remote HTTP MCP server
+    client = MCPClient("http://api.example.com/mcp")
+
+    async with client:
+        # Get server information
+        tools = await client.list_tools()
+        print(f"Remote server tools: {len(tools)} tools")
+
+        # Call remote tool
+        result = await client.call_tool("process_data", {
+            "data": "Hello, World!",
+            "operation": "uppercase"
+        })
+        print(f"Remote processing result: {result}")
+
+# Note: Requires actual HTTP MCP server
+# asyncio.run(test_http_transport())
+```
+
+**(6) SSE Transport - Server-Sent Events Transport**
+
+Use case: Real-time communication, streaming processing, long connections
+
+```python
+# Note: MCPTool is mainly for Stdio and Memory transport
+# For SSE transport, recommend using underlying MCPClient
+
+import asyncio
+from hello_agents.protocols import MCPClient
+
+async def test_sse_transport():
+    # Connect to SSE MCP server
+    client = MCPClient(
+        "http://localhost:8080/sse",
+        transport_type="sse"
+    )
+
+    async with client:
+        # SSE is especially suitable for streaming processing
+        result = await client.call_tool("stream_process", {
+            "input": "Large data processing request",
+            "stream": True
+        })
+        print(f"Streaming processing result: {result}")
+
+# Note: Requires MCP server supporting SSE
+# asyncio.run(test_sse_transport())
+```
+
+**(7) StreamableHTTP Transport - Streaming HTTP Transport**
+
+Use case: HTTP scenarios requiring bidirectional streaming communication
+
+```python
+# Note: MCPTool is mainly for Stdio and Memory transport
+# For StreamableHTTP transport, recommend using underlying MCPClient
+
+import asyncio
+from hello_agents.protocols import MCPClient
+
+async def test_streamable_http_transport():
+    # Connect to StreamableHTTP MCP server
+    client = MCPClient(
+        "http://localhost:8080/mcp",
+        transport_type="streamable_http"
+    )
+
+    async with client:
+        # Supports bidirectional streaming communication
+        tools = await client.list_tools()
+        print(f"StreamableHTTP server tools: {len(tools)} tools")
+
+# Note: Requires MCP server supporting StreamableHTTP
+# asyncio.run(test_streamable_http_transport())
+```
+
+### 10.2.4 Using MCP Tools in Agents
+
+Previously, we learned how to use the MCP client directly. But in practical applications, we prefer to have agents **automatically** call MCP tools rather than manually writing calling code. HelloAgents provides the `MCPTool` wrapper, allowing MCP servers to seamlessly integrate into the agent's tool chain.
+
+**(1) Automatic Expansion Mechanism of MCP Tools**
+
+HelloAgents' `MCPTool` has a feature: **automatic expansion**. When you add an MCP tool to an Agent, it automatically expands all tools provided by the MCP server into independent tools, allowing the Agent to call them like ordinary tools.
+
+**Method 1: Using Built-in Demo Server**
+
+We previously implemented calculator tool functions, and here we convert them into MCP services. This is the simplest usage method.
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import MCPTool
+
+agent = SimpleAgent(name="Assistant", llm=HelloAgentsLLM())
+
+# No configuration needed, automatically uses built-in demo server
+mcp_tool = MCPTool(name="calculator")
+agent.add_tool(mcp_tool)
+# ✅ MCP tool 'calculator' expanded into 6 independent tools
+
+# Agent can directly use expanded tools
+response = agent.run("Calculate 25 times 16")
+print(response)  # Output: The result of 25 times 16 is 400
+```
+
+**Tools after automatic expansion**:
+
+- `calculator_add` - Addition calculator
+- `calculator_subtract` - Subtraction calculator
+- `calculator_multiply` - Multiplication calculator
+- `calculator_divide` - Division calculator
+- `calculator_greet` - Friendly greeting
+- `calculator_get_system_info` - Get system information
+
+When the Agent calls, it only needs to provide parameters, for example: `[TOOL_CALL:calculator_multiply:a=25,b=16]`, and the system will automatically handle type conversion and MCP calls.
+
+**Method 2: Connecting to External MCP Servers**
+
+In actual projects, you need to connect to more powerful MCP servers. These servers can be:
+- **Community-provided official servers** (such as file system, GitHub, database, etc.)
+- **Custom servers you write yourself** (encapsulating business logic)
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import MCPTool
+
+agent = SimpleAgent(name="File Assistant", llm=HelloAgentsLLM())
+
+# Example 1: Connect to community-provided file system server
+fs_tool = MCPTool(
+    name="filesystem",  # Specify unique name
+    description="Access local file system",
+    server_command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "."]
+)
+agent.add_tool(fs_tool)
+
+# Example 2: Connect to custom Python MCP server
+# For how to write custom MCP servers, refer to Section 10.5
+custom_tool = MCPTool(
+    name="custom_server",  # Use different name
+    description="Custom business logic server",
+    server_command=["python", "my_mcp_server.py"]
+)
+agent.add_tool(custom_tool)
+
+# Agent can now automatically use these tools!
+response = agent.run("Please read the my_README.md file and summarize its main content")
+print(response)
+```
+
+When using multiple MCP servers, be sure to specify a different name for each MCPTool. This name will be added as a prefix to the expanded tool names to avoid conflicts. For example: `name="fs"` will expand to `fs_read_file`, `fs_write_file`, etc. If you need to write your own MCP server to encapsulate specific business logic, refer to Section 10.5.
+
+**(2) How MCP Tool Automatic Expansion Works**
+
+Understanding the automatic expansion mechanism helps you better use MCP tools. Let's dive into how it works:
+
+```python
+# User code
+fs_tool = MCPTool(name="fs", server_command=[...])
+agent.add_tool(fs_tool)
+
+# What happens internally:
+# 1. MCPTool connects to server, discovers 14 tools
+# 2. Creates wrapper for each tool:
+#    - fs_read_text_file (parameters: path, tail, head)
+#    - fs_write_file (parameters: path, content)
+#    - ...
+# 3. Registers to Agent's tool registry
+
+# Agent call
+response = agent.run("Read README.md")
+
+# Inside Agent:
+# 1. Identifies need to call fs_read_text_file
+# 2. Generates parameters: path=README.md
+# 3. Wrapper converts to MCP format:
+#    {"action": "call_tool", "tool_name": "read_text_file", "arguments": {"path": "README.md"}}
+# 4. Calls MCP server
+# 5. Returns file content
+```
+
+The system automatically converts types based on tool parameter definitions:
+
+```python
+# Agent calls calculator
+agent.run("Calculate 25 times 16")
+
+# Agent generates: a=25,b=16 (string)
+# System automatically converts to: {"a": 25.0, "b": 16.0} (number)
+# MCP server receives correct number type
+```
+
+**(3) Practical Case: Intelligent Document Assistant**
+
+Let's build a complete intelligent document assistant. Here we demonstrate with a simple multi-agent orchestration:
+
+```python
+"""
+Multi-Agent Collaborative Intelligent Document Assistant
+
+Uses two SimpleAgents for division of labor:
+- Agent1: GitHub search expert
+- Agent2: Document generation expert
+"""
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import MCPTool
+from dotenv import load_dotenv
+
+# Load environment variables from .env file
+load_dotenv(dotenv_path="../HelloAgents/.env")
+
+print("="*70)
+print("Multi-Agent Collaborative Intelligent Document Assistant")
+print("="*70)
+
+# ============================================================
+# Agent 1: GitHub Search Expert
+# ============================================================
+print("\n[Step 1] Creating GitHub search expert...")
+
+github_searcher = SimpleAgent(
+    name="GitHub Search Expert",
+    llm=HelloAgentsLLM(),
+    system_prompt="""You are a GitHub search expert.
+Your task is to search GitHub repositories and return results.
+Please return clear, structured search results, including:
+- Repository name
+- Brief description
+
+Keep it concise, don't add extra explanations."""
+)
+
+# Add GitHub tool
+github_tool = MCPTool(
+    name="gh",
+    server_command=["npx", "-y", "@modelcontextprotocol/server-github"]
+)
+github_searcher.add_tool(github_tool)
+
+# ============================================================
+# Agent 2: Document Generation Expert
+# ============================================================
+print("\n[Step 2] Creating document generation expert...")
+
+document_writer = SimpleAgent(
+    name="Document Generation Expert",
+    llm=HelloAgentsLLM(),
+    system_prompt="""You are a document generation expert.
+Your task is to generate structured Markdown reports based on provided information.
+
+The report should include:
+- Title
+- Introduction
+- Main content (listed in points, including project names, descriptions, etc.)
+- Summary
+
+Please output the complete Markdown format report content directly, do not use tools to save."""
+)
+
+# Add file system tool
+fs_tool = MCPTool(
+    name="fs",
+    server_command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "."]
+)
+document_writer.add_tool(fs_tool)
+
+# ============================================================
+# Execute Task
+# ============================================================
+print("\n" + "="*70)
+print("Starting task execution...")
+print("="*70)
+
+try:
+    # Step 1: GitHub search
+    print("\n[Step 3] Agent1 searching GitHub...")
+    search_task = "Search for GitHub repositories about 'AI agent', return the top 5 most relevant results"
+
+    search_results = github_searcher.run(search_task)
+
+    print("\nSearch results:")
+    print("-" * 70)
+    print(search_results)
+    print("-" * 70)
+
+    # Step 2: Generate report
+    print("\n[Step 4] Agent2 generating report...")
+    report_task = f"""
+Based on the following GitHub search results, generate a Markdown format research report:
+
+{search_results}
+
+Report requirements:
+1. Title: # AI Agent Framework Research Report
+2. Introduction: Explain this is a GitHub project survey about AI Agents
+3. Main findings: List found projects and their features (including names, descriptions, etc.)
+4. Summary: Summarize common characteristics of these projects
+
+Please output the complete Markdown format report directly.
+"""
+
+    report_content = document_writer.run(report_task)
+
+    print("\nReport content:")
+    print("=" * 70)
+    print(report_content)
+    print("=" * 70)
+
+    # Step 3: Save report
+    print("\n[Step 5] Saving report to file...")
+    import os
+    try:
+        with open("report.md", "w", encoding="utf-8") as f:
+            f.write(report_content)
+        print("✅ Report saved to report.md")
+
+        # Verify file
+        file_size = os.path.getsize("report.md")
+        print(f"✅ File size: {file_size} bytes")
+    except Exception as e:
+        print(f"❌ Save failed: {e}")
+
+    print("\n" + "="*70)
+    print("Task completed!")
+    print("="*70)
+
+except Exception as e:
+    print(f"\n❌ Error: {e}")
+    import traceback
+    traceback.print_exc()
+
+```
+
+`github_searcher` will call `gh_search_repositories` during this process to search GitHub projects. The obtained results will be returned to `document_writer` as input, further guiding report generation, and finally saving the report to report.md.
+
+### 10.2.5 MCP Community Ecosystem
+
+A huge advantage of the MCP protocol is its **rich community ecosystem**. Anthropic and community developers have created a large number of ready-made MCP servers, covering various scenarios such as file systems, databases, API services, etc. This means you don't need to write tool adapters from scratch and can directly use these verified servers.
+
+Here are three resource repositories for the MCP community:
+
+1. **Awesome MCP Servers** (https://github.com/punkpeye/awesome-mcp-servers)
+   - Community-maintained curated list of MCP servers
+   - Contains various third-party servers
+   - Categorized by function, easy to find
+
+2. **MCP Servers Website** (https://mcpservers.org/)
+   - Official MCP server directory website
+   - Provides search and filtering functions
+   - Contains usage instructions and examples
+
+3. **Official MCP Servers** (https://github.com/modelcontextprotocol/servers)
+   - Servers officially maintained by Anthropic
+   - Highest quality, most complete documentation
+   - Contains implementations of commonly used services
+
+Tables 10.5 and 10.6 show commonly used official MCP servers and popular community MCP servers:
+
+<div align="center">
+  <p>Table 10.5 Commonly Used Official MCP Servers</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-5.png" alt="" width="85%"/>
+</div>
+
+<div align="center">
+  <p>Table 10.6 Popular Community MCP Servers</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-6.png" alt="" width="85%"/>
+</div>
+
+Here are some particularly interesting case TODOs for reference:
+
+1. **Automated Web Testing (Playwright)**
+
+   ```python
+   # Agent can automatically:
+   # - Open browser to visit website
+   # - Fill forms and submit
+   # - Screenshot to verify results
+   # - Generate test reports
+   playwright_tool = MCPTool(
+       name="playwright",
+       server_command=["npx", "-y", "@playwright/mcp"]
+   )
+   ```
+
+2. **Intelligent Note Assistant (Obsidian + Perplexity)**
+   ```python
+   # Agent can:
+   # - Search latest tech news (Perplexity)
+   # - Organize into structured notes
+   # - Save to Obsidian knowledge base
+   # - Automatically establish links between notes
+   ```
+
+3. **Project Management Automation (Jira + GitHub)**
+   ```python
+   # Agent can:
+   # - Create Jira tasks from GitHub Issues
+   # - Sync code commits to Jira
+   # - Automatically update Sprint progress
+   # - Generate project reports
+   ```
+
+5. **Content Creation Workflow (YouTube + Notion + Spotify)**
+
+   ```python
+   # Agent can:
+   # - Get YouTube video subtitles
+   # - Generate content summaries
+   # - Save to Notion database
+   # - Play background music (Spotify)
+   ```
+
+Through this section's explanation, I hope you can explore more MCP implementation cases, and contributions to HelloAgents are welcome! Next, let's learn about the A2A protocol.
+
+## 10.3 A2A Protocol in Practice
+
+A2A (Agent-to-Agent) is a protocol that supports direct communication and collaboration between agents.
+
+### 10.3.1 Protocol Design Motivation
+
+The MCP protocol solved the interaction between agents and tools, while the A2A protocol solves the collaboration problem between agents. In a task requiring multi-agent (such as researcher, writer, editor) collaboration, they need to communicate, delegate tasks, negotiate capabilities, and synchronize states.
+
+Traditional central coordinator (star topology) solutions have three main problems:
+
+- **Single Point of Failure**: Coordinator failure leads to overall system paralysis.
+- **Performance Bottleneck**: All communication goes through the central node, limiting concurrency.
+- **Difficult to Scale**: Adding or modifying agents requires changing central logic.
+
+The A2A protocol adopts a peer-to-peer (P2P) architecture (mesh topology), allowing agents to communicate directly, fundamentally solving the above problems. Its core is the two abstract concepts of **Task** and **Artifact**, which is its biggest difference from MCP, as shown in Table 10.7.
+
+<div align="center">
+  <p>Table 10.7 A2A Core Concepts</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-7.png" alt="" width="85%"/>
+</div>
+
+To implement management of the collaboration process, A2A defines a standardized lifecycle for tasks, including states such as creation, negotiation, delegation, in-progress, completion, and failure, as shown in Figure 10.7.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-7.png" alt="" width="85%"/>
+  <p>Figure 10.7 A2A Task Lifecycle</p>
+</div>
+
+
+This mechanism enables agents to perform task negotiation, progress tracking, and exception handling.
+
+The A2A request lifecycle is a sequence that details the four main steps a request follows: agent discovery, authentication, send message API, and send message stream API. Figure 10.8 below, borrowed from the official website's flowchart, shows the operational flow, illustrating the interaction between client, A2A server, and authentication server.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-8.png" alt="" width="85%"/>
+  <p>Figure 10.8 A2A Request Lifecycle</p>
+</div>
+
+### 10.3.2 A2A Protocol in Practice
+
+Most existing A2A implementations are `Sample Code`, and even Python implementations are quite cumbersome. Therefore, here we only adopt a method that simulates the protocol's ideas, implementing partial functionality through the A2A-SDK.
+
+**(2) Creating a Simple A2A Agent**
+
+Let's create an A2A agent, again using the calculator case as a demonstration:
+
+```python
+from hello_agents.protocols.a2a.implementation import A2AServer, A2A_AVAILABLE
+
+def create_calculator_agent():
+    """Create a calculator agent"""
+    if not A2A_AVAILABLE:
+        print("❌ A2A SDK not installed, please run: pip install a2a-sdk")
+        return None
+
+    print("🧮 Creating calculator agent")
+
+    # Create A2A server
+    calculator = A2AServer(
+        name="calculator-agent",
+        description="Professional mathematical calculation agent",
+        version="1.0.0",
+        capabilities={
+            "math": ["addition", "subtraction", "multiplication", "division"],
+            "advanced": ["power", "sqrt", "factorial"]
+        }
+    )
+
+    # Add basic calculation skills
+    @calculator.skill("add")
+    def add_numbers(query: str) -> str:
+        """Addition calculation"""
+        try:
+            # Simple parsing of "calculate 5 + 3" format
+            parts = query.replace("calculate", "").replace("plus", "+").replace("add", "+")
+            if "+" in parts:
+                numbers = [float(x.strip()) for x in parts.split("+")]
+                result = sum(numbers)
+                return f"Calculation result: {' + '.join(map(str, numbers))} = {result}"
+            else:
+                return "Please use format: calculate 5 + 3"
+        except Exception as e:
+            return f"Calculation error: {e}"
+
+    @calculator.skill("multiply")
+    def multiply_numbers(query: str) -> str:
+        """Multiplication calculation"""
+        try:
+            parts = query.replace("calculate", "").replace("times", "*").replace("×", "*")
+            if "*" in parts:
+                numbers = [float(x.strip()) for x in parts.split("*")]
+                result = 1
+                for num in numbers:
+                    result *= num
+                return f"Calculation result: {' × '.join(map(str, numbers))} = {result}"
+            else:
+                return "Please use format: calculate 5 * 3"
+        except Exception as e:
+            return f"Calculation error: {e}"
+
+    @calculator.skill("info")
+    def get_info(query: str) -> str:
+        """Get agent information"""
+        return f"I am {calculator.name}, can perform basic mathematical calculations. Supported skills: {list(calculator.skills.keys())}"
+
+    print(f"✅ Calculator agent created successfully, supported skills: {list(calculator.skills.keys())}")
+    return calculator
+
+# Create agent
+calc_agent = create_calculator_agent()
+if calc_agent:
+    # Test skills
+    print("\n🧪 Testing agent skills:")
+    test_queries = [
+        "Get information",
+        "Calculate 10 + 5",
+        "Calculate 6 * 7"
+    ]
+
+    for query in test_queries:
+        if "information" in query.lower():
+            result = calc_agent.skills["info"](query)
+        elif "+" in query:
+            result = calc_agent.skills["add"](query)
+        elif "*" in query or "×" in query:
+            result = calc_agent.skills["multiply"](query)
+        else:
+            result = "Unknown query type"
+
+        print(f"  📝 Query: {query}")
+        print(f"  🤖 Reply: {result}")
+        print()
+```
+
+**(2) Custom A2A Agent**
+
+You can also create your own A2A agent, here's a simple demonstration:
+
+```python
+from hello_agents.protocols.a2a.implementation import A2AServer, A2A_AVAILABLE
+
+def create_custom_agent():
+    """Create custom agent"""
+    if not A2A_AVAILABLE:
+        print("Please install A2A SDK first: pip install a2a-sdk")
+        return None
+
+    # Create agent
+    agent = A2AServer(
+        name="my-custom-agent",
+        description="My custom agent",
+        capabilities={"custom": ["skill1", "skill2"]}
+    )
+
+    # Add skills
+    @agent.skill("greet")
+    def greet_user(name: str) -> str:
+        """Greet user"""
+        return f"Hello, {name}! I am a custom agent."
+
+    @agent.skill("calculate")
+    def simple_calculate(expression: str) -> str:
+        """Simple calculation"""
+        try:
+            # Safe calculation (only supports basic operations)
+            allowed_chars = set('0123456789+-*/(). ')
+            if all(c in allowed_chars for c in expression):
+                result = eval(expression)
+                return f"Calculation result: {expression} = {result}"
+            else:
+                return "Error: Only basic mathematical operations supported"
+        except Exception as e:
+            return f"Calculation error: {e}"
+
+    return agent
+
+# Create and test custom agent
+custom_agent = create_custom_agent()
+if custom_agent:
+    # Test skills
+    print("Testing greeting skill:")
+    result1 = custom_agent.skills["greet"]("Zhang San")
+    print(result1)
+
+    print("\nTesting calculation skill:")
+    result2 = custom_agent.skills["calculate"]("10 + 5 * 2")
+    print(result2)
+```
+
+### 10.3.3 Using HelloAgents A2A Tools
+
+HelloAgents provides a unified A2A tool interface.
+
+**(1) Creating A2A Agent Server**
+
+First, let's create an Agent server:
+
+```python
+from hello_agents.protocols import A2AServer
+import threading
+
+# Create researcher Agent service
+researcher = A2AServer(
+    name="researcher",
+    description="Agent responsible for searching and analyzing materials",
+    version="1.0.0"
+)
+
+# Define skills
+@researcher.skill("research")
+def handle_research(text: str) -> str:
+    """Handle research requests"""
+    import re
+    match = re.search(r'research\s+(.+)', text, re.IGNORECASE)
+    topic = match.group(1).strip() if match else text
+
+    # Actual research logic (simplified here)
+    result = {
+        "topic": topic,
+        "findings": f"Research results about {topic}...",
+        "sources": ["Source 1", "Source 2", "Source 3"]
+    }
+    return str(result)
+
+# Start service in background
+def start_server():
+    researcher.run(host="localhost", port=5000)
+
+if __name__ == "__main__":
+    server_thread = threading.Thread(target=start_server, daemon=True)
+    server_thread.start()
+
+    print("✅ Researcher Agent service started at http://localhost:5000")
+
+    # Keep program running
+    try:
+        while True:
+            time.sleep(1)
+    except KeyboardInterrupt:
+        print("\nService stopped")
+```
+
+**(2) Creating A2A Agent Client**
+
+Now, let's create a client to communicate with the server:
+
+```python
+from hello_agents.protocols import A2AClient
+
+# Create client to connect to researcher Agent
+client = A2AClient("http://localhost:5000")
+
+# Send research request
+response = client.execute_skill("research", "research AI applications in healthcare")
+print(f"Received response: {response.get('result')}")
+
+# Output:
+# Received response: {'topic': 'AI applications in healthcare', 'findings': 'Research results about AI applications in healthcare...', 'sources': ['Source 1', 'Source 2', 'Source 3']}
+```
+
+**(3) Creating Agent Network**
+
+For collaboration among multiple Agents, we can connect multiple Agents to each other:
+
+```python
+from hello_agents.protocols import A2AServer, A2AClient
+import threading
+import time
+
+# 1. Create multiple Agent services
+researcher = A2AServer(
+    name="researcher",
+    description="Researcher"
+)
+
+@researcher.skill("research")
+def do_research(text: str) -> str:
+    import re
+    match = re.search(r'research\s+(.+)', text, re.IGNORECASE)
+    topic = match.group(1).strip() if match else text
+    return str({"topic": topic, "findings": f"Research results for {topic}"})
+
+writer = A2AServer(
+    name="writer",
+    description="Writer"
+)
+
+@writer.skill("write")
+def write_article(text: str) -> str:
+    import re
+    match = re.search(r'write\s+(.+)', text, re.IGNORECASE)
+    content = match.group(1).strip() if match else text
+
+    # Try to parse research data
+    try:
+        data = eval(content)
+        topic = data.get("topic", "Unknown topic")
+        findings = data.get("findings", "No research results")
+    except:
+        topic = "Unknown topic"
+        findings = content
+
+    return f"# {topic}\n\nBased on research: {findings}\n\nArticle content..."
+
+editor = A2AServer(
+    name="editor",
+    description="Editor"
+)
+
+@editor.skill("edit")
+def edit_article(text: str) -> str:
+    import re
+    match = re.search(r'edit\s+(.+)', text, re.IGNORECASE)
+    article = match.group(1).strip() if match else text
+
+    result = {
+        "article": article + "\n\n[Edited and optimized]",
+        "feedback": "Article quality is good",
+        "approved": True
+    }
+    return str(result)
+
+# 2. Start all services
+threading.Thread(target=lambda: researcher.run(port=5000), daemon=True).start()
+threading.Thread(target=lambda: writer.run(port=5001), daemon=True).start()
+threading.Thread(target=lambda: editor.run(port=5002), daemon=True).start()
+time.sleep(2)  # Wait for services to start
+
+# 3. Create clients to connect to each Agent
+researcher_client = A2AClient("http://localhost:5000")
+writer_client = A2AClient("http://localhost:5001")
+editor_client = A2AClient("http://localhost:5002")
+
+# 4. Collaboration workflow
+def create_content(topic):
+    # Step 1: Research
+    research = researcher_client.execute_skill("research", f"research {topic}")
+    research_data = research.get('result', '')
+
+    # Step 2: Write
+    article = writer_client.execute_skill("write", f"write {research_data}")
+    article_content = article.get('result', '')
+
+    # Step 3: Edit
+    final = editor_client.execute_skill("edit", f"edit {article_content}")
+    return final.get('result', '')
+
+# Usage
+result = create_content("AI applications in healthcare")
+print(f"\nFinal result:\n{result}")
+```
+
+### 10.3.4 Using A2A Tools in Agents
+
+Now let's see how to integrate A2A into HelloAgents agents.
+
+**(1) Using A2ATool Wrapper**
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import A2ATool
+from dotenv import load_dotenv
+
+load_dotenv()
+llm = HelloAgentsLLM()
+
+# Assume a researcher Agent service is already running at http://localhost:5000
+
+# Create coordinator Agent
+coordinator = SimpleAgent(name="Coordinator", llm=llm)
+
+# Add A2A tool, connect to researcher Agent
+researcher_tool = A2ATool(
+    name="researcher",
+    description="Researcher Agent, can search and analyze materials",
+    agent_url="http://localhost:5000"
+)
+coordinator.add_tool(researcher_tool)
+
+# Coordinator can call researcher Agent
+response = coordinator.run("Please have the researcher help me research AI applications in education")
+print(response)
+```
+
+**(2) Practical Case: Intelligent Customer Service System**
+
+Let's build a complete intelligent customer service system with three Agents:
+- **Receptionist**: Analyzes customer question types
+- **Technical Expert**: Answers technical questions
+- **Sales Consultant**: Answers sales questions
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import A2ATool
+from hello_agents.protocols import A2AServer
+import threading
+import time
+from dotenv import load_dotenv
+
+load_dotenv()
+llm = HelloAgentsLLM()
+
+# 1. Create technical expert Agent service
+tech_expert = A2AServer(
+    name="tech_expert",
+    description="Technical expert, answers technical questions"
+)
+
+@tech_expert.skill("answer")
+def answer_tech_question(text: str) -> str:
+    import re
+    match = re.search(r'answer\s+(.+)', text, re.IGNORECASE)
+    question = match.group(1).strip() if match else text
+    # In actual applications, this would call LLM or knowledge base
+    return f"Technical answer: Regarding '{question}', I suggest you check our technical documentation..."
+
+# 2. Create sales consultant Agent service
+sales_advisor = A2AServer(
+    name="sales_advisor",
+    description="Sales consultant, answers sales questions"
+)
+
+@sales_advisor.skill("answer")
+def answer_sales_question(text: str) -> str:
+    import re
+    match = re.search(r'answer\s+(.+)', text, re.IGNORECASE)
+    question = match.group(1).strip() if match else text
+    return f"Sales answer: Regarding '{question}', we have special offers..."
+
+# 3. Start services
+threading.Thread(target=lambda: tech_expert.run(port=6000), daemon=True).start()
+threading.Thread(target=lambda: sales_advisor.run(port=6001), daemon=True).start()
+time.sleep(2)
+
+# 4. Create receptionist Agent (using HelloAgents' SimpleAgent)
+receptionist = SimpleAgent(
+    name="Receptionist",
+    llm=llm,
+    system_prompt="""You are a customer service receptionist, responsible for:
+1. Analyzing customer question types (technical questions or sales questions)
+2. Forwarding questions to appropriate experts
+3. Organizing expert answers and returning them to customers
+
+Please remain polite and professional."""
+)
+
+# Add technical expert tool
+tech_tool = A2ATool(
+    agent_url="http://localhost:6000",
+    name="tech_expert",
+    description="Technical expert, answers technical-related questions"
+)
+receptionist.add_tool(tech_tool)
+
+# Add sales consultant tool
+sales_tool = A2ATool(
+    agent_url="http://localhost:6001",
+    name="sales_advisor",
+    description="Sales consultant, answers price and purchase-related questions"
+)
+receptionist.add_tool(sales_tool)
+
+# 5. Handle customer inquiries
+def handle_customer_query(query):
+    print(f"\nCustomer inquiry: {query}")
+    print("=" * 50)
+    response = receptionist.run(query)
+    print(f"\nCustomer service reply: {response}")
+    print("=" * 50)
+
+# Test different types of questions
+if __name__ == "__main__":
+    handle_customer_query("How do I call your API?")
+    handle_customer_query("What is the price of the enterprise version?")
+    handle_customer_query("How do I integrate it into my Python project?")
+```
+
+**(3) Advanced Usage: Agent Negotiation**
+
+The A2A protocol also supports negotiation mechanisms between Agents:
+
+```python
+from hello_agents.protocols import A2AServer, A2AClient
+import threading
+import time
+
+# Create two Agents that need to negotiate
+agent1 = A2AServer(
+    name="agent1",
+    description="Agent 1"
+)
+
+@agent1.skill("propose")
+def handle_proposal(text: str) -> str:
+    """Handle negotiation proposals"""
+    import re
+
+    # Parse proposal
+    match = re.search(r'propose\s+(.+)', text, re.IGNORECASE)
+    proposal_str = match.group(1).strip() if match else text
+
+    try:
+        proposal = eval(proposal_str)
+        task = proposal.get("task")
+        deadline = proposal.get("deadline")
+
+        # Evaluate proposal
+        if deadline >= 7:  # Need at least 7 days
+            result = {"accepted": True, "message": "Proposal accepted"}
+        else:
+            result = {
+                "accepted": False,
+                "message": "Timeline too tight",
+                "counter_proposal": {"deadline": 7}
+            }
+        return str(result)
+    except:
+        return str({"accepted": False, "message": "Invalid proposal format"})
+
+agent2 = A2AServer(
+    name="agent2",
+    description="Agent 2"
+)
+
+@agent2.skill("negotiate")
+def negotiate_task(text: str) -> str:
+    """Initiate negotiation"""
+    import re
+
+    # Parse task and deadline
+    match = re.search(r'negotiate\s+task:(.+?)\s+deadline:(\d+)', text, re.IGNORECASE)
+    if match:
+        task = match.group(1).strip()
+        deadline = int(match.group(2))
+
+        # Send proposal to agent1
+        proposal = {"task": task, "deadline": deadline}
+        return str({"status": "negotiating", "proposal": proposal})
+    else:
+        return str({"status": "error", "message": "Invalid negotiation request"})
+
+# Start services
+threading.Thread(target=lambda: agent1.run(port=7000), daemon=True).start()
+threading.Thread(target=lambda: agent2.run(port=7001), daemon=True).start()
+```
+
+## 10.4 ANP Protocol in Practice
+
+After the MCP protocol solved tool invocation and the A2A protocol solved peer-to-peer agent collaboration, the ANP protocol focuses on solving agent management problems in large-scale, open network environments.
+
+In Sections 10.2 and 10.3, we learned about MCP (tool access) and A2A (agent collaboration). Now, let's learn about the ANP (Agent Network Protocol) protocol, which focuses on building **large-scale, open agent networks**.
+
+### 10.4.1 Protocol Goals
+
+When a network contains a large number of agents with different functions (e.g., natural language processing, image recognition, data analysis, etc.), the system faces a series of challenges:
+
+- **Service Discovery**: When a new task arrives, how to quickly find agents capable of handling that task?
+- **Intelligent Routing**: If multiple agents can handle the same task, how to choose the most suitable one (e.g., based on load, cost, etc.) and dispatch the task to it?
+- **Dynamic Scaling**: How to make newly joined agents discoverable and callable by other members?
+
+The design goal of ANP is to provide a standardized mechanism to solve the above service discovery, routing selection, and network scalability problems.
+
+To achieve its design goals, ANP defines the following core concepts, as shown in Table 10.8:
+
+<div align="center">
+  <p>Table 10.8 ANP Core Concepts</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-8.png" alt="" width="85%"/>
+</div>
+
+We also borrow from the official [Getting Started Guide](https://github.com/agent-network-protocol/AgentNetworkProtocol/blob/main/docs/chinese/ANP入门指南.md) to introduce ANP's architectural design, as shown in Figure 10.9
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-9.png" alt="" width="85%"/>
+  <p>Figure 10.9 ANP Overall Process</p>
+</div>
+
+
+In this flowchart, the main steps include:
+
+**1. Service Discovery and Matching:** First, Agent A uses a public discovery service to query based on semantic or functional descriptions to locate Agent B that meets its task requirements. The discovery service establishes an index by pre-crawling standard endpoints (`.well-known/agent-descriptions`) exposed by each agent, thereby achieving dynamic matching between service demanders and providers.
+
+**2. DID-based Identity Verification:** At the start of interaction, Agent A uses its private key to sign a request containing its own DID. After Agent B receives it, it parses the DID to obtain the corresponding public key and uses it to verify the authenticity of the signature and the integrity of the request, thereby establishing trusted communication between both parties.
+
+**3. Standardized Service Execution:** After identity verification passes, Agent B responds to the request, and both parties exchange data or invoke services (such as booking, querying, etc.) according to predefined standard interfaces and data formats. Standardized interaction processes are the foundation for achieving cross-platform and cross-system interoperability.
+
+In summary, the core of this mechanism is using DID to build a decentralized trust foundation and leveraging standardized description protocols to achieve dynamic service discovery. This approach enables agents to form collaborative networks on the internet securely and efficiently without requiring central coordination.
+
+### 10.4.2 Using ANP Service Discovery
+
+**(1) Creating Service Discovery Center**
+
+```python
+from hello_agents.protocols import ANPDiscovery, register_service
+
+# Create service discovery center
+discovery = ANPDiscovery()
+
+# Register Agent services
+register_service(
+    discovery=discovery,
+    service_id="nlp_agent_1",
+    service_name="NLP Processing Expert A",
+    service_type="nlp",
+    capabilities=["text_analysis", "sentiment_analysis", "ner"],
+    endpoint="http://localhost:8001",
+    metadata={"load": 0.3, "price": 0.01, "version": "1.0.0"}
+)
+
+register_service(
+    discovery=discovery,
+    service_id="nlp_agent_2",
+    service_name="NLP Processing Expert B",
+    service_type="nlp",
+    capabilities=["text_analysis", "translation"],
+    endpoint="http://localhost:8002",
+    metadata={"load": 0.7, "price": 0.02, "version": "1.1.0"}
+)
+
+print("✅ Service registration completed")
+```
+
+**(2) Discovering Services**
+
+```python
+from hello_agents.protocols import discover_service
+
+# Find by type
+nlp_services = discover_service(discovery, service_type="nlp")
+print(f"Found {len(nlp_services)} NLP services")
+
+# Select service with lowest load
+best_service = min(nlp_services, key=lambda s: s.metadata.get("load", 1.0))
+print(f"Best service: {best_service.service_name} (load: {best_service.metadata['load']})")
+```
+
+**(3) Building Agent Network**
+
+```python
+from hello_agents.protocols import ANPNetwork
+
+# Create network
+network = ANPNetwork(network_id="ai_cluster")
+
+# Add nodes
+for service in discovery.list_all_services():
+    network.add_node(service.service_id, service.endpoint)
+
+# Establish connections (based on capability matching)
+network.connect_nodes("nlp_agent_1", "nlp_agent_2")
+
+stats = network.get_network_stats()
+print(f"✅ Network construction completed, total {stats['total_nodes']} nodes")
+```
+
+### 10.4.3 Practical Case
+
+Let's build a complete distributed task scheduling system:
+
+```python
+from hello_agents.protocols import ANPDiscovery, register_service
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools.builtin import ANPTool
+import random
+from dotenv import load_dotenv
+
+load_dotenv()
+llm = HelloAgentsLLM()
+
+# 1. Create service discovery center
+discovery = ANPDiscovery()
+
+# 2. Register multiple compute nodes
+for i in range(10):
+    register_service(
+        discovery=discovery,
+        service_id=f"compute_node_{i}",
+        service_name=f"Compute Node {i}",
+        service_type="compute",
+        capabilities=["data_processing", "ml_training"],
+        endpoint=f"http://node{i}:8000",
+        metadata={
+            "load": random.uniform(0.1, 0.9),
+            "cpu_cores": random.choice([4, 8, 16]),
+            "memory_gb": random.choice([16, 32, 64]),
+            "gpu": random.choice([True, False])
+        }
+    )
+
+print(f"✅ Registered {len(discovery.list_services())} compute nodes")
+
+# 3. Create task scheduler Agent
+scheduler = SimpleAgent(
+    name="Task Scheduler",
+    llm=llm,
+    system_prompt="""You are an intelligent task scheduler, responsible for:
+1. Analyzing task requirements
+2. Selecting the most suitable compute node
+3. Assigning tasks
+
+When selecting nodes, consider: load, CPU cores, memory, GPU, and other factors."""
+)
+
+# Add ANP tool
+anp_tool = ANPTool(
+    name="service_discovery",
+    description="Service discovery tool, can find and select compute nodes",
+    discovery=discovery
+)
+scheduler.add_tool(anp_tool)
+
+# 4. Intelligent task assignment
+def assign_task(task_description):
+    print(f"\nTask: {task_description}")
+    print("=" * 50)
+
+    # Let Agent intelligently select node
+    response = scheduler.run(f"""
+    Please select the most suitable compute node for the following task:
+    {task_description}
+
+    Requirements:
+    1. List all available nodes
+    2. Analyze characteristics of each node
+    3. Select the most suitable node
+    4. Explain selection reasoning
+    """)
+
+    print(response)
+    print("=" * 50)
+
+# Test different types of tasks
+assign_task("Train a large deep learning model, requires GPU support")
+assign_task("Process large amounts of text data, requires high memory")
+assign_task("Run lightweight data analysis task")
+```
+
+This is a load balancing example
+
+```python
+from hello_agents.protocols import ANPDiscovery, register_service
+import random
+
+# Create service discovery center
+discovery = ANPDiscovery()
+
+# Register multiple services of the same type
+for i in range(5):
+    register_service(
+        discovery=discovery,
+        service_id=f"api_server_{i}",
+        service_name=f"API Server {i}",
+        service_type="api",
+        capabilities=["rest_api"],
+        endpoint=f"http://api{i}:8000",
+        metadata={"load": random.uniform(0.1, 0.9)}
+    )
+
+# Load balancing function
+def get_best_server():
+    """Select server with lowest load"""
+    servers = discovery.discover_services(service_type="api")
+    if not servers:
+        return None
+
+    best = min(servers, key=lambda s: s.metadata.get("load", 1.0))
+    return best
+
+# Simulate request allocation
+for i in range(10):
+    server = get_best_server()
+    print(f"Request {i+1} -> {server.service_name} (load: {server.metadata['load']:.2f})")
+
+    # Update load (simulated)
+    server.metadata["load"] += 0.1
+```
+
+## 10.5 Building Custom MCP Servers
+
+In previous sections, we learned how to use existing MCP services. We also learned about the characteristics of different protocols. Now, let's learn how to build our own MCP server.
+
+### 10.5.1 Creating Your First MCP Server
+
+**(1) Why Build a Custom MCP Server?**
+
+Although you can directly use public MCP services, in many practical application scenarios, you need to build custom MCP servers to meet specific needs.
+
+Main motivations include the following:
+
+- **Encapsulating Business Logic**: Encapsulate enterprise-specific business processes or complex operations as standardized MCP tools for unified invocation by agents.
+- **Accessing Private Data**: Create a secure and controllable interface or proxy for accessing internal databases, APIs, or other private data sources that cannot be exposed to the public network.
+- **Performance Optimization**: Perform deep optimization for high-frequency calls or application scenarios with strict response latency requirements.
+- **Custom Feature Extension**: Implement specific functions not provided by standard MCP services, such as integrating proprietary algorithm models or connecting to specific hardware devices.
+
+**(2) Teaching Case: Weather Query MCP Server**
+
+Let's start with a simple weather query server and gradually learn MCP server development:
+
+```python
+#!/usr/bin/env python3
+"""Weather Query MCP Server"""
+
+import json
+import requests
+import os
+from datetime import datetime
+from typing import Dict, Any
+from hello_agents.protocols import MCPServer
+
+# Create MCP server
+weather_server = MCPServer(name="weather-server", description="Real weather query service")
+
+CITY_MAP = {
+    "Beijing": "Beijing", "Shanghai": "Shanghai", "Guangzhou": "Guangzhou",
+    "Shenzhen": "Shenzhen", "Hangzhou": "Hangzhou", "Chengdu": "Chengdu",
+    "Chongqing": "Chongqing", "Wuhan": "Wuhan", "Xi'an": "Xi'an",
+    "Nanjing": "Nanjing", "Tianjin": "Tianjin", "Suzhou": "Suzhou"
+}
+
+
+def get_weather_data(city: str) -> Dict[str, Any]:
+    """Get weather data from wttr.in"""
+    city_en = CITY_MAP.get(city, city)
+    url = f"https://wttr.in/{city_en}?format=j1"
+    response = requests.get(url, timeout=10)
+    response.raise_for_status()
+    data = response.json()
+    current = data["current_condition"][0]
+
+    return {
+        "city": city,
+        "temperature": float(current["temp_C"]),
+        "feels_like": float(current["FeelsLikeC"]),
+        "humidity": int(current["humidity"]),
+        "condition": current["weatherDesc"][0]["value"],
+        "wind_speed": round(float(current["windspeedKmph"]) / 3.6, 1),
+        "visibility": float(current["visibility"]),
+        "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    }
+
+
+# Define tool function
+def get_weather(city: str) -> str:
+    """Get current weather for specified city"""
+    try:
+        weather_data = get_weather_data(city)
+        return json.dumps(weather_data, ensure_ascii=False, indent=2)
+    except Exception as e:
+        return json.dumps({"error": str(e), "city": city}, ensure_ascii=False)
+
+
+def list_supported_cities() -> str:
+    """List all supported Chinese cities"""
+    result = {"cities": list(CITY_MAP.keys()), "count": len(CITY_MAP)}
+    return json.dumps(result, ensure_ascii=False, indent=2)
+
+
+def get_server_info() -> str:
+    """Get server information"""
+    info = {
+        "name": "Weather MCP Server",
+        "version": "1.0.0",
+        "tools": ["get_weather", "list_supported_cities", "get_server_info"]
+    }
+    return json.dumps(info, ensure_ascii=False, indent=2)
+
+
+# Register tools to server
+weather_server.add_tool(get_weather)
+weather_server.add_tool(list_supported_cities)
+weather_server.add_tool(get_server_info)
+
+
+if __name__ == "__main__":
+    weather_server.run()
+```
+
+**(3) Testing Custom MCP Server**
+
+Then create a test script:
+
+```python
+#!/usr/bin/env python3
+"""Test Weather Query MCP Server"""
+
+import asyncio
+import json
+import sys
+import os
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'HelloAgents'))
+from hello_agents.protocols.mcp.client import MCPClient
+
+
+async def test_weather_server():
+    server_script = os.path.join(os.path.dirname(__file__), "14_weather_mcp_server.py")
+    client = MCPClient(["python", server_script])
+
+    try:
+        async with client:
+            # Test 1: Get server information
+            info = json.loads(await client.call_tool("get_server_info", {}))
+            print(f"Server: {info['name']} v{info['version']}")
+
+            # Test 2: List supported cities
+            cities = json.loads(await client.call_tool("list_supported_cities", {}))
+            print(f"Supported cities: {cities['count']} cities")
+
+            # Test 3: Query Beijing weather
+            weather = json.loads(await client.call_tool("get_weather", {"city": "Beijing"}))
+            if "error" not in weather:
+                print(f"\nBeijing weather: {weather['temperature']}°C, {weather['condition']}")
+
+            # Test 4: Query Shenzhen weather
+            weather = json.loads(await client.call_tool("get_weather", {"city": "Shenzhen"}))
+            if "error" not in weather:
+                print(f"Shenzhen weather: {weather['temperature']}°C, {weather['condition']}")
+
+            print("\n✅ All tests completed!")
+
+    except Exception as e:
+        print(f"❌ Test failed: {e}")
+
+
+if __name__ == "__main__":
+    asyncio.run(test_weather_server())
+```
+
+**(4) Using Custom MCP Server in Agent**
+
+```python
+"""Using Weather MCP Server in Agent"""
+
+import os
+from dotenv import load_dotenv
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import MCPTool
+
+load_dotenv()
+
+
+def create_weather_assistant():
+    """Create weather assistant"""
+    llm = HelloAgentsLLM()
+
+    assistant = SimpleAgent(
+        name="Weather Assistant",
+        llm=llm,
+        system_prompt="""You are a weather assistant that can query city weather.
+Use the get_weather tool to query weather, supports Chinese city names.
+"""
+    )
+
+    # Add weather MCP tool
+    server_script = os.path.join(os.path.dirname(__file__), "14_weather_mcp_server.py")
+    weather_tool = MCPTool(server_command=["python", server_script])
+    assistant.add_tool(weather_tool)
+
+    return assistant
+
+
+def demo():
+    """Demo"""
+    assistant = create_weather_assistant()
+
+    print("\nQuery Beijing weather:")
+    response = assistant.run("How's the weather in Beijing today?")
+    print(f"Answer: {response}\n")
+
+
+def interactive():
+    """Interactive mode"""
+    assistant = create_weather_assistant()
+
+    while True:
+        user_input = input("\nYou: ").strip()
+        if user_input.lower() in ['quit', 'exit']:
+            break
+        response = assistant.run(user_input)
+        print(f"Assistant: {response}")
+
+
+if __name__ == "__main__":
+    import sys
+    if len(sys.argv) > 1 and sys.argv[1] == "demo":
+        demo()
+    else:
+        interactive()
+```
+
+```
+🔗 Connecting to MCP server...
+✅ Connection successful!
+🔌 Connection disconnected
+✅ Tool 'mcp_get_weather' registered.
+✅ Tool 'mcp_list_supported_cities' registered.
+✅ Tool 'mcp_get_server_info' registered.
+✅ MCP tool 'mcp' expanded into 3 independent tools
+
+You: I want to query Beijing's weather
+🔗 Connecting to MCP server...
+✅ Connection successful!
+🔌 Connection disconnected
+Assistant: The current weather in Beijing is as follows:
+
+- Temperature: 10.0°C
+- Feels like: 9.0°C
+- Humidity: 94%
+- Weather condition: Light rain
+- Wind speed: 1.7 m/s
+- Visibility: 10.0 km
+- Timestamp: October 9, 2025 13:46:40
+
+Please bring rain gear and adjust your clothing according to weather changes.
+```
+
+### 10.5.2 Uploading MCP Server
+
+We created a real weather query MCP server. Now, let's publish it to the Smithery platform so developers worldwide can use our service.
+
+(1) What is Smithery?
+
+[Smithery](https://smithery.ai/) is the official publishing platform for MCP servers, similar to Python's PyPI or Node.js's npm. Through Smithery, users can:
+
+- 🔍 Discover and search for MCP servers
+- 📦 Install MCP servers with one click
+- 📊 View server usage statistics and ratings
+- 🔄 Automatically get server updates
+
+(2) Preparing for Publication
+First, we need to organize the project into a standard publishing format. This folder has been organized in the `code` directory for your reference:
+
+```
+weather-mcp-server/
+├── README.md           # Project documentation
+├── LICENSE            # Open source license
+├── Dockerfile         # Docker build configuration (recommended)
+├── pyproject.toml     # Python project configuration (required)
+├── requirements.txt   # Python dependencies
+├── smithery.yaml      # Smithery configuration file (required)
+└── server.py          # MCP server main file
+```
+
+Note that `smithery.yaml` is the configuration file for the Smithery platform:
+```yaml
+name: weather-mcp-server
+displayName: Weather MCP Server
+description: Real-time weather query MCP server based on HelloAgents framework
+version: 1.0.0
+author: HelloAgents Team
+homepage: https://github.com/yourusername/weather-mcp-server
+license: MIT
+categories:
+  - weather
+  - data
+tags:
+  - weather
+  - real-time
+  - helloagents
+  - wttr
+runtime: container
+build:
+  dockerfile: Dockerfile
+  dockerBuildPath: .
+startCommand:
+  type: http
+tools:
+  - name: get_weather
+    description: Get current weather for a city
+  - name: list_supported_cities
+    description: List all supported cities
+  - name: get_server_info
+    description: Get server information
+```
+
+Configuration explanation:
+
+- `name`: Unique identifier for the server (lowercase, hyphen-separated)
+- `displayName`: Display name
+- `description`: Brief description
+- `version`: Version number (follows semantic versioning)
+- `runtime`: Runtime environment (python/node)
+- `entrypoint`: Entry file
+- `tools`: Tool list
+
+`pyproject.toml` is the standard configuration file for Python projects. Smithery requires this file because it will be packaged into a server later:
+
+```toml
+[build-system]
+requires = ["setuptools>=61.0", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "weather-mcp-server"
+version = "1.0.0"
+description = "Real-time weather query MCP server based on HelloAgents framework"
+readme = "README.md"
+license = {text = "MIT"}
+authors = [
+    {name = "HelloAgents Team", email = "xxx"}
+]
+requires-python = ">=3.10"
+dependencies = [
+    "hello-agents>=0.2.1",
+    "requests>=2.31.0",
+]
+
+[project.urls]
+Homepage = "https://github.com/yourusername/weather-mcp-server"
+Repository = "https://github.com/yourusername/weather-mcp-server"
+"Bug Tracker" = "https://github.com/yourusername/weather-mcp-server/issues"
+
+[tool.setuptools]
+py-modules = ["server"]
+```
+
+
+Configuration explanation:
+
+- `[build-system]`: Specifies build tool (setuptools)
+- `[project]`: Project metadata
+  - `name`: Project name
+  - `version`: Version number (follows semantic versioning)
+  - `dependencies`: Project dependency list
+  - `requires-python`: Python version requirement
+- `[project.urls]`: Project-related links
+- `[tool.setuptools]`: setuptools configuration
+
+Although Smithery automatically generates Dockerfile, providing a custom Dockerfile ensures successful deployment:
+
+```dockerfile
+# Multi-stage build for weather-mcp-server
+FROM python:3.12-slim-bookworm as base
+
+# Set working directory
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    --no-install-recommends \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy project files
+COPY pyproject.toml requirements.txt ./
+COPY server.py ./
+
+# Install Python dependencies
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+ENV PORT=8081
+
+# Expose port (Smithery uses 8081)
+EXPOSE 8081
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD python -c "import sys; sys.exit(0)"
+
+# Run the MCP server
+CMD ["python", "server.py"]
+```
+
+Dockerfile configuration explanation:
+
+- **Base Image**: `python:3.12-slim-bookworm` - Lightweight Python image
+- **Working Directory**: `/app` - Application root directory
+- **Port**: `8081` - Smithery platform standard port
+- **Start Command**: `python server.py` - Run MCP server
+
+Here, we need to Fork the `hello-agents` repository, get the source code in `code`, and create a repository named `weather-mcp-server` using your own GitHub, changing `yourusername` to your GitHub username.
+
+(3) Submit to Smithery
+
+Open your browser and visit [https://smithery.ai/](https://smithery.ai/). Log in to Smithery using your GitHub account. Click the "Publish Server" button on the page, enter your GitHub repository URL: `https://github.com/yourusername/weather-mcp-server`, and wait for publication.
+
+Once publication is complete, you can see a page similar to this, as shown in Figure 10.10:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-10.png" alt="" width="85%"/>
+  <p>Figure 10.10 Smithery Publication Success Page</p>
+</div>
+
+
+
+Once the server is successfully published, users can use it in the following ways:
+
+Method 1: Through Smithery CLI
+
+```bash
+# Install Smithery CLI
+npm install -g @smithery/cli
+
+# Install your server
+smithery install weather-mcp-server
+```
+
+Method 2: Configure in Claude Desktop
+
+```json
+{
+  "mcpServers": {
+    "weather": {
+      "command": "smithery",
+      "args": ["run", "weather-mcp-server"]
+    }
+  }
+}
+```
+
+Method 3: Use in HelloAgents
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools.builtin.protocol_tools import MCPTool
+
+agent = SimpleAgent(name="Weather Assistant", llm=HelloAgentsLLM())
+
+# Use Smithery-installed server
+weather_tool = MCPTool(
+    server_command=["smithery", "run", "weather-mcp-server"]
+)
+agent.add_tool(weather_tool)
+
+response = agent.run("How's the weather in Beijing today?")
+```
+
+Of course, this is just an example, and there are more usages to explore on your own. Figure 10.11 below shows the information included when an MCP tool is successfully published, displaying the service name "Weather", its unique identifier `@jjyaoao/weather-mcp-server`, and status information. The Tools area shows the methods we just implemented, and the Connect area provides technical information needed to connect and use this service, including the service's **access URL address** and **configuration code snippets** in multiple languages/environments. If you want to learn more, you can click this [link](https://smithery.ai/server/@jjyaoao/weather-mcp-server).
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-11.png" alt="" width="85%"/>
+  <p>Figure 10.11 Successfully Published MCP Tool on Smithery</p>
+</div>
+
+Now it's time to create your own MCP server!
+
+
+
+## 10.6 Chapter Summary
+
+This chapter systematically introduced three core protocols for agent communication: MCP, A2A, and ANP, and explored their design philosophies, application scenarios, and practical methods.
+
+**Protocol Positioning:**
+
+- **MCP (Model Context Protocol)**: As a bridge between agents and tools, provides a unified tool access interface, suitable for enhancing the capabilities of individual agents.
+- **A2A (Agent-to-Agent Protocol)**: As a dialogue system between agents, supports direct communication and task negotiation, suitable for close collaboration in small-scale teams.
+- **ANP (Agent Network Protocol)**: As the "internet" for agents, provides service discovery, routing, and load balancing mechanisms, suitable for building large-scale, open agent networks.
+
+**HelloAgents Integration Solution**
+
+In the `HelloAgents` framework, these three protocols are uniformly abstracted as tools (Tool), achieving seamless integration, allowing developers to flexibly add different levels of communication capabilities to agents:
+
+```python
+# Unified Tool interface
+from hello_agents.tools import MCPTool, A2ATool, ANPTool
+
+# All protocols can be added to Agent as Tools
+agent.add_tool(MCPTool(...))
+agent.add_tool(A2ATool(...))
+agent.add_tool(ANPTool(...))
+```
+
+**Practical Experience Summary**
+
+- Prioritize using mature community MCP services to reduce unnecessary redundant development.
+- Choose appropriate protocols based on system scale: A2A is recommended for small-scale collaboration scenarios, while ANP should be used for large-scale network scenarios.
+
+After completing this chapter, it is recommended that you:
+
+1. **Hands-on Practice**:
+   - Build your own MCP server
+   - Create multi-agent collaboration systems using protocols
+   - Combination application strategies for MCP, A2A, and ANP
+2. **In-depth Learning**:
+   - Read MCP official documentation: https://modelcontextprotocol.io
+   - Read A2A official documentation: https://a2a-protocol.org/latest/
+   - Read ANP official documentation: https://agent-network-protocol.com/guide/
+3. **Participate in Community**:
+   - Contribute new MCP services to the community
+   - Share your own developed agent implementation cases
+   - Participate in technical standard discussions for related protocols, or ask questions in Issues or directly help HelloAgents support new example cases
+
+**Congratulations on completing Chapter 10!**
+
+You now have mastered the core knowledge of agent communication protocols. Keep up the good work! 🚀
+
+## Exercises
+
+> **Note**: Some exercises do not have standard answers. The focus is on cultivating learners' comprehensive understanding and practical ability in agent communication protocols.
+
+1. This chapter introduced three agent communication protocols: MCP, A2A, and ANP. Please analyze:
+
+   - Section 10.1.2 compared the design philosophies of the three protocols. Please analyze in depth: Why does MCP emphasize "context sharing", A2A emphasize "conversational collaboration", and ANP emphasize "network topology"? What core problems do these design philosophies solve respectively?
+   - Suppose you want to build an "intelligent customer service system" that requires the following functions: (1) Access customer database and order system; (2) Multiple professional customer service agents collaborate to handle complex problems; (3) Support large-scale concurrent user requests. Please select the most appropriate protocol for each function and explain your reasoning.
+   - Can the three protocols be used in combination? Please design a practical application scenario showing how to use MCP, A2A, and ANP simultaneously to build a complete agent system. Draw a system architecture diagram and explain the responsibilities of each protocol.
+
+2. MCP (Model Context Protocol) is the standard protocol for agent-tool communication. Based on the content in Section 10.2, please think deeply:
+
+   > **Note**: This is a hands-on practice question, actual operation is recommended
+
+   - In the MCP server implementation in Section 10.2.3, we defined core methods such as `list_tools` and `call_tool`. Please extend this implementation by adding a new MCP server that provides the following tools: (1) Database query tool; (2) Data visualization tool; (3) Report generation tool. Require that tools can collaborate to complete complex data analysis tasks.
+   - The MCP protocol supports two important concepts: "Resources" and "Prompts", but this chapter mainly focuses on "Tools". Please consult the MCP official documentation to understand the design purposes of Resources and Prompts, and design an application scenario showing how to use these three core concepts to build a more powerful agent system.
+   - MCP uses JSON-RPC 2.0 as the underlying communication protocol and communicates between processes via stdio. Please analyze: What are the advantages and limitations of this design? If you need to support remote MCP servers (accessed via HTTP/WebSocket), how should the current implementation be extended?
+
+3. A2A (Agent-to-Agent Protocol) supports conversational collaboration between agents. Based on the content in Section 10.3, please complete the following extended practice:
+
+   > **Note**: This is a hands-on practice question, actual operation is recommended
+
+   - In the "research team" case in Section 10.3.4, researchers and writers collaborate through the A2A protocol to complete paper writing. Please extend this case by adding a third agent "Reviewer", which can review paper quality and provide revision suggestions. Design the collaboration process among the three agents and implement complete code.
+   - The A2A protocol defines message types such as `task` and `task_result`. Please analyze: If conflicts occur during collaboration (such as two agents having different opinions on the same issue), how should a conflict resolution mechanism be designed? Please extend the A2A protocol by adding message types such as "negotiation" and "voting".
+   - Compare the A2A protocol with multi-agent frameworks such as AutoGen and CAMEL introduced in Chapter 6: What is the relationship between A2A as a standard protocol and these frameworks? Can they replace each other? Please design a solution that allows agents based on the A2A protocol to communicate with agents in the AutoGen framework.
+
+4. ANP (Agent Network Protocol) supports large-scale agent networks. Based on the content in Section 10.4, please analyze in depth:
+
+   - Section 10.4.2 introduced ANP's network topology design, including star, mesh, hierarchical, and other structures. Please analyze: In what scenarios should which topology structure be chosen? If the network scale expands from 10 agents to 1000 agents, how should the topology structure evolve?
+   - The ANP protocol supports "routing" and "discovery" mechanisms, allowing agents to dynamically find suitable collaboration partners. Please design an "intelligent routing algorithm": automatically select the optimal message routing path based on task type, agent capabilities, network load, and other factors.
+   - In the "smart city" case in Section 10.4.4, multiple agents collaborate to manage city systems. Please think: If a critical agent (such as a traffic management agent) fails, how should the entire system respond? Please design a "fault tolerance mechanism", including fault detection, backup switching, state recovery, and other functions.
+
+5. Security and privacy protection of agent communication protocols are key issues in practical applications. Please think:
+
+   - In the MCP client implementation in Section 10.2.4, agents can call any tool provided by the MCP server. Please analyze: What security risks does this design have? If the MCP server provides dangerous operations (such as deleting files, executing system commands), how should a permission control mechanism be designed?
+   - A2A and ANP protocols involve communication between multiple agents, which may contain sensitive information (such as user privacy data, business secrets). Please design an "end-to-end encryption" solution: ensure that messages are not eavesdropped or tampered with during transmission, while supporting agent identity authentication and access control.
+   - In large-scale agent networks, malicious agents may send false information, launch denial-of-service attacks, or steal data from other agents. Please design a "trust evaluation system": dynamically evaluate the trustworthiness of each agent based on historical behavior, collaboration quality, community evaluation, and other factors, and adjust communication strategies accordingly.
+
+## References
+
+[1] Anthropic. (2024). *Model Context Protocol*. Retrieved October 7, 2025, from https://modelcontextprotocol.io/
+
+[2] The A2A Project. (2025). *A2A Protocol: An open protocol for agent-to-agent communication*. Retrieved October 7, 2025, from https://a2a-protocol.org/
+
+[3] Chang, G., Lin, E., Yuan, C., Cai, R., Chen, B., Xie, X., & Zhang, Y. (2025). *Agent Network Protocol technical white paper*. arXiv. https://doi.org/10.48550/arXiv.2508.00007
+

+ 185 - 181
docs/chapter10/第十章 智能体通信协议.md

@@ -1,8 +1,12 @@
+<div align="right">
+  <a href="./Chapter10-Agent-Communication-Protocols.md">English</a> | 中文
+</div>
+
 # 第十章 智能体通信协议
 
-在前面的章节中,我们构建了功能完备的单体智能体,它们具备推理、工具调用和记忆能力。然而,当我们尝试构建更复杂的AI系统时,自然会有疑问:<strong>如何让智能体与外部世界高效交互?如何让多个智能体相互协作?</strong>
+在前面的章节中,我们构建了功能完备的单体智能体,它们具备推理、工具调用和记忆能力。然而,当我们尝试构建更复杂的 AI 系统时,自然会有疑问:<strong>如何让智能体与外部世界高效交互?如何让多个智能体相互协作?</strong>
 
-这正是智能体通信协议要解决的核心问题。本章将为HelloAgents框架引入三种通信协议:<strong>MCP(Model Context Protocol)</strong>用于智能体与工具的标准化通信,<strong>A2A(Agent-to-Agent Protocol)</strong>用于智能体间的点对点协作,<strong>ANP(Agent Network Protocol)</strong>用于构建大规模智能体网络。这三种协议共同构成了智能体通信的基础设施层。
+这正是智能体通信协议要解决的核心问题。本章将为 HelloAgents 框架引入三种通信协议:<strong>MCP(Model Context Protocol)</strong>用于智能体与工具的标准化通信,<strong>A2A(Agent-to-Agent Protocol)</strong>用于智能体间的点对点协作,<strong>ANP(Agent Network Protocol)</strong>用于构建大规模智能体网络。这三种协议共同构成了智能体通信的基础设施层。
 
 通过本章的学习,您将掌握智能体通信协议的设计理念和实践技能,理解三种主流协议的设计差异,学会如何选择合适的协议来解决实际问题。
 
@@ -10,7 +14,7 @@
 
 ### 10.1.1 为何需要通信协议
 
-回顾我们在第七章构建的ReAct智能体,它已经具备了强大的推理和工具调用能力。让我们看一个典型的使用场景:
+回顾我们在第七章构建的 ReAct 智能体,它已经具备了强大的推理和工具调用能力。让我们看一个典型的使用场景:
 
 ```python
 from hello_agents import ReActAgent, HelloAgentsLLM
@@ -25,7 +29,7 @@ agent.add_tool(SearchTool())
 response = agent.run("搜索最新的AI新闻,并计算相关公司的市值总和")
 ```
 
-这个智能体工作得很好,但它面临着三个根本性的限制。首先是<strong>工具集成的困境</strong>:每当需要访问新的外部服务(如GitHub API、数据库、文件系统),我们都必须编写专门的Tool类。这不仅工作量大,而且不同开发者编写的工具无法互相兼容。其次是<strong>能力扩展的瓶颈</strong>:智能体的能力被限制在预先定义的工具集内,无法动态发现和使用新的服务。最后是<strong>协作的缺失</strong>:当任务复杂到需要多个专业智能体协作时(如研究员+撰写员+编辑),我们只能通过手动编排来协调它们的工作。
+这个智能体工作得很好,但它面临着三个根本性的限制。首先是<strong>工具集成的困境</strong>:每当需要访问新的外部服务(如 GitHub API、数据库、文件系统),我们都必须编写专门的 Tool 类。这不仅工作量大,而且不同开发者编写的工具无法互相兼容。其次是<strong>能力扩展的瓶颈</strong>:智能体的能力被限制在预先定义的工具集内,无法动态发现和使用新的服务。最后是<strong>协作的缺失</strong>:当任务复杂到需要多个专业智能体协作时(如研究员+撰写员+编辑),我们只能通过手动编排来协调它们的工作。
 
 让我们通过一个更具体的例子来理解这些限制。假设你要构建一个智能研究助手,它需要:
 
@@ -55,9 +59,9 @@ agent.add_tool(DatabaseTool())
 agent.add_tool(WeatherTool())
 ```
 
-这种方式存在明显的问题:代码重复(每个工具都要处理HTTP请求、错误处理、认证等),难以维护(API变更需要修改所有相关工具),无法复用(其他开发者的工具无法直接使用),扩展性差(添加新服务需要大量编码工作)。
+这种方式存在明显的问题:代码重复(每个工具都要处理 HTTP 请求、错误处理、认证等),难以维护(API 变更需要修改所有相关工具),无法复用(其他开发者的工具无法直接使用),扩展性差(添加新服务需要大量编码工作)。
 
-<strong>通信协议的核心价值</strong>正是解决这些问题。它提供了一套标准化的接口规范,让智能体能够以统一的方式访问各种外部服务,而无需为每个服务编写专门的适配器。这就像互联网的TCP/IP协议,它让不同的设备能够相互通信,而不需要为每种设备编写专门的通信代码。
+<strong>通信协议的核心价值</strong>正是解决这些问题。它提供了一套标准化的接口规范,让智能体能够以统一的方式访问各种外部服务,而无需为每个服务编写专门的适配器。这就像互联网的 TCP/IP 协议,它让不同的设备能够相互通信,而不需要为每种设备编写专门的通信代码。
 
 有了通信协议,上面的代码可以简化为:
 
@@ -81,42 +85,42 @@ agent.add_tool(database_mcp)
 
 ### 10.1.2 三种协议设计理念比较
 
-智能体通信协议并非单一的解决方案,而是针对不同通信场景设计的一系列标准。在本章以目前业界主流的三种协议MCP、A2A和ANP为例进行实践,下面是一个总览的比较。
+智能体通信协议并非单一的解决方案,而是针对不同通信场景设计的一系列标准。在本章以目前业界主流的三种协议 MCP、A2A  ANP 为例进行实践,下面是一个总览的比较。
 
 <strong>(1)MCP:智能体与工具的桥梁</strong>
 
-MCP(Model Context Protocol)由Anthropic团队提出<sup>[1]</sup>,其核心设计理念是<strong>标准化智能体与外部工具/资源的通信方式</strong>。想象一下,你的智能体需要访问文件系统、数据库、GitHub、Slack等各种服务。传统做法是为每个服务编写专门的适配器,这不仅工作量大,而且难以维护。MCP通过定义统一的协议规范,让所有服务都能以相同的方式被访问。
+MCP(Model Context Protocol)由 Anthropic 团队提出<sup>[1]</sup>,其核心设计理念是<strong>标准化智能体与外部工具/资源的通信方式</strong>。想象一下,你的智能体需要访问文件系统、数据库、GitHub、Slack 等各种服务。传统做法是为每个服务编写专门的适配器,这不仅工作量大,而且难以维护。MCP 通过定义统一的协议规范,让所有服务都能以相同的方式被访问。
 
-MCP的设计哲学是"上下文共享"。它不仅仅是一个RPC(远程过程调用)协议,更重要的是它允许智能体和工具之间共享丰富的上下文信息。如图10.1所示,当智能体访问一个代码仓库时,MCP服务器不仅能提供文件内容,还能提供代码结构、依赖关系、提交历史等上下文信息,让智能体能够做出更智能的决策。
+MCP 的设计哲学是"上下文共享"。它不仅仅是一个 RPC(远程过程调用)协议,更重要的是它允许智能体和工具之间共享丰富的上下文信息。如图 10.1 所示,当智能体访问一个代码仓库时,MCP 服务器不仅能提供文件内容,还能提供代码结构、依赖关系、提交历史等上下文信息,让智能体能够做出更智能的决策。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-1.png" alt="" width="85%"/>
-  <p>图 10.1 MCP设计思想</p>
+  <p>图 10.1 MCP 设计思想</p>
 </div>
 
 <strong>(2)A2A:智能体间的对话</strong>
 
-A2A(Agent-to-Agent Protocol)协议由Google团队提出<sup>2</sup>,其核心设计理念是<strong>实现智能体之间的点对点通信</strong>。与MCP关注智能体与工具的通信不同,A2A关注的是智能体之间如何相互协作。这种设计让智能体能够像人类团队一样进行对话、协商和协作。
+A2A(Agent-to-Agent Protocol)协议由 Google 团队提出<sup>2</sup>,其核心设计理念是<strong>实现智能体之间的点对点通信</strong>。与 MCP 关注智能体与工具的通信不同,A2A 关注的是智能体之间如何相互协作。这种设计让智能体能够像人类团队一样进行对话、协商和协作。
 
-A2A的设计哲学是"对等通信"。如图10.2所示,在A2A网络中,每个智能体既是服务提供者,也是服务消费者。智能体可以主动发起请求,也可以响应其他智能体的请求。这种对等的设计避免了中心化协调器的瓶颈,让智能体网络更加灵活和可扩展。
+A2A 的设计哲学是"对等通信"。如图 10.2 所示,在 A2A 网络中,每个智能体既是服务提供者,也是服务消费者。智能体可以主动发起请求,也可以响应其他智能体的请求。这种对等的设计避免了中心化协调器的瓶颈,让智能体网络更加灵活和可扩展。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-2.png" alt="" width="85%"/>
-  <p>图 10.2 A2A设计思想</p>
+  <p>图 10.2 A2A 设计思想</p>
 </div>
 
 <strong>(3)ANP:智能体网络的基础设施</strong>
 
-ANP(Agent Network Protocol)是一个概念性的协议框架<sup>3</sup>,目前由开源社区维护,还没有成熟的生态,其核心设计理念是<strong>构建大规模智能体网络的基础设施</strong>。如果说MCP解决的是"如何访问工具",A2A解决的是"如何与其他智能体对话",那么ANP解决的是"如何在大规模网络中发现和连接智能体"。
+ANP(Agent Network Protocol)是一个概念性的协议框架<sup>3</sup>,目前由开源社区维护,还没有成熟的生态,其核心设计理念是<strong>构建大规模智能体网络的基础设施</strong>。如果说 MCP 解决的是"如何访问工具",A2A 解决的是"如何与其他智能体对话",那么 ANP 解决的是"如何在大规模网络中发现和连接智能体"。
 
-ANP的设计哲学是"去中心化服务发现"。在一个包含成百上千个智能体的网络中,如何让智能体能够找到它需要的服务?如图10.3所示,ANP提供了服务注册、发现和路由机制,让智能体能够动态地发现网络中的其他服务,而不需要预先配置所有的连接关系。
+ANP 的设计哲学是"去中心化服务发现"。在一个包含成百上千个智能体的网络中,如何让智能体能够找到它需要的服务?如图 10.3 所示,ANP 提供了服务注册、发现和路由机制,让智能体能够动态地发现网络中的其他服务,而不需要预先配置所有的连接关系。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-3.png" alt="" width="85%"/>
-  <p>图 10.3 ANP设计思想</p>
+  <p>图 10.3 ANP 设计思想</p>
 </div>
 
-最后在表10.1中,让我们通过一个对比表格来更清晰地理解这三种协议的差异:
+最后在表 10.1 中,让我们通过一个对比表格来更清晰地理解这三种协议的差异:
 
 <div align="center">
   <p>表 10.1 三种协议对比</p>
@@ -125,7 +129,7 @@ ANP的设计哲学是"去中心化服务发现"。在一个包含成百上千个
 
 <strong>(4)如何选择合适的协议?</strong>
 
-目前的协议还处于发展早期,MCP的生态相对成熟,不过各种工具的时效性取决于维护者,更推荐选择大公司背书的MCP工具。
+目前的协议还处于发展早期,MCP 的生态相对成熟,不过各种工具的时效性取决于维护者,更推荐选择大公司背书的 MCP 工具。
 
 选择协议的关键在于理解你的需求:
 
@@ -133,22 +137,22 @@ ANP的设计哲学是"去中心化服务发现"。在一个包含成百上千个
 - 如果你需要多个智能体相互协作完成任务,选择<strong>A2A</strong>
 - 如果你要构建大规模的智能体生态系统,考虑<strong>ANP</strong>
 
-### 10.1.3 HelloAgents通信协议架构设计
+### 10.1.3 HelloAgents 通信协议架构设计
 
-在理解了三种协议的设计理念后,让我们看看如何在HelloAgents框架中实现和使用它们。我们的设计目标是:<strong>让学习者能够以最简单的方式使用这些协议,同时保持足够的灵活性以应对复杂场景</strong>。
+在理解了三种协议的设计理念后,让我们看看如何在 HelloAgents 框架中实现和使用它们。我们的设计目标是:<strong>让学习者能够以最简单的方式使用这些协议,同时保持足够的灵活性以应对复杂场景</strong>。
 
-如图10.4所示,HelloAgents的通信协议架构采用三层设计,从底层到上层分别是:协议实现层、工具封装层和智能体集成层。
+如图 10.4 所示,HelloAgents 的通信协议架构采用三层设计,从底层到上层分别是:协议实现层、工具封装层和智能体集成层。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-4.png" alt="" width="85%"/>
-  <p>图 10.4 HelloAgents通信协议设计</p>
+  <p>图 10.4 HelloAgents 通信协议设计</p>
 </div>
 
-<strong>(1)协议实现层</strong>:这一层包含了三种协议的具体实现。MCP基于FastMCP库实现,提供客户端和服务器功能;A2A基于Google官方的a2a-sdk实现;ANP是我们自研的轻量级实现,提供服务发现和网络管理功能,当然目前也有官方的[实现](https://github.com/agent-network-protocol/AgentConnect),考虑到后期的迭代,因此这里只做概念的模拟。
+<strong>(1)协议实现层</strong>:这一层包含了三种协议的具体实现。MCP 基于 FastMCP 库实现,提供客户端和服务器功能;A2A 基于 Google 官方的 a2a-sdk 实现;ANP 是我们自研的轻量级实现,提供服务发现和网络管理功能,当然目前也有官方的[实现](https://github.com/agent-network-protocol/AgentConnect),考虑到后期的迭代,因此这里只做概念的模拟。
 
-<strong>(2)工具封装层</strong>:这一层将协议实现封装成统一的Tool接口。MCPTool、A2ATool和ANPTool都继承自BaseTool,提供一致的`run()`方法。这种设计让智能体能够以相同的方式使用不同的协议。
+<strong>(2)工具封装层</strong>:这一层将协议实现封装成统一的 Tool 接口。MCPTool、A2ATool  ANPTool 都继承自 BaseTool,提供一致的`run()`方法。这种设计让智能体能够以相同的方式使用不同的协议。
 
-<strong>(3)智能体集成层</strong>:这一层是智能体与协议的集成点。所有的智能体(ReActAgent、SimpleAgent等)都通过Tool System来使用协议工具,无需关心底层的协议细节。
+<strong>(3)智能体集成层</strong>:这一层是智能体与协议的集成点。所有的智能体(ReActAgent、SimpleAgent 等)都通过 Tool System 来使用协议工具,无需关心底层的协议细节。
 
 ### 10.1.4 本章学习目标与快速体验
 
@@ -211,101 +215,101 @@ print("A2A工具创建成功")
 这个简单的示例展示了三种协议的核心功能。在接下来的章节中,我们将深入学习每种协议的详细用法和最佳实践。
 
 
-## 10.2 MCP协议实战
+## 10.2 MCP 协议实战
 
-现在,让我们深入学习MCP,掌握如何让智能体访问外部工具和资源。
+现在,让我们深入学习 MCP,掌握如何让智能体访问外部工具和资源。
 
-### 10.2.1 MCP协议概念介绍
+### 10.2.1 MCP 协议概念介绍
 
 <strong>(1)MCP:智能体的"USB-C"</strong>
 
 想象一下,你的智能体可能需要同时做很多事情,例如:
 - 读取本地文件系统的文档
-- 查询PostgreSQL数据库
-- 搜索GitHub上的代码
-- 发送Slack消息
-- 访问Google Drive
+- 查询 PostgreSQL 数据库
+- 搜索 GitHub 上的代码
+- 发送 Slack 消息
+- 访问 Google Drive
 
-传统方式下,你需要为每个服务编写适配器代码,处理不同的API、认证方式、错误处理等。这不仅工作量大,而且难以维护。更重要的是,不同LLM平台的function call实现差异巨大,切换模型时需要重写大量代码。
+传统方式下,你需要为每个服务编写适配器代码,处理不同的 API、认证方式、错误处理等。这不仅工作量大,而且难以维护。更重要的是,不同 LLM 平台的 function call 实现差异巨大,切换模型时需要重写大量代码。
 
-MCP的出现改变了这一切。它就像USB-C统一了各种设备的连接方式一样,<strong>MCP统一了智能体与外部工具的交互方式</strong>。无论你使用Claude、GPT还是其他模型,只要它们支持MCP协议,就能无缝访问相同的工具和资源。
+MCP 的出现改变了这一切。它就像 USB-C 统一了各种设备的连接方式一样,<strong>MCP 统一了智能体与外部工具的交互方式</strong>。无论你使用 Claude、GPT 还是其他模型,只要它们支持 MCP 协议,就能无缝访问相同的工具和资源。
 
-<strong>(2)MCP架构</strong>
+<strong>(2)MCP 架构</strong>
 
-MCP协议采用Host、Client、Servers三层架构设计,让我们通过图10.5的场景来理解这些组件如何协同工作。
+MCP 协议采用 Host、Client、Servers 三层架构设计,让我们通过图 10.5 的场景来理解这些组件如何协同工作。
 
-假设你正在使用Claude Desktop询问:"我桌面上有哪些文档?"
+假设你正在使用 Claude Desktop 询问:"我桌面上有哪些文档?"
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-5.png" alt="" width="85%"/>
-  <p>图 10.5 MCP案例演示</p>
+  <p>图 10.5 MCP 案例演示</p>
 </div>
 
 <strong>三层架构的职责:</strong>
 
-1. <strong>Host(宿主层)</strong>:Claude Desktop作为Host,负责接收用户提问并与Claude模型交互。Host是用户直接交互的界面,它管理整个对话流程。
+1. <strong>Host(宿主层)</strong>:Claude Desktop 作为 Host,负责接收用户提问并与 Claude 模型交互。Host 是用户直接交互的界面,它管理整个对话流程。
 
-2. <strong>Client(客户端层)</strong>:当Claude模型决定需要访问文件系统时,Host中内置的MCP Client被激活。Client负责与适当的MCP Server建立连接,发送请求并接收响应。
+2. <strong>Client(客户端层)</strong>:当 Claude 模型决定需要访问文件系统时,Host 中内置的 MCP Client 被激活。Client 负责与适当的 MCP Server 建立连接,发送请求并接收响应。
 
-3. <strong>Server(服务器层)</strong>:文件系统MCP Server被调用,执行实际的文件扫描操作,访问桌面目录,并返回找到的文档列表。
+3. <strong>Server(服务器层)</strong>:文件系统 MCP Server 被调用,执行实际的文件扫描操作,访问桌面目录,并返回找到的文档列表。
 
-<strong>完整的交互流程:</strong>用户问题 → Claude Desktop(Host) → Claude模型分析 → 需要文件信息 → MCP Client连接 → 文件系统MCP Server → 执行操作 → 返回结果 → Claude生成回答 → 显示在Claude Desktop上
+<strong>完整的交互流程:</strong>用户问题 → Claude Desktop(Host) → Claude 模型分析 → 需要文件信息 → MCP Client 连接 → 文件系统 MCP Server → 执行操作 → 返回结果 → Claude 生成回答 → 显示在 Claude Desktop 
 
-这种架构设计的优势在于<strong>关注点分离</strong>:Host专注于用户体验,Client专注于协议通信,Server专注于具体功能实现。开发者只需专注于开发对应的MCP Server,无需关心Host和Client的实现细节。
+这种架构设计的优势在于<strong>关注点分离</strong>:Host 专注于用户体验,Client 专注于协议通信,Server 专注于具体功能实现。开发者只需专注于开发对应的 MCP Server,无需关心 Host  Client 的实现细节。
 
-<strong>(3)MCP的核心能力</strong>
+<strong>(3)MCP 的核心能力</strong>
 
-如表10.2所示,MCP协议提供了三大核心能力,构成完整的工具访问框架:
+如表 10.2 所示,MCP 协议提供了三大核心能力,构成完整的工具访问框架:
 
 <div align="center">
-  <p>表 10.2 MCP核心能力</p>
+  <p>表 10.2 MCP 核心能力</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-2.png" alt="" width="85%"/>
 </div>
 
-这三种能力的区别在于:<strong>Tools是主动的</strong>(执行操作),<strong>Resources是被动的</strong>(提供数据),<strong>Prompts是指导性的</strong>(提供模板)。
+这三种能力的区别在于:<strong>Tools 是主动的</strong>(执行操作),<strong>Resources 是被动的</strong>(提供数据),<strong>Prompts 是指导性的</strong>(提供模板)。
 
-<strong>(4)MCP的工作流程</strong>
+<strong>(4)MCP 的工作流程</strong>
 
-让我们通过一个具体例子来理解MCP的完整工作流程,如图10.6所示:
+让我们通过一个具体例子来理解 MCP 的完整工作流程,如图 10.6 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-6.png" alt="" width="85%"/>
-  <p>图 10.6 MCP案例演示</p>
+  <p>图 10.6 MCP 案例演示</p>
 </div>
 
-一个关键问题是:<strong>Claude(或其他LLM)是如何决定使用哪些工具的?</strong> 
+一个关键问题是:<strong>Claude(或其他 LLM)是如何决定使用哪些工具的?</strong> 
 
 当用户提出问题时,完整的工具选择流程如下:
 
-1. <strong>工具发现阶段</strong>:MCP Client连接到Server后,首先调用`list_tools()`获取所有可用工具的描述信息(包括工具名称、功能说明、参数定义)
+1. <strong>工具发现阶段</strong>:MCP Client 连接到 Server 后,首先调用`list_tools()`获取所有可用工具的描述信息(包括工具名称、功能说明、参数定义)
 
-2. <strong>上下文构建</strong>:Client将工具列表转换为LLM能理解的格式,添加到系统提示词中。例如:
+2. <strong>上下文构建</strong>:Client 将工具列表转换为 LLM 能理解的格式,添加到系统提示词中。例如:
    ```
    你可以使用以下工具:
    - read_file(path: str): 读取指定路径的文件内容
    - search_code(query: str, language: str): 在代码库中搜索
    ```
 
-3. <strong>模型推理</strong>:LLM分析用户问题和可用工具,决定是否需要调用工具以及调用哪个工具。这个决策基于工具的描述和当前对话上下文
+3. <strong>模型推理</strong>:LLM 分析用户问题和可用工具,决定是否需要调用工具以及调用哪个工具。这个决策基于工具的描述和当前对话上下文
 
-4. <strong>工具执行</strong>:如果LLM决定使用工具,Client通过MCP Server执行所选工具,获取结果
+4. <strong>工具执行</strong>:如果 LLM 决定使用工具,Client 通过 MCP Server 执行所选工具,获取结果
 
-5. <strong>结果整合</strong>:工具执行结果被送回给LLM,LLM结合结果生成最终回答
+5. <strong>结果整合</strong>:工具执行结果被送回给 LLM,LLM 结合结果生成最终回答
 
-这个过程是<strong>完全自动化</strong>的,LLM会根据工具描述的质量来决定是否使用以及如何使用工具。因此,编写清晰、准确的工具描述至关重要。
+这个过程是<strong>完全自动化</strong>的,LLM 会根据工具描述的质量来决定是否使用以及如何使用工具。因此,编写清晰、准确的工具描述至关重要。
 
-<strong>(5)MCP与Function Calling的差异</strong>
+<strong>(5)MCP  Function Calling 的差异</strong>
 
-很多开发者会问:<strong>我已经在用Function Calling了,为什么还需要MCP?</strong> 让我们通过表10.3来理解它们的区别。
+很多开发者会问:<strong>我已经在用 Function Calling 了,为什么还需要 MCP?</strong> 让我们通过表 10.3 来理解它们的区别。
 
 <div align="center">
   <p>表 10.3 Function Calling 与 MCP 对比</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-3.png" alt="" width="85%"/>
 </div>
 
-这里我们以智能体需要访问GitHub仓库和本地文件系统为例子来详细对比同一个任务的两种实现
+这里我们以智能体需要访问 GitHub 仓库和本地文件系统为例子来详细对比同一个任务的两种实现
 
-<strong>方式1:使用Function Calling</strong>
+<strong>方式 1:使用 Function Calling</strong>
 
 ```python
 # 步骤1:为每个LLM提供商定义函数
@@ -363,7 +367,7 @@ if response.content[0].type == "tool_use":
     result = search_github(**tool_use.input)
 ```
 
-<strong>方式2:使用MCP</strong>
+<strong>方式 2:使用 MCP</strong>
 
 ```python
 from hello_agents.protocols import MCPClient
@@ -398,13 +402,13 @@ async with github_client:
 
 了解了它们之间的互补关系后,我们接下来看看如何在 HelloAgents 中使用 MCP 协议。
 
-### 10.2.2 使用MCP客户端
+### 10.2.2 使用 MCP 客户端
 
-HelloAgents基于FastMCP 2.0实现了完整的MCP客户端功能。我们提供了异步和同步两种API,以适应不同的使用场景。对于大多数应用,推荐使用异步API,它能更好地处理并发请求和长时间运行的操作。下面我们将提供一个拆解的操作演示。
+HelloAgents 基于 FastMCP 2.0 实现了完整的 MCP 客户端功能。我们提供了异步和同步两种 API,以适应不同的使用场景。对于大多数应用,推荐使用异步 API,它能更好地处理并发请求和长时间运行的操作。下面我们将提供一个拆解的操作演示。
 
-<strong>(1)连接到MCP服务器</strong>
+<strong>(1)连接到 MCP 服务器</strong>
 
-MCP客户端支持多种连接方式,最常用的是Stdio模式(通过标准输入输出与本地进程通信):
+MCP 客户端支持多种连接方式,最常用的是 Stdio 模式(通过标准输入输出与本地进程通信):
 
 ```python
 import asyncio
@@ -481,7 +485,7 @@ asyncio.run(discover_tools())
 
 <strong>(3)调用工具</strong>
 
-调用工具时,只需提供工具名称和符合JSON Schema的参数:
+调用工具时,只需提供工具名称和符合 JSON Schema 的参数:
 
 ```python
 async def use_tools():
@@ -506,7 +510,7 @@ async def use_tools():
 asyncio.run(use_tools())
 ```
 
-在这里提供一种更为安全的方式来调用MCP服务,可供参考:
+在这里提供一种更为安全的方式来调用 MCP 服务,可供参考:
 
 ```python
 async def safe_tool_call():
@@ -526,7 +530,7 @@ asyncio.run(safe_tool_call())
 
 <strong>(4)访问资源</strong>
 
-除了工具,MCP服务器还可以提供资源(Resources):
+除了工具,MCP 服务器还可以提供资源(Resources):
 
 ```python
 # 列出可用资源
@@ -540,7 +544,7 @@ print(f"资源内容:{resource_content}")
 
 <strong>(5)使用提示模板</strong>
 
-MCP服务器可以提供预定义的提示模板(Prompts):
+MCP 服务器可以提供预定义的提示模板(Prompts):
 
 ```python
 # 列出可用提示
@@ -552,9 +556,9 @@ prompt = client.get_prompt("code_review", {"language": "python"})
 print(f"提示内容:{prompt}")
 ```
 
-<strong>(6)完整示例:使用GitHub MCP服务</strong>
+<strong>(6)完整示例:使用 GitHub MCP 服务</strong>
 
-让我们通过一个完整的例子来看如何使用社区提供的GitHub MCP服务,我们将采用封装好的MCP Tools来:
+让我们通过一个完整的例子来看如何使用社区提供的 GitHub MCP 服务,我们将采用封装好的 MCP Tools 来:
 
 ```python
 """
@@ -592,16 +596,16 @@ print(result)
 
 ```
 
-### 10.2.3 MCP传输方式详解
+### 10.2.3 MCP 传输方式详解
 
-MCP协议的一个重要特性是<strong>传输层无关性</strong>(Transport Agnostic)。这意味着MCP协议本身不依赖于特定的传输方式,可以在不同的通信通道上运行。HelloAgents基于FastMCP 2.0,提供了完整的传输方式支持,让你可以根据实际场景选择最合适的传输模式。
+MCP 协议的一个重要特性是<strong>传输层无关性</strong>(Transport Agnostic)。这意味着 MCP 协议本身不依赖于特定的传输方式,可以在不同的通信通道上运行。HelloAgents 基于 FastMCP 2.0,提供了完整的传输方式支持,让你可以根据实际场景选择最合适的传输模式。
 
 <strong>(1)传输方式概览</strong>
 
-HelloAgents的`MCPClient`支持五种传输方式,每种都有不同的使用场景,如表10.4所示:
+HelloAgents 的`MCPClient`支持五种传输方式,每种都有不同的使用场景,如表 10.4 所示:
 
 <div align="center">
-  <p>表 10.4 MCP传输方式对比</p>
+  <p>表 10.4 MCP 传输方式对比</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-4.png" alt="" width="85%"/>
 </div>
 
@@ -741,9 +745,9 @@ async def test_sse_transport():
 # asyncio.run(test_sse_transport())
 ```
 
-<strong>(7)StreamableHTTP Transport - 流式HTTP传输</strong>
+<strong>(7)StreamableHTTP Transport - 流式 HTTP 传输</strong>
 
-适用场景:需要双向流式通信的HTTP场景
+适用场景:需要双向流式通信的 HTTP 场景
 
 ```python
 # 注意:MCPTool 主要用于 Stdio 和 Memory 传输
@@ -768,17 +772,17 @@ async def test_streamable_http_transport():
 # asyncio.run(test_streamable_http_transport())
 ```
 
-### 10.2.4 在智能体中使用MCP工具
+### 10.2.4 在智能体中使用 MCP 工具
 
-前面我们学习了如何直接使用MCP客户端。但在实际应用中,我们更希望让智能体<strong>自动</strong>调用MCP工具,而不是手动编写调用代码。HelloAgents提供了`MCPTool`包装器,让MCP服务器无缝集成到智能体的工具链中。
+前面我们学习了如何直接使用 MCP 客户端。但在实际应用中,我们更希望让智能体<strong>自动</strong>调用 MCP 工具,而不是手动编写调用代码。HelloAgents 提供了`MCPTool`包装器,让 MCP 服务器无缝集成到智能体的工具链中。
 
-<strong>(1)MCP工具的自动展开机制</strong>
+<strong>(1)MCP 工具的自动展开机制</strong>
 
-HelloAgents的`MCPTool`有一个特性:<strong>自动展开</strong>。当你添加一个MCP工具到Agent时,它会自动将MCP服务器提供的所有工具展开为独立的工具,让Agent可以像调用普通工具一样调用它们。
+HelloAgents 的`MCPTool`有一个特性:<strong>自动展开</strong>。当你添加一个 MCP 工具到 Agent 时,它会自动将 MCP 服务器提供的所有工具展开为独立的工具,让 Agent 可以像调用普通工具一样调用它们。
 
-<strong>方式1:使用内置演示服务器</strong>
+<strong>方式 1:使用内置演示服务器</strong>
 
-我们在之前实现过计算器的工具函数,在这里将他转化为MCP的服务。这是最简单的使用方式。
+我们在之前实现过计算器的工具函数,在这里将他转化为 MCP 的服务。这是最简单的使用方式。
 
 ```python
 from hello_agents import SimpleAgent, HelloAgentsLLM
@@ -805,11 +809,11 @@ print(response)  # 输出:25 乘以 16 的结果是 400
 - `calculator_greet` - 友好问候
 - `calculator_get_system_info` - 获取系统信息
 
-Agent调用时只需提供参数,例如:`[TOOL_CALL:calculator_multiply:a=25,b=16]`,系统会自动处理类型转换和MCP调用。
+Agent 调用时只需提供参数,例如:`[TOOL_CALL:calculator_multiply:a=25,b=16]`,系统会自动处理类型转换和 MCP 调用。
 
-<strong>方式2:连接外部MCP服务器</strong>
+<strong>方式 2:连接外部 MCP 服务器</strong>
 
-在实际项目中,你需要连接到功能更强大的MCP服务器。这些服务器可以是:
+在实际项目中,你需要连接到功能更强大的 MCP 服务器。这些服务器可以是:
 - <strong>社区提供的官方服务器</strong>(如文件系统、GitHub、数据库等)
 - <strong>你自己编写的自定义服务器</strong>(封装业务逻辑)
 
@@ -841,11 +845,11 @@ response = agent.run("请读取my_README.md文件,并总结其中的主要内
 print(response)
 ```
 
-当使用多个MCP服务器时,务必为每个MCPTool指定不同的name,这个name会作为前缀添加到展开的工具名前,避免冲突。例如:`name="fs"` 会展开为 `fs_read_file`、`fs_write_file` 等。如果你需要编写自己的MCP服务器来封装特定的业务逻辑,请参考10.5节内容。
+当使用多个 MCP 服务器时,务必为每个 MCPTool 指定不同的 name,这个 name 会作为前缀添加到展开的工具名前,避免冲突。例如:`name="fs"` 会展开为 `fs_read_file`、`fs_write_file` 等。如果你需要编写自己的 MCP 服务器来封装特定的业务逻辑,请参考 10.5 节内容。
 
-<strong>(2)MCP工具自动展开的工作原理</strong>
+<strong>(2)MCP 工具自动展开的工作原理</strong>
 
-理解自动展开机制有助于你更好地使用MCP工具。让我们深入了解它是如何工作的:
+理解自动展开机制有助于你更好地使用 MCP 工具。让我们深入了解它是如何工作的:
 
 ```python
 # 用户代码
@@ -1024,42 +1028,42 @@ except Exception as e:
 
 ```
 
-`github_searcher`会在这个过程中调用`gh_search_repositories`搜索GitHub项目。得到的结果会返回给`document_writer`当做输入,进一步指导报告的生成,最后保存报告到report.md。
+`github_searcher`会在这个过程中调用`gh_search_repositories`搜索 GitHub 项目。得到的结果会返回给`document_writer`当做输入,进一步指导报告的生成,最后保存报告到 report.md。
 
-### 10.2.5 MCP社区生态
+### 10.2.5 MCP 社区生态
 
-MCP协议的一个巨大优势是<strong>丰富的社区生态</strong>。Anthropic和社区开发者已经创建了大量现成的MCP服务器,涵盖文件系统、数据库、API服务等各种场景。这意味着你不需要从零开始编写工具适配器,可以直接使用这些经过验证的服务器。
+MCP 协议的一个巨大优势是<strong>丰富的社区生态</strong>。Anthropic 和社区开发者已经创建了大量现成的 MCP 服务器,涵盖文件系统、数据库、API 服务等各种场景。这意味着你不需要从零开始编写工具适配器,可以直接使用这些经过验证的服务器。
 
-这里给出MCP社区的三个资源库:
+这里给出 MCP 社区的三个资源库:
 
 1. <strong>Awesome MCP Servers</strong> (https://github.com/punkpeye/awesome-mcp-servers)
-   - 社区维护的MCP服务器精选列表
+   - 社区维护的 MCP 服务器精选列表
    - 包含各种第三方服务器
    - 按功能分类,易于查找
 
 2. <strong>MCP Servers Website</strong> (https://mcpservers.org/)
-   - 官方MCP服务器目录网站
+   - 官方 MCP 服务器目录网站
    - 提供搜索和筛选功能
    - 包含使用说明和示例
 
 3. <strong>Official MCP Servers</strong> (https://github.com/modelcontextprotocol/servers)
-   - Anthropic官方维护的服务器
+   - Anthropic 官方维护的服务器
    - 质量最高、文档最完善
    - 包含常用服务的实现
 
-表10.5和10.6给出常用的官方MCP服务器和社区热门MCP服务器:
+表 10.5  10.6 给出常用的官方 MCP 服务器和社区热门 MCP 服务器:
 
 <div align="center">
-  <p>表 10.5 常用官方MCP服务器</p>
+  <p>表 10.5 常用官方 MCP 服务器</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-5.png" alt="" width="85%"/>
 </div>
 
 <div align="center">
-  <p>表 10.6 社区热门MCP服务器</p>
+  <p>表 10.6 社区热门 MCP 服务器</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-6.png" alt="" width="85%"/>
 </div>
 
-以下是一些特别有趣的案例TODO可供参考:
+以下是一些特别有趣的案例 TODO 可供参考:
 
 1. <strong>自动化网页测试(Playwright)</strong>
    
@@ -1103,15 +1107,15 @@ MCP协议的一个巨大优势是<strong>丰富的社区生态</strong>。Anthro
    # - 播放背景音乐(Spotify)
    ```
 
-通过这一节内容的讲解,希望你能探索更多MCP的实现案例,也欢迎投稿至Helloagents!接下来,让我们学习 A2A 协议。
+通过这一节内容的讲解,希望你能探索更多 MCP 的实现案例,也欢迎投稿至 Helloagents!接下来,让我们学习 A2A 协议。
 
-## 10.3 A2A协议实战
+## 10.3 A2A 协议实战
 
 A2A(Agent-to-Agent)是一种支持智能体之间直接通信与协作的协议。
 
 ### 10.3.1 协议设计动机
 
-MCP协议解决了智能体与工具的交互,而A2A协议则解决智能体之间的协作问题。在一个需要多智能体(如研究员、撰写员、编辑)协作的任务中,它们需要通信、委托任务、协商能力和同步状态。
+MCP 协议解决了智能体与工具的交互,而 A2A 协议则解决智能体之间的协作问题。在一个需要多智能体(如研究员、撰写员、编辑)协作的任务中,它们需要通信、委托任务、协商能力和同步状态。
 
 传统的中央协调器(星型拓扑)方案存在三个主要问题:
 
@@ -1119,37 +1123,37 @@ MCP协议解决了智能体与工具的交互,而A2A协议则解决智能体
 - <strong>性能瓶颈</strong>:所有通信都经过中心节点,限制了并发。
 - <strong>扩展困难</strong>:增加或修改智能体需要改动中心逻辑。
 
-A2A协议采用点对点(P2P)架构(网状拓拓),允许智能体直接通信,从根本上解决了上述问题。它的核心是<strong>任务(Task)</strong>和<strong>工件(Artifact)</strong>这两个抽象概念,这是它与MCP最大的区别,如表10.7所示。
+A2A 协议采用点对点(P2P)架构(网状拓拓),允许智能体直接通信,从根本上解决了上述问题。它的核心是<strong>任务(Task)</strong>和<strong>工件(Artifact)</strong>这两个抽象概念,这是它与 MCP 最大的区别,如表 10.7 所示。
 
 <div align="center">
-  <p>表 10.7 A2A核心概念</p>
+  <p>表 10.7 A2A 核心概念</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-7.png" alt="" width="85%"/>
 </div>
 
-为实现对协作过程的管理,A2A为任务定义了标准化的生命周期,包括创建、协商、代理、执行中、完成、失败等状态,可见图10.7。
+为实现对协作过程的管理,A2A 为任务定义了标准化的生命周期,包括创建、协商、代理、执行中、完成、失败等状态,可见图 10.7。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-7.png" alt="" width="85%"/>
-  <p>图 10.7 A2A任务周期</p>
+  <p>图 10.7 A2A 任务周期</p>
 </div>
 
 
 该机制使智能体可以进行任务协商、进度跟踪和异常处理。
 
-A2A 请求生命周期是一个序列,详细说明了请求遵循的四个主要步骤:代理发现、身份验证、发送消息 API 和发送消息流 API。下图10.8借鉴了官网的流程图,用来展示了操作流程,说明了客户端、A2A 服务器和身份验证服务器之间的交互。
+A2A 请求生命周期是一个序列,详细说明了请求遵循的四个主要步骤:代理发现、身份验证、发送消息 API 和发送消息流 API。下图 10.8 借鉴了官网的流程图,用来展示了操作流程,说明了客户端、A2A 服务器和身份验证服务器之间的交互。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-8.png" alt="" width="85%"/>
-  <p>图 10.8 A2A请求生命周期</p>
+  <p>图 10.8 A2A 请求生命周期</p>
 </div>
 
-### 10.3.2 使用A2A协议实战
+### 10.3.2 使用 A2A 协议实战
 
-A2A现有实现大部分为`Sample Code`,并且即使有Python的实现也较为繁琐,因此这里我们只采用模拟协议思想的方式,通过A2A-SDK来继承部分功能实现。
+A2A 现有实现大部分为`Sample Code`,并且即使有 Python 的实现也较为繁琐,因此这里我们只采用模拟协议思想的方式,通过 A2A-SDK 来继承部分功能实现。
 
 <strong>(2)创建简单的 A2A 智能体</strong>
 
-让我们创建一个A2A的智能体,同样是计算器案例作为演示:
+让我们创建一个 A2A 的智能体,同样是计算器案例作为演示:
 
 ```python
 from hello_agents.protocols.a2a.implementation import A2AServer, A2A_AVAILABLE
@@ -1298,9 +1302,9 @@ if custom_agent:
 
 HelloAgents 提供了统一的 A2A 工具接口。
 
-<strong>(1)创建A2A Agent服务端</strong>
+<strong>(1)创建 A2A Agent 服务端</strong>
 
-首先,让我们创建一个Agent服务端:
+首先,让我们创建一个 Agent 服务端:
 
 ```python
 from hello_agents.protocols import A2AServer
@@ -1347,7 +1351,7 @@ if __name__ == "__main__":
         print("\n服务已停止")
 ```
 
-<strong>(2)创建A2A Agent客户端</strong>
+<strong>(2)创建 A2A Agent 客户端</strong>
 
 现在,让我们创建一个客户端来与服务端通信:
 
@@ -1365,9 +1369,9 @@ print(f"收到响应:{response.get('result')}")
 # 收到响应:{'topic': 'AI在医疗领域的应用', 'findings': '关于AI在医疗领域的应用的研究结果...', 'sources': ['来源1', '来源2', '来源3']}
 ```
 
-<strong>(3)创建Agent网络</strong>
+<strong>(3)创建 Agent 网络</strong>
 
-对于多个Agent的协作,我们可以让多个Agent相互连接:
+对于多个 Agent 的协作,我们可以让多个 Agent 相互连接:
 
 ```python
 from hello_agents.protocols import A2AServer, A2AClient
@@ -1457,11 +1461,11 @@ result = create_content("AI在医疗领域的应用")
 print(f"\n最终结果:\n{result}")
 ```
 
-### 10.3.4 在智能体中使用A2A工具
+### 10.3.4 在智能体中使用 A2A 工具
 
-现在让我们看看如何将A2A集成到HelloAgents的智能体中。
+现在让我们看看如何将 A2A 集成到 HelloAgents 的智能体中。
 
-<strong>(1)使用A2ATool包装器</strong>
+<strong>(1)使用 A2ATool 包装器</strong>
 
 ```python
 from hello_agents import SimpleAgent, HelloAgentsLLM
@@ -1491,7 +1495,7 @@ print(response)
 
 <strong>(2)实战案例:智能客服系统</strong>
 
-让我们构建一个完整的智能客服系统,包含三个Agent:
+让我们构建一个完整的智能客服系统,包含三个 Agent:
 - <strong>接待员</strong>:分析客户问题类型
 - <strong>技术专家</strong>:回答技术问题
 - <strong>销售顾问</strong>:回答销售问题
@@ -1582,9 +1586,9 @@ if __name__ == "__main__":
     handle_customer_query("如何集成到我的Python项目中?")
 ```
 
-<strong>(3)高级用法:Agent间协商</strong>
+<strong>(3)高级用法:Agent 间协商</strong>
 
-A2A协议还支持Agent间的协商机制:
+A2A 协议还支持 Agent 间的协商机制:
 
 ```python
 from hello_agents.protocols import A2AServer, A2AClient
@@ -1651,11 +1655,11 @@ threading.Thread(target=lambda: agent1.run(port=7000), daemon=True).start()
 threading.Thread(target=lambda: agent2.run(port=7001), daemon=True).start()
 ```
 
-## 10.4 ANP协议实战
+## 10.4 ANP 协议实战
 
-在MCP协议解决了工具调用、A2A协议解决点对点智能体协作之后,ANP协议则专注于解决大规模、开放网络环境下的智能体管理问题。
+在 MCP 协议解决了工具调用、A2A 协议解决点对点智能体协作之后,ANP 协议则专注于解决大规模、开放网络环境下的智能体管理问题。
 
-在10.2和10.3节中,我们学习了MCP(工具访问)和A2A(智能体协作)。现在,让我们学习ANP(Agent Network Protocol)协议,它专注于构建<strong>大规模、开放的智能体网络</strong>。
+在 10.2  10.3 节中,我们学习了 MCP(工具访问)和 A2A(智能体协作)。现在,让我们学习 ANP(Agent Network Protocol)协议,它专注于构建<strong>大规模、开放的智能体网络</strong>。
 
 ### 10.4.1 协议目标
 
@@ -1665,34 +1669,34 @@ threading.Thread(target=lambda: agent2.run(port=7001), daemon=True).start()
 - <strong>智能路由</strong>:如果多个智能体都能处理同一任务,如何选择最合适的一个(如根据负载、成本等)并向其分派任务?
 - <strong>动态扩展</strong>:如何让新加入网络的智能体被其他成员发现和调用?
 
-ANP的设计目标就是提供一套标准化的机制,来解决上述的服务发现、路由选择和网络扩展性问题。
+ANP 的设计目标就是提供一套标准化的机制,来解决上述的服务发现、路由选择和网络扩展性问题。
 
-为实现其设计目标,ANP定义了以下几个核心概念,如表10.8所示:
+为实现其设计目标,ANP 定义了以下几个核心概念,如表 10.8 所示:
 
 <div align="center">
-  <p>表 10.8 ANP核心概念</p>
+  <p>表 10.8 ANP 核心概念</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-table-8.png" alt="" width="85%"/>
 </div>
 
-我们同样借用官方的[入门指南](https://github.com/agent-network-protocol/AgentNetworkProtocol/blob/main/docs/chinese/ANP入门指南.md)来介绍ANP的架构设计,如图10.9所示
+我们同样借用官方的[入门指南](https://github.com/agent-network-protocol/AgentNetworkProtocol/blob/main/docs/chinese/ANP入门指南.md)来介绍 ANP 的架构设计,如图 10.9 所示
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-9.png" alt="" width="85%"/>
-  <p>图 10.9 ANP整体流程</p>
+  <p>图 10.9 ANP 整体流程</p>
 </div>
 
 
 在这个流程图里,主要包括以下几个步骤:
 
-<strong>1. 服务的发现与匹配:</strong>首先,智能体A通过一个公开的发现服务,基于语义或功能描述进行查询,以定位到符合其任务需求的智能体B。该发现服务通过预先爬取各智能体对外暴露的标准端点(`.well-known/agent-descriptions`)来建立索引,从而实现服务需求方与提供方的动态匹配。
+<strong>1. 服务的发现与匹配:</strong>首先,智能体 A 通过一个公开的发现服务,基于语义或功能描述进行查询,以定位到符合其任务需求的智能体 B。该发现服务通过预先爬取各智能体对外暴露的标准端点(`.well-known/agent-descriptions`)来建立索引,从而实现服务需求方与提供方的动态匹配。
 
-<strong>2. 基于DID的身份验证:</strong>在交互开始时,智能体A使用其私钥对包含自身DID的请求进行签名。智能体B收到后,通过解析该DID获取对应的公钥,并以此验证签名的真实性与请求的完整性,从而建立起双方的可信通信。
+<strong>2. 基于 DID 的身份验证:</strong>在交互开始时,智能体 A 使用其私钥对包含自身 DID 的请求进行签名。智能体 B 收到后,通过解析该 DID 获取对应的公钥,并以此验证签名的真实性与请求的完整性,从而建立起双方的可信通信。
 
-<strong>3. 标准化的服务执行:</strong>身份验证通过后,智能体B响应请求,双方依据预定义的标准接口和数据格式进行数据交换或服务调用(如预订、查询等)。标准化的交互流程是实现跨平台、跨系统互操作性的基础。
+<strong>3. 标准化的服务执行:</strong>身份验证通过后,智能体 B 响应请求,双方依据预定义的标准接口和数据格式进行数据交换或服务调用(如预订、查询等)。标准化的交互流程是实现跨平台、跨系统互操作性的基础。
 
-总而言之,该机制的核心是利用DID构建了一个去中心化的信任根基,并借助标准化的描述协议实现了服务的动态发现。这套方法使得智能体能够在无需中央协调的前提下,安全、高效地在互联网上形成协作网络。
+总而言之,该机制的核心是利用 DID 构建了一个去中心化的信任根基,并借助标准化的描述协议实现了服务的动态发现。这套方法使得智能体能够在无需中央协调的前提下,安全、高效地在互联网上形成协作网络。
 
-### 10.4.2 使用ANP服务发现
+### 10.4.2 使用 ANP 服务发现
 
 <strong>(1)创建服务发现中心</strong>
 
@@ -1740,7 +1744,7 @@ best_service = min(nlp_services, key=lambda s: s.metadata.get("load", 1.0))
 print(f"最佳服务:{best_service.service_name} (负载: {best_service.metadata['load']})")
 ```
 
-<strong>(3)构建Agent网络</strong>
+<strong>(3)构建 Agent 网络</strong>
 
 ```python
 from hello_agents.protocols import ANPNetwork
@@ -1881,22 +1885,22 @@ for i in range(10):
     server.metadata["load"] += 0.1
 ```
 
-## 10.5 构建自定义MCP服务器
+## 10.5 构建自定义 MCP 服务器
 
-在前面的章节中,我们学习了如何使用现有的MCP服务。并且也了解到了不同协议的特点。现在,让我们学习如何构建自己的MCP服务器。
+在前面的章节中,我们学习了如何使用现有的 MCP 服务。并且也了解到了不同协议的特点。现在,让我们学习如何构建自己的 MCP 服务器。
 
 ### 10.5.1 创建你的第一个 MCP 服务器
 
 <strong>(1)为什么要构建自定义 MCP 服务器?</strong>
 
-虽然可以直接使用公开的MCP服务,但在许多实际应用场景中,需要构建自定义的MCP服务器以满足特定需求。
+虽然可以直接使用公开的 MCP 服务,但在许多实际应用场景中,需要构建自定义的 MCP 服务器以满足特定需求。
 
 主要动机包括以下几点:
 
-- <strong>封装业务逻辑</strong>:将企业内部特有的业务流程或复杂操作封装为标准化的MCP工具,供智能体统一调用。
-- <strong>访问私有数据</strong>:创建一个安全可控的接口或代理,用于访问内部数据库、API或其他无法对公网暴露的私有数据源。
+- <strong>封装业务逻辑</strong>:将企业内部特有的业务流程或复杂操作封装为标准化的 MCP 工具,供智能体统一调用。
+- <strong>访问私有数据</strong>:创建一个安全可控的接口或代理,用于访问内部数据库、API 或其他无法对公网暴露的私有数据源。
 - <strong>性能专项优化</strong>:针对高频调用或对响应延迟有严苛要求的应用场景,进行深度优化。
-- <strong>功能定制扩展</strong>:实现标准MCP服务未提供的特定功能,例如集成专有算法模型或连接特定的硬件设备。
+- <strong>功能定制扩展</strong>:实现标准 MCP 服务未提供的特定功能,例如集成专有算法模型或连接特定的硬件设备。
 
 <strong>(2)教学案例:天气查询 MCP 服务器</strong>
 
@@ -2120,7 +2124,7 @@ if __name__ == "__main__":
 请注意携带雨具,并根据天气变化适当调整着装。
 ```
 
-### 10.5.2 上传MCP服务器
+### 10.5.2 上传 MCP 服务器
 
 我们创建了一个真实的天气查询 MCP 服务器。现在,让我们将它发布到 Smithery 平台,让全世界的开发者都能使用我们的服务。
 
@@ -2189,7 +2193,7 @@ tools:
 - `entrypoint`: 入口文件
 - `tools`: 工具列表
 
-`pyproject.toml`是 Python 项目的标准配置文件,Smithery要求必须包含此文件,因为后续会打包成一个server:
+`pyproject.toml`是 Python 项目的标准配置文件,Smithery 要求必须包含此文件,因为后续会打包成一个 server:
 
 ```toml
 [build-system]
@@ -2276,17 +2280,17 @@ Dockerfile 配置说明:
 - <strong>端口</strong>: `8081` - Smithery 平台标准端口
 - <strong>启动命令</strong>: `python server.py` - 运行 MCP 服务器
 
-在这里,我们需要Fork`hello-agents`仓库,得到`code`中的源码,并使用自己的github创建一个名为`weather-mcp-server`的仓库,将`yourusername`改为自己github的Username。
+在这里,我们需要 Fork`hello-agents`仓库,得到`code`中的源码,并使用自己的 github 创建一个名为`weather-mcp-server`的仓库,将`yourusername`改为自己 github  Username。
 
 (3)提交到 Smithery
 
 打开浏览器,访问 [https://smithery.ai/](https://smithery.ai/)。使用 GitHub 账号登录 Smithery。点击页面上的 "Publish Server" 按钮,输入你的 GitHub 仓库 URL:`https://github.com/yourusername/weather-mcp-server`,即可等待发布。
 
-一旦发布完成,可以看到类似这样的页面,如图10.10所示:
+一旦发布完成,可以看到类似这样的页面,如图 10.10 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-10.png" alt="" width="85%"/>
-  <p>图 10.10 Smithery发布成功页面 </p>
+  <p>图 10.10 Smithery 发布成功页面 </p>
 </div>
 
 
@@ -2333,20 +2337,20 @@ agent.add_tool(weather_tool)
 response = agent.run("北京今天天气怎么样?")
 ```
 
-当然,这里只是举例,还有更多的用法可以自行探索,下图10.11展示了当MCP工具发布成功会包含的信息,显示服务的名称“天气”,其唯一标识符 `@jjyaoao/weather-mcp-server`,以及状态信息。Tools区域就是我们刚刚实现的方法,Connect区则提供了连接和使用此服务所需的技术信息,包括服务的<strong>接入URL地址</strong>和多种语言/环境下的<strong>配置代码片段</strong>。如果想要更加深入了解可以点击这个[链接](https://smithery.ai/server/@jjyaoao/weather-mcp-server)。
+当然,这里只是举例,还有更多的用法可以自行探索,下图 10.11 展示了当 MCP 工具发布成功会包含的信息,显示服务的名称“天气”,其唯一标识符 `@jjyaoao/weather-mcp-server`,以及状态信息。Tools 区域就是我们刚刚实现的方法,Connect 区则提供了连接和使用此服务所需的技术信息,包括服务的<strong>接入 URL 地址</strong>和多种语言/环境下的<strong>配置代码片段</strong>。如果想要更加深入了解可以点击这个[链接](https://smithery.ai/server/@jjyaoao/weather-mcp-server)。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/10-figures/10-11.png" alt="" width="85%"/>
-  <p>图 10.11 Smithery发布成功的MCP工具 </p>
+  <p>图 10.11 Smithery 发布成功的 MCP 工具 </p>
 </div>
 
-现在是时候去创造你的MCP服务器了!
+现在是时候去创造你的 MCP 服务器了!
 
 
 
 ## 10.6 本章总结
 
-本章系统性地介绍了智能体通信的三种核心协议:MCP、A2A与ANP,并探讨了它们的设计理念、应用场景与实践方法。
+本章系统性地介绍了智能体通信的三种核心协议:MCP、A2A  ANP,并探讨了它们的设计理念、应用场景与实践方法。
 
 <strong>协议定位:</strong>
 
@@ -2354,7 +2358,7 @@ response = agent.run("北京今天天气怎么样?")
 - <strong>A2A (Agent-to-Agent Protocol)</strong>: 作为智能体之间的对话系统,支持直接通信与任务协商,适用于小规模团队的紧密协作。
 - <strong>ANP (Agent Network Protocol)</strong>: 作为智能体的“互联网”,提供服务发现、路由与负载均衡机制,适用于构建大规模、开放的智能体网络。
 
-<strong>HelloAgents的集成方案</strong>
+<strong>HelloAgents 的集成方案</strong>
 
 在`HelloAgents`框架中,这三种协议被统一抽象为工具(Tool),实现了无缝集成,允许开发者灵活地为智能体添加不同层级的通信能力:
 
@@ -2370,23 +2374,23 @@ agent.add_tool(ANPTool(...))
 
 <strong>实战经验总结</strong>
 
-- 优先利用成熟的社区MCP服务,以减少不必要的重复开发。
-- 根据系统规模选择合适的协议:小规模协作场景推荐使用A2A,大规模网络场景则应采用ANP。
+- 优先利用成熟的社区 MCP 服务,以减少不必要的重复开发。
+- 根据系统规模选择合适的协议:小规模协作场景推荐使用 A2A,大规模网络场景则应采用 ANP。
 
 完成本章后,建议你:
 
 1. <strong>动手实践</strong>:
-   - 构建自己的MCP服务器
-   - 利用协议创建多Agent协作系统
-   - MCP、A2A与ANP的组合应用策略
+   - 构建自己的 MCP 服务器
+   - 利用协议创建多 Agent 协作系统
+   - MCP、A2A  ANP 的组合应用策略
 2. <strong>深入学习</strong>:
-   - 阅读MCP官方文档:https://modelcontextprotocol.io
-   - 阅读A2A官方文档:https://a2a-protocol.org/latest/
-   - 阅读ANP官方文档:https://agent-network-protocol.com/guide/
+   - 阅读 MCP 官方文档:https://modelcontextprotocol.io
+   - 阅读 A2A 官方文档:https://a2a-protocol.org/latest/
+   - 阅读 ANP 官方文档:https://agent-network-protocol.com/guide/
 3. <strong>参与社区</strong>:
-   - 向社区贡献新的MCP服务
+   - 向社区贡献新的 MCP 服务
    - 分享个人开发的智能体实现案例
-   - 参与相关协议的技术标准讨论,也可以在Issue提问或是直接帮助Helloagents支持新的example案例
+   - 参与相关协议的技术标准讨论,也可以在 Issue 提问或是直接帮助 Helloagents 支持新的 example 案例
 
 <strong>恭喜你完成第十章的学习!</strong>
 
@@ -2396,38 +2400,38 @@ agent.add_tool(ANPTool(...))
 
 > <strong>提示</strong>:部分习题没有标准答案,重点在于培养学习者对智能体通信协议的综合理解和实践能力。
 
-1. 本章介绍了三种智能体通信协议:MCP、A2A和ANP。请分析:
+1. 本章介绍了三种智能体通信协议:MCP、A2A  ANP。请分析:
 
-   - 在10.1.2节中对比了三种协议的设计理念。请深入分析:为什么MCP强调"上下文共享",A2A强调"对话式协作",而ANP强调"网络拓扑"?这些设计理念分别解决了什么核心问题?
+   - 在 10.1.2 节中对比了三种协议的设计理念。请深入分析:为什么 MCP 强调"上下文共享",A2A 强调"对话式协作",而 ANP 强调"网络拓扑"?这些设计理念分别解决了什么核心问题?
    - 假设你要构建一个"智能客服系统",需要以下功能:(1)访问客户数据库和订单系统;(2)多个专业客服智能体协作处理复杂问题;(3)支持大规模并发用户请求。请为每个功能选择最合适的协议,并说明理由。
-   - 三种协议是否可以组合使用?请设计一个实际应用场景,展示如何同时使用MCP、A2A和ANP来构建一个完整的智能体系统。画出系统架构图并说明各协议的职责。
+   - 三种协议是否可以组合使用?请设计一个实际应用场景,展示如何同时使用 MCP、A2A  ANP 来构建一个完整的智能体系统。画出系统架构图并说明各协议的职责。
 
-2. MCP(Model Context Protocol)是智能体与工具通信的标准协议。基于10.2节的内容,请深入思考:
+2. MCP(Model Context Protocol)是智能体与工具通信的标准协议。基于 10.2 节的内容,请深入思考:
 
    > <strong>提示</strong>:这是一道动手实践题,建议实际操作
 
-   - 在10.2.3节的MCP服务器实现中,我们定义了`list_tools`、`call_tool`等核心方法。请扩展这个实现,添加一个新的MCP服务器,提供以下工具:(1)数据库查询工具;(2)数据可视化工具;(3)报表生成工具。要求工具之间能够协作完成复杂的数据分析任务。
-   - MCP协议支持"资源"(Resources)和"提示"(Prompts)两个重要概念,但本章主要聚焦于"工具"(Tools)。请查阅MCP官方文档,了解Resources和Prompts的设计目的,并设计一个应用场景,展示如何利用这三个核心概念构建更强大的智能体系统。
-   - MCP使用JSON-RPC 2.0作为底层通信协议,通过stdio进行进程间通信。请分析:这种设计有什么优势和局限性?如果需要支持远程MCP服务器(通过HTTP/WebSocket访问),应该如何扩展当前的实现?
+   - 在 10.2.3 节的 MCP 服务器实现中,我们定义了`list_tools`、`call_tool`等核心方法。请扩展这个实现,添加一个新的 MCP 服务器,提供以下工具:(1)数据库查询工具;(2)数据可视化工具;(3)报表生成工具。要求工具之间能够协作完成复杂的数据分析任务。
+   - MCP 协议支持"资源"(Resources)和"提示"(Prompts)两个重要概念,但本章主要聚焦于"工具"(Tools)。请查阅 MCP 官方文档,了解 Resources  Prompts 的设计目的,并设计一个应用场景,展示如何利用这三个核心概念构建更强大的智能体系统。
+   - MCP 使用 JSON-RPC 2.0 作为底层通信协议,通过 stdio 进行进程间通信。请分析:这种设计有什么优势和局限性?如果需要支持远程 MCP 服务器(通过 HTTP/WebSocket 访问),应该如何扩展当前的实现?
 
-3. A2A(Agent-to-Agent Protocol)支持智能体间的对话式协作。基于10.3节的内容,请完成以下扩展实践:
+3. A2A(Agent-to-Agent Protocol)支持智能体间的对话式协作。基于 10.3 节的内容,请完成以下扩展实践:
 
    > <strong>提示</strong>:这是一道动手实践题,建议实际操作
 
-   - 在10.3.4节的"研究团队"案例中,研究员和撰写员通过A2A协议协作完成论文写作。请扩展这个案例,添加第三个智能体"审稿人"(Reviewer),它能够评审论文质量并提出修改建议。设计三个智能体之间的协作流程,并实现完整的代码。
-   - A2A协议定义了`task`、`task_result`等消息类型。请分析:如果协作过程中出现冲突(如两个智能体对同一问题有不同意见),应该如何设计冲突解决机制?请扩展A2A协议,添加"协商"(negotiation)和"投票"(voting)等消息类型。
-   - 对比A2A协议与第六章介绍的AutoGen、CAMEL等多智能体框架:A2A作为标准协议,与这些框架的关系是什么?它们能否互相替代?请设计一个方案,让基于A2A协议的智能体能够与AutoGen框架中的智能体进行通信。
+   - 在 10.3.4 节的"研究团队"案例中,研究员和撰写员通过 A2A 协议协作完成论文写作。请扩展这个案例,添加第三个智能体"审稿人"(Reviewer),它能够评审论文质量并提出修改建议。设计三个智能体之间的协作流程,并实现完整的代码。
+   - A2A 协议定义了`task`、`task_result`等消息类型。请分析:如果协作过程中出现冲突(如两个智能体对同一问题有不同意见),应该如何设计冲突解决机制?请扩展 A2A 协议,添加"协商"(negotiation)和"投票"(voting)等消息类型。
+   - 对比 A2A 协议与第六章介绍的 AutoGen、CAMEL 等多智能体框架:A2A 作为标准协议,与这些框架的关系是什么?它们能否互相替代?请设计一个方案,让基于 A2A 协议的智能体能够与 AutoGen 框架中的智能体进行通信。
 
-4. ANP(Agent Network Protocol)支持大规模智能体网络。基于10.4节的内容,请深入分析:
+4. ANP(Agent Network Protocol)支持大规模智能体网络。基于 10.4 节的内容,请深入分析:
 
-   - 在10.4.2节中介绍了ANP的网络拓扑设计,包括星型、网状、分层等结构。请分析:在什么场景下应该选择哪种拓扑结构?如果网络规模从10个智能体扩展到1000个智能体,拓扑结构应该如何演进?
-   - ANP协议支持"路由"(routing)和"发现"(discovery)机制,让智能体能够动态找到合适的协作伙伴。请设计一个"智能路由算法":根据任务类型、智能体能力、网络负载等因素,自动选择最优的消息路由路径。
-   - 在10.4.4节的"智能城市"案例中,多个智能体协作管理城市系统。请思考:如果某个关键智能体(如交通管理智能体)出现故障,整个系统应该如何应对?请设计一个"容错机制",包括故障检测、备份切换、状态恢复等功能。
+   - 在 10.4.2 节中介绍了 ANP 的网络拓扑设计,包括星型、网状、分层等结构。请分析:在什么场景下应该选择哪种拓扑结构?如果网络规模从 10 个智能体扩展到 1000 个智能体,拓扑结构应该如何演进?
+   - ANP 协议支持"路由"(routing)和"发现"(discovery)机制,让智能体能够动态找到合适的协作伙伴。请设计一个"智能路由算法":根据任务类型、智能体能力、网络负载等因素,自动选择最优的消息路由路径。
+   - 在 10.4.4 节的"智能城市"案例中,多个智能体协作管理城市系统。请思考:如果某个关键智能体(如交通管理智能体)出现故障,整个系统应该如何应对?请设计一个"容错机制",包括故障检测、备份切换、状态恢复等功能。
 
 5. 智能体通信协议的安全性和隐私保护是实际应用中的关键问题。请思考:
 
-   - 在10.2.4节的MCP客户端实现中,智能体可以调用MCP服务器提供的任何工具。请分析:这种设计存在什么安全风险?如果MCP服务器提供了危险操作(如删除文件、执行系统命令),应该如何设计权限控制机制?
-   - A2A和ANP协议涉及多个智能体之间的通信,可能包含敏感信息(如用户隐私数据、商业机密)。请设计一个"端到端加密"方案:确保消息在传输过程中不被窃听或篡改,同时支持智能体身份认证和访问控制。
+   - 在 10.2.4 节的 MCP 客户端实现中,智能体可以调用 MCP 服务器提供的任何工具。请分析:这种设计存在什么安全风险?如果 MCP 服务器提供了危险操作(如删除文件、执行系统命令),应该如何设计权限控制机制?
+   - A2A  ANP 协议涉及多个智能体之间的通信,可能包含敏感信息(如用户隐私数据、商业机密)。请设计一个"端到端加密"方案:确保消息在传输过程中不被窃听或篡改,同时支持智能体身份认证和访问控制。
    - 在大规模智能体网络中,恶意智能体可能会发送虚假信息、发起拒绝服务攻击或窃取其他智能体的数据。请设计一个"信任评估系统":根据智能体的历史行为、协作质量、社区评价等因素,动态评估每个智能体的可信度,并据此调整通信策略。
 
 ## 参考文献

+ 2696 - 0
docs/chapter11/Chapter11-Agentic-RL.md

@@ -0,0 +1,2696 @@
+<div align="right">
+  English | <a href="./第十一章%20Agentic-RL.md">中文</a>
+</div>
+
+# Chapter 11 Agentic-RL
+
+## 11.1 From LLM Training to Agentic RL
+
+In previous chapters, we implemented various agent paradigms and communication protocols. However, when agents handle more complex tasks, they perform poorly, naturally raising questions: **How can we make agents have stronger reasoning capabilities? How can we make agents learn to use tools better? How can we make agents capable of self-improvement?**
+
+This is precisely the core problem that Agentic RL (agent training based on reinforcement learning) aims to solve. This chapter will introduce reinforcement learning training capabilities to the HelloAgents framework, enabling you to train agents with advanced capabilities such as reasoning and tool use. We will start from the basics of LLM training and gradually delve into practical techniques such as Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO), ultimately building a complete agent training pipeline.
+
+### 11.1.1 From Reinforcement Learning to Agentic RL
+
+In Section 2.4.2 of Chapter 2, we introduced agents based on reinforcement learning. Reinforcement Learning (RL) is a learning paradigm focused on solving sequential decision-making problems. It learns how to maximize long-term rewards through direct interaction between agents and the environment, learning through "trial and error".
+
+Now, let's apply this framework to LLM agents. Consider a mathematical problem-solving agent that needs to answer questions like this:
+
+```
+Question: Janet's ducks lay 16 eggs per day. She eats three for breakfast
+every morning and bakes muffins for her friends every day with four.
+She sells the remainder at the farmers' market daily for $2 per fresh
+duck egg. How much in dollars does she make every day at the farmers' market?
+```
+
+This problem requires multi-step reasoning: first calculate the number of eggs Janet has left each day (16 - 3 - 4 = 9), then calculate her income (9 × 2 = 18). We can map this task to the reinforcement learning framework:
+
+- **Agent**: LLM-based reasoning system
+- **Environment**: Mathematical problems and verification system
+- **State**: Current problem description and existing reasoning steps
+- **Action**: Generate next reasoning step or final answer
+- **Reward**: Whether the answer is correct (correct +1, incorrect 0)
+
+Traditional supervised learning methods have three core limitations: first, data quality completely determines training quality, and models can only imitate training data, making it difficult to surpass; second, lack of exploration ability, only passively learning paths provided by humans; third, difficulty optimizing long-term goals, unable to precisely optimize intermediate processes of multi-step reasoning.
+
+Reinforcement learning provides new possibilities. By allowing agents to autonomously generate multiple candidate answers and receive rewards based on correctness, they can learn which reasoning paths are better, which steps are critical, and even discover better problem-solving methods than human annotations<sup>[8]</sup>. This is the core idea of Agentic RL: treating LLM as a learnable policy, embedding it in the agent's perception-decision-execution loop, and optimizing multi-step task performance through reinforcement learning.
+
+### 11.1.2 LLM Training Landscape
+
+Before diving into Agentic RL, we need to first understand the complete process of LLM training. The birth of a powerful LLM (such as GPT, Claude, Qwen) typically goes through two main stages: Pretraining and Post-training. As shown in Figure 11.1, these two stages constitute the complete evolutionary path of LLM from "language model" to "conversational assistant".
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-1.png" alt="" width="85%"/>
+  <p>Figure 11.1 LLM Training Landscape</p>
+</div>
+
+**Pretraining Stage** is the first stage of LLM training, with the goal of making the model learn basic language patterns and world knowledge. This stage uses massive amounts of text data (usually TB-level) and trains the model through self-supervised learning. The most common pretraining task is Causal Language Modeling, also known as Next Token Prediction.
+
+Given a text sequence $x_1, x_2, ..., x_t$, the model needs to predict the next word $x_{t+1}$:
+
+$$
+\mathcal{L}_{\text{pretrain}} = -\sum_{t=1}^{T} \log P(x_t | x_1, x_2, ..., x_{t-1}; \theta)
+$$
+
+Where $\theta$ is the model parameters, $P(x_t | x_1, ..., x_{t-1}; \theta)$ is the probability distribution of the next word predicted by the model, and the goal is to minimize negative log-likelihood, i.e., maximize the probability of predicting the correct word. For example, given the text "The cat sat on the", the model needs to predict that the next word is most likely "mat". Through training on massive amounts of text, the model gradually learns grammar rules (what word sequences are legal), semantic knowledge (relationships between words), world knowledge (factual information about the world), and basic reasoning abilities.
+
+The characteristics of the pretraining stage are: massive data volume, high computational cost, learning general language understanding and generation capabilities, and using unsupervised learning.
+
+**Post-training Stage** aims to address the shortcomings of pretrained models. Although pretrained models have powerful language capabilities, they are just "next word prediction" models and don't know how to follow human instructions, generate helpful, harmless, and honest answers, refuse inappropriate requests, and interact with humans in a conversational manner. The post-training stage aims to solve these problems and align the model with human preferences and values.
+
+Post-training typically includes three steps. The first step is **Supervised Fine-Tuning (SFT)**<sup>[15]</sup>, with the goal of making the model learn to follow instructions and dialogue formats. Training data consists of (prompt, completion) pairs, and the training objective is similar to pretraining, still maximizing the probability of correct output:
+
+$$
+\mathcal{L}_{\text{SFT}} = -\sum_{i=1}^{N} \log P(y_i | x_i; \theta)
+$$
+
+Where $x_i$ is the input prompt, $y_i$ is the expected output, and $N$ is the number of training samples. SFT characteristics are: smaller data volume, requires manual annotation, quick results, mainly learning task formats and basic capabilities.
+
+The second step is **Reward Modeling (RM)**. Although SFT models can follow instructions, the quality of generated answers varies. We need a way to evaluate answer quality, which is the role of the reward model<sup>[13,14]</sup>. Reward model training data consists of preference comparison data, containing two answers to the same question, one better (chosen) and one worse (rejected). The reward model training objective is to learn human preferences:
+
+$$
+\mathcal{L}_{\text{RM}} = -\mathbb{E}_{(x, y_w, y_l)} [\log \sigma(r_\phi(x, y_w) - r_\phi(x, y_l))]
+$$
+
+Where $r_\phi(x, y)$ is the reward model, input is (prompt, answer) pair, output is quality score; $y_w$ is the better answer (chosen), $y_l$ is the worse answer (rejected), $\sigma$ is the sigmoid function, and the goal is to make the reward model give higher scores to better answers.
+
+The third step is **Reinforcement Learning Fine-tuning**. With the reward model, we can use reinforcement learning to optimize the language model to generate higher quality answers. The most classic algorithm is PPO (Proximal Policy Optimization)<sup>[1]</sup>, with the training objective:
+
+$$
+\mathcal{L}_{\text{PPO}} = \mathbb{E}_{x, y \sim \pi_\theta} [r_\phi(x, y)] - \beta \cdot D_{KL}(\pi_\theta || \pi_{\text{ref}})
+$$
+
+Where $\pi_\theta$ is the current policy, i.e., the language model, $\pi_{\text{ref}}$ is the reference policy, which in this scenario can be the SFT model, $r_\phi(x, y)$ is the reward model score, $D_{KL}$ is KL divergence, aimed at preventing the model from deviating too far, and $\beta$ is the balance coefficient. The meaning of this objective function is: maximize reward while not deviating too far from the original model.
+
+Traditional RLHF (Reinforcement Learning from Human Feedback)<sup>[5]</sup> requires a large amount of manual preference data annotation, which is costly. To reduce costs, researchers proposed RLAIF (Reinforcement Learning from AI Feedback)<sup>[7]</sup>, using powerful AI models (such as GPT-4) to replace human annotators. The RLAIF workflow is: use SFT model to generate multiple candidate answers, use powerful AI model to score and rank answers, use AI scores to train reward model, use reward model for reinforcement learning. Experiments show that RLAIF's effectiveness is close to or even exceeds RLHF, while costs are significantly reduced<sup>[11]</sup>.
+
+### 11.1.3 Core Philosophy of Agentic RL
+
+After understanding the basic training process of LLM, let's look at the difference between Agentic RL and traditional training methods. Traditional post-training (which we call PBRFT: Preference-Based Reinforcement Fine-Tuning) mainly focuses on optimizing single-turn dialogue quality: given a user question, the model generates an answer, then receives a reward based on answer quality. This approach is suitable for optimizing conversational assistants, but for agent tasks requiring multi-step reasoning, tool use, and long-term planning, it falls short.
+
+**Agentic RL** is a new paradigm that treats LLM as a learnable policy embedded in a sequential decision-making loop. In this framework, agents need to interact with the external world in dynamic environments, execute multi-step actions to complete complex tasks, obtain intermediate feedback to guide subsequent decisions, and optimize long-term cumulative rewards rather than single-step rewards.
+
+Let's understand this difference through a specific example. In the PBRFT scenario, a user asks "Please explain what reinforcement learning is", the model generates a complete answer, then scores directly based on answer quality. In the Agentic RL scenario, a user requests "Help me analyze the code quality of this GitHub repository", the agent needs to go through multiple steps: first call GitHub API to get repository information, successfully obtain repository structure and file list, get +0.1 reward; then read main code files, successfully obtain code content, get +0.1 reward; then analyze code quality reasonably, get +0.2 reward; finally generate analysis report with high quality, get +0.6 reward. Total reward is the accumulation of all steps: 1.0.
+
+As can be seen, key features of Agentic RL are multi-step interaction, each action changes environment state, each step can receive feedback, and optimizing overall task completion quality.
+
+Reinforcement learning is formalized based on the Markov Decision Process (MDP) framework. MDP is defined by a five-tuple $(S, A, P, R, \gamma)$: state space $S$, action space $A$, state transition function $P(s'|s,a)$, reward function $R(s,a)$, discount factor $\gamma$. Let's compare PBRFT and Agentic RL from the MDP perspective, as shown in Table 11.1.
+
+<div align="center">
+  <p>Table 11.1 Comparison of PBRFT and Agentic RL</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-1.png" alt="" width="85%"/>
+</div>
+
+In terms of state, PBRFT's state $s_0$ consists only of user prompts, time span $T=1$ (single step), state doesn't change, can be represented as $s_0 = \text{prompt}$. While Agentic RL's state $s_t$ contains historical observations and context, time span $T \gg 1$ (multi-step), state evolves with actions, can be represented as $s_t = (\text{prompt}, o_1, o_2, ..., o_t)$, where $o_t$ is the observation at step $t$ (such as tool return results, environment feedback, etc.).
+
+In terms of action, PBRFT's action space only has text generation, single action type, represented as $a = y \sim \pi_\theta(y|s_0)$. While Agentic RL's action space includes text generation, tool invocation, environment operations, and other types, represented as $a_t \in \{a_t^{\text{text}}, a_t^{\text{tool}}\}$, for example $a_t^{\text{text}}$ is generating thinking process or answer, $a_t^{\text{tool}}$ is calling calculator, search engine, and other tools.
+
+In terms of transition function, PBRFT has no state transition, represented as $P(s'|s,a) = \delta(s' - s_{\text{terminal}})$. While Agentic RL's state changes dynamically based on actions and environment, represented as $s_{t+1} \sim P(s_{t+1}|s_t, a_t)$, for example after calling search tool, state will include search results.
+
+In terms of reward, PBRFT only has single-step reward $r(s_0, a)$, only given at task end, represented as $R_{\text{PBRFT}} = r(s_0, y)$, usually given by reward model: $r(s_0, y) = r_\phi(s_0, y)$. While Agentic RL has multi-step rewards $r(s_t, a_t)$, can give partial rewards at intermediate steps, represented as:
+
+$$
+R_{\text{Agentic}} = \sum_{t=0}^{T} \gamma^t r(s_t, a_t)
+$$
+
+Where $\gamma \in [0,1]$ is the discount factor, $r(s_t, a_t)$ can be sparse reward (only given at task completion, such as correct answer +1), dense reward (given at each step, such as successful tool call +0.1), or a combination of both.
+
+In terms of objective function, PBRFT maximizes single-step expected reward:
+
+$$
+J_{\text{PBRFT}}(\theta) = \mathbb{E}_{s_0, y \sim \pi_\theta} [r(s_0, y)]
+$$
+
+While Agentic RL maximizes cumulative discounted reward:
+
+$$
+J_{\text{Agentic}}(\theta) = \mathbb{E}_{\tau \sim \pi_\theta} \left[\sum_{t=0}^{T} \gamma^t r(s_t, a_t)\right]
+$$
+
+Where $\tau = (s_0, a_0, s_1, a_1, ..., s_T)$ is the complete trajectory.
+
+This transformation is not just a difference in technical details, but a fundamental shift in thinking. PBRFT thinking focuses on "how to make the model generate better single answers", optimizing answer quality, focusing on language expression, making single-step decisions. While Agentic RL thinking focuses on "how to make agents complete complex tasks", optimizing task completion, focusing on action strategies, making multi-step planning. This transformation enables LLM to evolve from "conversational assistant" to "autonomous agent", capable of actively seeking information, knowing when and how to use external tools, willing to execute seemingly "detour" intermediate steps for the ultimate goal, and learning from mistakes.
+
+Agentic RL aims to endow LLM agents with six core capabilities, as shown in Figure 11.2.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-2.png" alt="" width="85%"/>
+  <p>Figure 11.2 Six Core Capabilities of Agentic RL</p>
+</div>
+
+**Reasoning** refers to the process of logically deriving conclusions from given information, which is the core capability of agents. Traditional CoT prompting methods rely on few-shot examples with limited generalization ability; SFT can only imitate reasoning patterns in training data, making it difficult to innovate. The advantage of reinforcement learning is learning effective reasoning strategies through trial and error, discovering reasoning paths not in training data, learning when deep thinking is needed and when quick answers are possible. Reasoning tasks can be modeled as sequential decision problems. Given question $q$, the agent needs to generate reasoning chain $c = (c_1, c_2, ..., c_n)$ and final answer $a$. The reward function is typically designed as $r(q, c, a) = 1$ if $a = a^*$ else $0$, with training objective $\max_\theta \mathbb{E}_{q, (c,a) \sim \pi_\theta} [r(q, c, a)]$. Through this approach, the model learns to generate high-quality reasoning chains, not just memorize answers.
+
+**Tool Use** refers to the agent's ability to call external tools to complete tasks. In tool use tasks, the action space expands to $a_t \in \{a_t^{\text{think}}, a_t^{\text{tool}}\}$, where $a_t^{\text{think}}$ is generating thinking process, $a_t^{\text{tool}} = (\text{tool\_name}, \text{arguments})$ is calling tools. Reinforcement learning allows agents to learn when to use tools, which tool to choose, and how to combine multiple tools. For example, when solving math problems, agents need to learn when to use calculators, when to use code interpreters, and when to reason directly.
+
+**Memory** refers to the agent's ability to retain and reuse past information, which is crucial for long-term tasks. LLM's context window is limited, and static retrieval strategies (such as RAG) cannot be optimized for tasks. Reinforcement learning allows agents to learn memory management strategies: deciding which information is worth remembering, when to update memory, and when to delete outdated information. This is similar to human working memory, where we actively manage information in our brains, retaining important information and forgetting irrelevant information.
+
+**Planning** refers to the ability to formulate action sequences to achieve goals. Traditional CoT is linear thinking and cannot backtrack; prompt engineering uses static planning templates that are difficult to adapt to new situations. Reinforcement learning allows agents to learn dynamic planning: discovering effective action sequences through trial and error, learning to balance short-term and long-term benefits. For example, in multi-step tasks, agents may need to first execute some seemingly "detour" steps, such as collecting information, before ultimately completing the task.
+
+**Self-Improvement** refers to the agent's ability to review its own output, correct errors, and optimize strategies. Reinforcement learning allows agents to learn self-reflection: identifying their own errors, analyzing failure causes, and adjusting strategies. This capability enables agents to continuously improve without human intervention, similar to human "learning from mistakes".
+
+**Perception** refers to the ability to understand multimodal information. For example, reinforcement learning can enhance visual reasoning capabilities, allowing models to learn to use visual tools and learn visual planning. This enables agents to not only understand text but also understand and operate in the visual world.
+
+### 11.1.4 HelloAgents' Agentic RL Design
+
+After understanding the core philosophy of Agentic RL, let's see how to implement these capabilities in the HelloAgents framework.
+
+In terms of technology selection, we integrated the TRL (Transformer Reinforcement Learning) framework<sup>[9]</sup> and chose the Qwen3-0.6B model<sup>[10]</sup>. TRL is Hugging Face's reinforcement learning library, mature and stable, feature-complete, and easy to integrate. Qwen3-0.6B is Alibaba Cloud's small language model, with 0.6B parameters suitable for ordinary GPU training, excellent performance, and open source and free.
+
+HelloAgents' Agentic RL module adopts a four-layer architecture design, as shown in Figure 11.3.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-3.png" alt="" width="85%"/>
+  <p>Figure 11.3 HelloAgents Agentic RL Architecture</p>
+</div>
+
+The bottom layer is the **Dataset Layer**, containing the `GSM8KDataset` class, `create_sft_dataset()` function, and `create_rl_dataset()` function, responsible for data loading and format conversion. The second layer is the **Reward Function Layer**, containing the `MathRewardFunction` base class, `AccuracyReward` accuracy reward, `LengthPenaltyReward` length penalty, `StepReward` step reward, and convenient creation functions `create_*_reward()`, responsible for defining what good behavior is. The third layer is the **Trainer Layer**, containing `SFTTrainerWrapper` and `GRPOTrainerWrapper`, responsible for specific training logic and LoRA support. The top layer is the **Unified Interface Layer**, providing `RLTrainingTool` unified training tool, supporting four operations: `action="train"` (train model), `action="load_dataset"` (load dataset), `action="create_reward"` (create reward function), `action="evaluate"` (evaluate model).
+
+### 11.1.5 Quick Start Example
+
+Before diving into learning, let's quickly experience the complete training process. Since this chapter has a lot of theoretical content and practical debugging is quite tedious, we focus on learning to apply rather than constructing tools. First install the HelloAgents framework:
+
+```bash
+# Install HelloAgents framework (Chapter 11 version)
+pip install "hello-agents[rl]==0.2.5"
+
+# Or install from source
+cd HelloAgents
+pip install -e ".[rl]"
+```
+
+Then run the quick training example:
+
+```python
+import sys
+import json
+
+from hello_agents.tools import RLTrainingTool
+
+# Create RL training tool
+rl_tool = RLTrainingTool()
+
+# 1. Quick test: SFT training (10 samples, 1 epoch)
+sft_result_str = rl_tool.run({
+    "action": "train",
+    "algorithm": "sft",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/quick_test_sft",
+    "max_samples": 10,      # Only use 10 samples for quick test
+    "num_epochs": 1,        # Only train 1 epoch
+    "batch_size": 2,
+    "use_lora": True        # Use LoRA to accelerate training
+})
+
+sft_result = json.loads(sft_result_str)
+print(f"\n✓ SFT training completed, model saved at: {sft_result['output_dir']}")
+
+# 2. GRPO training (5 samples, 1 epoch)
+grpo_result_str = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",  # Use base model
+    "output_dir": "./models/quick_test_grpo",
+    "max_samples": 5,       # Only use 5 samples for quick test
+    "num_epochs": 1,
+    "batch_size": 2,        # Must be divisible by num_generations(8), use 2
+    "use_lora": True
+})
+
+grpo_result = json.loads(grpo_result_str)
+print(f"\n✓ GRPO training completed, model saved at: {grpo_result['output_dir']}")
+
+# 3. Evaluate model
+eval_result_str = rl_tool.run({
+    "action": "evaluate",
+    "model_path": "./models/quick_test_grpo",
+    "max_samples": 10,      # Evaluate on 10 test samples
+    "use_lora": True
+})
+
+eval_result = json.loads(eval_result_str)
+print(f"\n✓ Evaluation completed:")
+print(f"  - Accuracy: {eval_result['accuracy']}")
+print(f"  - Average reward: {eval_result['average_reward']}")
+print(f"  - Test samples: {eval_result['num_samples']}")
+
+print("\n" + "=" * 50)
+print("🎉 Congratulations! You have completed training your first Agentic RL model!")
+print("=" * 50)
+print(f"\nModel paths:")
+print(f"  SFT model: {sft_result['output_dir']}")
+print(f"  GRPO model: {grpo_result['output_dir']}")
+```
+
+This quick example demonstrates the complete training process: SFT training allows the model to learn basic reasoning formats and dialogue patterns, GRPO training optimizes reasoning strategies through reinforcement learning to improve accuracy, and model evaluation assesses training effectiveness on the test set. Also, it's normal for accuracy to be very low after running, because the model has only seen 0.7% of training samples and only ran for one epoch.
+
+## 11.2 Datasets and Reward Functions
+
+Datasets and reward functions are the two cornerstones of reinforcement learning training. Datasets define the tasks the agent needs to learn, and reward functions define what good behavior is. In this section, we will learn how to prepare training data and design reward functions.
+
+### 11.2.1 GSM8K Mathematical Reasoning Dataset
+
+Mathematical reasoning is an ideal task for evaluating LLM reasoning capabilities. First, math problems have clear correct answers that can be automatically evaluated without manual annotation or complex reward models. Second, solving math problems requires decomposing problems and step-by-step derivation, which is a typical scenario for multi-step reasoning. Finally, learned reasoning capabilities can transfer to other domains with strong generalization. In contrast, open-ended Q&A tasks (such as "How to learn programming?") have answer quality that is difficult to objectively evaluate and requires extensive manual annotation.
+
+GSM8K (Grade School Math 8K)<sup>[4]</sup> is a high-quality elementary school math word problem dataset. As shown in Table 11.2, the dataset contains 7,473 training samples and 1,319 test samples, with difficulty at elementary school math level (grades 2-8), problem types are word problems, requiring 2-8 steps of reasoning to arrive at answers.
+
+<div align="center">
+  <p>Table 11.2 GSM8K Dataset Statistics</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-2.png" alt="" width="85%"/>
+</div>
+
+Let's look at a typical GSM8K problem:
+
+```
+Question: Natalia sold clips to 48 of her friends in April, and then she sold half
+      as many clips in May. How many clips did Natalia sell altogether in April
+      and May?
+
+Answer: Natalia sold 48/2 = <<48/2=24>>24 clips in May.
+      Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.
+      #### 72
+
+Final Answer: 72
+```
+
+This problem requires two steps of reasoning: first calculate the quantity sold in May (half of 48), then calculate the total (April + May). The `<<48/2=24>>` in the answer is a marker for intermediate calculation steps, and `#### 72` marks the final answer.
+
+The GSM8K dataset needs to be converted to different formats to adapt to different training methods, as shown in Figure 11.4.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-4.png" alt="" width="85%"/>
+  <p>Figure 11.4 GSM8K Data Format Conversion</p>
+</div>
+
+
+The original format comes directly from the dataset, containing question and answer (with solution steps), suitable for human reading. SFT format is used for supervised fine-tuning, converting questions to dialogue format prompts, with complete solutions as completion. For example:
+
+```python
+{
+    "prompt": "<|im_start|>user\nNatalia sold clips to 48 of her friends...<|im_end|>\n<|im_start|>assistant\n",
+    "completion": "Let me solve this step by step.\n\nStep 1: ...\n\nFinal Answer: 72<|im_end|>"
+}
+```
+
+Key points are using the model's dialogue template (such as Qwen's `<|im_start|>` marker), prompt contains user question, completion contains complete solution process and answer. This way the model can learn how to format output and how to reason step by step.
+
+RL format is used for reinforcement learning, only providing questions and correct answers, not solution processes. For example:
+
+```python
+{
+    "prompt": "<|im_start|>user\nNatalia sold clips to 48 of her friends...<|im_end|>\n<|im_start|>assistant\n",
+    "ground_truth": "72"
+}
+```
+
+Key points are prompt is the same as SFT, but ground_truth only contains the final answer (used to calculate reward), and the model needs to generate the complete reasoning process itself. This design forces the model to learn autonomous reasoning rather than simply memorizing answers.
+
+As shown in Table 11.3, the three formats each have their uses.
+
+<div align="center">
+  <p>Table 11.3 Data Format Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-3.png" alt="" width="85%"/>
+</div>
+
+HelloAgents provides convenient dataset loading functions. Let's load and view the dataset through code:
+
+```python
+from hello_agents.tools import RLTrainingTool
+import json
+
+# Create tool
+rl_tool = RLTrainingTool()
+
+# 1. Load SFT format dataset
+sft_result = rl_tool.run({
+    "action": "load_dataset",
+    "format": "sft",
+    "max_samples": 5  # Only load 5 samples to view
+})
+sft_data = json.loads(sft_result)
+
+print(f"Dataset size: {sft_data['dataset_size']}")
+print(f"Data format: {sft_data['format']}")
+print(f"Sample keys: {sft_data['sample_keys']}")
+
+# 2. Load RL format dataset
+rl_result = rl_tool.run({
+    "action": "load_dataset",
+    "format": "rl",
+    "max_samples": 5
+})
+rl_data = json.loads(rl_result)
+
+print(f"Dataset size: {rl_data['dataset_size']}")
+print(f"Data format: {rl_data['format']}")
+print(f"Sample keys: {rl_data['sample_keys']}")
+```
+
+As can be seen, SFT format contains complete solution processes for supervised learning; RL format only contains final answers, and the model needs to generate reasoning processes itself. The `max_samples` parameter controls the number of samples loaded, convenient for quick testing.
+
+### 11.2.2 Reward Function Design
+
+Reward functions are the core of reinforcement learning, defining what "good behavior" is. A good reward function can guide agents to learn correct strategies, while a poor reward function may lead to training failure or learning wrong behaviors.
+
+In reinforcement learning, the reward function $r(s, a)$ or $r(s, a, s')$ assigns a numerical reward to each action of the agent. The agent's goal is to maximize cumulative reward:
+
+$$
+J(\theta) = \mathbb{E}_{\tau \sim \pi_\theta} \left[\sum_{t=0}^{T} \gamma^t r(s_t, a_t)\right]
+$$
+
+For mathematical reasoning tasks, we can simplify to:
+
+$$
+r(q, a) = f(a, a^*)
+$$
+
+Where $q$ is the question, $a$ is the answer generated by the model, $a^*$ is the correct answer, and $f$ is the evaluation function.
+
+Reward function design directly affects training effectiveness. Good reward functions should clearly define what success is, provide gradient signals, not produce excessive variance, and be easy to adjust and combine. Poor reward functions may only give rewards at task end with no intermediate feedback, have reward hacking where agents find "cheating" ways to get high rewards, have multiple conflicting objectives, or have excessive variance preventing convergence.
+
+HelloAgents provides three built-in reward functions that can be used individually or in combination, as shown in Figure 11.5.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-5.png" alt="" width="85%"/>
+  <p>Figure 11.5 Reward Function Design</p>
+</div>
+
+**(1) Accuracy Reward**
+
+Accuracy Reward (AccuracyReward) is the most basic reward function, only caring whether the answer is correct. Mathematical definition:
+
+$$
+r_{\text{acc}}(a, a^*) = \begin{cases}
+1 & \text{if } a = a^* \\
+0 & \text{otherwise}
+\end{cases}
+$$
+
+Where $a$ is the answer generated by the model and $a^*$ is the correct answer. This is a binary reward function, getting 1 point for correct answers and 0 for incorrect ones.
+
+Implementation requires handling answer extraction and comparison. Model output may contain large amounts of text, and we need to extract the final answer. Common extraction methods include: finding numbers after "Final Answer:", finding numbers after "####" marker, using regular expressions to extract the last number. Answer comparison needs to handle numerical precision (such as 72.0 and 72 should be considered the same), unit conversion (such as 1000 and 1k), and format differences (such as "72" and "seventy-two").
+
+Usage example:
+
+```python
+from hello_agents.tools import RLTrainingTool
+import json
+rl_tool = RLTrainingTool()
+
+# Create accuracy reward function
+reward_result = rl_tool.run({
+    "action": "create_reward",
+    "reward_type": "accuracy"
+})
+reward_data = json.loads(reward_result)
+
+print(f"Reward type: {reward_data['reward_type']}")
+print(f"Description: {reward_data['description']}")
+
+# Note: The create_reward operation of RLTrainingTool returns configuration information,
+# the actual reward function will be automatically created and used during training
+```
+
+Output:
+
+```json
+Prediction: 72, Ground truth: 72, Reward: 1.0
+Prediction: 72.0, Ground truth: 72, Reward: 1.0
+Prediction: 73, Ground truth: 72, Reward: 0.0
+```
+
+Advantages of accuracy reward: simple and direct, easy to understand and implement, suitable for tasks with clear correct answers. Disadvantages: sparse reward, only fully correct answers get rewards, cannot distinguish between "close to correct" and "completely wrong", may lead to lack of effective feedback in early training.
+
+**(2) Length Penalty**
+
+Length Penalty (LengthPenaltyReward) encourages the model to generate concise answers, avoiding verbosity. Mathematical definition:
+
+$$
+r_{\text{length}}(a, a^*, l) = r_{\text{acc}}(a, a^*) - \alpha \cdot \max(0, l - l_{\text{target}})
+$$
+
+Where $l$ is the length of generated text (character count or token count), $l_{\text{target}}$ is the target length, and $\alpha$ is the penalty coefficient (default 0.001). Length penalty is only applied when the answer is correct, avoiding the model generating incorrect short answers to reduce penalty.
+
+Design rationale: if answer is incorrect, reward is 0 (regardless of length); if answer is correct and length is reasonable, reward is 1; if answer is correct but too long, reward is $1 - \alpha \cdot (l - l_{\text{target}})$. For example, target length 200 characters, actual length 500 characters, penalty coefficient 0.001, then reward is $1 - 0.001 \times (500 - 200) = 0.7$.
+
+Usage example:
+
+```python
+# Create length penalty reward function
+reward_result = rl_tool.run({
+    "action": "create_reward",
+    "reward_type": "length_penalty",
+    "max_length": 1024,      # Maximum length
+    "penalty_weight": 0.001  # Penalty weight
+})
+reward_data = json.loads(reward_result)
+
+print(f"Reward type: {reward_data['reward_type']}")
+print(f"Description: {reward_data['description']}")
+print(f"Max length: {reward_data['max_length']}")
+print(f"Penalty weight: {reward_data['penalty_weight']}")
+```
+
+Output:
+
+```
+Prediction: 72, Ground truth: 72, Length: 50, Reward: 1.000
+Prediction: 72, Ground truth: 72, Length: 200, Reward: 1.000
+Prediction: 72, Ground truth: 72, Length: 500, Reward: 0.700
+Prediction: 73, Ground truth: 72, Length: 50, Reward: 0.000
+```
+
+Advantages of length penalty: encourages concise expression, avoids model generating redundant content, can control reasoning cost (shorter output means less token consumption). Disadvantages: may suppress detailed reasoning, requires careful adjustment of penalty coefficient, optimal length varies greatly across different tasks.
+
+**(3) Step Reward**
+
+Step Reward (StepReward) encourages the model to generate clear reasoning steps, improving interpretability. Mathematical definition:
+
+$$
+r_{\text{step}}(a, a^*, s) = r_{\text{acc}}(a, a^*) + \beta \cdot s
+$$
+
+Where $s$ is the number of detected reasoning steps and $\beta$ is the step reward coefficient (default 0.1). Similarly, step rewards are only given when the answer is correct.
+
+Step detection methods include: finding "Step 1:", "Step 2:" markers, counting newline characters, using regular expressions to match reasoning patterns. For example, a correct answer with 3 clear steps gets reward $1 + 0.1 \times 3 = 1.3$.
+
+Usage example:
+
+```python
+# Create step reward function
+reward_result = rl_tool.run({
+    "action": "create_reward",
+    "reward_type": "step",
+    "step_bonus": 0.1  # 0.1 reward per step
+})
+reward_data = json.loads(reward_result)
+
+print(f"Reward type: {reward_data['reward_type']}")
+print(f"Description: {reward_data['description']}")
+print(f"Step bonus: {reward_data['step_bonus']}")
+```
+
+Output:
+
+```
+Prediction: 72, Ground truth: 72, Steps: 0, Reward: 1.00
+Prediction: 72, Ground truth: 72, Steps: 2, Reward: 1.20
+Prediction: 72, Ground truth: 72, Steps: 5, Reward: 1.50
+Prediction: 73, Ground truth: 72, Steps: 5, Reward: 0.00
+```
+
+Advantages of step reward: encourages interpretable reasoning, generated answers are easier to verify and debug, helps model learn systematic thinking. Disadvantages: may lead model to generate redundant steps to get more rewards, needs to balance step quantity and answer quality, step detection may be inaccurate.
+
+In practical applications, we typically combine multiple reward functions to balance different objectives. Common combination strategies include:
+
+**Accuracy + Length Penalty**: Encourages concise correct answers, suitable for dialogue systems and Q&A systems. Formula:
+
+$$
+r = r_{\text{acc}} - \alpha \cdot \max(0, l - l_{\text{target}})
+$$
+
+**Accuracy + Step Reward**: Encourages detailed reasoning processes, suitable for educational scenarios and explainable AI. Formula:
+
+$$
+r = r_{\text{acc}} + \beta \cdot s
+$$
+
+**Three-way Balance**: Comprehensively optimizes answer quality, conciseness, and interpretability. Formula:
+
+$$
+r = r_{\text{acc}} - \alpha \cdot \max(0, l - l_{\text{target}}) + \beta \cdot s
+$$
+
+Weights $\alpha$ and $\beta$ need to be carefully adjusted to avoid one objective dominating excessively.
+
+Usage example:
+
+```python
+# Combined reward function: accuracy + length penalty + step reward
+# Note: RLTrainingTool currently supports single reward type
+# Combined rewards need to be specified through reward_fn parameter in training configuration
+# This shows how to configure different types of reward functions
+
+# Accuracy reward
+accuracy_result = rl_tool.run({
+    "action": "create_reward",
+    "reward_type": "accuracy"
+})
+print("Accuracy reward:", json.loads(accuracy_result)['description'])
+
+# Length penalty reward
+length_result = rl_tool.run({
+    "action": "create_reward",
+    "reward_type": "length_penalty",
+    "max_length": 1024,
+    "penalty_weight": 0.001
+})
+print("Length penalty reward:", json.loads(length_result)['description'])
+
+# Step reward
+step_result = rl_tool.run({
+    "action": "create_reward",
+    "reward_type": "step",
+    "step_bonus": 0.1
+})
+print("Step reward:", json.loads(step_result)['description'])
+```
+
+Output:
+
+```
+Combined reward: 1.200
+  - Accuracy: 1.0
+  - Length penalty: -0.100
+  - Step reward: +0.3
+```
+
+As shown in Table 11.4, different reward functions are suitable for different application scenarios.
+
+<div align="center">
+  <p>Table 11.4 Reward Function Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-4.png" alt="" width="85%"/>
+</div>
+
+### 11.2.3 Custom Datasets and Reward Functions
+
+Although HelloAgents provides the GSM8K dataset and common reward functions, in practical applications you may need to use your own dataset or design specific reward functions. This section will introduce how to extend the framework.
+
+Before using custom datasets, you need to understand the data requirements for two training formats:
+
+**SFT Format**: Used for supervised fine-tuning, needs to contain the following fields:
+- `prompt`: Input prompt (containing system and user messages)
+- `completion`: Expected output
+- `text`: Complete dialogue text (optional)
+
+**RL Format**: Used for reinforcement learning, needs to contain the following fields:
+- `question`: Original question
+- `prompt`: Input prompt (containing system and user messages)
+- `ground_truth`: Correct answer
+- `full_answer`: Complete answer (including reasoning process)
+
+**(1) Converting with format_math_dataset**
+
+The simplest method is to prepare raw data containing `question` and `answer` fields, then use the `format_math_dataset()` function for automatic conversion:
+
+```python
+from datasets import Dataset
+from hello_agents.rl import format_math_dataset
+
+# 1. Prepare raw data
+custom_data = [
+    {
+        "question": "What is 2+2?",
+        "answer": "2+2=4. #### 4"
+    },
+    {
+        "question": "What is 5*3?",
+        "answer": "5*3=15. #### 15"
+    },
+    {
+        "question": "What is 10+7?",
+        "answer": "10+7=17. #### 17"
+    }
+]
+
+# 2. Convert to Dataset object
+raw_dataset = Dataset.from_list(custom_data)
+
+# 3. Convert to SFT format
+sft_dataset = format_math_dataset(
+    dataset=raw_dataset,
+    format_type="sft",
+    model_name="Qwen/Qwen3-0.6B"
+)
+print(f"SFT dataset: {len(sft_dataset)} samples")
+print(f"Fields: {sft_dataset.column_names}")
+
+# 4. Convert to RL format
+rl_dataset = format_math_dataset(
+    dataset=raw_dataset,
+    format_type="rl",
+    model_name="Qwen/Qwen3-0.6B"
+)
+print(f"RL dataset: {len(rl_dataset)} samples")
+print(f"Fields: {rl_dataset.column_names}")
+```
+
+**(2) Directly Passing Custom Dataset**
+
+When using RLTrainingTool, you can directly pass a custom dataset through the `custom_dataset` parameter:
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+rl_tool = RLTrainingTool()
+
+# SFT training
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "sft",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/custom_sft",
+    "num_epochs": 3,
+    "batch_size": 4,
+    "use_lora": True,
+    "custom_dataset": sft_dataset  # Directly pass custom dataset
+})
+
+# GRPO training
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/custom_grpo",
+    "num_epochs": 2,
+    "batch_size": 2,
+    "use_lora": True,
+    "custom_dataset": rl_dataset  # Directly pass custom dataset
+})
+```
+
+**(3) Registering Custom Dataset (Recommended)**
+
+For datasets that need to be used multiple times, registration is recommended:
+
+```python
+# 1. Register dataset
+rl_tool.register_dataset("my_math_dataset", rl_dataset)
+
+# 2. Use registered dataset
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "dataset": "my_math_dataset",  # Use registered dataset name
+    "output_dir": "./models/custom_grpo",
+    "num_epochs": 2,
+    "use_lora": True
+})
+```
+
+Reward functions are used to evaluate the quality of answers generated by the model. Custom reward functions need to follow this signature:
+
+```python
+from typing import List
+import re
+
+def custom_reward_function(
+    completions: List[str],
+    **kwargs
+) -> List[float]:
+    """
+    Custom reward function
+
+    Args:
+        completions: List of completion texts generated by the model
+        **kwargs: Other parameters, typically including:
+            - ground_truth: List of correct answers
+            - Other dataset fields
+
+    Returns:
+        List of reward values (each value between 0.0-1.0)
+    """
+    ground_truths = kwargs.get("ground_truth", [])
+    rewards = []
+
+    for completion, truth in zip(completions, ground_truths):
+        reward = 0.0
+
+        # Extract answer
+        numbers = re.findall(r'-?\d+\.?\d*', completion)
+        if numbers:
+            try:
+                pred = float(numbers[-1])
+                truth_num = float(truth)
+                error = abs(pred - truth_num)
+
+                # Give different rewards based on error
+                if error < 0.01:
+                    reward = 1.0  # Completely correct
+                elif error < 1.0:
+                    reward = 0.8  # Very close
+                elif error < 5.0:
+                    reward = 0.5  # Close
+
+                # Extra reward: encourage showing reasoning steps
+                if "step" in completion.lower() or "=" in completion:
+                    reward += 0.1
+
+            except ValueError:
+                reward = 0.0
+
+        rewards.append(min(reward, 1.0))  # Limit maximum value to 1.0
+
+    return rewards
+```
+
+There are two ways to use custom reward functions:
+
+**(1) Direct Passing**
+
+```python
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/custom_grpo",
+    "custom_dataset": rl_dataset,
+    "custom_reward": custom_reward_function  # Directly pass reward function
+})
+```
+
+**(2) Registration (Recommended)**
+
+```python
+# 1. Register reward function
+rl_tool.register_reward_function("my_reward", custom_reward_function)
+
+# 2. Use registered reward function
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "dataset": "my_math_dataset",
+    "output_dir": "./models/custom_grpo"
+    # Reward function will automatically use registered function with same name as dataset
+})
+```
+
+Here is a complete example of custom dataset and reward function:
+
+```python
+from datasets import Dataset
+from hello_agents.tools import RLTrainingTool
+from hello_agents.rl import format_math_dataset
+import re
+from typing import List
+
+# 1. Prepare custom data
+custom_data = [
+    {"question": "What is 2+2?", "answer": "2+2=4. #### 4"},
+    {"question": "What is 5+3?", "answer": "5+3=8. #### 8"},
+    {"question": "What is 10+7?", "answer": "10+7=17. #### 17"}
+]
+
+# 2. Convert to training format
+raw_dataset = Dataset.from_list(custom_data)
+rl_dataset = format_math_dataset(raw_dataset, format_type="rl")
+
+# 3. Define custom reward function
+def tolerant_reward(completions: List[str], **kwargs) -> List[float]:
+    """Reward function with tolerance"""
+    ground_truths = kwargs.get("ground_truth", [])
+    rewards = []
+
+    for completion, truth in zip(completions, ground_truths):
+        numbers = re.findall(r'-?\d+\.?\d*', completion)
+        if numbers:
+            try:
+                pred = float(numbers[-1])
+                truth_num = float(truth)
+                error = abs(pred - truth_num)
+
+                if error < 0.01:
+                    reward = 1.0
+                elif error < 5.0:
+                    reward = 0.5
+                else:
+                    reward = 0.0
+            except ValueError:
+                reward = 0.0
+        else:
+            reward = 0.0
+
+        rewards.append(reward)
+
+    return rewards
+
+# 4. Create tool and register
+rl_tool = RLTrainingTool()
+rl_tool.register_dataset("my_dataset", rl_dataset)
+rl_tool.register_reward_function("my_dataset", tolerant_reward)
+
+# 5. Train
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "dataset": "my_dataset",
+    "output_dir": "./models/custom_grpo",
+    "num_epochs": 2,
+    "batch_size": 2,
+    "use_lora": True
+})
+```
+
+## 11.3 SFT Training
+
+Supervised Fine-Tuning (SFT) is the first step of reinforcement learning training and the most important foundation. SFT allows the model to learn the basic format of tasks, dialogue patterns, and preliminary reasoning capabilities. Without the foundation of SFT, directly conducting reinforcement learning often fails because the model doesn't even know the basic output format.
+
+### 11.3.1 Why SFT is Needed
+
+Before starting reinforcement learning, we need to conduct SFT training first. This is because although pretrained models have powerful language capabilities, they don't know how to complete specific tasks. The training objective of pretrained models is to predict the next word, not to solve math problems or use tools. The output format of pretrained models is free text, while we need structured output (such as "Step 1: ..., Step 2: ..., Final Answer: ..."). Pretrained models haven't seen task-related data and don't know what a "good" reasoning process is.
+
+The role of SFT is to teach the model the basic rules of the task. First, learning output format, letting the model know how to organize answers (such as using "Step 1", "Final Answer" markers). Second, learning reasoning patterns, learning how to decompose problems and derive step by step through examples. Third, establishing baseline capabilities, providing a reasonable starting point for subsequent reinforcement learning. Finally, reducing exploration space, reinforcement learning doesn't need to start from scratch and can optimize based on SFT.
+
+Let's understand the importance of SFT through a comparative experiment. Suppose we directly use a pretrained model to solve GSM8K problems:
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+# Load pretrained model
+model_name = "Qwen/Qwen3-0.6B"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+
+# Test question
+question = """Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"""
+
+# Construct input
+prompt = f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
+inputs = tokenizer(prompt, return_tensors="pt")
+
+# Generate answer
+outputs = model.generate(**inputs, max_new_tokens=200)
+response = tokenizer.decode(outputs[0], skip_special_tokens=False)
+
+print("Pretrained model's answer:")
+print(response)
+```
+
+The pretrained model's output might be:
+
+```bash
+<|im_start|>user
+Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?<|im_end|>
+<|im_start|>assistant
+I can help you with that. Let me think about this problem. Natalia sold clips to her friends. In April she sold 48 clips. In May she sold half as many. So we need to find out how many she sold in May and then add them together. But I'm not sure how to calculate half of 48. Maybe I should use a calculator? Or maybe I can just estimate...
+```
+
+As can be seen, although the pretrained model understands the question, the output is very verbose, lacks structure, has no clear answer, and has confused reasoning process. Such output cannot be used for reinforcement learning because we cannot extract answers, cannot evaluate quality, and cannot provide effective reward signals.
+
+Now let's look at the model after SFT:
+
+```python
+# Load SFT model (assuming already trained)
+sft_model_path = "./models/sft_model"
+sft_model = AutoModelForCausalLM.from_pretrained(sft_model_path)
+
+# Use the same question
+outputs = sft_model.generate(**inputs, max_new_tokens=200)
+response = tokenizer.decode(outputs[0], skip_special_tokens=False)
+
+print("SFT model's answer:")
+print(response)
+```
+
+SFT model's output:
+
+```bash
+<|im_start|>user
+Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?<|im_end|>
+<|im_start|>assistant
+Let me solve this step by step.
+
+Step 1: Calculate clips sold in May
+Natalia sold half as many clips in May as in April.
+Clips in May = 48 / 2 = 24
+
+Step 2: Calculate total clips
+Total = April + May = 48 + 24 = 72
+
+Final Answer: 72<|im_end|>
+```
+
+As can be seen, the SFT model's output has clear structure (using "Step 1", "Step 2", "Final Answer" markers), correct reasoning, clear answer, and unified format. Such output can be used for reinforcement learning because we can extract answers, calculate rewards, and optimize strategies.
+
+As shown in Figure 11.6, SFT is the bridge from pretrained models to reinforcement learning.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-6.png" alt="" width="85%"/>
+  <p>Figure 11.6 Role of SFT in Training Pipeline</p>
+</div>
+
+### 11.3.2 LoRA: Parameter-Efficient Fine-Tuning
+
+Directly fine-tuning the entire model requires substantial computational resources and memory. For Qwen3-0.6B (0.6B parameters), full fine-tuning requires about 12GB memory (FP16) or 24GB memory (FP32). For larger models (such as 7B, 13B), full fine-tuning is almost impossible on consumer-grade GPUs.
+
+LoRA (Low-Rank Adaptation)<sup>[3]</sup> is a parameter-efficient fine-tuning method that only trains a small number of additional parameters while keeping the original model parameters frozen. The core idea of LoRA is: parameter changes during model fine-tuning can be represented by low-rank matrices.
+
+Assume the original model's weight matrix is $W \in \mathbb{R}^{d \times k}$, and the fine-tuned weight is $W' = W + \Delta W$. LoRA assumes $\Delta W$ can be decomposed into the product of two low-rank matrices:
+
+$$
+\Delta W = BA
+$$
+
+Where $B \in \mathbb{R}^{d \times r}$, $A \in \mathbb{R}^{r \times k}$, $r \ll \min(d, k)$ is the rank.
+
+During forward propagation, the output is:
+
+$$
+h = Wx + \Delta Wx = Wx + BAx
+$$
+
+The original model parameters $W$ remain frozen, only training $B$ and $A$.
+
+Parameter count comparison: original model parameter count is $d \times k$, LoRA parameter count is $d \times r + r \times k = r(d + k)$. When $r \ll \min(d, k)$, LoRA parameter count is much smaller than the original model. For example, for $d=4096, k=4096, r=8$, original model parameter count is $4096 \times 4096 = 16,777,216$, LoRA parameter count is $8 \times (4096 + 4096) = 65,536$, a 256-fold reduction in parameters!
+
+Therefore, we can summarize LoRA's advantages: significantly reduced memory usage, faster training speed, easy deployment, and prevention of overfitting. However, training effectiveness is usually somewhat worse than full parameter tuning.
+
+As shown in Table 11.5, comparison of LoRA effects at different model scales.
+
+<div align="center">
+  <p>Table 11.5 LoRA vs Full Fine-Tuning Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-5.png" alt="" width="85%"/>
+</div>
+
+LoRA's key hyperparameters include: rank (r), controlling the rank of LoRA matrices, larger means stronger expressiveness but more parameters, typical values 4-64, default 8; Alpha ($\alpha$), LoRA scaling factor, actual update is $\Delta W = \frac{\alpha}{r} BA$, controls LoRA's influence strength, typical value equals rank; target_modules, specifying which layers to apply LoRA, usually choosing attention layers (q_proj, k_proj, v_proj, o_proj), can also include MLP layers (gate_proj, up_proj, down_proj).
+
+### 11.3.3 SFT Training Practice
+
+Now let's conduct SFT training using HelloAgents. The complete training process includes: preparing dataset, configuring LoRA, setting training parameters, starting training, and saving model.
+
+Basic training example:
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+# Create training tool
+rl_tool = RLTrainingTool()
+
+# SFT training
+result = rl_tool.run({
+    # Training configuration
+    "action": "train",
+    "algorithm": "sft",
+
+    # Model configuration
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/sft_model",
+
+    # Data configuration
+    "max_samples": 100,     # Use 100 samples for quick test
+
+    # Training parameters
+    "num_epochs": 3,        # Train for 3 epochs
+    "batch_size": 4,        # Batch size
+    "learning_rate": 5e-5,  # Learning rate
+
+    # LoRA configuration
+    "use_lora": True,       # Use LoRA
+    "lora_rank": 8,         # LoRA rank
+    "lora_alpha": 16,       # LoRA alpha
+})
+
+print(f"\n✓ Training completed!")
+print(f"  - Model save path: {result['model_path']}")
+print(f"  - Training samples: {result['num_samples']}")
+print(f"  - Training epochs: {result['num_epochs']}")
+print(f"  - Final loss: {result['final_loss']:.4f}")
+```
+
+If the loss gradually decreases during training, it indicates the model is learning.
+
+**(1) Training Parameter Details**
+
+Let's understand the meaning and tuning suggestions for each training parameter in detail.
+
+**Data Parameters**:
+
+- `max_samples`: Number of training samples to use. For quick testing, use 100-1000 samples; for complete training, recommend using all data (7473 samples). More data usually brings better results, but training time is also longer.
+- `split`: Dataset split, default "train". Can be set to "train[:1000]" to use only the first 1000 samples.
+
+**Training Parameters**:
+
+- `num_epochs`: Number of training epochs. 1 epoch means traversing the entire dataset once. Too few (1-2 epochs) may underfit, too many (>10 epochs) may overfit. Recommend starting from 3 epochs, observe loss curve and adjust.
+- `batch_size`: Number of samples used per update. Larger is more stable but uses more memory. Recommend adjusting based on memory: 4GB memory use batch_size=1-2, 8GB memory use batch_size=4-8, 16GB memory use batch_size=8-16.
+- `learning_rate`: Learning rate, controls parameter update step size. Too small (1e-6) converges slowly, too large (1e-3) may not converge. SFT recommends 5e-5, LoRA can be slightly larger (1e-4).
+
+**LoRA Parameters**:
+
+- `use_lora`: Whether to use LoRA. Recommend always enabling unless there is sufficient memory.
+- `lora_rank`: LoRA rank, controls expressiveness. 4-8 suitable for small tasks, 16-32 suitable for complex tasks, 64 suitable for large-scale fine-tuning.
+- `lora_alpha`: LoRA scaling factor, usually set to 2 times the rank. When rank=8, alpha=16; when rank=16, alpha=32.
+
+**Optimizer Parameters**:
+
+- `optimizer`: Optimizer type, default "adamw". AdamW is the most commonly used choice, can also try "sgd" or "adafactor".
+- `weight_decay`: Weight decay, prevents overfitting. Default 0.01, can try 0.001-0.1.
+- `warmup_ratio`: Learning rate warmup ratio. Learning rate increases linearly for the first warmup_ratio steps, then decays linearly. Default 0.1 (warmup for first 10% steps).
+
+**(2) Complete Training Example**
+
+Let's conduct a complete SFT training using all data and best practices:
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+rl_tool = RLTrainingTool()
+
+# Complete SFT training
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "sft",
+
+    # Model configuration
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/sft_full",
+
+    # Data configuration
+    "max_samples": None,    # Use all data (7473 samples)
+
+    # Training parameters
+    "num_epochs": 3,
+    "batch_size": 8,
+    "learning_rate": 5e-5,
+    "warmup_ratio": 0.1,
+    "weight_decay": 0.01,
+
+    # LoRA configuration
+    "use_lora": True,
+    "lora_rank": 16,        # Use larger rank
+    "lora_alpha": 32,
+    "lora_target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
+
+    # Other configurations
+    "save_steps": 500,      # Save every 500 steps
+    "logging_steps": 100,   # Log every 100 steps
+    "eval_steps": 500,      # Evaluate every 500 steps
+})
+
+print(f"Training completed! Model saved at: {result['model_path']}")
+```
+
+This configuration is suitable for training on GPUs with 8GB memory, estimated to take 30-60 minutes.
+
+**(3) Training Monitoring and Debugging**
+
+During training, we need to monitor three key metrics. Loss should gradually decrease; if it doesn't decrease, learning rate may be too small or data may have problems; if it decreases then rises, learning rate may be too large or overfitting may occur. Gradient Norm should be in a reasonable range of 0.1-10; too large (>100) indicates gradient explosion and requires reducing learning rate; too small (<0.01) indicates gradient vanishing and requires checking model configuration. Learning Rate should change according to warmup strategy, linearly increasing for the first 10% steps, then linearly decaying to 0.
+
+Common problems during training and solutions: when out of memory, reduce batch_size or max_length, use gradient accumulation or smaller model; when training is slow, increase batch_size, reduce logging frequency, or use mixed precision training; when loss doesn't decrease, increase learning rate, check data format, or increase training epochs; when overfitting, increase weight_decay, reduce training epochs, or use more data.
+
+### 11.3.4 Model Evaluation
+
+After training is complete, we need to evaluate the model's effectiveness. Evaluation metrics include:
+
+- **Accuracy**: Proportion of completely correct answers, most direct metric, range 0-1, higher is better.
+
+- **Average Reward**: Average reward across all samples, comprehensively considering accuracy, length, steps and other factors, range depends on reward function design.
+
+- **Reasoning Quality**: Clarity and logic of reasoning process, requires manual evaluation or specialized evaluation models.
+
+Using HelloAgents to evaluate models:
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+rl_tool = RLTrainingTool()
+
+# Evaluate SFT model
+eval_result = rl_tool.run({
+    "action": "evaluate",
+    "model_path": "./models/sft_full",
+    "max_samples": 100,     # Evaluate on 100 test samples
+    "use_lora": True,
+})
+
+eval_data = json.loads(eval_result)
+print(f"\nEvaluation results:")
+print(f"  - Accuracy: {eval_data['accuracy']}")
+print(f"  - Average reward: {eval_data['average_reward']}")
+print(f"  - Test samples: {eval_data['num_samples']}")
+```
+
+For small models like Qwen3-0.6B, achieving 40-50% accuracy on GSM8K after SFT is normal. Through reinforcement learning, we can further improve to 60-70%.
+
+To better understand SFT's effectiveness, we can compare models at different stages:
+
+```python
+# Evaluate pretrained model (without SFT)
+base_result = rl_tool.run({
+    "action": "evaluate",
+    "model_path": "Qwen/Qwen3-0.6B",
+    "max_samples": 100,
+    "use_lora": False,
+})
+base_data = json.loads(base_result)
+
+# Evaluate SFT model
+sft_result = rl_tool.run({
+    "action": "evaluate",
+    "model_path": "./models/sft_full",
+    "max_samples": 100,
+    "use_lora": True,
+})
+sft_data = json.loads(sft_result)
+
+# Compare results
+print("Model comparison:")
+print(f"Pretrained model accuracy: {base_data['accuracy']}")
+print(f"SFT model accuracy: {sft_data['accuracy']}")
+```
+
+In this section, we learned about SFT's importance (learning format, establishing baseline), LoRA principles (low-rank decomposition, parameter efficiency), SFT training practice (parameter configuration, training monitoring), and model evaluation (accuracy, comparative analysis).
+
+## 11.4 GRPO Training
+
+After completing SFT training, we have obtained a model capable of generating structured answers. However, the SFT model has only learned to "imitate" the reasoning process in training data and hasn't truly learned to "think". Reinforcement learning can allow the model to optimize reasoning strategies through trial and error, thereby surpassing the quality of training data.
+
+### 11.4.1 From PPO to GRPO
+
+In the field of reinforcement learning, PPO (Proximal Policy Optimization)<sup>[1]</sup> is one of the most classic algorithms. PPO ensures training stability by limiting the magnitude of policy updates. However, PPO has some problems in LLM training: it requires training a Value Model, increasing training complexity and memory usage; it requires maintaining four models simultaneously (Policy Model, Reference Model, Value Model, Reward Model), making engineering implementation complex; training is unstable, prone to reward collapse or policy degradation.
+
+GRPO (Group Relative Policy Optimization)<sup>[2]</sup> is a simplified PPO variant specifically designed for LLMs. GRPO's core idea is: no need for Value Model, using group-relative rewards instead of absolute rewards; simplified training process, only requiring Policy Model and Reference Model; improved training stability, reducing risk of reward collapse.
+
+Let's understand GRPO's principles through mathematical formulas. PPO's objective function is:
+
+$$
+\mathcal{L}_{\text{PPO}}(\theta) = \mathbb{E}_{s,a \sim \pi_\theta} \left[ \min\left( \frac{\pi_\theta(a|s)}{\pi_{\text{old}}(a|s)} A(s,a), \text{clip}\left(\frac{\pi_\theta(a|s)}{\pi_{\text{old}}(a|s)}, 1-\epsilon, 1+\epsilon\right) A(s,a) \right) \right]
+$$
+
+Where $A(s,a)$ is the advantage function, requiring Value Model to estimate:
+
+$$
+A(s,a) = Q(s,a) - V(s) = r(s,a) + \gamma V(s') - V(s)
+$$
+
+GRPO's objective function is simplified to:
+
+$$
+\mathcal{L}_{\text{GRPO}}(\theta) = \mathbb{E}_{s,a \sim \pi_\theta} \left[ \frac{\pi_\theta(a|s)}{\pi_{\text{ref}}(a|s)} \cdot (r(s,a) - \bar{r}_{\text{group}}) \right] - \beta \cdot D_{KL}(\pi_\theta || \pi_{\text{ref}})
+$$
+
+Where $\bar{r}_{\text{group}}$ is the group average reward and $\beta$ is the KL divergence penalty coefficient. Key differences are: GRPO uses $r(s,a) - \bar{r}_{\text{group}}$ instead of advantage function $A(s,a)$, no need for Value Model; GRPO uses group-relative rewards, reducing reward variance; GRPO adds KL divergence penalty, preventing policy from deviating too far.
+
+As shown in Figure 11.7, comparison of PPO and GRPO training processes.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-7.png" alt="" width="85%"/>
+  <p>Figure 11.7 PPO vs GRPO Training Process</p>
+</div>
+
+As can be seen, GRPO eliminates Value Model training, greatly simplifying the process.
+
+As shown in Table 11.6, detailed comparison of PPO and GRPO.
+
+<div align="center">
+  <p>Table 11.6 PPO vs GRPO Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-6.png" alt="" width="85%"/>
+</div>
+
+
+
+For LLM training, GRPO is a better choice because it is simpler, more stable, and has lower memory usage.
+
+### 11.4.2 GRPO Training Practice
+
+Now let's conduct GRPO training using HelloAgents. The prerequisite for GRPO training is completing SFT training, because GRPO requires a reasonable initial policy.
+
+Basic GRPO training example:
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+# Create training tool
+rl_tool = RLTrainingTool()
+
+# GRPO training
+result = rl_tool.run({
+    # Training configuration
+    "action": "train",
+    "algorithm": "grpo",
+
+    # Model configuration
+    "model_name": "./models/sft_full",  # Start from SFT model
+    "output_dir": "./models/grpo_model",
+
+    # Data configuration
+    "max_samples": 100,     # Use 100 samples for quick test
+
+    # Training parameters
+    "num_epochs": 3,
+    "batch_size": 4,
+    "learning_rate": 1e-5,  # GRPO learning rate usually smaller than SFT
+
+    # GRPO-specific parameters
+    "num_generations": 4,   # Generate 4 answers per question
+    "kl_coef": 0.05,        # KL divergence penalty coefficient
+
+    # LoRA configuration
+    "use_lora": True,
+    "lora_rank": 16,
+    "lora_alpha": 32,
+
+    # Reward function configuration
+    "reward_type": "accuracy",  # Use accuracy reward
+})
+
+print(f"\n✓ Training completed!")
+print(f"  - Model save path: {result['model_path']}")
+print(f"  - Training samples: {result['num_samples']}")
+print(f"  - Training epochs: {result['num_epochs']}")
+print(f"  - Average reward: {result['average_reward']:.4f}")
+```
+
+If average reward gradually increases and KL divergence remains in a reasonable range during GRPO training, it indicates training is proceeding normally.
+
+GRPO has some specific parameters that need to be understood and tuned.
+
+**Generation Parameters**:
+
+- `num_generations`: How many answers to generate per question. More is better, but computational cost is also higher. Typical values are 4-8. The purpose of generating multiple answers is to calculate group-relative rewards and increase diversity of training signals.
+- `max_new_tokens`: Maximum number of tokens to generate per answer. Too few may truncate answers, too many wastes computation. Recommend 256-512.
+- `temperature`: Generation temperature, controls randomness. 0 means greedy decoding, 1 means standard sampling. GRPO recommends 0.7-1.0, maintaining some exploration.
+
+**Optimization Parameters**:
+
+- `learning_rate`: GRPO's learning rate is usually smaller than SFT because we don't want to deviate too far from the SFT model. Recommend 1e-5 to 5e-5.
+- `kl_coef`: KL divergence penalty coefficient, controls magnitude of policy updates. Too small (0.01) may cause policy to deviate too far, too large (0.5) may limit learning. Recommend 0.05-0.1.
+- `clip_range`: Policy ratio clipping range, similar to PPO's epsilon. Recommend 0.2.
+
+**Reward Parameters**:
+
+- `reward_type`: Reward function type, can be "accuracy", "length_penalty", "step", or "combined".
+- `reward_config`: Additional configuration for reward function, such as target length for length penalty, coefficient for step reward, etc.
+
+Let's conduct a complete GRPO training using all data and best practices:
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+rl_tool = RLTrainingTool()
+
+# Complete GRPO training
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+
+    # Model configuration
+    "model_name": "./models/sft_full",
+    "output_dir": "./models/grpo_full",
+
+    # Data configuration
+    "max_samples": None,    # Use all data
+
+    # Training parameters
+    "num_epochs": 3,
+    "batch_size": 4,
+    "learning_rate": 1e-5,
+    "warmup_ratio": 0.1,
+
+    # GRPO-specific parameters
+    "num_generations": 4,
+    "max_new_tokens": 512,
+    "temperature": 0.8,
+    "kl_coef": 0.05,
+    "clip_range": 0.2,
+
+    # LoRA configuration
+    "use_lora": True,
+    "lora_rank": 16,
+    "lora_alpha": 32,
+
+    # Reward function configuration
+    "reward_type": "combined",
+    "reward_config": {
+        "components": [
+            {"type": "accuracy", "weight": 1.0},
+            {"type": "length_penalty", "weight": 0.5, "target_length": 200},
+            {"type": "step", "weight": 0.3, "step_bonus": 0.1}
+        ]
+    },
+
+    # Other configurations
+    "save_steps": 500,
+    "logging_steps": 100,
+})
+
+print(f"Training completed! Model saved at: {result['model_path']}")
+```
+
+### 11.4.3 GRPO Training Process Analysis
+
+Let's deeply understand GRPO's training process and see what happens at each step.
+
+**(1) Training Loop**
+
+GRPO's training loop includes the following steps:
+
+1. **Sampling Phase**: For each question, use current policy to generate multiple answers (`num_generations`). These answers form a "group" for calculating relative rewards.
+
+2. **Reward Calculation**: Calculate reward $r_i$ for each generated answer. Rewards can be accuracy, length penalty, step reward, or their combination.
+
+3. **Relative Reward**: Calculate group average reward $\bar{r} = \frac{1}{N}\sum_{i=1}^{N} r_i$, then calculate relative reward $\hat{r}_i = r_i - \bar{r}$. The benefit of this is reducing reward variance and making training more stable.
+
+4. **Policy Update**: Use relative rewards to update policy, while adding KL divergence penalty to prevent policy from deviating too far from reference model.
+
+5. **Repeat**: Repeat above steps until all training epochs are complete.
+
+Let's understand through a specific example:
+
+```python
+# Assume we have a question
+question = "What is 48 + 24?"
+
+# Generate 4 answers
+answers = [
+    "48 + 24 = 72. Final Answer: 72",      # Correct
+    "48 + 24 = 72. Final Answer: 72",      # Correct
+    "48 + 24 = 70. Final Answer: 70",      # Incorrect
+    "Let me think... 72. Final Answer: 72" # Correct but verbose
+]
+
+# Calculate rewards (assuming using accuracy + length penalty)
+rewards = [1.0, 1.0, 0.0, 0.8]  # 4th answer penalized for verbosity
+
+# Calculate group average reward
+avg_reward = (1.0 + 1.0 + 0.0 + 0.8) / 4 = 0.7
+
+# Calculate relative rewards
+relative_rewards = [
+    1.0 - 0.7 = 0.3,   # Correct and concise, positive relative reward
+    1.0 - 0.7 = 0.3,   # Correct and concise, positive relative reward
+    0.0 - 0.7 = -0.7,  # Incorrect, negative relative reward
+    0.8 - 0.7 = 0.1    # Correct but verbose, smaller relative reward
+]
+
+# Policy update: increase probability of first two answers, decrease probability of third answer
+```
+
+As can be seen, the relative reward mechanism encourages the model to generate answers "better than average" rather than simply pursuing high rewards. This can reduce reward variance and improve training stability.
+
+**(2) KL Divergence Penalty**
+
+KL divergence penalty is a key component of GRPO, preventing policy from deviating too far from the reference model. KL divergence is defined as:
+
+$$
+D_{KL}(\pi_\theta || \pi_{\text{ref}}) = \mathbb{E}_{s,a \sim \pi_\theta} \left[ \log \frac{\pi_\theta(a|s)}{\pi_{\text{ref}}(a|s)} \right]
+$$
+
+In practice, we calculate KL divergence for each token, then sum:
+
+$$
+D_{KL} = \sum_{t=1}^{T} \log \frac{\pi_\theta(a_t|s, a_{<t})}{\pi_{\text{ref}}(a_t|s, a_{<t})}
+$$
+
+The larger the KL divergence, the greater the difference between current policy and reference model. By adding KL divergence penalty term $-\beta \cdot D_{KL}$, we limit the magnitude of policy updates, avoiding "forgetting" knowledge learned during SFT phase.
+
+The choice of `kl_coef` ($\beta$) is important:
+
+- Too small (0.01): Policy may deviate too far, causing output format confusion or quality degradation
+- Too large (0.5): Policy updates are limited, learning is slow, difficult to surpass SFT model
+- Recommended (0.05-0.1): Balance exploration and stability
+
+**(3) Training Monitoring**
+
+During GRPO training, we need to monitor the following metrics:
+
+- **Average Reward**: Should gradually increase. If reward doesn't increase, learning rate may be too small, KL penalty too large, or reward function design unreasonable. If reward rises then falls, may be overfitting or reward collapse.
+
+- **KL Divergence**: Should remain in reasonable range (0.01-0.1). If KL divergence is too large (>0.5), policy deviates too far, need to increase kl_coef or reduce learning rate. If KL divergence is too small (<0.001), policy is barely updating, need to reduce kl_coef or increase learning rate.
+
+- **Accuracy**: Should gradually improve. This is the most intuitive metric, reflecting the model's actual capability.
+
+- **Generation Quality**: Need manual inspection of generated answers to ensure correct format and clear reasoning.
+
+HelloAgents integrates two mainstream training monitoring tools: Weights & Biases (wandb) and TensorBoard.
+
+**Method 1: Using Weights & Biases (Recommended)**
+
+Weights & Biases is currently the most popular machine learning experiment tracking platform, providing powerful visualization and experiment management features.
+
+```python
+import os
+
+# 1. Set up wandb (need to register account first: https://wandb.ai)
+os.environ["WANDB_PROJECT"] = "hello-agents-grpo"  # Project name
+os.environ["WANDB_LOG_MODEL"] = "false"            # Don't upload model files
+
+# 2. Enable wandb in training configuration
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/grpo_monitored",
+    "num_epochs": 2,
+    "batch_size": 2,
+    "use_lora": True,
+    # wandb will automatically log all training metrics
+})
+
+# After training completes, visit https://wandb.ai to view training curves
+```
+
+wandb will automatically log the following metrics:
+- `train/reward`: Average reward
+- `train/kl`: KL divergence
+- `train/loss`: Training loss
+- `train/learning_rate`: Learning rate
+- `train/epoch`: Training epoch
+
+**Method 2: Using TensorBoard**
+
+TensorBoard is a visualization tool provided by TensorFlow, also supporting PyTorch training.
+
+```python
+# 1. TensorBoard logs will be automatically created in output_dir during training
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/grpo_tb",
+    "num_epochs": 2,
+    "batch_size": 2,
+    "use_lora": True,
+})
+
+# 2. Launch TensorBoard to view training curves
+# Run in command line:
+# tensorboard --logdir=./models/grpo_tb
+# Then visit http://localhost:6006
+```
+
+**Method 3: Offline Monitoring (No External Tools Required)**
+
+If you don't want to use wandb or TensorBoard, you can also monitor through training logs:
+
+```python
+# Training process will print detailed logs
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/grpo_simple",
+    "num_epochs": 2,
+    "batch_size": 2,
+    "use_lora": True,
+})
+
+# Log example:
+# Epoch 1/2 | Step 100/500 | Reward: 0.45 | KL: 0.023 | Loss: 1.234
+# Epoch 1/2 | Step 200/500 | Reward: 0.52 | KL: 0.031 | Loss: 1.156
+# ...
+```
+
+In GRPO training, you may encounter some problems. When reward doesn't increase, it may be that learning rate is too small or KL penalty is too large limiting policy updates, or reward function design is unreasonable or SFT model quality is too poor. In this case, increase learning rate (from 1e-5 to 5e-5), reduce kl_coef (from 0.1 to 0.05), check reward function, or retrain SFT model.
+
+When KL divergence explodes (exceeds 0.5 or even 1.0) causing generated answer format confusion, it's usually because learning rate is too large or KL penalty is too small, or reward function is too aggressive. You can reduce learning rate (from 5e-5 to 1e-5), increase kl_coef (from 0.05 to 0.1), adjust reward function, or use gradient clipping.
+
+When generation quality degrades (accuracy improves but format is confused, reasoning unclear), it may be that reward function only focuses on accuracy ignoring other quality metrics, or KL penalty is too small causing model to deviate too far from SFT, or overfitting occurs. In this case, use combined reward function to optimize multiple metrics simultaneously, increase kl_coef to maintain consistency, reduce training epochs, or increase training data.
+
+GRPO training has higher memory usage than SFT because it needs to generate multiple answers simultaneously and store reference model outputs, prone to OOM. You can reduce num_generations (from 8 to 4), batch_size (from 4 to 2), or max_new_tokens (from 512 to 256), or use gradient checkpointing and mixed precision training to alleviate.
+
+## 11.5 Model Evaluation and Analysis
+
+After training is complete, we need to comprehensively evaluate model performance, not only looking at accuracy as a single metric, but also deeply analyzing model's reasoning quality, error patterns, generalization ability, etc. This section will introduce how to systematically evaluate and analyze Agentic RL models.
+
+### 11.5.1 Evaluation Metric System
+
+A good evaluation system should be multi-dimensional, measuring model capabilities from different angles. We divide evaluation metrics into three categories: accuracy metrics, efficiency metrics, and quality metrics.
+
+**(1) Accuracy Metrics**
+
+Accuracy metrics measure whether the model can arrive at correct answers.
+
+**Accuracy**: Most basic metric, proportion of completely correct answers. Calculation formula:
+$$
+\text{Accuracy} = \frac{\text{Number of correct answers}}{\text{Total number of questions}}
+$$
+
+Advantages are simple and intuitive, easy to understand and compare. Disadvantages are inability to distinguish "nearly correct" from "completely wrong", may be too coarse for complex tasks.
+
+**Top-K Accuracy**: Generate K answers, count as correct if at least one is correct. Calculation formula:
+$$
+\text{Accuracy@K} = \frac{\text{Number of questions with at least one correct answer}}{\text{Total number of questions}}
+$$
+
+This metric reflects the model's "potential", i.e., whether correct answers can be found through multiple sampling.
+
+**Numerical Error**: For mathematical problems, can calculate error between predicted and true values. Calculation formula:
+
+$$
+\text{Error} = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|
+$$
+
+This metric can distinguish "nearly correct" (e.g., predicted 72.5, actual 72) from "completely wrong" (e.g., predicted 100, actual 72).
+
+**(2) Efficiency Metrics**
+
+Efficiency metrics measure the cost of generating answers.
+
+**Average Length**: Average number of tokens in generated answers. Calculation formula:
+
+$$
+\text{Avg Length} = \frac{1}{N} \sum_{i=1}^{N} |y_i|
+$$
+
+Shorter answers mean lower inference cost and faster response speed.
+
+**Reasoning Steps**: Number of reasoning steps contained in answers. Calculation formula:
+
+$$
+\text{Avg Steps} = \frac{1}{N} \sum_{i=1}^{N} s_i
+$$
+
+Appropriate number of steps (2-5 steps) indicates model can systematically decompose problems; too many steps may indicate redundant reasoning.
+
+**Inference Time**: Time required to generate one answer. This metric is important in actual deployment, affecting user experience.
+
+**(3) Quality Metrics**
+
+Quality metrics measure readability and explainability of answers.
+
+**Format Correctness**: Whether answers conform to expected format (e.g., containing markers like "Step 1", "Final Answer"). Calculation formula:
+$$
+\text{Format Correctness} = \frac{\text{Number of correctly formatted answers}}{\text{Total number of answers}}
+$$
+
+Correct format is a basic requirement; answers with confused format are difficult to use even if results are correct.
+
+**Reasoning Coherence**: Whether reasoning steps are logically coherent. This metric usually requires manual evaluation or specialized evaluation models.
+
+**Explainability**: Whether answers are easy to understand and verify. Answers with clear steps are more explainable than answers that directly give results.
+
+As shown in Table 11.7, comparison of different metrics.
+
+<div align="center">
+  <p>Table 11.7 Evaluation Metric Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-7.png" alt="" width="85%"/>
+</div>
+
+
+### 11.5.2 Evaluation Practice
+
+HelloAgents provides comprehensive evaluation functionality, capable of calculating multiple metrics at once.
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+rl_tool = RLTrainingTool()
+
+# Comprehensive evaluation
+print("=" * 50)
+print("Comprehensive GRPO Model Evaluation")
+print("=" * 50)
+
+result = rl_tool.run({
+    "action": "evaluate",
+    "model_path": "./models/grpo_full",
+    "max_samples": 200,
+    "use_lora": True,
+
+    # Evaluation configuration
+    "metrics": [
+        "accuracy",           # Accuracy
+        "accuracy_at_k",      # Top-K accuracy
+        "average_length",     # Average length
+        "average_steps",      # Average steps
+        "format_correctness", # Format correctness
+    ],
+    "k": 3,  # Top-3 accuracy
+})
+
+# Parse results
+eval_data = json.loads(result)
+
+# Print results
+print(f"\nEvaluation results:")
+print(f"  Accuracy: {eval_data['accuracy']}")
+print(f"  Average reward: {eval_data['average_reward']}")
+print(f"  Test samples: {eval_data['num_samples']}")
+```
+
+We can compare performance of pretrained model, SFT model, and GRPO model:
+
+```python
+# Evaluate three models
+models = [
+    ("Pretrained Model", "Qwen/Qwen3-0.6B", False),
+    ("SFT Model", "./models/sft_full", True),
+    ("GRPO Model", "./models/grpo_full", True),
+]
+
+results = []
+for name, path, use_lora in models:
+    print(f"\nEvaluating {name}...")
+    result = rl_tool.run({
+        "action": "evaluate",
+        "model_path": path,
+        "max_samples": 200,
+        "use_lora": use_lora,
+        "metrics": ["accuracy", "average_length", "format_correctness"],
+    })
+    results.append((name, result))
+
+# Print comparison table
+print("\n" + "=" * 70)
+print(f"{'Model':<15} {'Accuracy':<12} {'Avg Length':<15} {'Format Correct':<12}")
+print("=" * 70)
+for name, result in results:
+    print(f"{name:<15} {result['accuracy']:<12.2%} {result['average_length']:<15.1f} {result['format_correctness']:<12.2%}")
+print("=" * 70)
+```
+
+### 11.5.3 Error Analysis
+
+Knowing accuracy alone is not enough; we need to deeply analyze what types of problems the model is prone to errors on, thereby guiding subsequent improvements. Model errors can be divided into four categories: calculation errors (reasoning steps correct but calculation wrong, e.g., "48/2=25", indicating insufficient numerical calculation ability), reasoning errors (reasoning logic errors leading to wrong problem-solving approach, e.g., adding first then dividing instead of dividing first then adding, indicating insufficient logical reasoning ability), comprehension errors (not correctly understanding the problem, e.g., question asks for "total" but only calculated part, indicating insufficient language understanding ability), format errors (answer correct but format doesn't meet requirements, e.g., missing "Final Answer:" marker, indicating insufficient format learning).
+
+Error analysis example:
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+rl_tool = RLTrainingTool()
+
+# Evaluate and collect error samples
+result = rl_tool.run({
+    "action": "evaluate",
+    "model_path": "./models/grpo_full",
+    "max_samples": 200,
+    "use_lora": True,
+    "return_details": True,  # Return detailed results
+})
+
+# Analyze error samples
+errors = result['errors']  # Error sample list
+print(f"Total errors: {len(errors)}")
+
+# Classify by error type
+error_types = {
+    "Calculation Error": 0,
+    "Reasoning Error": 0,
+    "Comprehension Error": 0,
+    "Format Error": 0,
+}
+
+for error in errors:
+    question = error['question']
+    prediction = error['prediction']
+    ground_truth = error['ground_truth']
+
+    # Simple error classification logic (may need more complex analysis in practice)
+    if "Final Answer:" not in prediction:
+        error_types["Format Error"] += 1
+    elif "Step" in prediction:
+        # Has reasoning steps, may be calculation or reasoning error
+        # More detailed analysis needed here
+        error_types["Calculation Error"] += 1
+    else:
+        error_types["Comprehension Error"] += 1
+
+# Print error distribution
+print("\nError type distribution:")
+for error_type, count in error_types.items():
+    percentage = count / len(errors) * 100
+    print(f"  {error_type}: {count} ({percentage:.1f}%)")
+```
+
+Output example:
+
+```bash
+Total errors: 76
+
+Error type distribution:
+  Calculation Error: 32 (42.1%)
+  Reasoning Error: 18 (23.7%)
+  Comprehension Error: 22 (28.9%)
+  Format Error: 4 (5.3%)
+```
+
+As can be seen, calculation errors are the main error type (42.1%), indicating the model's numerical calculation ability needs strengthening. Format errors are rare (5.3%), indicating SFT training was effective. We can also analyze the model's performance on problems of different difficulty:
+
+```python
+# Group by number of reasoning steps
+step_groups = {
+    "Easy (1-2 steps)": [],
+    "Medium (3-4 steps)": [],
+    "Hard (5+ steps)": [],
+}
+
+for sample in result['details']:
+    steps = sample['ground_truth_steps']  # Number of steps in true answer
+    correct = sample['correct']
+
+    if steps <= 2:
+        step_groups["Easy (1-2 steps)"].append(correct)
+    elif steps <= 4:
+        step_groups["Medium (3-4 steps)"].append(correct)
+    else:
+        step_groups["Hard (5+ steps)"].append(correct)
+
+# Calculate accuracy for each group
+print("\nAccuracy at different difficulty levels:")
+for group_name, results in step_groups.items():
+    if len(results) > 0:
+        accuracy = sum(results) / len(results)
+        print(f"  {group_name}: {accuracy:.2%} ({len(results)} samples)")
+```
+
+Output example:
+
+```bash
+Accuracy at different difficulty levels:
+  Easy (1-2 steps): 78.50% (85 samples)
+  Medium (3-4 steps): 58.30% (96 samples)
+  Hard (5+ steps): 31.60% (19 samples)
+```
+
+As can be seen, the model performs well on easy problems (78.5%) but poorly on hard problems (31.6%). This indicates the model's multi-step reasoning ability needs improvement.
+
+### 11.5.4 Improvement Directions
+
+Based on evaluation and analysis results, we can determine improvement directions for the model, as shown in Figure 11.8.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-8.png" alt="" width="85%"/>
+  <p>Figure 11.8 Model Improvement Iteration Process</p>
+</div>
+
+This is a continuous iteration process: train model → evaluate performance → analyze errors → identify problems → select improvement direction → retrain. Through multiple iterations, model performance will continuously improve.
+
+## 11.6 Complete Training Pipeline Practice
+
+In previous sections, we learned about data preparation, SFT training, GRPO training, and model evaluation separately. Now, let's integrate this knowledge to complete an end-to-end Agentic RL training pipeline.
+
+### 11.6.1 End-to-End Training Pipeline
+
+A complete Agentic RL training pipeline includes the following stages: data preparation, SFT training, SFT evaluation, GRPO training, GRPO evaluation, and model deployment. As shown in Figure 11.9.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-9.png" alt="" width="85%"/>
+  <p>Figure 11.9 End-to-End Training Pipeline</p>
+</div>
+
+Let's implement this pipeline through a complete script:
+
+```python
+"""
+Complete Agentic RL Training Pipeline
+End-to-end example from data preparation to model deployment
+"""
+
+from hello_agents.tools import RLTrainingTool
+import json
+from datetime import datetime
+
+class AgenticRLPipeline:
+    """Agentic RL Training Pipeline"""
+
+    def __init__(self, config_path="config.json"):
+        """
+        Initialize training pipeline
+
+        Args:
+            config_path: Configuration file path
+        """
+        self.rl_tool = RLTrainingTool()
+        self.config = self.load_config(config_path)
+        self.results = {}
+
+    def load_config(self, config_path):
+        """Load configuration file"""
+        with open(config_path, 'r') as f:
+            return json.load(f)
+
+    def log(self, message):
+        """Log message"""
+        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+        print(f"[{timestamp}] {message}")
+
+    def stage1_prepare_data(self):
+        """Stage 1: Data Preparation"""
+        self.log("=" * 50)
+        self.log("Stage 1: Data Preparation")
+        self.log("=" * 50)
+
+        # Load and check dataset
+        result = self.rl_tool.run({
+            "action": "load_dataset",
+            "format": "sft",
+            "max_samples": self.config["data"]["max_samples"],
+        })
+
+        # Parse JSON result
+        dataset_info = json.loads(result)
+
+        self.log(f"✓ Dataset loaded")
+        self.log(f"  - Samples: {dataset_info['dataset_size']}")
+        self.log(f"  - Format: {dataset_info['format']}")
+        self.log(f"  - Data columns: {', '.join(dataset_info['sample_keys'])}")
+
+        self.results["data"] = dataset_info
+
+        return dataset_info
+
+    def stage2_sft_training(self):
+        """Stage 2: SFT Training"""
+        self.log("\n" + "=" * 50)
+        self.log("Stage 2: SFT Training")
+        self.log("=" * 50)
+
+        sft_config = self.config["sft"]
+
+        result = self.rl_tool.run({
+            "action": "train",
+            "algorithm": "sft",
+            "model_name": self.config["model"]["base_model"],
+            "output_dir": sft_config["output_dir"],
+            "max_samples": self.config["data"]["max_samples"],
+            "num_epochs": sft_config["num_epochs"],
+            "batch_size": sft_config["batch_size"],
+            "use_lora": True,
+            # Training monitoring configuration
+            "use_wandb": self.config.get("monitoring", {}).get("use_wandb", False),
+            "use_tensorboard": self.config.get("monitoring", {}).get("use_tensorboard", True),
+            "wandb_project": self.config.get("monitoring", {}).get("wandb_project", None),
+        })
+
+        # Parse JSON result
+        result_data = json.loads(result)
+
+        self.log(f"✓ SFT training completed")
+        self.log(f"  - Model path: {result_data['output_dir']}")
+        self.log(f"  - Status: {result_data['status']}")
+
+        self.results["sft_training"] = result_data
+
+        return result_data["output_dir"]
+
+    def stage3_sft_evaluation(self, model_path):
+        """Stage 3: SFT Evaluation"""
+        self.log("\n" + "=" * 50)
+        self.log("Stage 3: SFT Evaluation")
+        self.log("=" * 50)
+
+        result = self.rl_tool.run({
+            "action": "evaluate",
+            "model_path": model_path,
+            "max_samples": self.config["eval"]["max_samples"],
+            "use_lora": True,
+        })
+        eval_data = json.loads(result)
+
+        self.log(f"✓ SFT evaluation completed")
+        self.log(f"  - Accuracy: {eval_data['accuracy']}")
+        self.log(f"  - Average reward: {eval_data['average_reward']}")
+
+        self.results["sft_evaluation"] = eval_data
+
+        return eval_data
+
+    def stage4_grpo_training(self, sft_model_path):
+        """Stage 4: GRPO Training"""
+        self.log("\n" + "=" * 50)
+        self.log("Stage 4: GRPO Training")
+        self.log("=" * 50)
+
+        grpo_config = self.config["grpo"]
+
+        result = self.rl_tool.run({
+            "action": "train",
+            "algorithm": "grpo",
+            "model_name": sft_model_path,
+            "output_dir": grpo_config["output_dir"],
+            "max_samples": self.config["data"]["max_samples"],
+            "num_epochs": grpo_config["num_epochs"],
+            "batch_size": grpo_config["batch_size"],
+            "use_lora": True,
+            # Training monitoring configuration
+            "use_wandb": self.config.get("monitoring", {}).get("use_wandb", False),
+            "use_tensorboard": self.config.get("monitoring", {}).get("use_tensorboard", True),
+            "wandb_project": self.config.get("monitoring", {}).get("wandb_project", None),
+        })
+
+        # Parse JSON result
+        result_data = json.loads(result)
+
+        self.log(f"✓ GRPO training completed")
+        self.log(f"  - Model path: {result_data['output_dir']}")
+        self.log(f"  - Status: {result_data['status']}")
+
+        self.results["grpo_training"] = result_data
+
+        return result_data["output_dir"]
+
+    def stage5_grpo_evaluation(self, model_path):
+        """Stage 5: GRPO Evaluation"""
+        self.log("\n" + "=" * 50)
+        self.log("Stage 5: GRPO Evaluation")
+        self.log("=" * 50)
+
+        result = self.rl_tool.run({
+            "action": "evaluate",
+            "model_path": model_path,
+            "max_samples": self.config["eval"]["max_samples"],
+            "use_lora": True,
+        })
+        eval_data = json.loads(result)
+
+        self.log(f"✓ GRPO evaluation completed")
+        self.log(f"  - Accuracy: {eval_data['accuracy']}")
+        self.log(f"  - Average reward: {eval_data['average_reward']}")
+
+        self.results["grpo_evaluation"] = eval_data
+
+        return eval_data
+
+    def stage6_save_results(self):
+        """Stage 6: Save Results"""
+        self.log("\n" + "=" * 50)
+        self.log("Stage 6: Save Results")
+        self.log("=" * 50)
+
+        # Save training results
+        results_path = "training_results.json"
+        with open(results_path, 'w') as f:
+            json.dump(self.results, f, indent=2)
+
+        self.log(f"✓ Results saved to: {results_path}")
+
+    def run(self):
+        """Run complete pipeline"""
+        try:
+            # Stage 1: Data preparation
+            self.stage1_prepare_data()
+
+            # Stage 2: SFT training
+            sft_model_path = self.stage2_sft_training()
+
+            # Stage 3: SFT evaluation
+            self.stage3_sft_evaluation(sft_model_path)
+
+            # Stage 4: GRPO training
+            grpo_model_path = self.stage4_grpo_training(sft_model_path)
+
+            # Stage 5: GRPO evaluation
+            self.stage5_grpo_evaluation(grpo_model_path)
+
+            # Stage 6: Save results
+            self.stage6_save_results()
+
+            self.log("\n" + "=" * 50)
+            self.log("✓ Training pipeline completed!")
+            self.log("=" * 50)
+
+        except Exception as e:
+            self.log(f"\n✗ Training failed: {str(e)}")
+            raise
+
+# Usage example
+if __name__ == "__main__":
+    # Create configuration file
+    config = {
+        "model": {
+            "base_model": "Qwen/Qwen3-0.6B"
+        },
+        "data": {
+            "max_samples": 1000  # Use 1000 samples
+        },
+        "sft": {
+            "output_dir": "./models/sft_model",
+            "num_epochs": 3,
+            "batch_size": 8,
+        },
+        "grpo": {
+            "output_dir": "./models/grpo_model",
+            "num_epochs": 3,
+            "batch_size": 4,
+        },
+        "eval": {
+            "max_samples": 200,
+            "sft_accuracy_threshold": 0.40  # SFT accuracy threshold
+        },
+        "monitoring": {
+            "use_wandb": False,  # Whether to use Wandb
+            "use_tensorboard": True,  # Whether to use TensorBoard
+            "wandb_project": "agentic-rl-pipeline"  # Wandb project name
+        }
+    }
+
+    # Save configuration
+    with open("config.json", 'w') as f:
+        json.dump(config, f, indent=2)
+
+    # Run training pipeline
+    pipeline = AgenticRLPipeline("config.json")
+    pipeline.run()
+```
+
+Running this script, you will see the complete training process.
+
+Running tips:
+
+**Start Small**: Don't start training with all data at once. First use 100-1000 samples for quick iteration, validate process and parameters, and scale up after confirming effectiveness. This can save significant time and computational resources.
+
+**Data Quality Check**: Check data quality before training, ensure correct format, accurate answers, and no duplicate samples. You can use the following code:
+
+```python
+def check_data_quality(dataset):
+    """Check data quality"""
+    issues = []
+
+    # Check required fields
+    required_fields = ["prompt", "completion"]
+    for field in required_fields:
+        if field not in dataset.column_names:
+            issues.append(f"Missing field: {field}")
+
+    # Check null values
+    for i, sample in enumerate(dataset):
+        if not sample["prompt"] or not sample["completion"]:
+            issues.append(f"Sample {i} contains null values")
+
+    # Check duplicates
+    prompts = [s["prompt"] for s in dataset]
+    duplicates = len(prompts) - len(set(prompts))
+    if duplicates > 0:
+        issues.append(f"Found {duplicates} duplicate samples")
+
+    return issues
+
+# Usage
+issues = check_data_quality(dataset)
+if issues:
+    print("Data quality issues:")
+    for issue in issues:
+        print(f"  - {issue}")
+else:
+    print("✓ Data quality check passed")
+```
+
+**Data Augmentation**: If data volume is insufficient, consider data augmentation, such as rewriting questions (keeping answers unchanged), generating similar questions, or back translation. But be careful to maintain data quality and avoid introducing noise.
+
+### 11.6.2 Hyperparameter Tuning
+
+Hyperparameter tuning is key to improving model performance. Here are some commonly used tuning strategies.
+
+**(1) Grid Search**
+
+Grid Search is the simplest tuning method, traversing all parameter combinations and selecting the best set.
+
+```python
+# Define parameter grid
+param_grid = {
+    "learning_rate": [1e-5, 5e-5, 1e-4],
+    "lora_rank": [8, 16, 32],
+    "kl_coef": [0.05, 0.1, 0.2],
+}
+
+best_accuracy = 0
+best_params = None
+
+# Traverse all combinations
+for lr in param_grid["learning_rate"]:
+    for rank in param_grid["lora_rank"]:
+        for kl in param_grid["kl_coef"]:
+            print(f"Testing parameters: lr={lr}, rank={rank}, kl={kl}")
+
+            # Train model
+            result = rl_tool.run({
+                "action": "train",
+                "algorithm": "grpo",
+                "learning_rate": lr,
+                "lora_rank": rank,
+                "kl_coef": kl,
+                # Other parameters...
+            })
+
+            # Evaluate model
+            eval_result = rl_tool.run({
+                "action": "evaluate",
+                "model_path": result["model_path"],
+            })
+
+            # Update best parameters
+            if eval_result["accuracy"] > best_accuracy:
+                best_accuracy = eval_result["accuracy"]
+                best_params = {"lr": lr, "rank": rank, "kl": kl}
+
+print(f"Best parameters: {best_params}")
+print(f"Best accuracy: {best_accuracy:.2%}")
+```
+
+Grid search advantages are simple and direct, can find global optimum. Disadvantages are high computational cost, impractical when many parameters.
+
+**(2) Random Search**
+
+Random Search randomly samples parameter combinations, more efficient than grid search.
+
+```python
+import random
+
+# Define parameter ranges
+param_ranges = {
+    "learning_rate": (1e-6, 1e-4),  # Log-uniform distribution
+    "lora_rank": [4, 8, 16, 32, 64],
+    "kl_coef": (0.01, 0.5),
+}
+
+best_accuracy = 0
+best_params = None
+
+# Random sampling N times
+N = 10
+for i in range(N):
+    # Randomly sample parameters
+    lr = 10 ** random.uniform(-6, -4)  # Log-uniform
+    rank = random.choice(param_ranges["lora_rank"])
+    kl = random.uniform(0.01, 0.5)
+
+    print(f"[{i+1}/{N}] Testing parameters: lr={lr:.2e}, rank={rank}, kl={kl:.3f}")
+
+    # Train and evaluate (same as above)
+    # ...
+
+print(f"Best parameters: {best_params}")
+print(f"Best accuracy: {best_accuracy:.2%}")
+```
+
+Random search advantages are high efficiency, suitable for large parameter spaces. Disadvantages are may miss optimal solution.
+
+**(3) Bayesian Optimization**
+
+Bayesian Optimization uses probabilistic models to guide search, more intelligent. Can use libraries like Optuna:
+
+```python
+import optuna
+
+def objective(trial):
+    """Optimization objective function"""
+    # Sample parameters
+    lr = trial.suggest_loguniform("learning_rate", 1e-6, 1e-4)
+    rank = trial.suggest_categorical("lora_rank", [8, 16, 32])
+    kl = trial.suggest_uniform("kl_coef", 0.01, 0.5)
+
+    # Train model
+    result = rl_tool.run({
+        "action": "train",
+        "algorithm": "grpo",
+        "learning_rate": lr,
+        "lora_rank": rank,
+        "kl_coef": kl,
+        # Other parameters...
+    })
+
+    # Evaluate model
+    eval_result = rl_tool.run({
+        "action": "evaluate",
+        "model_path": result["model_path"],
+    })
+
+    return eval_result["accuracy"]
+
+# Create study
+study = optuna.create_study(direction="maximize")
+study.optimize(objective, n_trials=20)
+
+# Print best parameters
+print(f"Best parameters: {study.best_params}")
+print(f"Best accuracy: {study.best_value:.2%}")
+```
+
+Bayesian optimization advantages are high sample efficiency, can quickly find good parameters. Disadvantages are complex implementation, requires additional libraries.
+
+As shown in Table 11.8, comparison of different tuning methods.
+
+<div align="center">
+  <p>Table 11.8 Hyperparameter Tuning Method Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-8.png" alt="" width="85%"/>
+</div>
+
+### 11.6.3 Distributed Training
+
+When data volume and model scale increase, single GPU training becomes very slow. At this point we need to use distributed training to accelerate the training process. HelloAgents is based on TRL and Hugging Face Accelerate, naturally supporting multi-GPU and multi-node distributed training.
+
+**Solution Selection Recommendations**:
+
+- **Single Machine Multi-GPU (2-8 cards)**: Use DDP, simple and efficient
+- **Large Models (>7B)**: Use DeepSpeed ZeRO-2 or ZeRO-3
+- **Multi-Node Cluster**: Use DeepSpeed ZeRO-3 + Offload
+
+**(1) Configure Accelerate**
+
+First need to create Accelerate configuration file. Run the following command:
+
+```bash
+accelerate config
+```
+
+Select configuration according to prompts:
+
+```
+In which compute environment are you running?
+> This machine
+
+Which type of machine are you using?
+> multi-GPU
+
+How many different machines will you use?
+> 1
+
+Do you wish to optimize your script with torch dynamo?
+> NO
+
+Do you want to use DeepSpeed?
+> YES
+
+Which DeepSpeed config file do you want to use?
+> ZeRO-2
+
+How many GPU(s) should be used for distributed training?
+> 4
+```
+
+This will generate a configuration file at `~/.cache/huggingface/accelerate/default_config.yaml`.
+
+**(2) Training with DDP**
+
+**Data Parallel (DDP)** is the simplest distributed solution, each GPU holds a complete model copy, data is split across GPUs.
+
+**Accelerate Configuration File** (`multi_gpu_ddp.yaml`):
+
+```yaml
+compute_environment: LOCAL_MACHINE
+distributed_type: MULTI_GPU
+num_processes: 4  # Number of GPUs
+machine_rank: 0
+num_machines: 1
+gpu_ids: all
+mixed_precision: fp16
+```
+
+**Training Script** (no modification needed):
+
+```python
+from hello_agents.tools import RLTrainingTool
+
+rl_tool = RLTrainingTool()
+
+# Training code remains unchanged
+result = rl_tool.run({
+    "action": "train",
+    "algorithm": "grpo",
+    "model_name": "Qwen/Qwen3-0.6B",
+    "output_dir": "./models/grpo_ddp",
+    "num_epochs": 3,
+    "batch_size": 4,  # Batch size per GPU
+    "use_lora": True,
+})
+```
+
+**Launch Training**:
+
+```bash
+# Using configuration file
+accelerate launch --config_file multi_gpu_ddp.yaml train_script.py
+
+# Or directly specify parameters
+accelerate launch --num_processes 4 --mixed_precision fp16 train_script.py
+```
+
+**(3) Training with DeepSpeed ZeRO**
+
+**DeepSpeed ZeRO** significantly reduces memory usage by sharding optimizer states, gradients, and model parameters, supporting larger models and batch sizes.
+
+**ZeRO-2 Configuration File** (`deepspeed_zero2.yaml`):
+
+```yaml
+compute_environment: LOCAL_MACHINE
+distributed_type: DEEPSPEED
+num_processes: 4
+machine_rank: 0
+num_machines: 1
+gpu_ids: all
+mixed_precision: fp16
+deepspeed_config:
+  gradient_accumulation_steps: 4
+  gradient_clipping: 1.0
+  offload_optimizer_device: none
+  offload_param_device: none
+  zero3_init_flag: false
+  zero_stage: 2  # ZeRO-2
+```
+
+**ZeRO-3 Configuration File** (`deepspeed_zero3.yaml`):
+
+```yaml
+compute_environment: LOCAL_MACHINE
+distributed_type: DEEPSPEED
+num_processes: 4
+machine_rank: 0
+num_machines: 1
+gpu_ids: all
+mixed_precision: fp16
+deepspeed_config:
+  gradient_accumulation_steps: 4
+  gradient_clipping: 1.0
+  offload_optimizer_device: cpu  # Offload optimizer states to CPU
+  offload_param_device: cpu      # Offload parameters to CPU
+  zero3_init_flag: true
+  zero_stage: 3  # ZeRO-3
+```
+
+**Launch Training**:
+
+```bash
+# ZeRO-2
+accelerate launch --config_file deepspeed_zero2.yaml train_script.py
+
+# ZeRO-3
+accelerate launch --config_file deepspeed_zero3.yaml train_script.py
+```
+
+As shown in Table 11.9, this is a memory comparison for training Qwen3-0.6B model with different methods:
+
+<div align="center">
+  <p>Table 11.9 Memory Comparison (Qwen3-0.6B Model)</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/11-figures/11-table-9.png" alt="" width="85%"/>
+</div>
+
+**(4) Multi-Node Training**
+
+For ultra-large-scale training, multiple nodes (machines) can be used.
+
+**Main Node Configuration** (`multi_node_main.yaml`):
+
+```yaml
+compute_environment: LOCAL_MACHINE
+distributed_type: DEEPSPEED
+num_processes: 16  # 4 nodes x 4 GPUs
+machine_rank: 0    # Main node
+num_machines: 4
+main_process_ip: 192.168.1.100  # Main node IP
+main_process_port: 29500
+gpu_ids: all
+mixed_precision: fp16
+deepspeed_config:
+  zero_stage: 3
+  offload_optimizer_device: cpu
+  offload_param_device: cpu
+```
+
+**Worker Node Configuration** (modify `machine_rank` to 1, 2, 3):
+
+```yaml
+machine_rank: 1  # Worker node 1
+# Other configurations same
+```
+
+**Launch Training**:
+
+```bash
+# On main node
+accelerate launch --config_file multi_node_main.yaml train_script.py
+
+# On worker node 1
+accelerate launch --config_file multi_node_worker1.yaml train_script.py
+
+# On worker node 2
+accelerate launch --config_file multi_node_worker2.yaml train_script.py
+
+# On worker node 3
+accelerate launch --config_file multi_node_worker3.yaml train_script.py
+```
+
+**(5) Distributed Training Best Practices**
+
+**1. Batch Size Adjustment**
+
+In distributed training, total batch size = `per_device_batch_size × num_gpus × gradient_accumulation_steps`
+
+```python
+# Single GPU: batch_size=4, gradient_accumulation=4, total_batch=16
+# 4GPU DDP: batch_size=4, gradient_accumulation=1, total_batch=16 (keep consistent)
+```
+
+**2. Learning Rate Scaling**
+
+Use linear scaling rule: `lr_new = lr_base × sqrt(total_batch_size_new / total_batch_size_base)`
+
+```python
+# Baseline: single GPU, batch=16, lr=5e-5
+# 4GPU: batch=64, lr=5e-5 × sqrt(64/16) = 1e-4
+```
+
+**3. Monitoring and Debugging**
+
+```python
+# Enable verbose logging
+export ACCELERATE_LOG_LEVEL=INFO
+
+# Enable NCCL debugging (multi-node)
+export NCCL_DEBUG=INFO
+
+# Check GPU utilization
+watch -n 1 nvidia-smi
+```
+
+### 11.6.4 Production Deployment
+
+After training is complete, we need to deploy the model to production environment. Here are some deployment recommendations.
+
+**(1) Model Export**
+
+Merge LoRA weights into base model for easier deployment:
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+
+# Load base model
+base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
+
+# Load LoRA weights
+model = PeftModel.from_pretrained(base_model, "./models/grpo_model")
+
+# Merge weights
+merged_model = model.merge_and_unload()
+
+# Save merged model
+merged_model.save_pretrained("./models/merged_model")
+
+# Save tokenizer
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
+tokenizer.save_pretrained("./models/merged_model")
+
+print("✓ Model exported to: ./models/merged_model")
+```
+
+**(2) Inference Optimization**
+
+Use quantization and optimization techniques to accelerate inference:
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+# Load model (using 8-bit quantization)
+model = AutoModelForCausalLM.from_pretrained(
+    "./models/merged_model",
+    load_in_8bit=True,  # 8-bit quantization
+    device_map="auto",  # Auto device allocation
+)
+
+tokenizer = AutoTokenizer.from_pretrained("./models/merged_model")
+
+# Inference
+def generate_answer(question):
+    prompt = f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
+    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=512,
+        temperature=0.7,
+        do_sample=True,
+    )
+
+    response = tokenizer.decode(outputs[0], skip_special_tokens=False)
+    return response
+
+# Test
+question = "What is 48 + 24?"
+answer = generate_answer(question)
+print(answer)
+```
+
+**(3) API Service**
+
+Create inference service using FastAPI:
+
+```python
+from fastapi import FastAPI
+from pydantic import BaseModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+app = FastAPI()
+
+# Load model
+model = AutoModelForCausalLM.from_pretrained("./models/merged_model")
+tokenizer = AutoTokenizer.from_pretrained("./models/merged_model")
+
+class Question(BaseModel):
+    text: str
+    max_tokens: int = 512
+
+class Answer(BaseModel):
+    text: str
+    confidence: float
+
+@app.post("/generate", response_model=Answer)
+def generate(question: Question):
+    """Generate answer"""
+    prompt = f"<|im_start|>user\n{question.text}<|im_end|>\n<|im_start|>assistant\n"
+    inputs = tokenizer(prompt, return_tensors="pt")
+
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=question.max_tokens,
+        temperature=0.7,
+        return_dict_in_generate=True,
+        output_scores=True,
+    )
+
+    response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=False)
+
+    # Calculate confidence (simplified version)
+    confidence = 0.8  # Should actually be calculated based on output probabilities
+
+    return Answer(text=response, confidence=confidence)
+
+# Run: uvicorn api:app --host 0.0.0.0 --port 8000
+```
+
+
+
+## 11.7 Chapter Summary
+
+In this chapter, we systematically learned the theory and practice of Agentic RL, from basic concepts to complete training pipeline, from data preparation to model deployment. Let's review the main content of this chapter.
+
+**(1) Essence of Agentic RL**
+
+Agentic RL treats LLM as a learnable policy, embedding it into the agent's perception-decision-execution loop, optimizing agent performance in multi-step tasks through reinforcement learning. Its core difference from traditional PBRFT (Preference-Based Reinforcement Fine-Tuning) lies in:
+
+- **Task Nature**: From single-turn dialogue optimization to multi-step sequential decision-making
+- **State Space**: From static prompts to dynamically evolving environment states
+- **Action Space**: From pure text generation to text + tools + environment operations
+- **Reward Design**: From single-step quality assessment to long-term cumulative returns
+- **Optimization Objective**: From short-term response quality to long-term task success
+
+**(2) Six Core Capabilities**
+
+Agentic RL aims to enhance six core capabilities of agents:
+
+1. **Reasoning**: Multi-step logical deduction, learning reasoning strategies
+2. **Tool Use**: API/tool invocation, learning when and how to use
+3. **Memory**: Long-term information retention, learning memory management
+4. **Planning**: Action sequence planning, learning dynamic planning
+5. **Self-Improvement**: Self-reflection optimization, learning from mistakes
+6. **Perception**: Multimodal understanding, visual reasoning and tool use
+
+**(3) Training Pipeline**
+
+Complete Agentic RL training pipeline includes:
+
+1. **Pretraining**: Learning language knowledge on large-scale text (usually using existing pretrained models)
+2. **Supervised Fine-Tuning (SFT)**: Learning task format and basic reasoning ability
+3. **Reinforcement Learning (RL)**: Optimizing reasoning strategies through trial and error, surpassing training data quality
+
+Among these, SFT is the foundation, RL is the enhancement. Without SFT foundation, RL is difficult to succeed; without RL optimization, models can only imitate training data.
+
+If you want to deeply learn Agentic RL, recommend following this path:
+
+**Foundation Stage**
+
+1. **Reinforcement Learning Basics**: Learn basic concepts like MDP, policy gradient, PPO
+2. **LLM Basics**: Understand technologies like Transformer, pretraining, fine-tuning
+3. **Practice HelloAgents**: Run example code from this chapter, understand complete pipeline
+
+**Advanced Stage**
+
+1. **Deep Dive into TRL**: Learn TRL library implementation, understand details of algorithms like SFT and GRPO
+2. **Custom Datasets**: Train models using your own datasets
+3. **Custom Reward Functions**: Design reward functions suitable for your tasks
+4. **Parameter Tuning**: Systematically tune hyperparameters, improve model performance
+
+**Expert Stage**
+
+1. **Multi-Step Reasoning**: Research long-sequence reasoning tasks
+2. **Tool Learning**: Enable agents to learn tool use
+3. **Multi-Agent**: Research multi-agent collaboration
+4. **Cutting-Edge Papers**: Read latest research papers, follow frontier progress
+
+
+
+We hope this chapter helps you understand and master Agentic RL technology, apply this knowledge in your own projects, and build more intelligent Agent systems!
+
+
+
+## References
+
+[1] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. *arXiv preprint arXiv:1707.06347*.
+
+[2] Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Zhang, M., ... & Guo, D. (2024). DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. *arXiv preprint arXiv:2402.03300*.
+
+[3] Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. *arXiv preprint arXiv:2106.09685*.
+
+[4] Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., ... & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems. *arXiv preprint arXiv:2110.14168*.
+
+[5] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. *Advances in Neural Information Processing Systems*, 35, 27730-27744.
+
+[6] Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., & Finn, C. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. *arXiv preprint arXiv:2305.18290*.
+
+[7] Lee, H., Phatale, S., Mansoor, H., Lu, K., Mesnard, T., Bishop, C., ... & Rastogi, A. (2023). RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. *arXiv preprint arXiv:2309.00267*.
+
+[8] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., ... & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. *Advances in Neural Information Processing Systems*, 35, 24824-24837.
+
+[9] von Werra, L., Belkada, Y., Tunstall, L., Beeching, E., Thrush, T., Lambert, N., & Huang, S. (2020). TRL: Transformer Reinforcement Learning. *GitHub repository*. https://github.com/huggingface/trl
+
+[10] Qwen Team. (2025). Qwen3 Technical Report. *arXiv preprint arXiv:2505.09388*.
+
+[11] Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., ... & Kaplan, J. (2022). Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. *arXiv preprint arXiv:2204.05862*.
+
+[12] Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., ... & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. *arXiv preprint arXiv:2203.11171*.
+
+[13] Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep Reinforcement Learning from Human Preferences. *Advances in Neural Information Processing Systems*, 30.
+
+[14] Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., ... & Christiano, P. F. (2020). Learning to summarize with human feedback. *Advances in Neural Information Processing Systems*, 33, 3008-3021.
+
+[15] Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., ... & Irving, G. (2019). Fine-Tuning Language Models from Human Preferences. *arXiv preprint arXiv:1909.08593*.
+
+## Exercises
+
+> **Note**: Some exercises do not have standard answers; the focus is on cultivating learners' comprehensive understanding and practical ability in Agentic RL and agent training.
+
+1. This chapter introduced the evolution from LLM training to Agentic RL. Please analyze:
+
+   - In Table 11.1 of Section 11.1.3, the differences between PBRFT (Preference-Based Reinforcement Fine-Tuning) and Agentic RL under the MDP framework are compared. Please explain in depth: Why does Agentic RL's state space $s_t = (\text{prompt}, o_1, o_2, ..., o_t)$ include historical observations, while PBRFT's state $s_0 = \text{prompt}$ only includes the initial prompt? What impact does this difference have on the training process and final results?
+   - Suppose you want to train an "intelligent code debugging assistant" that needs to: (1) analyze code to find bugs; (2) consult documentation to understand API usage; (3) modify code; (4) run tests to verify fix effectiveness. Please map this task to the reinforcement learning framework, clearly defining state space, action space, reward function, and state transition function.
+   - Section 11.1.1 mentioned that traditional supervised learning has the limitation of "difficulty optimizing long-term objectives". Please design a specific multi-step reasoning task (such as mathematical proof, complex problem solving), demonstrating why supervised learning struggles to optimize intermediate steps, while reinforcement learning can solve this problem through delayed rewards.
+
+2. SFT (Supervised Fine-Tuning) and GRPO (Group Relative Policy Optimization) are two core training methods in this chapter. Based on Sections 11.2 and 11.3, please think deeply:
+
+   > **Note**: This is a hands-on practice question, actual operation recommended
+
+   - In the SFT training code in Section 11.2.4, we used LoRA (Low-Rank Adaptation) technology to reduce training parameters. Please analyze: What is the core idea of LoRA? Why can it achieve effects close to full parameter fine-tuning with a small number of parameters (such as 0.16%)? Under what circumstances should LoRA be chosen over full parameter fine-tuning?
+   - What advantages does the GRPO algorithm (Section 11.3) have compared to traditional PPO algorithm? Please compare the training processes of both, analyzing how GRPO simplifies the training process and improves stability through "group-relative rewards". If applying GRPO to other tasks (such as code generation, dialogue optimization), what adjustments are needed?
+   - Based on the code in Section 11.2.5, please extend the SFT training pipeline, adding the following features: (1) support for multi-turn dialogue data training; (2) add data augmentation strategies (such as synonym rewriting, difficulty adjustment); (3) implement visualization monitoring of training process (such as loss curves, sample quality assessment).
+
+3. Reward function design is a core challenge of Agentic RL. Based on Section 11.3.3, please complete the following extended practice:
+
+   > **Note**: This is a hands-on practice question, actual operation recommended
+
+   - In Section 11.3.3, we designed a simple binary reward for GSM8K math problems (correct +1, incorrect 0). Please design a more refined reward function that can: (1) give partial rewards for partially correct answers; (2) score the reasonableness of the reasoning process; (3) penalize overly verbose or inefficient solution paths. How should this reward function be implemented?
+   - Reward function design often requires domain knowledge. Please design reward functions for the following three different agent tasks: (1) code generation assistant (need to consider code correctness, readability, efficiency); (2) customer service dialogue agent (need to consider problem resolution rate, user satisfaction, response time); (3) game AI (need to consider win rate, strategy diversity, adversarial robustness).
+   - In practical applications, reward functions may have "reward hacking" problems: agents find shortcuts to obtain high rewards but don't actually complete tasks. Please give examples of this phenomenon and design defense mechanisms to avoid reward hacking.
+
+4. In the "Mathematical Reasoning Agent Training" case in Section 11.4, we saw the complete training pipeline. Please analyze in depth:
+
+   - The case used the GSM8K dataset for training and evaluation. Please analyze: What are the characteristics of this dataset? What type of reasoning ability is it suitable for training? If training an agent capable of handling more complex mathematical problems (such as advanced mathematics, mathematical proofs), how should the dataset and training methods be extended?
+   - In the training results in Section 11.4.3, we observed accuracy improvement on the training set, but there may be overfitting risks. Please design a "generalization ability assessment" plan: How to test whether the model truly learned mathematical reasoning rather than memorizing training data? How to improve generalization ability through regularization, data augmentation and other techniques?
+   - The training in the case is offline (using pre-collected datasets). Please design an "online learning" plan: agents continuously collect user feedback during actual use and automatically update the model. What technical challenges need to be considered in this plan (such as data quality control, catastrophic forgetting, safety assurance)?
+
+5. An important application of Agentic RL is enabling agents to learn tool use. Please think:
+
+   - Section 11.1.3 mentioned that Agentic RL is suitable for optimizing tasks "requiring multi-step reasoning, tool use, long-term planning". Please design a "tool learning" training plan: Given a set of tools (such as search engine, calculator, code executor), how to train agents to learn to choose appropriate tools at appropriate times? How should the reward function be designed?
+   - Tool use often involves complex dependencies (such as "must first call tool A to obtain information before calling tool B"). Please design a "hierarchical reinforcement learning" plan: high-level policy responsible for task planning, low-level policy responsible for tool invocation. How to train this hierarchical structure? How to coordinate optimization objectives of high and low levels?
+   - In practical applications, the number of tools may be very large (such as 50+ APIs), and direct training may face "low exploration efficiency" problems. Please design a "curriculum learning" plan: start training from simple tasks (using few tools), gradually increasing task difficulty and number of tools. How should this plan design curriculum sequence? How to assess whether agents are ready to enter the next stage?
+

File diff suppressed because it is too large
+ 199 - 195
docs/chapter11/第十一章 Agentic-RL.md


+ 2766 - 0
docs/chapter12/Chapter12-Agent-Performance-Evaluation.md

@@ -0,0 +1,2766 @@
+<div align="right">
+  English | <a href="./第十二章%20智能体性能评估.md">中文</a>
+</div>
+
+# Chapter 12: Agent Performance Evaluation
+
+In previous chapters, we built the core functionality of the HelloAgents framework, implementing various agent paradigms, tool systems, memory mechanisms, and reinforcement learning training. When building agent systems, we also need to solve a core problem: **How to objectively evaluate agent performance?** Specifically, we need to answer the following questions:
+
+1. Does the agent possess the expected capabilities?
+2. How does it perform on different tasks?
+3. What level is it at compared to other agents?
+
+This chapter will add a **Performance Evaluation System** to HelloAgents. We will deeply understand the theoretical foundation of agent evaluation and implement evaluation tools.
+
+## 12.1 Agent Evaluation Fundamentals
+
+### 12.1.1 Why Agent Evaluation is Needed
+
+We now have SimpleAgent, which already possesses powerful reasoning and tool invocation capabilities. Let's look at a typical usage scenario:
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import SearchTool
+
+# Create LLM and agent
+llm = HelloAgentsLLM()
+
+# Create a system prompt emphasizing tool use
+system_prompt = """You are an AI assistant that can use search tools to obtain the latest information.
+
+When you need to search for information, please use the following format:
+[TOOL_CALL:search:search keywords]
+
+For example:
+- [TOOL_CALL:search:latest AI news]
+- [TOOL_CALL:search:Python programming tutorial]
+
+Please use the search tool to obtain the latest information before answering questions."""
+
+agent = SimpleAgent(name="AI Assistant", llm=llm, system_prompt=system_prompt)
+
+# Add search tool
+agent.add_tool(SearchTool())
+
+# Example: Use search tool to answer questions
+response = agent.run("What are the latest AI technology development trends?")
+print(f"\nAnswer: {response}")
+```
+
+This agent can work normally, but we face a core problem: How to objectively evaluate its performance? When we optimize prompts or change LLM models, how do we know if there's real improvement? Before deploying to production environment, how do we ensure agent reliability? These questions all need to be solved through systematic evaluation.
+
+The core value of agent evaluation lies in providing standardized methods to measure agent capabilities. Through evaluation, we can quantify agent performance with specific numerical metrics, objectively compare the merits of different design solutions, promptly discover agent weaknesses in specific scenarios, and prove agent reliability to users.
+
+Unlike traditional software testing, agent evaluation faces unique challenges. First is output uncertainty - the same question may have multiple correct answers, making it difficult to judge with simple right or wrong. Second is diversity of evaluation criteria - different tasks require different evaluation methods; tool invocation needs to check function signatures, while Q&A tasks need to evaluate semantic similarity. Finally is high evaluation cost - each evaluation requires numerous API calls, potentially costing hundreds of yuan or more.
+
+To address these challenges, academia and industry have proposed multiple standardized **Benchmarks**. These benchmarks provide unified datasets, evaluation metrics, and scoring methods, enabling us to evaluate and compare different agent systems under the same standards.
+
+### 12.1.2 Overview of Mainstream Evaluation Benchmarks
+
+The agent evaluation field has seen the emergence of multiple influential benchmark tests. Below are some mainstream evaluation benchmarks and metrics:
+
+**(1) Tool Invocation Capability Evaluation**
+
+Tool invocation is one of the core capabilities of agents. Agents need to understand user intent, select appropriate tools, and correctly construct function calls. Related evaluation benchmarks include:
+
+- **BFCL (Berkeley Function Calling Leaderboard)**<sup>[1]</sup>: Launched by UC Berkeley, includes 1120+ test samples, covering four categories: simple, multiple, parallel, irrelevance, uses AST matching algorithm for evaluation, moderate dataset size, active community.
+- **ToolBench**<sup>[2]</sup>: Launched by Tsinghua University, includes 16000+ real API call scenarios, covering complex tool usage scenarios in the real world.
+- **API-Bank**<sup>[3]</sup>: Launched by Microsoft Research, includes 53 commonly used API tools, focuses on evaluating agent understanding and invocation of API documentation.
+
+**(2) General Capability Evaluation**
+
+Evaluates agent comprehensive performance in real-world tasks, including multi-step reasoning, knowledge application, multimodal understanding, etc.:
+
+- **GAIA (General AI Assistants)**<sup>[4]</sup>: Jointly launched by Meta AI and Hugging Face, includes 466 real-world problems, divided into Level 1/2/3 difficulty levels, evaluates multi-step reasoning, tool use, file processing, web browsing capabilities, uses Quasi Exact Match algorithm, tasks are realistic and comprehensive.
+- **AgentBench**<sup>[5]</sup>: Launched by Tsinghua University, includes 8 tasks in different domains, comprehensively evaluates agent general capabilities.
+- **WebArena**<sup>[6]</sup>: Launched by CMU, evaluates agent task completion and web interaction capabilities in real web environments.
+
+**(3) Multi-Agent Collaboration Evaluation**
+
+Evaluates the ability of multiple agents to work collaboratively:
+
+- **ChatEval**<sup>[7]</sup>: Evaluates quality of multi-agent dialogue systems.
+- **SOTOPIA**<sup>[8]</sup>: Evaluates agent interaction capabilities in social scenarios.
+- **Custom Collaboration Scenarios**: Evaluation tasks designed according to specific application scenarios.
+
+**(4) Common Evaluation Metrics**
+
+Different benchmarks use different evaluation metrics, common ones include:
+
+- **Accuracy Metrics**: Accuracy, Exact Match, F1 Score, used to measure answer correctness.
+- **Efficiency Metrics**: Response Time, Token Usage, used to measure execution efficiency.
+- **Robustness Metrics**: Error Rate, Failure Recovery, used to measure fault tolerance.
+- **Collaboration Metrics**: Communication Efficiency, Task Completion, used to measure collaboration effectiveness.
+
+### 12.1.3 HelloAgents Evaluation System Design
+
+Considering learning curve and practicality, this chapter will focus on the following evaluation scenarios:
+
+1. **BFCL**: Evaluate tool invocation capability
+   - Selection rationale: Moderate dataset size, clear evaluation metrics, active community
+   - Applicable scenarios: Evaluate agent function call accuracy
+
+2. **GAIA**: Evaluate general AI assistant capability
+   - Selection rationale: Realistic tasks, difficulty grading, strong comprehensiveness
+   - Applicable scenarios: Evaluate agent comprehensive problem-solving capability
+
+3. **Data Generation Quality Evaluation**: Evaluate LLM-generated data quality
+   - Selection rationale: Through this case, experience complete demonstration of using Agent to create data and evaluate data
+   - Applicable scenarios: Evaluate quality of generated training data and test data
+   - Evaluation methods: LLM Judge, Win Rate, manual verification
+
+Through these three evaluation scenarios, we will build a complete evaluation system. Figure 12.1 shows our evaluation system construction approach.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-1.png" alt="" width="85%"/>
+  <p>Figure 12.1 HelloAgents Evaluation System Architecture</p>
+</div>
+
+
+
+### 12.1.4 Chapter Learning Objectives and Quick Experience
+
+Let's first look at the learning content of Chapter 12:
+
+```
+hello_agents/
+├── evaluation/                         # Evaluation module
+│   └── benchmarks/                     # Evaluation benchmark implementation
+│       ├── bfcl/                       # BFCL evaluation implementation
+│       │   ├── dataset.py              # BFCL dataset loader
+│       │   ├── evaluator.py            # BFCL evaluator (AST matching)
+│       │   ├── metrics.py              # BFCL-specific metrics
+│       │   └── ast_matcher.py          # AST matching algorithm
+│       ├── gaia/                       # GAIA evaluation implementation
+│       │   ├── dataset.py              # GAIA dataset loader
+│       │   ├── evaluator.py            # GAIA evaluator (quasi exact match)
+│       │   ├── metrics.py              # GAIA-specific metrics
+│       │   └── quasi_exact_match.py    # Quasi exact match algorithm
+│       └── data_generation/            # Data generation evaluation implementation
+│           ├── dataset.py              # AIME dataset loader
+│           ├── llm_judge.py            # LLM Judge evaluator
+│           └── win_rate.py             # Win Rate evaluator
+└── tools/builtin/                      # Built-in tools module
+    ├── bfcl_evaluation_tool.py         # BFCL evaluation tool
+    ├── gaia_evaluation_tool.py         # GAIA evaluation tool
+    ├── llm_judge_tool.py               # LLM Judge tool
+    └── win_rate_tool.py                # Win Rate tool
+```
+
+For this chapter's content, the learning objective is to master the ability to apply evaluation tools. Let's first prepare the development environment:
+
+```bash
+# Install HelloAgents framework (Chapter 12 version)
+pip install "hello-agents[evaluation]==0.2.7"
+
+# Set environment variables
+export HF_TOKEN="your_huggingface_token"     # For GAIA dataset (setup steps will follow)
+
+# Since the official `bfcl-eval` package requires numpy<=2.0.0, which conflicts with HelloAgents main dependencies, separate installation is needed
+pip install "numpy==1.26.4" bfcl-eval
+```
+
+In the following sections, we will deeply learn the detailed usage and introduction of each evaluation method.
+
+## 12.2 BFCL: Tool Invocation Capability Evaluation
+
+### 12.2.1 BFCL Benchmark Introduction
+
+BFCL (Berkeley Function Calling Leaderboard) is a function calling capability evaluation benchmark launched by UC Berkeley<sup>[1]</sup>. In agent systems, tool calling is one of the core capabilities. Agents need to complete the following tasks:
+
+1. **Understand Task Requirements**: Extract key information from user's natural language description
+2. **Select Appropriate Tools**: Choose the most suitable tool from available tool set
+3. **Construct Function Calls**: Correctly fill in function name and parameters
+4. **Handle Complex Scenarios**: Support advanced scenarios like multi-function calls, parallel calls
+
+The BFCL benchmark contains four evaluation categories with increasing difficulty. Starting from the most basic single function call (Simple), gradually increasing to scenarios requiring multiple function calls (Multiple), then to complex scenarios requiring parallel calls of multiple functions (Parallel), and finally to scenarios requiring judgment of whether function calls are needed (Irrelevance). These four categories cover various tool calling scenarios that agents may encounter in practical applications, as shown in Table 12.1:
+
+<div align="center">
+  <p>Table 12.1 Four Evaluation Categories in BFCL Benchmark</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-1.png" alt="" width="85%"/>
+</div>
+
+The BFCL evaluation process follows standard benchmark testing procedures: first load dataset and select evaluation category, then run agent to obtain prediction results, next parse prediction results into Abstract Syntax Tree (AST), and finally judge whether predictions are correct through AST matching algorithm. The entire process traverses all test samples, ultimately calculating evaluation metrics like accuracy and generating evaluation reports. The complete evaluation process is shown in Figure 12.2:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-2.png" alt="" width="85%"/>
+  <p>Figure 12.2 BFCL Evaluation Process Diagram</p>
+</div>
+
+**(1) BFCL Dataset Structure**
+
+The BFCL dataset uses JSON format, with each test sample containing the following fields:
+
+```json
+{
+  "id": "simple_001",
+  "question": "What's the weather like in Beijing today?",
+  "function": [
+    {
+      "name": "get_weather",
+      "description": "Get the current weather for a location",
+      "parameters": {
+        "type": "object",
+        "properties": {
+          "location": {
+            "type": "string",
+            "description": "The city name"
+          }
+        },
+        "required": ["location"]
+      }
+    }
+  ],
+  "ground_truth": [
+    {
+      "name": "get_weather",
+      "arguments": {
+        "location": "Beijing"
+      }
+    }
+  ]
+}
+```
+
+**Key Field Descriptions:**
+
+- `question`: User's natural language request
+- `function`: List of available functions (including function signatures and descriptions)
+- `ground_truth`: Standard answer (expected function call)
+
+**(2) AST Matching Explanation**
+
+BFCL uses **AST Matching (Abstract Syntax Tree Matching)** as the core evaluation algorithm, so let's understand the evaluation strategy below.
+
+BFCL uses Abstract Syntax Tree (AST) for intelligent matching, rather than simple string matching. The core idea of AST matching is: **Parse function calls into syntax trees, then compare tree structure and node values**.
+
+Given predicted function call $P$ and standard answer $G$, the AST matching function is defined as:
+
+$$
+\text{AST\_Match}(P, G) = \begin{cases}
+1 & \text{if } \text{AST}(P) \equiv \text{AST}(G) \\
+0 & \text{otherwise}
+\end{cases}
+$$
+
+Where $\text{AST}(x)$ represents parsing function call into abstract syntax tree, $\equiv$ represents syntax tree equivalence.
+
+Two syntax trees are equivalent if they satisfy three core conditions: function names must be completely identical (exact match), parameter key-value pair sets are equal (ignoring order), and each parameter value is semantically equivalent (e.g., `2+3` is equivalent to `5`). In the specific matching process, function name matching requires exact string matching, for example `get_weather` and `get_temperature` are considered different functions. Parameter matching uses AST for intelligent comparison, allowing different parameter orders (`f(a=1, b=2)` is equivalent to `f(b=2, a=1)`), allowing equivalent expressions (`f(x=2+3)` is equivalent to `f(x=5)`), and also allowing different string representations (`f(s="hello")` is equivalent to `f(s='hello')`). For multi-function call scenarios, the matching algorithm requires calling the same number of functions, each function call must match, but call order can differ (using set matching).
+
+**AST Matching Examples:**
+
+```python
+# Example 1: Different parameter order (match successful)
+Prediction: get_weather(city="Beijing", unit="celsius")
+Standard: get_weather(unit="celsius", city="Beijing")
+Result: ✅ Match successful
+
+# Example 2: Equivalent expression (match successful)
+Prediction: calculate(x=2+3)
+Standard: calculate(x=5)
+Result: ✅ Match successful
+
+# Example 3: Wrong function name (match failed)
+Prediction: get_temperature(city="Beijing")
+Standard: get_weather(city="Beijing")
+Result: ❌ Match failed
+
+# Example 4: Wrong parameter value (match failed)
+Prediction: get_weather(city="Shanghai")
+Standard: get_weather(city="Beijing")
+Result: ❌ Match failed
+```
+
+**(3) BFCL Evaluation Metrics**
+
+BFCL uses the following metrics to evaluate agent performance:
+
+**1. Accuracy**
+
+Accuracy is the most core metric, defined as the proportion of samples with successful AST matching:
+
+$$
+\text{Accuracy} = \frac{1}{N} \sum_{i=1}^{N} \text{AST\_Match}(P_i, G_i)
+$$
+
+Where:
+- $N$ is the total number of samples
+- $P_i$ is the prediction result of the $i$-th sample
+- $G_i$ is the standard answer of the $i$-th sample
+- $\text{AST\_Match}(P_i, G_i) \in \{0, 1\}$ is the AST matching function
+
+**2. AST Match Rate**
+
+Same as accuracy, emphasizing use of AST matching algorithm:
+
+$$
+\text{AST Match Rate} = \text{Accuracy}
+$$
+
+**3. Category-wise Accuracy**
+
+For each category $c \in \{\text{simple}, \text{multiple}, \text{parallel}, \ldots\}$, calculate the accuracy for that category:
+
+$$
+\text{Accuracy}_c = \frac{1}{|D_c|} \sum_{i \in D_c} \text{AST\_Match}(P_i, G_i)
+$$
+
+Where $D_c$ is the sample set of category $c$, $|D_c|$ is the number of samples in that category.
+
+**4. Weighted Accuracy**
+
+Considering difficulty weights of different categories:
+
+$$
+\text{Weighted Accuracy} = \sum_{c} w_c \cdot \text{Accuracy}_c
+$$
+
+Where $w_c$ is the weight of category $c$, satisfying $\sum_c w_c = 1$.
+
+**5. Error Rate**
+
+Proportion of samples that failed to correctly call functions:
+
+$$
+\text{Error Rate} = 1 - \text{Accuracy} = \frac{1}{N} \sum_{i=1}^{N} (1 - \text{AST\_Match}(P_i, G_i))
+$$
+
+**Metric Interpretation:**
+
+- **Accuracy = 1.0**: All samples are completely correct
+- **Accuracy = 0.8**: 80% of samples correct, 20% of samples incorrect
+- **Accuracy = 0.0**: All samples are incorrect
+
+**Category Accuracy Example:**
+
+```python
+# Assume evaluation results
+simple_accuracy = 0.95      # Simple category: 95% correct
+multiple_accuracy = 0.82    # Multiple category: 82% correct
+parallel_accuracy = 0.68    # Parallel category: 68% correct
+
+# Weighted accuracy (assuming equal weights)
+weighted_accuracy = (0.95 + 0.82 + 0.68) / 3 = 0.817
+```
+
+**(4) BFCL Official Evaluation Tool**
+
+BFCL provides official CLI tool for evaluation:
+
+```bash
+# Install BFCL evaluation tool
+pip install bfcl
+
+# Run official evaluation
+bfcl evaluate \
+    --model-result-path ./results.json \
+    --test-category simple_python
+```
+
+Advantages of using the official evaluation tool: it uses the official AST matching algorithm, evaluation results are completely consistent with the leaderboard, supports all BFCL v4 categories, and can automatically generate detailed evaluation reports.
+
+
+### 12.2.2 Obtaining BFCL Dataset
+
+The BFCL dataset can be obtained through the following methods:
+
+**Method 1: Clone from Official GitHub Repository (Recommended)**
+
+This is the most reliable method, obtaining complete dataset and ground truth:
+
+```bash
+# Clone BFCL repository
+git clone https://github.com/ShishirPatil/gorilla.git temp_gorilla
+cd temp_gorilla/berkeley-function-call-leaderboard
+
+# View BFCL v4 dataset
+ls bfcl_eval/data/
+# Output: BFCL_v4_simple_python.json  BFCL_v4_multiple.json  BFCL_v4_parallel.json  ...
+
+# View ground truth
+ls bfcl_eval/data/possible_answer/
+# Output: BFCL_v4_simple_python.json  BFCL_v4_multiple.json  ...
+```
+
+Reasons for recommending this method: it contains complete ground truth (standard answers), data format is completely consistent with official evaluation tool, can directly use official evaluation scripts, and supports BFCL v4 latest version.
+
+**Method 2: Load Official Data Using HelloAgents**
+
+After cloning repository, load data using HelloAgents:
+
+```python
+from hello_agents.evaluation import BFCLDataset
+
+# Load BFCL official data
+dataset = BFCLDataset(
+    bfcl_data_dir="./temp_gorilla/berkeley-function-call-leaderboard/bfcl_eval/data",
+    category="simple_python"  # BFCL v4 category
+)
+
+# Load data (including test data and ground truth)
+data = dataset.load()
+
+print(f"✅ Loaded {len(data)} test samples")
+print(f"✅ Loaded {len(dataset.ground_truth)} ground truth")
+# Output:
+# ✅ Loaded 400 test samples
+# ✅ Loaded 400 ground truth
+```
+
+The working principle of this loader is: first load test data from `bfcl_eval/data/`, then load ground truth from `bfcl_eval/data/possible_answer/`, next automatically merge test data and ground truth, and finally preserve original BFCL data format. BFCL v4 dataset categories can be viewed in Table 12.2.
+
+<div align="center">
+  <p>Table 12.2 Four Evaluation Categories in BFCL Benchmark</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-2.png" alt="" width="85%"/>
+</div>
+
+You can also view available categories through code:
+
+```python
+# Get all supported categories
+categories = dataset.get_available_categories()
+print(f"Supported categories: {categories}")
+# Output: ['simple_python', 'simple_java', 'simple_javascript', 'multiple', ...]
+```
+
+### 12.2.3 Implementing BFCL Evaluation in HelloAgents
+
+Now let's see how to implement BFCL evaluation in the HelloAgents framework. We provide three usage methods:
+
+**Method 1: Using BFCLEvaluationTool (Recommended)**
+
+This is the simplest method, completing evaluation, report generation, and official evaluation with one line of code:
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import BFCLEvaluationTool
+
+# 1. Create agent to be evaluated
+llm = HelloAgentsLLM()
+agent = SimpleAgent(name="TestAgent", llm=llm)
+
+# 2. Create BFCL evaluation tool
+bfcl_tool = BFCLEvaluationTool()
+
+# 3. Run evaluation (automatically complete all steps)
+results = bfcl_tool.run(
+    agent=agent,
+    category="simple_python",  # Evaluation category
+    max_samples=5              # Number of evaluation samples (0 means all)
+)
+
+# 4. View results
+print(f"Accuracy: {results['overall_accuracy']:.2%}")
+print(f"Correct: {results['correct_samples']}/{results['total_samples']}")
+```
+
+**Run Output:**
+
+```
+============================================================
+BFCL One-Click Evaluation
+============================================================
+
+Configuration:
+   Evaluation category: simple_python
+   Sample count: 5
+   Agent: TestAgent
+
+============================================================
+Step 1: Run HelloAgents Evaluation
+============================================================
+✅ BFCL dataset loaded
+   Data directory: ./temp_gorilla/berkeley-function-call-leaderboard/bfcl_eval/data
+   Category: simple_python
+   Sample count: 400
+   Ground truth count: 400
+
+🔧 Starting BFCL evaluation...
+   Progress: 1/5
+   Progress: 5/5
+
+✅ BFCL evaluation complete
+   Overall accuracy: 100.00%
+   simple_python: 100.00% (5/5)
+
+📊 Evaluation results:
+   Accuracy: 100.00%
+   Correct: 5/5
+
+============================================================
+Step 2: Export BFCL Format Results
+============================================================
+✅ BFCL format results exported
+   Output file: ./evaluation_results/bfcl_official/BFCL_v4_simple_python_result.json
+
+============================================================
+Step 3: Run BFCL Official Evaluation
+============================================================
+✅ Result file copied to: ./result/Qwen_Qwen3-8B/BFCL_v4_simple_python_result.json
+
+🔄 Running command: bfcl evaluate --model Qwen/Qwen3-8B --test-category simple_python --partial-eval
+
+============================================================
+BFCL Official Evaluation Results
+============================================================
+📊 Evaluation results summary:
+Model,Overall Acc,simple_python
+Qwen/Qwen3-8B,100.00,100.00
+
+🎯 Final results:
+   Accuracy: 100.00%
+   Correct: 5/5
+
+============================================================
+Step 4: Generate Evaluation Report
+============================================================
+📄 Report generated: ./evaluation_reports/bfcl_report_20251011_005938.md
+
+Accuracy: 100.00%
+Correct: 5/5
+```
+
+**Auto-generated Markdown Report:**
+
+After evaluation completes, a detailed Markdown report is automatically generated, including:
+
+```markdown
+# BFCL Evaluation Report
+**Generated**: 2025-10-11 00:59:38
+
+## 📊 Evaluation Overview
+
+- **Agent**: TestAgent
+- **Evaluation Category**: simple_python
+- **Overall Accuracy**: 100.00%
+- **Correct Samples**: 5/5
+
+## 📈 Detailed Metrics
+
+### Category Accuracy
+
+- **simple_python**: 100.00% (5/5)
+
+## 📝 Sample Details
+
+| Sample ID | Question | Prediction | Ground Truth | Correct |
+|-----------|----------|------------|--------------|---------|
+| simple_python_0 | Find the area of a triangle... | [{'name': 'calculate_triangle_area'...}] | [{'function_name': {'base': [10]...}}] | ✅ |
+| simple_python_1 | Calculate the factorial of 5... | [{'name': 'calculate_factorial'...}] | [{'function_name': {'number': [5]}}] | ✅ |
+...
+
+## 📊 Accuracy Visualization
+Accuracy: ██████████████████████████████████████████████████ 100.00%
+
+## 💡 Recommendations
+- ✅ Excellent performance! Agent shows outstanding tool calling capabilities.
+```
+
+**Method 2: Using One-Click Evaluation Script**
+
+Suitable for quick command-line evaluation. In this chapter's accompanying code examples, we provide `04_run_bfcl_evaluation.py`, supporting direct command-line evaluation:
+
+```bash
+# Run evaluation script
+python chapter12/04_run_bfcl_evaluation.py --category simple_python --samples 10
+
+# Specify model name (for BFCL official evaluation)
+python examples/04_run_bfcl_evaluation.py \
+    --category simple_python \
+    --samples 10 \
+    --model-name "Qwen/Qwen3-8B"
+```
+
+The script supports three parameters: `--category` specifies evaluation category (default simple_python), `--samples` specifies number of evaluation samples (default 5, 0 means all), `--model-name` specifies model name for BFCL official evaluation (default Qwen/Qwen3-8B).
+
+**Method 3: Directly Using Dataset and Evaluator**
+
+Suitable for scenarios requiring custom evaluation process:
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.evaluation import BFCLDataset, BFCLEvaluator
+
+# 1. Create agent
+llm = HelloAgentsLLM()
+agent = SimpleAgent(name="TestAgent", llm=llm)
+
+# 2. Load dataset
+dataset = BFCLDataset(
+    bfcl_data_dir="./temp_gorilla/berkeley-function-call-leaderboard/bfcl_eval/data",
+    category="simple_python"
+)
+data = dataset.load()
+
+# 3. Create evaluator
+evaluator = BFCLEvaluator(
+    dataset=dataset,
+    category="simple_python",
+    evaluation_mode="ast"  # Use AST matching mode
+)
+
+# 4. Run evaluation
+results = evaluator.evaluate(agent, max_samples=10)
+
+# 5. View results
+print(f"Accuracy: {results['overall_accuracy']:.2%}")
+print(f"Correct: {results['correct_samples']}/{results['total_samples']}")
+
+# 6. Export BFCL format results (optional)
+evaluator.export_to_bfcl_format(
+    results,
+    output_path="./evaluation_results/my_results.json"
+)
+```
+
+Through these three methods, we can choose appropriate evaluation methods based on different needs. If you just want to quickly understand agent performance, using BFCLEvaluationTool's one-click evaluation is most convenient; if you need batch evaluation or integration into CI/CD pipeline, using command-line scripts is more suitable; if you need deep customization of evaluation process or integration into your own system, directly using Dataset and Evaluator provides maximum flexibility.
+
+
+
+
+### 12.2.4 BFCL Official Evaluation Tool Integration
+
+Previously we learned how to use HelloAgents' built-in evaluation functionality. In fact, `BFCLEvaluationTool` has **automatically integrated BFCL official evaluation tool**, allowing you to obtain authoritative, comparable evaluation results.
+
+The entire evaluation process includes four steps: first load test data from BFCL v4 dataset, then use HelloAgents to run evaluation and obtain agent prediction results, next export results to BFCL official format (JSONL), and finally use official evaluation script to calculate final scores. This process ensures evaluation results are completely consistent with BFCL leaderboard, as shown in Figure 12.3:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-3.png" alt="" width="85%"/>
+  <p>Figure 12.3 HelloAgents Loading BFCL Evaluation Process</p>
+</div>
+
+When using `BFCLEvaluationTool`, official evaluation **runs automatically** (enabled by default):
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import BFCLEvaluationTool
+
+# Create agent
+llm = HelloAgentsLLM()
+agent = SimpleAgent(name="TestAgent", llm=llm)
+
+# Create evaluation tool
+bfcl_tool = BFCLEvaluationTool()
+
+# Run evaluation (automatically runs official evaluation)
+results = bfcl_tool.run(
+    agent=agent,
+    category="simple_python",
+    max_samples=5,
+    # run_official_eval=True  # Default is True, can be omitted
+    model_name="Qwen/Qwen3-8B"  # Optional, specify model name
+)
+```
+
+The tool automatically executes the complete evaluation process: first run HelloAgents evaluation to obtain prediction results, then export results to BFCL format and save to `evaluation_results/bfcl_official/` directory, next copy result file to `result/{model_name}/` directory to meet official evaluation tool requirements, then run BFCL official evaluation command to calculate scores, and finally display official evaluation results and generate Markdown format evaluation report.
+
+**Official Evaluation Output Example:**
+
+```
+============================================================
+Step 3: Run BFCL Official Evaluation
+============================================================
+
+✅ Result file copied to:
+   ./result/Qwen_Qwen3-8B/BFCL_v4_simple_python_result.json
+
+🔄 Running command: bfcl evaluate --model Qwen/Qwen3-8B --test-category simple_python --partial-eval
+
+============================================================
+BFCL Official Evaluation Results
+============================================================
+
+📊 Evaluation results summary:
+Model,Overall Acc,simple_python
+Qwen/Qwen3-8B,100.00,100.00
+
+🎯 Final results:
+   Accuracy: 100.00%
+   Correct: 5/5
+```
+
+If you want to manually control the evaluation process, you can disable automatic official evaluation:
+
+```python
+# Disable official evaluation
+results = bfcl_tool.run(
+    agent=agent,
+    category="simple_python",
+    max_samples=5,
+    run_official_eval=False  # Disable official evaluation
+)
+
+# Then manually run official evaluation
+import subprocess
+subprocess.run([
+    "bfcl", "evaluate",
+    "--model", "Qwen/Qwen3-8B",
+    "--test-category", "simple_python",
+    "--partial-eval"
+])
+```
+
+You can also manually generate reports:
+
+```python
+# Run evaluation
+results = bfcl_tool.run(agent, category="simple_python", max_samples=5)
+
+# Manually generate report
+report = bfcl_tool.generate_report(
+    results,
+    output_file="./my_reports/custom_report.md"
+)
+
+# Print report content
+print(report)
+```
+
+
+
+### 12.2.5 Core Component Implementation Details
+
+In previous sections, we learned how to use BFCL evaluation tools. Now let's dive into how HelloAgents evaluation system's core components are implemented. Understanding these implementation details not only helps you better use the evaluation system, but also allows you to customize and extend according to your own needs.
+
+**(1) BFCLDataset: Dataset Loader**
+
+BFCLDataset is responsible for loading and managing BFCL dataset:
+
+````python
+class BFCLDataset:
+    """BFCL dataset loader"""
+
+    def __init__(self, category: str = "simple", local_data_path: Optional[str] = None):
+        self.category = category
+        self.local_data_path = local_data_path
+        self.data = []
+
+    def load(self) -> List[Dict[str, Any]]:
+        """Load dataset"""
+        # Load from local first
+        if self.local_data_path:
+            return self._load_from_local()
+        # Otherwise load from Hugging Face
+        return self._load_from_huggingface()
+````
+
+Because BFCL's dataset is in the official repository, the recommended approach here is to directly clone a local copy for evaluation. Only when not found will it load from Hugging Face.
+
+**(2) BFCLEvaluator: Evaluation Executor**
+
+BFCLEvaluator is responsible for executing the evaluation process. Its core is the `evaluate()` method, which coordinates the entire evaluation process:
+
+````python
+class BFCLEvaluator:
+    """BFCL evaluator"""
+
+    def evaluate(self, agent: Any, max_samples: Optional[int] = None) -> Dict[str, Any]:
+        """Execute evaluation"""
+        results = []
+
+        for item in self.dataset[:max_samples]:
+            # 1. Construct prompt
+            prompt = self._build_prompt(item)
+
+            # 2. Call agent
+            response = agent.run(prompt)
+
+            # 3. Extract function calls
+            predicted_calls = self._extract_function_calls(response)
+
+            # 4. Compare with ground truth
+            is_correct = self._compare_calls(predicted_calls, item["ground_truth"])
+
+            results.append({
+                "id": item["id"],
+                "prediction": predicted_calls,
+                "ground_truth": item["ground_truth"],
+                "is_correct": is_correct
+            })
+
+        return {"results": results, "total_samples": len(results)}
+````
+
+This evaluator's design contains three core points: first is prompt construction, needing to convert questions and function definitions in dataset into prompts understandable by agent; second is function call extraction, needing to extract function calls from agent's response and support multiple formats (JSON, code blocks, etc.); finally is AST matching, using abstract syntax tree for function call comparison, which is more accurate than simple string matching.
+
+Let's look at the implementation of function call extraction:
+
+```python
+def _extract_function_calls(self, response: str) -> List[Dict[str, Any]]:
+    """Extract function calls from response
+
+    Supports multiple formats:
+    1. JSON format: {"name": "func", "arguments": {...}}
+    2. Code block format: ```python\nfunc(arg1=val1)\n```
+    3. Plain text format: func(arg1=val1)
+    """
+    calls = []
+
+    # Try JSON parsing
+    try:
+        json_match = re.search(r'\{.*\}', response, re.DOTALL)
+        if json_match:
+            data = json.loads(json_match.group())
+            if isinstance(data, dict) and "name" in data:
+                calls.append(data)
+            elif isinstance(data, list):
+                calls.extend(data)
+    except json.JSONDecodeError:
+        pass
+
+    # Try code block extraction
+    code_blocks = re.findall(r'```(?:python)?\n(.*?)\n```', response, re.DOTALL)
+    for code in code_blocks:
+        # Parse Python function calls
+        parsed_calls = self._parse_python_calls(code)
+        calls.extend(parsed_calls)
+
+    return calls
+```
+
+**(3) BFCLMetrics: Metrics Calculator**
+
+BFCLMetrics is responsible for calculating various evaluation metrics:
+
+````python
+class BFCLMetrics:
+    """BFCL metrics calculator"""
+
+    def compute_metrics(self, results: List[Dict[str, Any]]) -> Dict[str, Any]:
+        """Compute all metrics"""
+        return {
+            "accuracy": self._compute_accuracy(results),
+            "ast_match_rate": self._compute_ast_match_rate(results),
+            "parameter_accuracy": self._compute_parameter_accuracy(results),
+            "f1_score": self._compute_f1_score(results),
+            "category_statistics": self._compute_category_stats(results)
+        }
+````
+
+**AST Matching Implementation**:
+
+AST matching is the core technology of BFCL evaluation. It is more intelligent than simple string matching and can identify semantically equivalent function calls:
+
+```python
+def _ast_match(self, pred_call: Dict, true_call: Dict) -> bool:
+    """Match function calls using AST
+
+    Advantages of AST matching:
+    1. Ignore parameter order: func(a=1, b=2) equivalent to func(b=2, a=1)
+    2. Recognize equivalent expressions: 2+3 equivalent to 5
+    3. Ignore whitespace and format differences
+    """
+    # 1. Function name must match exactly
+    if pred_call.get("name") != true_call.get("name"):
+        return False
+
+    # 2. Convert parameters to AST nodes
+    pred_args = self._args_to_ast(pred_call.get("arguments", {}))
+    true_args = self._args_to_ast(true_call.get("arguments", {}))
+
+    # 3. Compare AST nodes
+    return ast.dump(pred_args) == ast.dump(true_args)
+
+def _args_to_ast(self, args: Dict[str, Any]) -> ast.AST:
+    """Convert parameter dictionary to AST node"""
+    # Construct a virtual function call
+    code = f"func({', '.join(f'{k}={repr(v)}' for k, v in args.items())})"
+    tree = ast.parse(code)
+    return tree.body[0].value  # Return Call node
+```
+
+**(4) Tool Encapsulation: BFCLEvaluationTool**
+
+Finally, we encapsulate these components into a Tool so it can be directly called by agents:
+
+````python
+class BFCLEvaluationTool(Tool):
+    """BFCL evaluation tool"""
+
+    def __init__(self, local_data_path: Optional[str] = None):
+        super().__init__(
+            name="bfcl_evaluation",
+            description="Evaluate agent's tool calling capability"
+        )
+        self.dataset = None
+        self.evaluator = None
+        self.metrics_calculator = BFCLMetrics()
+
+    def run(self, parameters: Dict[str, Any]) -> str:
+        """Execute evaluation"""
+        # 1. Load dataset
+        self.dataset = BFCLDataset(...)
+
+        # 2. Create evaluator
+        self.evaluator = BFCLEvaluator(...)
+
+        # 3. Run evaluation
+        results = self.evaluator.evaluate(...)
+
+        # 4. Calculate metrics
+        metrics = self.metrics_calculator.compute_metrics(...)
+
+        # 5. Return JSON results
+        return json.dumps(results, ensure_ascii=False)
+````
+
+This tool's design follows three core principles: first inherit Tool base class to follow HelloAgents' tool specification, ensuring seamless integration with framework; second perform strict parameter validation, checking required parameters and providing friendly error prompts, improving user experience; finally format results, returning JSON string for easy parsing and display. Through this modular design, we implemented an evaluation system that is both easy to use and flexible. Users can directly use high-level Tool interface to quickly complete evaluation, or dive into low-level components for customization to meet special needs.
+
+### 12.2.6 Extension and Optimization Recommendations
+
+Through previous learning, we have mastered how to use HelloAgents for BFCL evaluation. It should be noted that our current implementation is a simple reproduction based on SimpleAgent, mainly completing basic BFCL evaluation functionality. In practical applications, BFCL benchmark contains multiple difficulty levels and scenarios. To achieve higher scores on the leaderboard, further optimization and extension are needed.
+
+**(1) Limitations of Current Implementation**
+
+Our current SimpleAgent implementation mainly focuses on building the evaluation process, with room for improvement in tool calling capabilities. SimpleAgent uses custom tool calling format `[TOOL_CALL:tool_name:parameters]`, which requires LLM to actively learn and use. In complex scenarios, performance may not match agents using native function calling. Additionally, we currently only test basic categories like simple_python. For more complex scenarios like multiple, parallel, irrelevance, targeted optimization is still needed.
+
+**(2) Directions for Improving BFCL Scores**
+
+To further improve BFCL evaluation scores, you can start from the following directions. First is optimizing agent's tool calling capability - consider using LLMs that support native function calling (like GPT-4, Claude, etc.), or improve prompts to help LLM better understand tool calling format. Second is expanding tool library - BFCL tests involve various types of functions, you can pre-implement common tool types based on test dataset characteristics to improve agent's tool coverage. Third is designing different strategies for different difficulty levels - for example, in multiple scenarios agents need to plan multi-step tool calling sequences, in parallel scenarios they need to identify tool calls that can be executed in parallel, in irrelevance scenarios they need to judge whether tool calling is truly needed.
+
+**(3) Practice Recommendations**
+
+For developers wanting to achieve better results on BFCL, the following practice strategies are recommended. First, start from simple category, ensure basic single function calls work stably - this is the foundation for subsequent optimization. Then, gradually test more complex categories like multiple, parallel, analyze failure cases, find agent's weak points. During optimization, you can refer to high-scoring models on BFCL leaderboard, learn their design ideas and optimization techniques. Meanwhile, it's recommended to use official evaluation tools for validation, ensuring optimized results are consistent with leaderboard standards.
+
+Here are some suggestions for further processing during evaluation:
+
+**1. Progressive Evaluation**
+
+Start from small samples, gradually increase sample count:
+
+```python
+# Step 1: Quick test (5 samples)
+results_quick = bfcl_tool.run(agent, category="simple_python", max_samples=5)
+
+# Step 2: Medium-scale test (50 samples)
+if results_quick['overall_accuracy'] > 0.8:
+    results_medium = bfcl_tool.run(agent, category="simple_python", max_samples=50)
+
+# Step 3: Full evaluation (all samples)
+if results_medium['overall_accuracy'] > 0.8:
+    results_full = bfcl_tool.run(agent, category="simple_python", max_samples=0)
+```
+
+**2. Multi-Category Evaluation**
+
+Evaluate tasks of different difficulties:
+
+```python
+categories = ["simple_python", "multiple", "parallel", "irrelevance"]
+
+for category in categories:
+    print(f"\nEvaluating category: {category}")
+    results = bfcl_tool.run(agent, category=category, max_samples=10)
+    print(f"Accuracy: {results['overall_accuracy']:.2%}")
+```
+
+**3. Comparative Evaluation**
+
+Compare agents with different configurations:
+
+```python
+# Configuration 1: Default prompt
+agent1 = SimpleAgent(name="Agent-Default", llm=llm)
+results1 = bfcl_tool.run(agent1, category="simple_python", max_samples=10)
+
+# Configuration 2: Optimized prompt
+agent2 = SimpleAgent(name="Agent-Optimized", llm=llm)
+# ... Set optimized system prompt ...
+results2 = bfcl_tool.run(agent2, category="simple_python", max_samples=10)
+
+# Compare results
+print(f"Default configuration accuracy: {results1['overall_accuracy']:.2%}")
+print(f"Optimized configuration accuracy: {results2['overall_accuracy']:.2%}")
+```
+
+If your evaluation results are good, consider submitting to BFCL official leaderboard!
+
+**Step 1: Prepare Submission Materials**
+
+1. Model description document
+2. Evaluation result files (all categories)
+3. Model access method (API or open-source link)
+
+**Step 2: Submit to GitHub**
+
+Visit BFCL official repository and submit Pull Request according to instructions:
+
+- Repository: https://github.com/ShishirPatil/gorilla
+- Submission guide: Refer to `CONTRIBUTING.md`
+
+**Step 3: Wait for Review**
+
+BFCL team will review your submission and verify result accuracy. After approval, your model will appear on the official leaderboard!
+
+
+
+## 12.3 GAIA: General AI Assistant Capability Evaluation
+
+### 12.3.1 GAIA Benchmark Introduction
+
+GAIA (General AI Assistants) is an evaluation benchmark jointly launched by Meta AI and Hugging Face, focusing on evaluating AI assistants' **general capabilities**<sup>[2]</sup>. Unlike BFCL's focus on tool calling, GAIA evaluates agents' comprehensive performance in real-world tasks.
+
+GAIA's design philosophy is: **Real-world problems often require comprehensive application of multiple capabilities**. An excellent AI assistant not only needs to call tools, but also needs to:
+
+- **Multi-step Reasoning**: Decompose complex problems into multiple sub-problems
+- **Knowledge Application**: Utilize built-in knowledge and external knowledge bases
+- **Multimodal Understanding**: Process multiple inputs like text, images, files
+- **Web Browsing**: Obtain latest information from the internet
+- **File Operations**: Read and process files in various formats
+
+**(1) GAIA Dataset Structure**
+
+After understanding GAIA's evaluation philosophy, let's dive into the specific structure of GAIA dataset. GAIA contains 466 carefully designed real-world problems. These problems are divided into three difficulty levels based on complexity and required reasoning steps, from simple zero-step reasoning tasks to difficult tasks requiring multi-step complex reasoning, comprehensively covering various scenarios agents may encounter in practical applications, as shown in Table 12.3:
+
+<div align="center">
+  <p>Table 12.3 GAIA Dataset Difficulty Level Distribution</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-3.png" alt="" width="85%"/>
+</div>
+
+For GAIA dataset sample examples, refer to the code snippet below:
+
+```json
+{
+  "task_id": "gaia_001",
+  "Question": "What is the total population of the top 3 most populous cities in California?",
+  "Level": 2,
+  "Final answer": "12847521",
+  "file_name": "",
+  "file_path": "",
+  "Annotator Metadata": {
+    "Steps": [
+      "Search for most populous cities in California",
+      "Get population data for top 3 cities",
+      "Sum the populations"
+    ],
+    "Number of steps": 3,
+    "How long did this take?": "5 minutes",
+    "Tools": ["web_search", "calculator"]
+  }
+}
+```
+
+**Key Field Descriptions:**
+- `Question`: Question description
+- `Level`: Difficulty level (1-3)
+- `Final answer`: Standard answer (may be number, text, or file)
+- `file_name/file_path`: Attachment file (if any)
+- `Annotator Metadata`: Metadata provided by annotator (reasoning steps, required tools, etc.)
+
+**(2) Quasi Exact Match Introduction**
+
+GAIA uses **Quasi Exact Match** evaluation algorithm, which is GAIA's officially defined evaluation standard. The core idea of this algorithm is: **First normalize answers, then perform exact matching**.
+
+Given predicted answer $A_{\text{pred}}$ and standard answer $A_{\text{true}}$, the quasi exact match function is defined as:
+
+$$
+\text{Quasi\_Exact\_Match}(A_{\text{pred}}, A_{\text{true}}) = \begin{cases}
+1 & \text{if } \mathcal{N}(A_{\text{pred}}) = \mathcal{N}(A_{\text{true}}) \\
+0 & \text{otherwise}
+\end{cases}
+$$
+
+Where $\mathcal{N}(\cdot)$ is the normalization function, applying different rules based on answer type.
+
+The normalization function applies different rules based on answer type. For numeric types, remove comma separators (`1,000` → `1000`) and unit symbols (`$100` → `100`, `50%` → `50`), for example `"$1,234.56"` normalizes to `"1234.56"`. For string types, convert to lowercase (`"Apple"` → `"apple"`), remove articles (`"the apple"` → `"apple"`), remove extra spaces (`"hello  world"` → `"hello world"`) and remove trailing punctuation (`"hello."` → `"hello"`), for example `"The United States"` normalizes to `"united states"`. For list types, split elements by comma, apply string normalization to each element, sort alphabetically then rejoin, for example `"Paris, London, Berlin"` normalizes to `"berlin,london,paris"`.
+
+**Normalization Examples:**
+
+```python
+# Numeric answer
+Original answer: "$1,234.56"
+Normalized: "1234.56"
+
+# String answer
+Original answer: "The United States of America"
+Normalized: "united states of america"
+
+# List answer
+Original answer: "Paris, London, Berlin"
+Normalized: "berlin, london, paris"
+```
+
+**(3) GAIA Evaluation Metrics**
+
+GAIA uses the following metrics to evaluate agent performance:
+
+**1. Exact Match Rate**
+
+Exact match rate is GAIA's core metric, defined as the proportion of samples with successful quasi exact matching:
+
+$$
+\text{Exact Match Rate} = \frac{1}{N} \sum_{i=1}^{N} \text{Quasi\_Exact\_Match}(A_{\text{pred},i}, A_{\text{true},i})
+$$
+
+Where:
+- $N$ is the total number of samples
+- $A_{\text{pred},i}$ is the predicted answer of the $i$-th sample
+- $A_{\text{true},i}$ is the standard answer of the $i$-th sample
+- $\text{Quasi\_Exact\_Match}(\cdot, \cdot) \in \{0, 1\}$ is the quasi exact match function
+
+**2. Level-wise Accuracy**
+
+For each difficulty level $\ell \in \{1, 2, 3\}$, calculate the accuracy for that level:
+
+$$
+\text{Accuracy}_\ell = \frac{1}{|D_\ell|} \sum_{i \in D_\ell} \text{Quasi\_Exact\_Match}(A_{\text{pred},i}, A_{\text{true},i})
+$$
+
+Where $D_\ell$ is the sample set of difficulty level $\ell$, $|D_\ell|$ is the number of samples at that level.
+
+**3. Difficulty Progression Drop Rate**
+
+Measures agent's performance degradation as difficulty increases:
+
+$$
+\text{Drop Rate}_{\ell \to \ell+1} = \frac{\text{Accuracy}_\ell - \text{Accuracy}_{\ell+1}}{\text{Accuracy}_\ell}
+$$
+
+- $\text{Drop Rate}_{1 \to 2}$: Drop rate from Level 1 to Level 2
+- $\text{Drop Rate}_{2 \to 3}$: Drop rate from Level 2 to Level 3
+
+**4. Average Reasoning Steps**
+
+Evaluates average number of steps required by agent to complete tasks:
+
+$$
+\text{Avg Steps} = \frac{1}{N_{\text{correct}}} \sum_{i \in \text{Correct}} \text{steps}_i
+$$
+
+Where $N_{\text{correct}}$ is the number of correctly answered samples, $\text{steps}_i$ is the number of reasoning steps for the $i$-th sample.
+
+**Metric Interpretation:**
+
+- **Exact Match Rate = 1.0**: All samples are completely correct
+- **Exact Match Rate = 0.5**: 50% of samples correct, 50% of samples incorrect
+- **Drop Rate = 0.3**: Difficulty increase causes 30% accuracy drop
+- **Drop Rate = 0.0**: Difficulty increase doesn't affect accuracy (ideal case)
+
+**Evaluation Example:**
+
+Suppose we evaluated 10 samples, results can be referenced in Table 12.4:
+
+<div align="center">
+  <p>Table 12.4 GAIA Dataset Difficulty Level Distribution</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-4.png" alt="" width="85%"/>
+</div>
+
+To calculate metrics for this case, refer to the Python script below:
+
+```python
+# 1. Exact match rate
+total_samples = 10
+correct_samples = 7  # Samples 1,2,3,5,6,8,9
+exact_match_rate = correct_samples / total_samples = 0.70  # 70%
+
+# 2. Level-wise accuracy
+level_1_correct = 3  # Samples 1,2,3
+level_1_total = 3
+level_1_accuracy = 3 / 3 = 1.00  # 100%
+
+level_2_correct = 2  # Samples 5,6
+level_2_total = 3
+level_2_accuracy = 2 / 3 = 0.67  # 67%
+
+level_3_correct = 2  # Samples 8,9
+level_3_total = 4
+level_3_accuracy = 2 / 4 = 0.50  # 50%
+
+# 3. Difficulty progression drop rate
+drop_rate_1_to_2 = (1.00 - 0.67) / 1.00 = 0.33  # 33%
+drop_rate_2_to_3 = (0.67 - 0.50) / 0.67 = 0.25  # 25%
+
+print(f"Exact match rate: {exact_match_rate:.2%}")  # 70.00%
+print(f"Level 1 accuracy: {level_1_accuracy:.2%}")  # 100.00%
+print(f"Level 2 accuracy: {level_2_accuracy:.2%}")  # 66.67%
+print(f"Level 3 accuracy: {level_3_accuracy:.2%}")  # 50.00%
+print(f"Level 1→2 drop rate: {drop_rate_1_to_2:.2%}")  # 33.00%
+print(f"Level 2→3 drop rate: {drop_rate_2_to_3:.2%}")  # 25.00%
+```
+
+**Result Analysis:**
+
+- **Overall Performance**: 70% exact match rate, good performance
+- **Difficulty Sensitivity**: 33% drop from Level 1 to Level 2, indicating significant degradation in medium difficulty tasks
+- **Capability Boundary**: Level 3 accuracy is 50%, indicating room for improvement in complex tasks
+
+The larger the drop rate, the more obvious the agent's capability degradation when handling complex tasks.
+
+**(4) GAIA Official System Prompt**
+
+GAIA requires using specific system prompt to ensure model output conforms to evaluation format:
+
+```python
+GAIA_SYSTEM_PROMPT = """You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER].
+
+YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
+
+If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
+
+If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
+
+If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string."""
+```
+
+GAIA has strict requirements for answer format: answers must be given in `FINAL ANSWER: [answer]` format; for numeric answers, don't use comma separators and unit symbols; for string answers, don't use articles and abbreviations; for list answers, use comma separation and arrange alphabetically.
+
+### 12.3.2 Obtaining GAIA Dataset
+
+**Important Note**: GAIA is a **Gated Dataset**, requiring prior application for access permission on HuggingFace.
+
+**Step 1: Apply for Access Permission**
+
+1. Visit https://huggingface.co/datasets/gaia-benchmark/GAIA
+2. Click "Request access" button
+3. Fill out application form (usually approved within seconds)
+4. Get your HuggingFace Token: https://huggingface.co/settings/tokens
+
+**Step 2: Configure Environment Variables**
+
+Add your HuggingFace Token to `.env` file:
+
+```bash
+# HuggingFace API configuration
+HF_TOKEN=hf_your_token_here
+```
+
+**Method 1: Automatic Download Using HelloAgents (Recommended)**
+
+HelloAgents automatically handles GAIA dataset download and caching:
+
+```python
+from hello_agents.evaluation import GAIADataset
+import os
+
+# Ensure HF_TOKEN is set, this line is not needed if .env is configured
+os.environ["HF_TOKEN"] = "hf_your_token_here"
+
+# Automatically download to ./data/gaia/
+dataset = GAIADataset(
+    dataset_name="gaia-benchmark/GAIA",
+    split="validation",  # or "test"
+    level=1  # Optional: 1, 2, 3, None(all)
+)
+items = dataset.load()
+
+print(f"Loaded {len(items)} test samples")
+# Output: Loaded 53 test samples (Level 1)
+```
+
+**Working Principle**:
+
+- On first run, uses `snapshot_download` to download entire dataset to `./data/gaia/`
+- Dataset contains 114 files (questions, images, PDFs, etc.)
+- Subsequent uses load directly from local, very fast
+
+**Dataset Directory Structure**:
+```
+./data/gaia/
+├── 2023/
+│   ├── validation/
+│   │   ├── metadata.jsonl  (165 questions)
+│   │   ├── *.png, *.pdf, *.csv, *.xlsx  (attachment files)
+│   └── test/
+│       ├── metadata.jsonl  (301 questions)
+│       └── ... (attachment files)
+├── GAIA.py
+└── README.md
+```
+
+**Method 2: Manual Download**
+
+If you want to manually download the dataset:
+
+```python
+from huggingface_hub import snapshot_download
+import os
+
+# Set Token
+os.environ["HF_TOKEN"] = "hf_your_token_here"
+
+# Download dataset
+snapshot_download(
+    repo_id="gaia-benchmark/GAIA",
+    repo_type="dataset",
+    local_dir="./data/gaia",
+    token=os.getenv("HF_TOKEN")
+)
+```
+
+**View Dataset Statistics**:
+
+```python
+# View dataset statistics
+stats = dataset.get_statistics()
+print(f"Total samples: {stats['total_samples']}")
+print(f"Level distribution: {stats['level_distribution']}")
+# Output:
+# Total samples: 165
+# Level distribution: {1: 53, 2: 62, 3: 50}
+```
+
+
+### 12.3.3 Implementing GAIA Evaluation in HelloAgents
+
+Similar to BFCL, we provide two evaluation methods, **Method 1** is recommended.
+
+**Method 1: One-Click Evaluation Using GAIAEvaluationTool**
+
+This is the simplest method, automatically completing dataset download, evaluation execution, result export, and report generation:
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import GAIAEvaluationTool
+
+# GAIA official system prompt (from paper)
+GAIA_SYSTEM_PROMPT = """You are a general AI assistant. I will ask you a question. Report your thoughts, and finish your answer with the following template: FINAL ANSWER: [YOUR FINAL ANSWER].
+
+YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
+
+If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
+
+If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
+
+If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string."""
+
+# 1. Create agent (using GAIA official system prompt)
+llm = HelloAgentsLLM()
+agent = SimpleAgent(
+    name="TestAgent",
+    llm=llm,
+    system_prompt=GAIA_SYSTEM_PROMPT  # Key: Use GAIA official prompt
+)
+
+# 2. Create GAIA evaluation tool
+gaia_tool = GAIAEvaluationTool()
+
+# 3. One-click run evaluation
+results = gaia_tool.run(
+    agent=agent,
+    level=1,  # Level 1: Simple tasks
+    max_samples=5,  # Evaluate 5 samples
+    export_results=True,  # Export GAIA format results
+    generate_report=True  # Generate evaluation report
+)
+
+# 4. View results
+print(f"Exact match rate: {results['exact_match_rate']:.2%}")
+print(f"Partial match rate: {results['partial_match_rate']:.2%}")
+print(f"Correct: {results['exact_matches']}/{results['total_samples']}")
+```
+
+**Run Results:**
+
+```
+============================================================
+GAIA One-Click Evaluation
+============================================================
+
+Configuration:
+   Agent: TestAgent
+   Difficulty level: 1
+   Sample count: 5
+
+============================================================
+Step 1: Run HelloAgents Evaluation
+============================================================
+   Downloading from HuggingFace: gaia-benchmark/GAIA
+   📥 Downloading GAIA dataset...
+   ✓ Dataset download complete
+   ✓ Loaded 165 samples
+✅ GAIA dataset loaded
+   Data source: gaia-benchmark/GAIA
+   Split: validation
+   Level: 1
+   Sample count: 53
+
+🌟 Starting GAIA evaluation...
+   Sample count: 5
+   Progress: 5/5
+✅ GAIA evaluation complete
+   Exact match rate: 80.00%
+   Partial match rate: 80.00%
+
+============================================================
+Step 2: Export GAIA Format Results
+============================================================
+✅ GAIA format results exported
+   Output file: evaluation_results\gaia_official\gaia_level1_result_20251011_012648.jsonl
+   Sample count: 5
+   Includes reasoning trace: True
+📄 Submission guide generated: evaluation_results\gaia_official\SUBMISSION_GUIDE_20251011_012648.md
+
+============================================================
+Step 3: Generate Evaluation Report
+============================================================
+📄 Report generated: evaluation_reports\gaia_report_20251011_012648.md
+
+============================================================
+🎯 Final Results
+============================================================
+   Exact match rate: 80.00%
+   Partial match rate: 80.00%
+   Correct: 4/5
+```
+
+After evaluation completes, three types of files are automatically generated: first is GAIA format result file (`evaluation_results/gaia_official/gaia_level1_result_*.jsonl`), using JSONL format (one JSON object per line), can be directly used for submission to GAIA leaderboard; second is submission guide file (`evaluation_results/gaia_official/SUBMISSION_GUIDE_*.md`), containing detailed submission steps, result file format description, and notes; finally is evaluation report (`evaluation_reports/gaia_report_*.md`), containing evaluation result summary, detailed metrics, sample details, and visualization charts.
+
+**Note**: If you find generated evaluation results unsatisfactory (e.g., low accuracy), this is normal. Although Level 1 is one-step reasoning tasks, agents still need tool calling capabilities (like search engine, calculator, etc.) to correctly answer questions. Our current SimpleAgent is mainly used to demonstrate evaluation process, with room for improvement in tool calling capabilities.
+
+**Method 2: Using Dataset + Evaluator (Flexible Customization)**
+
+If you need more fine-grained control, you can directly use low-level components:
+
+```python
+from hello_agents.evaluation import GAIADataset, GAIAEvaluator
+
+# 1. Load dataset
+dataset = GAIADataset(level=1)
+items = dataset.load()
+print(f"Loaded {len(items)} samples")
+
+# 2. Create evaluator
+evaluator = GAIAEvaluator(dataset=dataset, level=1)
+
+# 3. Run evaluation
+results = evaluator.evaluate(agent, max_samples=5)
+
+# 4. Export GAIA format results
+evaluator.export_to_gaia_format(
+    results,
+    "gaia_results.jsonl",
+    include_reasoning=True
+)
+```
+
+Generated evaluation report (`gaia_report_*.md`) can reference the file below:
+
+```markdown
+# GAIA Evaluation Report
+
+**Generated**: 2025-10-11 01:26:48
+
+## 📊 Evaluation Overview
+
+- **Agent**: TestAgent
+- **Difficulty Level**: 1
+- **Total Samples**: 2
+- **Exact Matches**: 1
+- **Partial Matches**: 1
+- **Exact Match Rate**: 50.00%
+- **Partial Match Rate**: 50.00%
+
+## 📈 Detailed Metrics
+
+### Level-wise Accuracy
+
+- **Level 1**: 50.00% exact / 50.00% partial (1/2)
+
+## 📝 Sample Details (First 10)
+
+| Task ID | Level | Predicted Answer | Correct Answer | Exact Match | Partial Match |
+|---------|-------|------------------|----------------|-------------|---------------|
+| e1fc63a2-da7a-432f-be78-7c4a95598703 | 1 | 24000 | 17 | ❌ | ❌ |
+| 8e867cd7-cff9-4e6c-867a-ff5ddc2550be | 1 | 3 | 3 | ✅ | ✅ |
+
+## 📊 Accuracy Visualization
+
+Exact match: █████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░ 50.00%
+Partial match: █████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░ 50.00%
+
+
+## 💡 Recommendations
+
+- ⚠️ Average performance, needs improvement.
+- 💡 Suggest checking tool usage and multi-step reasoning capabilities.
+```
+
+**Generated GAIA Format Results (`gaia_level1_result_*.jsonl`):**
+
+```json
+{"task_id": "e1fc63a2-da7a-432f-be78-7c4a95598703", "model_answer": "24000", "reasoning_trace": "24000"}
+{"task_id": "8e867cd7-cff9-4e6c-867a-ff5ddc2550be", "model_answer": "3", "reasoning_trace": "3"}
+```
+
+### 12.3.4 Submitting Results to GAIA Official Leaderboard
+
+After running evaluation using GAIAEvaluationTool, files required for submission and detailed submission instructions are generated in `evaluation_results/gaia_official/` directory.
+
+1. **GAIA Format Result File**: `gaia_level1_result_*.jsonl`
+   ```json
+   {"task_id": "xxx", "model_answer": "answer", "reasoning_trace": "reasoning process"}
+   {"task_id": "yyy", "model_answer": "answer", "reasoning_trace": "reasoning process"}
+   ```
+
+2. **Submission Guide File**: `SUBMISSION_GUIDE_*.md`
+
+Open the automatically generated `SUBMISSION_GUIDE_*.md` file, which contains complete submission guide:
+
+Specifically, open browser and visit:
+```
+https://huggingface.co/spaces/gaia-benchmark/leaderboard
+```
+
+As shown in Figure 12.4, fill in information in submission form:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-4.png" alt="" width="85%"/>
+  <p>Figure 12.4 BFCL Evaluation Process Diagram</p>
+</div>
+
+Before submission, you can manually check the generated JSONL file:
+
+```python
+import json
+
+# Read result file
+with open("evaluation_results/gaia_official/gaia_level1_result_*.jsonl", "r") as f:
+    for line in f:
+        result = json.loads(line)
+        print(f"Task ID: {result['task_id']}")
+        print(f"Answer: {result['model_answer']}")
+        print(f"Reasoning: {result['reasoning_trace']}")
+        print("-" * 50)
+```
+
+### 12.3.5 Core Component Implementation Details
+
+GAIA evaluation system implementation is similar to BFCL, but has some special designs for general capability evaluation.
+
+**(1) GAIADataset: Multimodal Data Loader**
+
+The special feature of GAIA dataset is that it contains multimodal data (text, files, images, etc.):
+
+````python
+class GAIADataset:
+    """GAIA dataset loader
+
+    Supports loading GAIA dataset from HuggingFace (gated dataset)
+    """
+
+    def __init__(
+        self,
+        level: Optional[int] = None,
+        split: str = "validation",
+        local_data_dir: Optional[str] = None
+    ):
+        self.level = level
+        self.split = split
+        self.local_data_dir = local_data_dir or "./data/gaia"
+        self.data = []
+
+    def load(self) -> List[Dict[str, Any]]:
+        """Load dataset"""
+        # Download from HuggingFace
+        items = self._load_from_huggingface()
+
+        # Filter by level
+        if self.level:
+            items = [item for item in items if item.get("level") == self.level]
+
+        self.data = items
+        return items
+
+    def _load_from_huggingface(self) -> List[Dict[str, Any]]:
+        """Download GAIA dataset from HuggingFace"""
+        from huggingface_hub import snapshot_download
+        import json
+
+        # Download dataset
+        repo_id = "gaia-benchmark/GAIA"
+        local_dir = snapshot_download(
+            repo_id=repo_id,
+            repo_type="dataset",
+            local_dir=self.local_data_dir,
+            local_dir_use_symlinks=False
+        )
+
+        # Load JSONL file
+        data_file = Path(local_dir) / "2023" / self.split / "metadata.jsonl"
+        items = []
+        with open(data_file, 'r', encoding='utf-8') as f:
+            for line in f:
+                item = json.loads(line)
+                items.append(self._standardize_item(item))
+
+        return items
+````
+
+**(2) GAIAEvaluator: Implementing GAIA Official Evaluation Algorithm**
+
+GAIA evaluation uses **Quasi Exact Match** algorithm, requiring special answer normalization and matching logic:
+
+````python
+class GAIAEvaluator:
+    """GAIA evaluator
+
+    Implements GAIA official Quasi Exact Match evaluation algorithm
+    """
+
+    def evaluate(self, agent: Any, max_samples: Optional[int] = None) -> Dict[str, Any]:
+        """Execute evaluation"""
+        dataset_items = self.dataset.load()
+
+        if max_samples:
+            dataset_items = dataset_items[:max_samples]
+
+        results = []
+        for i, item in enumerate(dataset_items, 1):
+            # 1. Construct prompt
+            prompt = self._build_prompt(item["question"], item)
+
+            # 2. Call agent
+            response = agent.run(prompt)
+
+            # 3. Extract answer (GAIA format: FINAL ANSWER: [answer])
+            predicted_answer = self._extract_answer(response)
+
+            # 4. Normalize answer (GAIA official rules)
+            normalized_pred = self._normalize_answer(predicted_answer)
+            normalized_truth = self._normalize_answer(item["final_answer"])
+
+            # 5. Quasi exact match
+            exact_match = (normalized_pred == normalized_truth)
+
+            results.append({
+                "task_id": item["task_id"],
+                "predicted": predicted_answer,
+                "expected": item["final_answer"],
+                "exact_match": exact_match,
+                "level": item.get("level", 0)
+            })
+
+        return self._format_results(results)
+````
+
+GAIA uses specific normalization rules to handle different types of answers:
+
+```python
+def _normalize_answer(self, answer: str) -> str:
+    """Normalize answer string (GAIA official normalization rules)
+
+    Rules:
+    1. Numbers: Remove comma separators and unit symbols
+    2. Strings: Remove articles, convert to lowercase, remove extra spaces
+    3. Lists: Comma-separated, sorted alphabetically
+    """
+    if not answer:
+        return ""
+
+    answer = answer.strip()
+
+    # Check if it's a comma-separated list
+    if ',' in answer:
+        parts = [self._normalize_single_answer(p.strip()) for p in answer.split(',')]
+        parts.sort()  # GAIA requires alphabetical sorting
+        return ','.join(parts)
+    else:
+        return self._normalize_single_answer(answer)
+
+def _normalize_single_answer(self, answer: str) -> str:
+    """Normalize single answer (answer without commas)"""
+    answer = answer.strip().lower()
+
+    # Remove common articles
+    articles = ['the', 'a', 'an']
+    words = answer.split()
+    if words and words[0] in articles:
+        words = words[1:]
+        answer = ' '.join(words)
+
+    # Remove currency symbols and percent signs
+    answer = answer.replace('$', '').replace('%', '').replace('€', '').replace('£', '')
+
+    # Remove comma separators in numbers
+    answer = re.sub(r'(\d),(\d)', r'\1\2', answer)
+
+    # Remove extra spaces
+    answer = ' '.join(answer.split())
+
+    # Remove trailing punctuation
+    answer = answer.rstrip('.,;:!?')
+
+    return answer
+```
+
+GAIA requires model output format to be `FINAL ANSWER: [answer]`:
+
+```python
+def _extract_answer(self, response: str) -> str:
+    """Extract answer from response (GAIA format)
+
+    GAIA requires answer format: FINAL ANSWER: [answer]
+    """
+    # First try to extract GAIA official format answer
+    final_answer_pattern = r'FINAL ANSWER:\s*(.+?)(?:\n|$)'
+    match = re.search(final_answer_pattern, response, re.IGNORECASE | re.MULTILINE)
+    if match:
+        answer = match.group(1).strip()
+        # Remove possible brackets
+        answer = answer.strip('[]')
+        return answer
+
+    # Fallback: Look for other answer markers
+    answer_patterns = [
+        r'答案[::]\s*(.+)',
+        r'最终答案[::]\s*(.+)',
+        r'Final answer[::]\s*(.+)',
+        r'Answer[::]\s*(.+)',
+    ]
+
+    for pattern in answer_patterns:
+        match = re.search(pattern, response, re.IGNORECASE)
+        if match:
+            return match.group(1).strip()
+
+    # If no marker found, return last non-empty line
+    lines = response.strip().split('\n')
+    for line in reversed(lines):
+        line = line.strip()
+        if line and not line.startswith('#'):
+            return line
+
+    return response.strip()
+```
+
+After evaluation completes, can export to JSONL format required by GAIA official:
+
+```python
+def export_to_gaia_format(
+    self,
+    results: Dict[str, Any],
+    output_path: Union[str, Path],
+    include_reasoning: bool = True
+) -> None:
+    """Export to GAIA official format (JSONL)
+
+    GAIA required format:
+    {"task_id": "xxx", "model_answer": "answer", "reasoning_trace": "reasoning process"}
+    """
+    output_path = Path(output_path)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+
+    with open(output_path, 'w', encoding='utf-8') as f:
+        for result in results.get("detailed_results", []):
+            entry = {
+                "task_id": result["task_id"],
+                "model_answer": result["predicted"]
+            }
+
+            if include_reasoning:
+                entry["reasoning_trace"] = result.get("response", result["predicted"])
+
+            f.write(json.dumps(entry, ensure_ascii=False) + '\n')
+```
+
+**(3) GAIAEvaluationTool: One-Click Evaluation Tool**
+
+GAIAEvaluationTool encapsulates complete evaluation process, providing one-click evaluation functionality:
+
+````python
+class GAIAEvaluationTool(Tool):
+    """GAIA evaluation tool
+
+    Provides one-click evaluation functionality:
+    1. Run HelloAgents evaluation
+    2. Export GAIA format results
+    3. Generate evaluation report
+    4. Generate submission guide
+    """
+
+    def run(
+        self,
+        agent: Any,
+        level: Optional[int] = None,
+        max_samples: Optional[int] = None,
+        local_data_dir: Optional[str] = None,
+        export_results: bool = True,
+        generate_report: bool = True
+    ) -> Dict[str, Any]:
+        """Execute GAIA one-click evaluation"""
+        # Step 1: Run HelloAgents evaluation
+        results = self._run_evaluation(agent, level, max_samples, local_data_dir)
+
+        # Step 2: Export GAIA format results
+        if export_results:
+            self._export_results(results)
+
+        # Step 3: Generate evaluation report
+        if generate_report:
+            self.generate_report(results)
+
+        return results
+````
+
+GAIAEvaluationTool automatically generates evaluation report:
+
+```python
+def generate_report(
+    self,
+    results: Dict[str, Any],
+    output_file: Optional[Union[str, Path]] = None
+) -> str:
+    """Generate evaluation report"""
+    report = f"""# GAIA Evaluation Report
+
+**Generated**: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
+
+## 📊 Evaluation Overview
+
+- **Agent**: {results.get("agent_name", "Unknown")}
+- **Difficulty Level**: {results.get("level_filter") or 'All'}
+- **Total Samples**: {results.get("total_samples", 0)}
+- **Exact Matches**: {results.get("exact_matches", 0)}
+- **Exact Match Rate**: {results.get("exact_match_rate", 0):.2%}
+
+## 📈 Detailed Metrics
+
+### Level-wise Accuracy
+
+{self._format_level_metrics(results.get("level_metrics", {}))}
+
+## 📝 Sample Details (First 10)
+
+{self._format_sample_details(results.get("detailed_results", [])[:10])}
+
+## 📊 Accuracy Visualization
+
+{self._format_visualization(results.get("exact_match_rate", 0))}
+
+## 💡 Recommendations
+
+{self._format_suggestions(results.get("exact_match_rate", 0))}
+"""
+
+    # Save report
+    if output_file is None:
+        output_dir = Path("./evaluation_reports")
+        output_dir.mkdir(parents=True, exist_ok=True)
+        output_file = output_dir / f"gaia_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.md"
+
+    with open(output_file, 'w', encoding='utf-8') as f:
+        f.write(report)
+
+    return report
+```
+
+## 12.4 Data Generation Quality Evaluation
+
+In AI system development, high-quality training data is the foundation of system performance. This section introduces how to use the HelloAgents framework to evaluate the quality of generated data, using AIME (American Invitational Mathematics Examination)<sup>[9]</sup> style mathematics problem generation as an example.
+
+AIME is a medium-difficulty mathematics competition hosted by the Mathematical Association of America (MAA), positioned between AMC 10/12 and the USA Mathematical Olympiad (USAMO). AIME problems have distinctive characteristics: each problem's answer is an integer between 0 and 999, problems cover multiple mathematical domains including algebra, geometry, number theory, combinatorics, and probability, require multi-step reasoning but don't involve advanced theory, and have moderate difficulty (equivalent to AIME problems 6-9). These characteristics make AIME problems an ideal benchmark for evaluating mathematics problem generation quality: unified answer format facilitates automated evaluation, and moderate difficulty is suitable for large-scale generation. We use the `TianHongZXY/aime-1983-2025` dataset on HuggingFace as reference, which contains over 900 AIME real problems from 1983 to 2025, providing rich reference samples for our generation and evaluation.
+
+### 12.4.1 Evaluation Methods Overview
+
+In data generation quality evaluation, we adopt three complementary evaluation methods: LLM Judge, Win Rate, and Manual Verification. There are two important reasons for choosing these three methods. First, from a methodological perspective, these are commonly used automated evaluation schemes in the current agent field and mainstream practices in many academic papers, with broad recognition and practical foundation. Second, from an applicability perspective, these three methods are naturally suitable for our evaluation scenario: LLM Judge and Win Rate are used to evaluate problem generation quality (multi-dimensional evaluation from correctness, clarity, difficulty matching, etc.), while Manual Verification is used to evaluate answer generation quality (verifying answer accuracy through human experts), this division of labor is very reasonable and easy to understand.
+
+Below we introduce the specific implementation of these three evaluation methods in detail. The implementation flow of the entire case is shown in Figure 12.5:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-5.png" alt="" width="85%"/>
+  <p>Figure 12.5 Data Generation Quality Evaluation Flow Diagram</p>
+</div>
+
+**(1) LLM Judge Evaluation**
+
+**Design Motivation**: In data generation quality evaluation, we need to quickly and consistently evaluate the quality of a large number of generated problems. Traditional manual evaluation, although accurate, is costly and inefficient, making it difficult to meet the demands of large-scale data generation. LLM Judge, by using large language models as judges, can automatically evaluate the quality of generated data from multiple dimensions, not only greatly improving evaluation efficiency but also maintaining consistency in evaluation standards. More importantly, LLM Judge can provide detailed scoring reasons and improvement suggestions, helping us understand the strengths and weaknesses of generated data and providing direction for subsequent optimization.
+
+In our implementation, LLM Judge evaluates AIME problem quality from four key dimensions:
+
+<div align="center">
+  <p>Table 12.5 LLM Judge Evaluation Dimensions for AIME Problems</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-4.png" alt="" width="85%"/>
+</div>
+
+After obtaining scores from four dimensions, we need to aggregate these scores into overall evaluation metrics. We define three key metrics to measure the quality level of generated problems:
+
+**Evaluation Metrics**:
+
+**1. Average Score**: Calculate the average score of all problems across four dimensions, reflecting the overall quality level of generated problems.
+$$
+\text{Average Score} = \frac{1}{N} \sum_{i=1}^{N} \frac{\sum_{d=1}^{4} S_{i,d}}{4}
+$$
+
+**2. Pass Rate**: Count the proportion of problems with average score of 3.5 or above, reflecting basic quality assurance of generated problems.
+
+$$
+\text{Pass Rate} = \frac{|\{i : \text{Score}_i \geq 3.5\}|}{N}
+$$
+
+**3. Excellent Rate**: Count the proportion of problems with average score of 4.5 or above, reflecting the high-quality proportion of generated problems.
+
+$$
+\text{Excellent Rate} = \frac{|\{i : \text{Score}_i \geq 4.5\}|}{N}
+$$
+
+Where:
+- $N$ is the total number of problems evaluated
+- $S_{i,d}$ is the score of the $i$-th problem on the $d$-th dimension (1-5 points)
+- $\text{Score}_i$ is the average score of the $i$-th problem (average of four dimension scores)
+
+These three metrics reflect generation quality from different angles: average score gives overall level, pass rate ensures basic quality, excellent rate measures high-quality output capability.
+
+**(2) Win Rate Evaluation**
+
+**Design Motivation**: Although LLM Judge can provide multi-dimensional absolute scoring, we also need a relative evaluation metric to measure the quality gap between generated problems and real problems. Win Rate evaluation, through pairwise comparison, lets LLM directly judge which is better between generated problems and real problems. This relative comparison is more in line with human judgment habits than absolute scoring, and can more easily discover the relative advantages and disadvantages of generated problems. Ideally, if the quality of generated problems is close to real problems, Win Rate should be around 50% (i.e., generated problems and real problems each have 50% win rate). This metric is simple and intuitive, allowing quick judgment of the overall quality level of the generation system.
+
+In our implementation, Win Rate evaluation is conducted through the flow shown in Figure 12.6:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-6.png" alt="" width="85%"/>
+  <p>Figure 12.6 Data Generation Quality Evaluation Flow Diagram</p>
+</div>
+
+In pairwise comparison evaluation, each comparison produces three possible results: generated problem wins (Win), real problem wins (Loss), or tie (Tie). We evaluate the quality of generated problems by counting the proportions of these three results:
+
+**Evaluation Metrics**:
+
+**1. Win Rate**: Proportion of generated problems judged as better, reflecting advantages of generated problems relative to real problems.
+
+$$
+\text{Win Rate} = \frac{\text{Wins}}{\text{Total Comparisons}}
+$$
+
+**2. Loss Rate**: Proportion of real problems judged as better, reflecting disadvantages of generated problems relative to real problems.
+
+$$
+\text{Loss Rate} = \frac{\text{Losses}}{\text{Total Comparisons}}
+$$
+
+**3. Tie Rate**: Proportion judged as equivalent quality, reflecting similarity between generated problems and real problems.
+
+$$
+\text{Tie Rate} = \frac{\text{Ties}}{\text{Total Comparisons}}
+$$
+
+Where Total Comparisons is the total number of comparisons, Wins, Losses, and Ties are the numbers of generated problem wins, losses, and ties respectively. These three metrics satisfy: Win Rate + Loss Rate + Tie Rate = 100%.
+
+**Ideal Result**: Win Rate ≈ 50% (indicating generation quality is close to real problems). If Win Rate is significantly lower than 50%, it indicates generated problem quality is inferior to real problems and generation strategy needs optimization; if Win Rate is significantly higher than 50%, it may indicate generated problems surpass real problems in some aspects, or there is bias in evaluation standards.
+
+**(3) Manual Verification**
+
+**Design Motivation**: Although LLM Judge and Win Rate can automatically evaluate problem quality, for mathematical problems that require strict logical reasoning, manual verification is still indispensable. Especially when evaluating answer generation quality, human experts are needed to verify answer accuracy, solution step completeness, and mathematical reasoning rigor. Additionally, manual verification can discover issues that automated evaluation might miss, such as subjective factors like problem innovation and interest. To improve manual verification efficiency and experience, we developed a Gradio-based Web interface, allowing verifiers to conveniently browse problems, score, annotate status, and add comments, greatly lowering the barrier to manual verification.
+
+In our implementation, manual verification is conducted through the following steps:
+
+1. Read problem, answer, solution
+2. Score (1-5 points): correctness, clarity, difficulty matching, completeness
+3. Annotate status:
+   - ✅ approved (passed)
+   - ❌ rejected (rejected)
+   - 🔄 needs_revision (needs revision)
+4. Add comments
+
+### 12.4.2 System Architecture
+
+Data generation and evaluation system adopts modular design:
+
+```
+data_generation/
+├── aime_generator.py              # AIME problem generator
+├── human_verification_ui.py       # Manual verification interface
+├── run_complete_evaluation.py     # Complete evaluation flow
+│
+├── generated_data/                # Generated data
+│   ├── aime_generated_XXXXXX.json
+│   └── generation_report_XXXXXX.md
+│
+└── evaluation_results/            # Evaluation results
+    └── XXXXXX/
+        ├── llm_judge/
+        ├── win_rate/
+        └── comprehensive_report.md
+```
+
+The system contains four core components: First is AIMEGenerator (problem generator), using HelloAgents framework to generate AIME-style problems, supporting batch generation and progress saving, and automatically handling API rate limits; second is LLMJudgeTool (LLM Judge evaluation tool), providing 4-dimensional quality evaluation, automatically generating JSON results and Markdown reports; third is WinRateTool (Win Rate evaluation tool), calculating win rate, loss rate, and tie rate through pairwise comparison evaluation; finally is HumanVerificationUI (manual verification interface), based on Gradio Web interface, supporting scoring and status annotation.
+
+### 12.4.3 AIME Problem Generator Implementation
+
+```python
+class AIMEGenerator:
+    """AIME Problem Generator"""
+
+    def __init__(
+        self,
+        llm: HelloAgentsLLM = None,
+        delay_seconds: float = 1.0,
+        use_reference_examples: bool = True,
+        reference_dataset: str = "TianHongZXY/aime-1983-2025"
+    ):
+        self.llm = llm or HelloAgentsLLM()
+        self.agent = SimpleAgent(
+            name="AIME Generator",
+            llm=self.llm,
+            system_prompt="You are a professional mathematics competition problem designer."
+        )
+        self.delay_seconds = delay_seconds
+        self.use_reference_examples = use_reference_examples
+
+        # Load reference examples from 900+ AIME problems (1983-2025)
+        if use_reference_examples:
+            dataset = load_dataset(reference_dataset, split="test")
+            self.reference_examples = list(dataset)
+```
+
+Our goal is to generate a similar style dataset, so we randomly select reference examples from 900+ AIME real problems (1983-2025)
+
+Generation prompt design (English):
+
+```python
+GENERATION_PROMPT = """You are a professional mathematics competition problem designer, skilled in creating AIME (American Invitational Mathematics Examination) style problems.
+
+【Reference Example】(For style reference only, please generate a completely different problem)
+Problem: {example_problem}
+Answer: {example_answer}
+
+AIME Problem Characteristics:
+1. Answer: An integer between 0 and 999
+2. Topics: Algebra, Geometry, Number Theory, Combinatorics, Probability, etc.
+3. Style: Requires multi-step reasoning, but no advanced theory
+4. Difficulty: Medium to hard (similar to AIME problems 6-9)
+
+Please generate a **completely different** AIME-style mathematics problem, including:
+1. Problem statement (clear and complete, different from the reference)
+2. Answer (an integer between 0 and 999, different from the reference)
+3. Detailed solution (including all reasoning steps)
+4. Topic classification (Algebra/Geometry/Number Theory/Combinatorics/Probability)
+
+Please output in the following JSON format:
+{
+    "problem": "Problem statement in English",
+    "answer": 123,
+    "solution": "Detailed solution steps in English",
+    "topic": "Algebra"
+}
+"""
+```
+
+We choose to generate problems in English for four important reasons: first is consistency with AIME real problems (AIME is an English competition, generating English problems is more reasonable), second is ensuring evaluation fairness (LLM Judge evaluation is fairer when English vs English), third is facilitating internationalization (English problems can be more widely used), and finally is avoiding translation issues (no need to worry about accuracy of Chinese-English translation).
+
+Batch generation implementation:
+
+```python
+def generate_and_save(self, num_problems: int = 30, output_dir: str = "data_generation/generated_data"):
+    """Generate and save problems with intelligent delay"""
+    # Clean old checkpoints
+    for file in os.listdir(output_dir):
+        if file.startswith("checkpoint_") and file.endswith(".json"):
+            os.remove(os.path.join(output_dir, file))
+
+    # Generate with tqdm progress bar
+    with tqdm(total=num_problems, desc="Generating AIME problems", unit="problem") as pbar:
+        last_call_time = 0
+
+        for i in range(num_problems):
+            # Ensure minimum delay between API calls
+            if last_call_time > 0:
+                elapsed = time.time() - last_call_time
+                if elapsed < self.delay_seconds:
+                    wait_time = self.delay_seconds - elapsed
+                    time.sleep(wait_time)
+
+            # Generate problem (randomly select reference example)
+            start_time = time.time()
+            problem = self.generate_single()
+            last_call_time = time.time()
+            generation_time = last_call_time - start_time
+
+            # Update progress bar
+            pbar.set_postfix({
+                "topic": problem.get('topic', 'N/A'),
+                "answer": problem.get('answer', 'N/A'),
+                "time": f"{generation_time:.1f}s"
+            })
+            pbar.update(1)
+
+    return generated_data_path
+```
+
+LaTeX mathematical formula support:
+
+Generated AIME problems contain LaTeX mathematical formulas (such as `$\frac{a}{b}$`, `$\sqrt{x}$`), requiring special JSON parsing handling:
+
+```python
+def _parse_response(self, response: str) -> Dict[str, Any]:
+    """Parse LLM response (supports LaTeX mathematical formulas)"""
+    import re
+
+    # Extract JSON part
+    if "```json" in response:
+        json_str = response.split("```json")[1].split("```")[0].strip()
+    else:
+        json_str = response.strip()
+
+    try:
+        problem_data = json.loads(json_str)
+    except json.JSONDecodeError:
+        # Fix LaTeX escape issue: convert \frac to \\frac
+        # Regular expression: find unescaped backslashes
+        fixed_json_str = re.sub(r'(?<!\\)\\(?!["\\/bfnrtu])', r'\\\\', json_str)
+        problem_data = json.loads(fixed_json_str)
+
+    return problem_data
+```
+
+Backslashes in LaTeX formulas (such as `\frac`, `\sqrt`) are illegal escape characters in JSON, causing parsing failure:
+```
+Invalid \escape: line 4 column 185 (char 375)
+```
+
+By using regular expressions to replace unescaped backslashes with double backslashes, making them legal in JSON.
+
+### 12.4.4 LLM Judge Evaluation Tool
+
+LLM Judge tool uses LLM as judge to conduct multi-dimensional evaluation of generated problems.
+
+```python
+class LLMJudgeTool(Tool):
+    """LLM Judge evaluation tool"""
+
+    def run(self, params: Dict[str, Any]) -> str:
+        """Run LLM Judge evaluation"""
+        # 1. Load generated data
+        gen_dataset = AIDataset(dataset_type="generated", data_path=params["generated_data_path"])
+        gen_problems = gen_dataset.load()
+
+        # 2. Load reference data (AIME 2025)
+        ref_dataset = AIDataset(dataset_type="real", year=2025)
+        ref_problems = ref_dataset.load()
+
+        # 3. Create evaluator
+        evaluator = LLMJudgeEvaluator(llm=self.llm, judge_model=params.get("judge_model", "gpt-4o"))
+
+        # 4. Run evaluation
+        results = evaluator.evaluate_batch(gen_problems, max_samples=params.get("max_samples"))
+
+        # 5. Save results
+        evaluator.export_results(results, result_file)
+
+        # 6. Generate report
+        self._generate_report(results, report_file)
+
+        return json.dumps({"status": "success", "metrics": results["metrics"]})
+```
+
+**Evaluation Prompt**:
+
+```python
+EVALUATION_PROMPT = """Please evaluate the quality of the following AIME mathematics problem.
+
+Problem:
+{problem}
+
+Answer: {answer}
+
+Solution:
+{solution}
+
+Please score from the following 4 dimensions (1-5 points):
+
+1. **Correctness**: Is the mathematical logic correct, is the answer accurate
+2. **Clarity**: Is the problem statement clear, is the solution easy to understand
+3. **Difficulty Match**: Does the difficulty match AIME standards (medium to hard)
+4. **Completeness**: Are the solution steps complete, does it include necessary reasoning
+
+Please output in the following JSON format:
+{
+    "correctness": 5,
+    "clarity": 4,
+    "difficulty_match": 4,
+    "completeness": 5,
+    "comments": "Evaluation reason"
+}
+"""
+```
+
+**Evaluation Report Example**:
+
+```markdown
+# LLM Judge Evaluation Report
+
+## Overall Score
+
+- **Average Total Score**: 4.2/5.0
+- **Pass Rate**: 85.0% (≥3.5 points)
+- **Excellent Rate**: 40.0% (≥4.5 points)
+
+## Dimension Scores
+
+| Dimension | Average Score | Rating |
+|------|--------|------|
+| Correctness | 4.3/5.0 | Good ⭐⭐⭐⭐ |
+| Clarity | 4.1/5.0 | Good ⭐⭐⭐⭐ |
+| Difficulty Match | 4.0/5.0 | Good ⭐⭐⭐⭐ |
+| Completeness | 4.4/5.0 | Good ⭐⭐⭐⭐ |
+```
+
+### 12.4.5 Win Rate Evaluation Tool
+
+Win Rate tool evaluates the quality of generated data relative to real problems through pairwise comparison.
+
+```python
+class WinRateTool(Tool):
+    """Win Rate evaluation tool"""
+
+    def run(self, params: Dict[str, Any]) -> str:
+        """Run Win Rate evaluation"""
+        # 1. Load generated data
+        gen_dataset = AIDataset(dataset_type="generated", data_path=params["generated_data_path"])
+        gen_problems = gen_dataset.load()
+
+        # 2. Load reference data (AIME 2025)
+        ref_dataset = AIDataset(dataset_type="real", year=2025)
+        ref_problems = ref_dataset.load()
+
+        # 3. Create evaluator
+        evaluator = WinRateEvaluator(llm=self.llm, judge_model=params.get("judge_model", "gpt-4o"))
+
+        # 4. Run evaluation
+        results = evaluator.evaluate_win_rate(gen_problems, ref_problems, num_comparisons=params.get("num_comparisons"))
+
+        # 5. Save results and report
+        evaluator.export_results(results, result_file)
+        self._generate_report(results, report_file)
+
+        return json.dumps({"status": "success", "metrics": results["metrics"]})
+```
+
+AIDataset is responsible for loading generated data and AIME real problem data, supporting two data types:
+
+```python
+class AIDataset:
+    """AI dataset loader
+
+    Supports two data types:
+    1. generated: Generated data (JSON format)
+    2. real: AIME real problems (loaded from HuggingFace)
+    """
+
+    def __init__(
+        self,
+        dataset_type: str = "generated",
+        data_path: Optional[str] = None,
+        year: Optional[int] = None
+    ):
+        self.dataset_type = dataset_type
+        self.data_path = data_path
+        self.year = year  # Only for real type, default 2025
+
+    def load(self) -> List[Dict[str, Any]]:
+        """Load dataset"""
+        if self.dataset_type == "generated":
+            return self._load_generated_data()
+        elif self.dataset_type == "real":
+            return self._load_real_data()
+
+    def _load_real_data(self) -> List[Dict[str, Any]]:
+        """Load AIME 2025 real problems from HuggingFace"""
+        from huggingface_hub import snapshot_download
+
+        # Use AIME 2025 dataset
+        repo_id = "math-ai/aime25"
+
+        # Download dataset
+        local_dir = snapshot_download(
+            repo_id=repo_id,
+            repo_type="dataset"
+        )
+
+        # Read JSONL file
+        data_file = list(Path(local_dir).glob("*.jsonl"))[0]
+        data = []
+        with open(data_file, 'r', encoding='utf-8') as f:
+            for line in f:
+                if line.strip():
+                    data.append(json.loads(line))
+
+        # Unify data format (AIME 2025 uses lowercase field names)
+        problems = []
+        for idx, item in enumerate(data):
+            problem = {
+                "problem_id": item.get("id", f"aime_2025_{idx}"),
+                "problem": item.get("problem", ""),
+                "answer": item.get("answer", ""),
+                "solution": item.get("solution", ""),  # AIME 2025 has no solution field
+            }
+            problems.append(problem)
+
+        return problems
+```
+
+We choose to use only AIME 2025 dataset for four reasons: first is data timeliness (2025 is the latest AIME competition data), second is simplified maintenance (maintaining only one dataset, code is more concise), third is unified format (JSONL format, field names unified to lowercase), and finally is sufficient representativeness (30 problems are enough to evaluate generation quality).
+
+**Comparison Prompt**:
+
+```python
+COMPARISON_PROMPT = """Please compare the quality of the following two AIME mathematics problems and judge which is better.
+
+【Problem A - Generated Problem】
+Problem: {problem_a}
+Answer: {answer_a}
+Solution: {solution_a}
+
+【Problem B - AIME Real Problem】
+Problem: {problem_b}
+Answer: {answer_b}
+Solution: {solution_b}
+
+Please compare from the following aspects:
+1. Rigor of mathematical logic
+2. Clarity of problem statement
+3. Reasonableness of difficulty
+4. Completeness of solution
+
+Please output in the following JSON format:
+{
+    "winner": "A" or "B" or "Tie",
+    "reason": "Judgment reason"
+}
+"""
+```
+
+**Evaluation Report Example**:
+
+```markdown
+# Win Rate Evaluation Report
+
+## Win Rate Statistics
+
+| Metric | Value | Percentage |
+|------|------|--------|
+| Generated Data Wins | 9 times | 45.0% |
+| AIME Real Problems Win | 8 times | 40.0% |
+| Tie | 3 times | 15.0% |
+
+**Win Rate**: 45.0%
+
+✅ **Good**: Generated data quality is close to reference data (gap <10%).
+```
+
+### 12.4.6 Manual Verification Interface
+
+Use Gradio to create Web interface, supporting manual verification of generated problems.
+
+```python
+class HumanVerificationUI:
+    """Manual verification interface"""
+
+    def launch(self, share: bool = False):
+        """Launch Gradio interface"""
+        with gr.Blocks(title="AIME Problem Manual Verification") as demo:
+            gr.Markdown("# 🎯 AIME Problem Manual Verification System")
+
+            with gr.Row():
+                with gr.Column(scale=2):
+                    # Problem display area
+                    problem_text = gr.Textbox(label="Problem Description", lines=5, interactive=False)
+                    answer_text = gr.Textbox(label="Answer", interactive=False)
+                    solution_text = gr.Textbox(label="Solution Process", lines=10, interactive=False)
+
+                with gr.Column(scale=1):
+                    # Scoring area
+                    correctness_slider = gr.Slider(1, 5, value=3, step=1, label="Correctness")
+                    clarity_slider = gr.Slider(1, 5, value=3, step=1, label="Clarity")
+                    difficulty_slider = gr.Slider(1, 5, value=3, step=1, label="Difficulty Match")
+                    completeness_slider = gr.Slider(1, 5, value=3, step=1, label="Completeness")
+
+                    # Status selection
+                    status_radio = gr.Radio(
+                        choices=["approved", "rejected", "needs_revision"],
+                        value="approved",
+                        label="Status"
+                    )
+
+                    # Verification button
+                    verify_btn = gr.Button("✅ Submit Verification", variant="primary")
+
+            demo.launch(share=share, server_name="127.0.0.1", server_port=7860)
+```
+
+**Usage Method**:
+
+```bash
+# Launch manual verification interface
+python data_generation/human_verification_ui.py data_generation/generated_data/aime_generated_XXXXXX.json
+
+# Open browser and visit
+http://127.0.0.1:7860
+```
+
+The final effect can be referenced in Figure 12.7. For problem correctness, manual review is best:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-7.png" alt="" width="85%"/>
+  <p>Figure 12.7 AIME Problem Manual Verification Page</p>
+</div>
+
+**Verification Process**:
+
+1. Open verification interface in browser
+2. Read problem, answer, solution
+3. Score from 4 dimensions (1-5 points)
+4. Select verification status (approved/rejected/needs_revision)
+5. Add comments (optional)
+6. Click "Submit Verification"
+7. View next problem
+
+**Verification Result Saving**:
+
+Verification results are automatically saved as `<data_path>_verifications.json`:
+
+```json
+{
+  "gen_aime_1": {
+    "problem_id": "gen_aime_1",
+    "scores": {
+      "correctness": 5,
+      "clarity": 4,
+      "difficulty_match": 4,
+      "completeness": 5
+    },
+    "total_score": 4.5,
+    "status": "approved",
+    "comments": "Problem quality is very good, logic is rigorous",
+    "verified_at": "2025-01-10T12:00:00"
+  }
+}
+```
+
+### 12.4.7 Complete Evaluation Flow
+
+Integrate all evaluation methods into a complete flow.
+
+```python
+def run_complete_evaluation(
+    num_problems: int = 30,
+    delay_seconds: float = 3.0
+):
+    """
+    Run complete evaluation flow
+
+    Args:
+        num_problems: Number of problems to generate
+        delay_seconds: Delay between each generation (seconds), avoid API rate limit
+    """
+    # Step 1: Generate AIME problems
+    generator = AIMEGenerator(delay_seconds=delay_seconds)
+    generated_data_path = generator.generate_and_save(
+        num_problems=num_problems,
+        output_dir="data_generation/generated_data"
+    )
+
+    # Step 2: Evaluation
+    # Create evaluation result directory
+    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+    evaluation_dir = f"data_generation/evaluation_results/{timestamp}"
+    os.makedirs(evaluation_dir, exist_ok=True)
+    os.makedirs(os.path.join(evaluation_dir, "llm_judge"), exist_ok=True)
+    os.makedirs(os.path.join(evaluation_dir, "win_rate"), exist_ok=True)
+
+    # Create LLM
+    llm = HelloAgentsLLM()
+
+    # Step 2.1: LLM Judge evaluation
+    llm_judge_result = None
+    try:
+        llm_judge_tool = LLMJudgeTool(llm=llm)
+        llm_judge_result_json = llm_judge_tool.run({
+            "generated_data_path": generated_data_path,
+            "reference_year": 2025,
+            "max_samples": num_problems,
+            "output_dir": os.path.join(evaluation_dir, "llm_judge"),
+            "judge_model": "gpt-4o"
+        })
+        llm_judge_result = json.loads(llm_judge_result_json)
+    except Exception as e:
+        print(f"❌ LLM Judge evaluation failed: {e}")
+
+    # Step 2.2: Win Rate evaluation
+    win_rate_result = None
+    try:
+        win_rate_tool = WinRateTool(llm=llm)
+        win_rate_result_json = win_rate_tool.run({
+            "generated_data_path": generated_data_path,
+            "reference_year": 2025,
+            "num_comparisons": min(num_problems, 20),
+            "output_dir": os.path.join(evaluation_dir, "win_rate"),
+            "judge_model": "gpt-4o"
+        })
+        win_rate_result = json.loads(win_rate_result_json)
+    except Exception as e:
+        print(f"❌ Win Rate evaluation failed: {e}")
+
+    # Step 3: Generate comprehensive report
+    comprehensive_report_path = None
+    if llm_judge_result or win_rate_result:
+        comprehensive_report_path = os.path.join(evaluation_dir, "comprehensive_report.md")
+        report = generate_comprehensive_report(
+            generated_data_path,
+            llm_judge_result,
+            win_rate_result
+        )
+        with open(comprehensive_report_path, 'w', encoding='utf-8') as f:
+            f.write(report)
+
+    return {
+        "generated_data_path": generated_data_path,
+        "llm_judge_result": llm_judge_result,
+        "win_rate_result": win_rate_result,
+        "comprehensive_report_path": comprehensive_report_path
+    }
+```
+
+**Run Method**:
+
+```bash
+# Basic usage (default 3 second delay)
+python data_generation/run_complete_evaluation.py 30
+
+# Custom delay (recommended 3-5 seconds, avoid API rate limit)
+python data_generation/run_complete_evaluation.py 30 3.0
+
+# Parameter explanation:
+# - 30: Number of problems to generate
+# - 3.0: Delay between each generation (seconds)
+
+# Explanation:
+# - Generation phase: Randomly select reference examples from 900+ AIME real problems (1983-2025)
+# - Evaluation phase: Quality comparison with AIME 2025 real problems
+# - Dataset source: math-ai/aime25 (JSONL format)
+```
+
+**Output Example**:
+
+```
+================================================================================
+🚀 AIME Data Generation and Evaluation Complete Flow
+================================================================================
+
+Configuration:
+  - Number of problems to generate: 30
+  - API delay: 3.0 seconds/problem
+  - Generation reference data: TianHongZXY/aime-1983-2025 (900+ problems)
+  - Evaluation reference: AIME 2025 real problems
+
+================================================================================
+📝 Step 1: Generate AIME Problems
+================================================================================
+📚 Load AIME real problem dataset: TianHongZXY/aime-1983-2025
+   ✓ Loaded 963 reference problems
+
+🎯 Start generating AIME problems
+   Target quantity: 30
+   Generation model: gpt-4o
+   Delay setting: 3.0 seconds/problem
+
+Generating AIME problems:  100%|██████████| 30/30 [01:30<00:00, 3.00s/problem, topic=Algebra, answer=123, time=3.0s]
+
+✅ Step 1 complete! Generated data saved at: data_generation/generated_data/aime_generated_20250110_120000.json
+
+🎯 Step 2.1: LLM Judge Evaluation (vs AIME 2025)
+
+✅ LLM Judge evaluation complete!
+   Average total score: 4.2/5.0
+   Pass rate: 85.0%
+
+🏆 Step 2.2: Win Rate Evaluation (vs AIME 2025)
+
+✅ Win Rate evaluation complete!
+   Win Rate: 45.0%
+
+================================================================================
+📊 Step 3: Generate Comprehensive Report
+================================================================================
+
+✅ Comprehensive report saved: data_generation/evaluation_results/20250110_120000/comprehensive_report.md
+
+================================================================================
+🎉 Complete Evaluation Flow Finished!
+================================================================================
+
+📁 Output Files:
+   - Generated data: data_generation/generated_data/aime_generated_20250110_120000.json
+   - Evaluation result directory: data_generation/evaluation_results/20250110_120000
+   - LLM Judge report: data_generation/evaluation_results/20250110_120000/llm_judge/llm_judge_report_20250110_120000.md
+   - Win Rate report: data_generation/evaluation_results/20250110_120000/win_rate/win_rate_report_20250110_120000.md
+   - Comprehensive report: data_generation/evaluation_results/20250110_120000/comprehensive_report.md
+
+💡 Next Steps:
+   1. View comprehensive report: data_generation/evaluation_results/20250110_120000/comprehensive_report.md
+   2. Run manual verification: python data_generation/human_verification_ui.py data_generation/generated_data/aime_generated_20250110_120000.json
+```
+
+### 12.4.8 Comprehensive Evaluation Report
+
+The system automatically generates comprehensive evaluation reports, summarizing all evaluation results. Below is an example report:
+
+```markdown
+# AIME Data Generation and Evaluation Comprehensive Report
+
+## 1. Basic Information
+
+- **Generation Time**: 2025-01-10 12:00:00
+- **Number of Generated Problems**: 30
+- **Reference AIME Year**: 2025
+
+## 2. Data Generation Statistics
+
+### Topic Distribution
+
+| Topic | Quantity | Proportion |
+|------|------|------|
+| Algebra | 10 | 33.3% |
+| Geometry | 8 | 26.7% |
+| Number Theory | 7 | 23.3% |
+| Combinatorics | 3 | 10.0% |
+| Probability | 2 | 6.7% |
+
+## 3. LLM Judge Evaluation Results
+
+### Overall Score
+
+- **Average Total Score**: 4.2/5.0
+- **Pass Rate**: 85.0% (≥3.5 points)
+- **Excellent Rate**: 40.0% (≥4.5 points)
+
+### Dimension Scores
+
+| Dimension | Average Score | Rating |
+|------|--------|------|
+| Correctness | 4.3/5.0 | Good ⭐⭐⭐⭐ |
+| Clarity | 4.1/5.0 | Good ⭐⭐⭐⭐ |
+| Difficulty Match | 4.0/5.0 | Good ⭐⭐⭐⭐ |
+| Completeness | 4.4/5.0 | Good ⭐⭐⭐⭐ |
+
+## 4. Win Rate Evaluation Results
+
+### Win Rate Statistics
+
+| Metric | Value | Percentage |
+|------|------|--------|
+| Generated Data Wins | 9 times | 45.0% |
+| AIME Real Problems Win | 8 times | 40.0% |
+| Tie | 3 times | 15.0% |
+
+**Win Rate**: 45.0%
+
+✅ **Good**: Generated data quality is close to reference data (gap <10%).
+
+## 5. Comprehensive Conclusion
+
+Based on the results of LLM Judge and Win Rate evaluation methods:
+
+1. **LLM Judge Evaluation**: Average quality of generated data is **4.2/5.0**
+2. **Win Rate Evaluation**: Win rate of generated data relative to AIME 2025 real problems is **45.0%**
+
+✅ **Conclusion**: Generated data quality is **excellent**, reaching or exceeding AIME real problem level. Can be used for practical applications.
+
+## 6. Improvement Suggestions
+
+- ✅ Continue maintaining current generation strategy
+- ✅ Can consider increasing generation quantity
+- ✅ Recommend manual verification to ensure quality
+
+## 7. Next Steps
+
+1. **Manual Verification**: Run `python data_generation/human_verification_ui.py <data_path>` for manual verification
+2. **View Detailed Results**:
+   - LLM Judge detailed report
+   - Win Rate detailed report
+3. **Data Usage**: If quality is satisfactory, generated data can be used for training or testing
+```
+
+Based on practical usage experience, summarize the following content:
+
+In data generation, use appropriate delay time (2-3 seconds) to avoid API rate limits, enable checkpoint saving to avoid interruption losses, first test with small batches (10) to confirm no issues before large-scale generation, and regularly check generation quality to adjust prompts in time. In evaluation strategy, recommend combining LLM Judge and Win Rate methods, where LLM Judge is used for absolute quality evaluation, Win Rate for relative quality comparison, and manual verification for final quality control. For quality standards, recommend LLM Judge average score above 4.0/5.0, Win Rate above 45% (close to 50%), pass rate above 80%, and manual verification pass rate above 90%. In iterative optimization, adjust generation prompts based on evaluation results, analyze common issues in low-scoring problems, reference advantages of high-scoring problems, and continuously improve generation strategy.
+
+Through learning this section, we have mastered how to use the HelloAgents framework for data generation quality evaluation, including three methods: LLM Judge evaluation, Win Rate evaluation, and manual verification. This complete evaluation system can ensure high quality of generated data, providing reliable data support for AI system training and testing.
+
+For LLM Judge and Win Rate evaluation, HelloAgents has also integrated tools and provided complete example code. If you are interested in the specific implementation details of these two evaluation methods, you can also refer to the example code.
+
+## 12.5 Chapter Summary
+
+In this chapter, we built a complete performance evaluation system for the HelloAgents framework. Let's review the core content learned:
+
+**(1) Evaluation System Overview**
+
+We established a three-tier evaluation system, comprehensively covering different capability dimensions of agents. First is tool calling capability evaluation (BFCL), focusing on evaluating agent function calling accuracy, including simple, multiple, parallel, irrelevance four categories, using AST matching technology for precise evaluation. Second is general capability evaluation (GAIA), evaluating agent comprehensive problem-solving capabilities, including three difficulty levels with 466 real-world problems, focusing on multi-step reasoning, tool usage, file processing and other capabilities. Third is data generation quality evaluation (AIME), evaluating LLM-generated data quality, using LLM Judge and Win Rate methods, supporting manual verification and comprehensive report generation, ensuring generated data reaches reference data quality standards.
+
+**(2) Core Technical Points**
+
+In technical implementation, we adopted six core technical points. First is modular design, evaluation system adopts three-tier architecture: data layer (Dataset responsible for data loading and management), evaluation layer (Evaluator responsible for executing evaluation flow), and metrics layer (Metrics responsible for calculating various evaluation metrics). Second is tool encapsulation, all evaluation functions are encapsulated as Tools, can be directly called by agents, integrated into workflows, or used through unified interface. Third is AST matching technology, using abstract syntax tree matching for function calls, more intelligent than simple string matching, able to ignore parameter order, recognize equivalent expressions, and ignore format differences. Fourth is multimodal support, GAIA evaluation supports text questions, attachment files, image inputs and other multimodal data. Fifth is LLM Judge evaluation, using LLM as judge to evaluate generated data quality, providing multi-dimensional scoring (correctness, clarity, difficulty matching, completeness), automated evaluation flow, detailed evaluation reports, and supporting custom evaluation dimensions and standards. Sixth is Win Rate comparison evaluation, evaluating generation quality through pairwise comparison (generated data vs reference data), LLM judges which is better and calculates win rate statistics, close to 50% indicates equivalent quality.
+
+**(3) Extension Directions**
+
+Based on this chapter's evaluation system, you can extend in four directions. First is adding new evaluation benchmarks, can refer to BFCL and GAIA implementation patterns, implement Dataset, Evaluator, Metrics three components, and encapsulate as Tool for use. Second is custom evaluation metrics, add new metric calculation methods in Metrics class, design metrics according to specific application scenarios. Third is integration into CI/CD flow, automatically run evaluation on code commits, set performance thresholds to prevent performance degradation, generate evaluation reports and archive. Fourth is extending data generation evaluation, support more data types (code, dialogue, documents, etc.), add more evaluation dimensions (innovation, diversity, etc.), integrate more reference datasets, support multi-model comparison evaluation.
+
+**Congratulations on completing Chapter 12!** 🎉
+
+Evaluation is an important part of agent development, it allows us to:
+
+- Objectively measure agent capabilities
+- Discover and fix issues
+- Continuously improve systems
+
+In the next chapter, we will explore how to apply the HelloAgents framework to actual projects.
+
+**Keep going!** 💪
+
+## Exercises
+
+> **Hint**: Some exercises have no standard answers, focusing on cultivating learners' comprehensive understanding and practical ability in agent performance evaluation.
+
+1. This chapter introduced multiple agent evaluation benchmarks. Please analyze:
+
+   - In Section 12.1.2, BFCL, GAIA, AgentBench and other evaluation benchmarks were introduced. Please compare BFCL and GAIA: What core capabilities of agents do they evaluate respectively? Why does BFCL use AST matching algorithm while GAIA uses Quasi Exact Match? What are the advantages and disadvantages of these two evaluation methods?
+   - Suppose you want to build an "intelligent customer service system" that needs to evaluate the following capabilities: (1) accuracy of understanding user intent; (2) correctness of calling backend APIs; (3) friendliness and professionalism of responses; (4) robustness in handling exceptional situations. Please select or design appropriate evaluation metrics and methods for each capability.
+   - In Section 12.1.1, it was mentioned that agent evaluation faces three major challenges: "output uncertainty", "evaluation standard diversity", and "high evaluation cost". Please propose specific solutions for each challenge and analyze the feasibility and limitations of the solutions.
+
+2. BFCL (Berkeley Function Calling Leaderboard) is an important benchmark for evaluating tool calling capabilities. Based on Section 12.2 content, please think deeply:
+
+   > **Hint**: This is a hands-on practice question, actual operation is recommended
+
+   - In the AST matching algorithm in Section 12.2.3, we judge whether function calls are correct by comparing abstract syntax trees. Please analyze: Why is AST matching more suitable than simple string matching? In what situations might AST matching produce misjudgments (false positives or false negatives)? How to improve the AST matching algorithm to increase accuracy?
+   - BFCL dataset contains four categories: simple, multiple, parallel, irrelevance. Please design 2-3 new test samples for each category, requiring ability to test boundary cases or error-prone scenarios under that category.
+   - Please extend the BFCL evaluator based on the code in Section 12.2.4, adding the following functions: (1) support evaluating execution order of tool calls (for multiple tool calls with dependencies); (2) evaluate tool calling efficiency (such as whether minimum number of calls was used); (3) generate detailed error analysis report (such as which types of errors are most common).
+
+3. GAIA (General AI Assistants) evaluates agent comprehensive capabilities. Based on Section 12.3 content, please complete the following extension practice:
+
+   > **Hint**: This is a hands-on practice question, actual operation is recommended
+
+   - In Section 12.3.2, three difficulty levels of GAIA (Level 1/2/3) were introduced. Please analyze: What are the differences between these three levels in task complexity, required capabilities, evaluation standards, etc.? If designing Level 4 (ultra-high difficulty), what types of tasks should it include?
+   - GAIA uses "Quasi Exact Match" algorithm to evaluate answer correctness. Please analyze: How does this method handle answer diversity (such as "42", "forty-two", "42.0" should all be considered correct)? In what situations might quasi exact match not be sufficient? Please design a more intelligent answer matching algorithm that can handle semantically equivalent answers.
+   - Please implement a "custom GAIA evaluation set" based on the code in Section 12.3.4: select a specific domain (such as medical, legal, financial), design 10 real-world questions, and implement complete evaluation flow. Require questions to cover different difficulty levels, and provide standard answers and scoring criteria.
+
+4. LLM Judge is an emerging method of using large language models for evaluation. Based on Section 12.4 content, please analyze in depth:
+
+   - In Section 12.4.2, we used GPT-4 as judge to evaluate agent response quality. Please analyze: What advantages does LLM Judge have compared to traditional rule matching or metric calculation? What potential biases or limitations does it have (such as preference for certain response styles, sensitivity to length)?
+   - LLM Judge scoring criteria design is crucial. Please design detailed scoring criteria (including scoring dimensions, weights, examples) for the following three different evaluation scenarios: (1) code generation quality evaluation; (2) creative writing quality evaluation; (3) technical documentation quality evaluation.
+   - In Section 12.4.3, it was mentioned that multiple LLM Judges can be used for "jury-style" evaluation. Please design a "multi-judge evaluation system": using 3-5 different LLMs (such as GPT-4, Claude, Qwen) as judges, how to aggregate their scores? How to handle disagreements between judges? How to detect and filter abnormal scores?
+
+5. Practical application of agent evaluation needs to consider multiple aspects. Please think:
+
+   - In actual projects, evaluation often needs to balance between "evaluation cost" and "evaluation quality". Please design a "tiered evaluation strategy": (1) quick evaluation (low cost, for daily development iteration); (2) standard evaluation (medium cost, for pre-release); (3) comprehensive evaluation (high cost, for major updates or public release). What evaluation items should each tier include? How to design evaluation flow?
+   - Agent performance may change over time (such as changes in dependent external APIs, changes in user needs). Please design a "continuous evaluation system": able to periodically automatically run evaluation, monitor agent performance change trends, and alert in time when performance declines. What components should this system include? How to design alert rules?
+   - Evaluation results need to be presented clearly to different audiences (such as developers, product managers, users). Please design an "evaluation report generation system": able to automatically generate reports with different levels of detail based on audience type. What technical details should developer reports include? What business metrics should product manager reports highlight? How should user reports be simplified and visualized?
+
+## References
+
+[1] Patil, S. G., Zhang, T., Wang, X., & Gonzalez, J. E. (2023). Gorilla: Large Language Model Connected with Massive APIs. arXiv preprint arXiv:2305.15334.
+
+[2] Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., ... & Sun, M. (2023). ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv preprint arXiv:2307.16789.
+
+[3] Li, M., Zhao, Y., Yu, B., Song, F., Li, H., Yu, H., ... & Li, Y. (2023). Api-bank: A comprehensive benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244.
+
+[4] Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., ... & Scialom, T. (2023). GAIA: a benchmark for General AI Assistants. arXiv preprint arXiv:2311.12983.
+
+[5] Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., ... & Zhang, D. (2023). AgentBench: Evaluating LLMs as Agents. arXiv preprint arXiv:2308.03688.
+
+[6] Zhou, S., Xu, F. F., Zhu, H., Zhou, X., Lo, R., Sridhar, A., ... & Neubig, G. (2023). WebArena: A Realistic Web Environment for Building Autonomous Agents. arXiv preprint arXiv:2307.13854.
+
+[7] Chan, C. M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S., ... & Liu, Z. (2023). ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate. arXiv preprint arXiv:2308.07201.
+
+[8] Zhou, X., Zhu, H., Mathur, L., Zhang, R., Yu, H., Qi, Z., ... & Neubig, G. (2023). SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents. arXiv preprint arXiv:2310.11667.
+
+[9] Mathematical Association of America. (2024). American Invitational Mathematics Examination (AIME). Retrieved from https://www.maa.org/math-competitions/invitational-competitions/aime
+

File diff suppressed because it is too large
+ 179 - 175
docs/chapter12/第十二章 智能体性能评估.md


+ 1583 - 0
docs/chapter13/Chapter13-Intelligent-Travel-Assistant.md

@@ -0,0 +1,1583 @@
+<div align="right">
+  English | <a href="./第十三章%20智能旅行助手.md">中文</a>
+</div>
+
+# Chapter 13 Intelligent Travel Assistant
+
+In previous chapters, we built the HelloAgents framework from scratch, implementing core functionalities including various agent paradigms, tool systems, memory mechanisms, protocol communication, and performance evaluation. Starting from this chapter, we will enter a completely new phase: **integrating all learned knowledge to build complete practical applications.**
+
+Do you remember the first agent we built in Chapter 1? It was a simple intelligent travel assistant that demonstrated the basic principles of the `Thought-Action-Observation` loop. The intelligent travel assistant in this chapter will be a complete project, including the following core functions:
+
+**(1) Intelligent Itinerary Planning**: Users input destination, dates, preferences and other information, and the system automatically generates a complete itinerary plan including attractions, dining, and hotels.
+
+**(2) Map Visualization**: Mark attraction locations on the map and draw tour routes, making the itinerary clear at a glance.
+
+**(3) Budget Calculation**: Automatically calculate ticket, hotel, dining, and transportation costs, displaying budget details.
+
+**(4) Itinerary Editing**: Support adding, deleting, and adjusting attractions, updating the map in real-time.
+
+**(5) Export Function**: Support exporting as PDF or image, convenient for saving and sharing.
+
+## 13.1 Project Overview and Architecture Design
+
+### 13.1.1 Why We Need an Intelligent Travel Assistant
+
+Planning a trip is both exciting and frustrating. You need to search for attraction information online, compare different guides, check weather forecasts, book hotels, calculate budgets, and plan routes. This process may take several hours or even days. And even after spending so much time, you're not sure whether the planned itinerary is reasonable, whether you've missed any important attractions, or whether the budget is accurate.
+
+Traditional travel planning methods have several pain points. First is **scattered information**. Attraction information is on travel websites, weather information is on weather websites, hotel information is on booking websites - you need to switch between multiple websites and manually integrate this information. Second is **lack of personalization**. Most guides are generic and don't consider your personal preferences, budget constraints, travel time and other factors. Finally is **difficulty in adjustment**. When you want to modify the itinerary, you may need to replan the entire trip, because the order of attractions, time arrangements, and budget are all interconnected.
+
+AI technology provides new possibilities for solving these problems. Imagine that you only need to tell the system "I want to visit Beijing for 3 days, like history and culture, medium budget", and the system can automatically generate a complete itinerary plan for you, including which attractions to visit each day, where to eat, which hotel to stay at, and how much budget is needed. Moreover, this plan is adjustable - you can delete attractions you don't like, adjust the tour order, and the system will automatically update the map and budget.
+
+This is the intelligent travel assistant we want to build. It's not just a technical demonstration, but a truly useful application. Through this project, you will learn how to apply AI technology to practical problems, how to design multi-agent systems, and how to build complete Web applications.
+
+### 13.1.2 Technical Architecture Overview
+
+The system adopts the classic **front-end and back-end separation architecture**, divided into four layers, as shown in Figure 13.1:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-1.png" alt="" width="85%"/>
+  <p>Figure 13.1 Intelligent Travel Assistant Technical Architecture</p>
+</div>
+
+**(1) Front-end Layer (Vue3+TypeScript)**: Responsible for user interaction and data display, including form input, result display, and map visualization.
+
+**(2) Back-end Layer (FastAPI)**: Responsible for API routing, data validation, and business logic.
+
+**(3) Agent Layer (HelloAgents)**: Responsible for task decomposition, tool invocation, and result integration. Includes 4 specialized Agents.
+
+**(4) External Service Layer**: Provides data and capabilities, including Amap API, Unsplash API, and LLM API.
+
+The data flow process is as follows: User fills out form on front-end → Back-end validates data → Calls agent system → Agents sequentially call attraction search, weather query, hotel recommendation, itinerary planning Agents → Each Agent calls external APIs through MCP protocol → Integrate results and return to front-end → Front-end renders and displays.
+
+The project structure reference is as follows, provided for easy source code location:
+```
+helloagents-trip-planner/
+├── backend/                    # Backend code
+│   ├── app/
+│   │   ├── agents/            # Agent implementation
+│   │   ├── api/               # API routes
+│   │   ├── models/            # Data models
+│   │   ├── services/          # Service layer
+│   │   └── config.py          # Configuration file
+│   └── requirements.txt       # Python dependencies
+│
+└── frontend/                   # Frontend code
+    ├── src/
+    │   ├── views/             # Page components
+    │   ├── services/          # API services
+    │   ├── types/             # Type definitions
+    │   └── router/            # Route configuration
+    └── package.json           # npm dependencies
+```
+
+Detailed architecture design and data flow will be introduced in subsequent sections.
+
+### 13.1.3 Quick Experience: Run the Project in 5 Minutes
+
+Before diving into implementation details, let's first run the project to see the final effect. This way you will have an intuitive understanding of the entire system.
+
+**Environment Requirements:**
+
+- Python 3.10 or higher
+- Node.js 16.0 or higher
+- npm 8.0 or higher
+
+**Obtain API Keys:**
+
+You need to prepare the following API keys:
+
+- LLM API (OpenAI, DeepSeek, etc.)
+- Amap Web Service Key: Visit https://console.amap.com/ to register and create an application
+- Unsplash Access Key: Visit https://unsplash.com/developers to register and create an application
+
+Put all API keys in the `.env` file.
+
+Start the backend:
+
+```bash
+# 1. Enter backend directory
+cd helloagents-trip-planner/backend
+
+# 2. Install dependencies
+pip install -r requirements.txt
+
+# 3. Configure environment variables
+cp .env.example .env
+# Edit .env file, fill in your API keys
+
+# 4. Start backend service
+uvicorn app.api.main:app --reload
+# or
+python run.py
+```
+
+After successful startup, visit http://localhost:8000/docs to see the API documentation.
+
+Open a new terminal window:
+
+```bash
+# 1. Enter frontend directory
+cd helloagents-trip-planner/frontend
+
+# 2. Install dependencies
+npm install
+
+# 3. Start frontend service
+npm run dev
+```
+
+After successful startup, visit http://localhost:5173 to use the application.
+
+Experience core functions:
+
+First, fill in the destination city, travel dates, preferences, budget, transportation and accommodation types in the homepage form. After clicking the "Start Planning" button, the system will display a loading progress bar and quickly generate a result page, as shown in Figure 13.2.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-2.png" alt="" width="85%"/>
+  <p>Figure 13.2 Travel Assistant Planning Progress Page</p>
+</div>
+
+After successful loading, the page will clearly display itinerary overview, budget details, attraction map, daily itinerary details and weather information, as shown in Figures 13.3 and 13.4.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-3.png" alt="" width="85%"/>
+  <p>Figure 13.3 Travel Assistant Planning Completion Page</p>
+</div>
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-4.png" alt="" width="85%"/>
+  <p>Figure 13.4 Travel Assistant Planning Completion Page</p>
+</div>
+
+If users need personalized adjustments, they can click the "Edit Itinerary" button to freely adjust the order of attractions or delete certain attractions, as shown in Figure 13.5. After planning is complete, through the "Export Itinerary" dropdown menu, the final plan can be easily saved as an image or PDF file for convenient reference at any time.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-5.png" alt="" width="85%"/>
+  <p>Figure 13.5 Travel Assistant Planning Completion Page</p>
+</div>
+
+## 13.2 Data Model Design
+
+### 13.2.1 Data Flow in Web Applications
+
+When building an intelligent travel assistant, we need to solve a core problem: **How to represent and transfer travel plan data?**
+
+We need to understand how data flows in a complete Web application. Imagine what happens when a user clicks the "Start Planning" button in the browser?
+
+The form data filled in by the user on the front-end (destination, dates, budget, etc.) needs to be sent to the back-end server through HTTP requests. After the back-end receives the data, it will call the agent system for processing. The agents will then call external services such as Amap API and Unsplash API to obtain data. The data formats returned by these external APIs are different - some use `lng`, some use `lon`, and some use `longitude`. Finally, the back-end needs to return the processed data to the front-end, which then renders it into the page the user sees.
+
+In this process, data undergoes multiple transformations: Front-end form → HTTP request → Back-end Python object → External API response → Back-end Python object → HTTP response → Front-end TypeScript object → Page display. Without a unified data format, each transformation step could go wrong. This is why we need **data models**.
+
+### 13.2.2 From Dictionaries to Pydantic Models
+
+Let's start with the simple prototype from Chapter 1. In that prototype, we used Python dictionaries to represent attraction data:
+
+```python
+# Chapter 1 approach: using dictionaries
+attraction = {
+    "name": "Forbidden City",
+    "location": {"lng": 116.397128, "lat": 39.916527},
+    "price": 60
+}
+
+# Access data
+lng = attraction["location"]["lng"]
+```
+
+This approach is convenient in the prototype stage, but will encounter many problems in actual projects. First is the problem of **inconsistent field names**. The location data returned by Amap API is a string like `"116.397128,39.916527"`, which needs to be manually split into longitude and latitude. Unsplash API might use `longitude` and `latitude`. If we use dictionaries everywhere in the code, we need to handle these differences in every place.
+
+Second is the problem of **type safety**. Suppose we accidentally set `price` as a string `"60"`, this won't immediately error in Python, but will cause problems when calculating the total budget. Worse, this kind of error can only be discovered at runtime, and the error message may be difficult to locate.
+
+Finally is the problem of **maintainability**. When we need to add a new field to attractions (such as `rating`), we need to modify multiple places in the code. If we miss somewhere, it will lead to data inconsistency.
+
+Pydantic provides a solution. It is a Python data validation library that allows us to define data structures using classes and automatically handle validation, conversion, and serialization. Let's look at a simple example:
+
+```python
+from pydantic import BaseModel, Field
+
+class Location(BaseModel):
+    longitude: float = Field(..., description="Longitude")
+    latitude: float = Field(..., description="Latitude")
+
+class Attraction(BaseModel):
+    name: str
+    location: Location
+    ticket_price: int = 0
+
+# Create object
+attraction = Attraction(
+    name="Forbidden City",
+    location=Location(longitude=116.397128, latitude=39.916527),
+    ticket_price=60
+)
+
+# Type-safe access
+lng = attraction.location.longitude  # IDE will provide code completion
+```
+
+This approach has several benefits. First, if we pass in the wrong type (such as setting `ticket_price` as a string), Pydantic will immediately throw an exception telling us where the error is. Second, the IDE can provide code completion and type checking based on type definitions, greatly reducing spelling errors. Finally, when we need to modify the data structure, we only need to modify the class definition, and all places using this class will automatically update.
+
+### 13.2.3 Core Concepts of Pydantic
+
+Before diving into designing our data models, let's first understand several core concepts of Pydantic. The foundation of Pydantic is the `BaseModel` class, and all data models need to inherit from this class. Each field can specify a type, and Pydantic will automatically perform type checking and conversion.
+
+Field definition uses the `Field` function, which can specify default values, descriptions, validation rules, etc. `...` indicates that this field is required - if this field is not provided when creating an object, Pydantic will throw an exception. We can also use `Optional` to indicate optional fields, or directly provide default values.
+
+```python
+from pydantic import BaseModel, Field
+from typing import Optional, List
+
+class Attraction(BaseModel):
+    name: str = Field(..., description="Attraction name")  # Required
+    rating: float = Field(default=0.0, ge=0, le=5)  # Default value, range validation
+    visit_duration: int = Field(default=60, gt=0)  # Greater than 0
+    description: Optional[str] = None  # Optional field
+```
+
+Pydantic also supports nested models and lists. We can use another model as a field type in one model, allowing us to build complex data structures. For example, an attraction contains location information, and an itinerary contains multiple attractions.
+
+```python
+class DayPlan(BaseModel):
+    date: str
+    attractions: List[Attraction]  # Attraction list
+    hotel: Optional[Hotel] = None  # Optional hotel information
+```
+
+One of the most powerful features is **custom validators**. Sometimes the data format returned by external APIs doesn't meet our requirements, and we can use the `field_validator` decorator to customize validation and conversion logic. For example, the temperature returned by Amap is a string like `"16°C"`, and we need to convert it to a number:
+
+```python
+from pydantic import field_validator
+
+class WeatherInfo(BaseModel):
+    temperature: int
+
+    @field_validator('temperature', mode='before')
+    def parse_temperature(cls, v):
+        """Parse temperature string: "16°C" -> 16"""
+        if isinstance(v, str):
+            v = v.replace('°C', '').replace('℃', '').strip()
+            return int(v)
+        return v
+```
+
+This validator will automatically execute before creating the object, converting the string to an integer. This way we don't need to manually handle temperature format in every place in the code.
+
+### 13.2.4 Bottom-Up Model Design
+
+Now let's start designing the data models for the intelligent travel assistant. A good design principle is **bottom-up**: first define the most basic models, then gradually combine them into complex structures. The advantage of this approach is that each model is simple, easy to understand and maintain.
+
+The most basic model is **location information**. Whether it's attractions, hotels, or restaurants, all need location information. We define a `Location` class to represent longitude and latitude coordinates:
+
+```python
+class Location(BaseModel):
+    """Location information (longitude and latitude coordinates)"""
+    longitude: float = Field(..., description="Longitude", ge=-180, le=180)
+    latitude: float = Field(..., description="Latitude", ge=-90, le=90)
+```
+
+Here we use range validation (`ge` means greater than or equal to, `le` means less than or equal to) to ensure longitude and latitude values are within reasonable ranges.
+
+Next is **attraction information**. An attraction contains name, address, location, visit duration, description, rating, image, and ticket price information. Note that we use `Location` as a field type, which is a nested model:
+
+```python
+class Attraction(BaseModel):
+    """Attraction information"""
+    name: str = Field(..., description="Attraction name")
+    address: str = Field(..., description="Address")
+    location: Location = Field(..., description="Longitude and latitude coordinates")
+    visit_duration: int = Field(..., description="Recommended visit duration (minutes)", gt=0)
+    description: str = Field(..., description="Attraction description")
+    category: Optional[str] = Field(default="Attraction", description="Attraction category")
+    rating: Optional[float] = Field(default=None, ge=0, le=5, description="Rating")
+    image_url: Optional[str] = Field(default=None, description="Image URL")
+    ticket_price: int = Field(default=0, ge=0, description="Ticket price (yuan)")
+```
+
+Similarly, we define **meal information** and **hotel information**. These models have similar structures, all containing basic information such as name, address, location, and cost:
+
+```python
+class Meal(BaseModel):
+    """Meal information"""
+    type: str = Field(..., description="Meal type: breakfast/lunch/dinner/snack")
+    name: str = Field(..., description="Meal name")
+    address: Optional[str] = Field(default=None, description="Address")
+    location: Optional[Location] = Field(default=None, description="Longitude and latitude coordinates")
+    description: Optional[str] = Field(default=None, description="Description")
+    estimated_cost: int = Field(default=0, description="Estimated cost (yuan)")
+
+class Hotel(BaseModel):
+    """Hotel information"""
+    name: str = Field(..., description="Hotel name")
+    address: str = Field(default="", description="Hotel address")
+    location: Optional[Location] = Field(default=None, description="Hotel location")
+    price_range: str = Field(default="", description="Price range")
+    rating: str = Field(default="", description="Rating")
+    distance: str = Field(default="", description="Distance to attractions")
+    type: str = Field(default="", description="Hotel type")
+    estimated_cost: int = Field(default=0, description="Estimated cost (yuan/night)")
+```
+
+**Budget information** is a special model that doesn't contain location information, but contains a summary of various expenses:
+
+```python
+class Budget(BaseModel):
+    """Budget information"""
+    total_attractions: int = Field(default=0, description="Total attraction ticket cost")
+    total_hotels: int = Field(default=0, description="Total hotel cost")
+    total_meals: int = Field(default=0, description="Total meal cost")
+    total_transportation: int = Field(default=0, description="Total transportation cost")
+    total: int = Field(default=0, description="Total cost")
+```
+
+Now we can combine these basic models to build a **daily itinerary**. A daily itinerary contains date, description, transportation method, accommodation arrangement, hotel, attraction list, and meal list:
+
+```python
+class DayPlan(BaseModel):
+    """Daily itinerary"""
+    date: str = Field(..., description="Date")
+    day_index: int = Field(..., description="Day number (starting from 0)")
+    description: str = Field(..., description="Daily itinerary description")
+    transportation: str = Field(..., description="Transportation method")
+    accommodation: str = Field(..., description="Accommodation arrangement")
+    hotel: Optional[Hotel] = Field(default=None, description="Hotel information")
+    attractions: List[Attraction] = Field(default_factory=list, description="Attraction list")
+    meals: List[Meal] = Field(default_factory=list, description="Meal arrangements")
+```
+
+Note that we use `List[Attraction]` to represent the attraction list, and `default_factory=list` means the default value is an empty list.
+
+**Weather information** requires special handling because the temperature format returned by Amap is non-standard. We use a custom validator to handle this:
+
+```python
+class WeatherInfo(BaseModel):
+    """Weather information"""
+    date: str = Field(..., description="Date")
+    day_weather: str = Field(..., description="Daytime weather")
+    night_weather: str = Field(..., description="Nighttime weather")
+    day_temp: int = Field(..., description="Daytime temperature (Celsius)")
+    night_temp: int = Field(..., description="Nighttime temperature (Celsius)")
+    wind_direction: str = Field(..., description="Wind direction")
+    wind_power: str = Field(..., description="Wind power")
+
+    @field_validator('day_temp', 'night_temp', mode='before')
+    def parse_temperature(cls, v):
+        """Parse temperature string: "16°C" -> 16"""
+        if isinstance(v, str):
+            v = v.replace('°C', '').replace('℃', '').replace('°', '').strip()
+            try:
+                return int(v)
+            except ValueError:
+                return 0  # Error tolerance
+        return v
+```
+
+Finally, we define the **complete travel plan**. This is the top-level model that contains all information:
+
+```python
+class TripPlan(BaseModel):
+    """Travel plan"""
+    city: str = Field(..., description="Destination city")
+    start_date: str = Field(..., description="Start date")
+    end_date: str = Field(..., description="End date")
+    days: List[DayPlan] = Field(default_factory=list, description="Daily itinerary")
+    weather_info: List[WeatherInfo] = Field(default_factory=list, description="Weather information")
+    overall_suggestions: str = Field(..., description="Overall suggestions")
+    budget: Optional[Budget] = Field(default=None, description="Budget information")
+```
+
+This way, we have completed the design of the entire data model. From the most basic `Location`, to `Attraction`, `Meal`, `Hotel`, then to `DayPlan`, and finally to `TripPlan`, forming a clear hierarchical structure.
+
+### 13.2.5 Application of Data Models in Web Applications
+
+Now let's see how these data models are used in actual Web applications. In FastAPI, Pydantic models can be directly used as type definitions for requests and responses. FastAPI will automatically perform data validation, serialization, and documentation generation.
+
+```python
+from fastapi import FastAPI
+from app.models.schemas import TripPlanRequest, TripPlan
+
+app = FastAPI()
+
+@app.post("/api/trip/plan", response_model=TripPlan)
+async def create_trip_plan(request: TripPlanRequest) -> TripPlan:
+    """
+    Create travel plan
+
+    FastAPI automatically:
+    1. Validates request data (TripPlanRequest)
+    2. Validates response data (TripPlan)
+    3. Generates OpenAPI documentation
+    """
+    trip_plan = await generate_trip_plan(request)
+    return trip_plan
+```
+
+When a user sends a POST request to `/api/trip/plan`, FastAPI will automatically convert the JSON data into a `TripPlanRequest` object. If the data format is incorrect (such as missing required fields or type mismatch), FastAPI will automatically return a 400 error and tell the user where the error is.
+
+On the front-end, we also need to define corresponding TypeScript types. Although TypeScript and Python are different languages, the data structures are the same:
+
+```typescript
+interface Location {
+  longitude: number;
+  latitude: number;
+}
+
+interface Attraction {
+  name: string;
+  address: string;
+  location: Location;
+  visit_duration: number;
+  ticket_price: number;
+}
+
+interface TripPlan {
+  city: string;
+  start_date: string;
+  end_date: string;
+  days: DayPlan[];
+}
+```
+
+This way, the front-end and back-end use a unified data format. When the back-end returns a `TripPlan` object, the front-end can use it directly without any conversion. TypeScript's type checking can also help us avoid many errors.
+
+## 13.3 Multi-Agent Collaboration Design
+
+### 13.3.1 Why We Need Multi-Agent
+
+In Chapter 7, we learned how to build agents using SimpleAgent. The design philosophy of SimpleAgent is simple and direct: each time the `run()` method is called, the Agent analyzes the user's question, decides whether to call tools, and then returns the result. This design is very effective when handling simple tasks, but when facing tasks like travel planning, some problems arise.
+
+If we use a single Agent to complete travel planning, what does this Agent need to do? First, it needs to search for attraction information, which requires calling Amap's POI search tool. Then, it needs to query weather information, which requires calling the weather query tool. Next, it needs to search for hotel information, which again requires calling the POI search tool. Finally, it needs to integrate all this information to generate a complete travel plan.
+
+This sounds simple, but in actual operation, the first problem is encountered: **tool calling limitations**. SimpleAgent can only execute one tool per `run()` call. This means we need to call the `run()` method multiple times, with each call handling one task. But this brings a new problem: how to pass information between multiple calls? How to pass the attraction information obtained from the first call to the second call? We need to manually manage these intermediate results, and the code becomes very complex.
+
+Of course, we can use ReactAgent to solve this problem. ReactAgent can execute multiple tools in one call, and it will automatically perform multiple rounds of thinking and action. But this brings new problems: **time cost**. Each round of thinking by ReactAgent requires calling the LLM. If three tools need to be called, at least three rounds of thinking are needed, which means at least three LLM calls. Moreover, these calls are serial - the next one can only start after the previous one is complete, so the total time will be very long.
+
+The second problem is **prompt complexity**. If we want one Agent to complete all tasks, we need to describe the execution logic of each task in detail in the prompt. For example:
+
+```python
+COMPLEX_PROMPT = """You are a travel planning assistant. You need to:
+1. Use maps_text_search to search for attractions, keywords determined by user preferences
+2. Use maps_weather to query weather, get weather forecast for the next few days
+3. Use maps_text_search to search for hotels, type determined by user needs
+4. Integrate all information to generate travel plan, including daily attractions, dining, accommodation arrangements
+Note: Must execute in order, each tool can only be called once, output must be in JSON format...
+"""
+```
+
+This kind of prompt has several problems. First is **difficult to maintain**. If we want to modify the attraction search logic (such as adding rating filtering), we need to modify the entire prompt, which can easily affect other parts. Second is **error-prone**. The LLM needs to understand the requirements of multiple tasks simultaneously, and can easily confuse the formats and parameters of different tasks. Finally is **difficult to debug**. When the generated plan doesn't meet expectations, it's hard to know which part went wrong - is the attraction search inaccurate, did the weather query fail, or is there a problem with the integration logic?
+
+Facing these problems, a natural idea is: can we decompose complex tasks into multiple simple tasks and let different Agents each do their own job? This is the core idea of multi-Agent collaboration.
+
+Imagine a travel agency in the real world. When you go to a travel agency to consult about a travel plan, you won't be served by just one person. Usually there will be a dedicated attraction consultant responsible for recommending attractions; a hotel consultant responsible for booking hotels; and an itinerary planner responsible for integrating all information into a complete itinerary. Each person focuses on their area of expertise, and finally the itinerary planner summarizes all the information. This division of labor and collaboration is much more efficient than having one person do everything.
+
+### 13.3.2 Agent Role Design
+
+Based on the task decomposition principle, we designed four specialized Agents, as shown in Figure 13.6:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-6.png" alt="" width="85%"/>
+  <p>Figure 13.6 Multi-Agent Collaboration Flow</p>
+</div>
+
+- **AttractionSearchAgent (Attraction Search Expert)** focuses on searching for attraction information. It only needs to understand user preferences (such as "history and culture", "natural scenery"), then call Amap's POI search tool and return a list of related attractions. Its prompt is very simple, only needing to explain how to choose keywords based on preferences and how to call tools.
+
+- **WeatherQueryAgent (Weather Query Expert)** focuses on querying weather information. It only needs to know the city name, then call the weather query tool and return the weather forecast for the next few days. Its task is very clear and almost error-free.
+
+- **HotelAgent (Hotel Recommendation Expert)** focuses on searching for hotel information. It needs to understand user accommodation needs (such as "budget", "luxury"), then call the POI search tool and return a list of hotels that meet the requirements.
+
+- **PlannerAgent (Itinerary Planning Expert)** is responsible for integrating all information. It receives the output from the first three Agents, plus the user's original requirements (dates, budget, etc.), and then generates a complete travel plan. It doesn't need to call any external tools, only needs to focus on information integration and itinerary arrangement.
+
+Now let's design the role and prompt for each Agent in detail. When designing prompts, we need to consider several key questions: What input does this Agent need? What output should it produce? What tools does it need to call? What problems might it encounter?
+
+**AttractionSearchAgent**'s task is to search for attractions based on user preferences. Its input is the city name and user preferences (such as "history and culture", "natural scenery"). It needs to call the `amap_maps_text_search` tool with parameters being keywords and city. Its output is a list of attractions, including name, address, rating, and other information.
+
+```python
+ATTRACTION_AGENT_PROMPT = """You are an attraction search expert.
+
+**Tool Call Format:**
+`[TOOL_CALL:amap_maps_text_search:keywords=attraction,city=city_name]`
+
+**Examples:**
+- `[TOOL_CALL:amap_maps_text_search:keywords=attraction,city=Beijing]`
+- `[TOOL_CALL:amap_maps_text_search:keywords=museum,city=Shanghai]`
+
+**Important:**
+- Must use tools to search, don't fabricate information
+- Search for attractions in {city} based on user preferences ({preferences})
+"""
+```
+
+This prompt is concise but contains all necessary information. It clearly explains the tool call format, provides specific examples, and emphasizes two important principles: must use tools (can't fabricate) and search based on user preferences.
+
+**WeatherQueryAgent**'s task is simpler, only needing to query weather. Its input is the city name, and output is weather information.
+
+```python
+WEATHER_AGENT_PROMPT = """You are a weather query expert.
+
+**Tool Call Format:**
+`[TOOL_CALL:amap_maps_weather:city=city_name]`
+
+Please query weather information for {city}.
+"""
+```
+
+**HotelAgent**'s task is to search for hotels. Its input is the city name and accommodation type, and output is a hotel list.
+
+```python
+HOTEL_AGENT_PROMPT = """You are a hotel recommendation expert.
+
+**Tool Call Format:**
+`[TOOL_CALL:amap_maps_text_search:keywords=hotel,city=city_name]`
+
+Please search for {accommodation} hotels in {city}.
+"""
+```
+
+**PlannerAgent** is the most complex because it needs to integrate all information. Its input is user requirements and the output from the first three Agents, and output is a complete travel plan (JSON format).
+
+```python
+PLANNER_AGENT_PROMPT = """You are an itinerary planning expert.
+
+**Output Format:**
+Strictly return in the following JSON format:
+{
+  "city": "city name",
+  "start_date": "YYYY-MM-DD",
+  "end_date": "YYYY-MM-DD",
+  "days": [...],
+  "weather_info": [...],
+  "overall_suggestions": "overall suggestions",
+  "budget": {...}
+}
+
+**Planning Requirements:**
+1. weather_info must include weather for each day
+2. Temperature as pure numbers (without °C)
+3. Arrange 2-3 attractions per day
+4. Consider attraction distance and visit time
+5. Include breakfast, lunch, and dinner
+6. Provide practical suggestions
+7. Include budget information
+"""
+```
+
+### 13.3.3 Agent Collaboration Flow
+
+Now let's see how these four Agents collaborate to complete the travel planning task. The entire flow can be divided into five steps:
+
+```python
+class TripPlannerAgent:
+    def __init__(self):
+        self.attraction_agent = SimpleAgent(name="Attraction Search", prompt=ATTRACTION_PROMPT)
+        self.weather_agent = SimpleAgent(name="Weather Query", prompt=WEATHER_PROMPT)
+        self.hotel_agent = SimpleAgent(name="Hotel Recommendation", prompt=HOTEL_PROMPT)
+        self.planner_agent = SimpleAgent(name="Itinerary Planning", prompt=PLANNER_PROMPT)
+
+    def plan_trip(self, request: TripPlanRequest) -> TripPlan:
+        # Step 1: Attraction search
+        attraction_response = self.attraction_agent.run(
+            f"Please search for {request.preferences} attractions in {request.city}"
+        )
+
+        # Step 2: Weather query
+        weather_response = self.weather_agent.run(
+            f"Please query weather for {request.city}"
+        )
+
+        # Step 3: Hotel recommendation
+        hotel_response = self.hotel_agent.run(
+            f"Please search for {request.accommodation} hotels in {request.city}"
+        )
+
+        # Step 4: Integrate and generate plan
+        planner_query = self._build_planner_query(
+            request, attraction_response, weather_response, hotel_response
+        )
+        planner_response = self.planner_agent.run(planner_query)
+
+        # Step 5: Parse JSON
+        trip_plan = self._parse_trip_plan(planner_response)
+        return trip_plan
+```
+
+This flow executes four steps sequentially, with the output of each step serving as input for the next step. Note that we use the `TripPlanRequest` and `TripPlan` Pydantic models defined in Section 13.2.
+
+### 13.3.4 Query Construction
+
+PlannerAgent needs to integrate all information. This query needs to include all necessary information and be organized clearly and orderly so that the LLM can accurately understand it.
+
+```python
+def _build_planner_query(
+    self,
+    request: TripPlanRequest,
+    attraction_response: str,
+    weather_response: str,
+    hotel_response: str
+) -> str:
+    """Build query for planning Agent"""
+    return f"""
+Please generate a {request.days}-day travel plan for {request.city} based on the following information:
+
+**User Requirements:**
+- Destination: {request.city}
+- Dates: {request.start_date} to {request.end_date}
+- Days: {request.days} days
+- Preferences: {request.preferences}
+- Budget: {request.budget}
+- Transportation: {request.transportation}
+- Accommodation: {request.accommodation}
+
+**Attraction Information:**
+{attraction_response}
+
+**Weather Information:**
+{weather_response}
+
+**Hotel Information:**
+{hotel_response}
+
+Please generate a detailed travel plan, including daily attraction arrangements, dining recommendations, accommodation information, and budget details.
+"""
+```
+
+Through this multi-Agent collaboration design, we decompose a complex travel planning task into four simple subtasks. Each Agent focuses on its area of expertise, and also lays a good foundation for future feature expansion (such as adding restaurant recommendation Agent, transportation planning Agent).
+
+## 13.4 MCP Tool Integration Details
+
+### 13.4.1 Why Not Call APIs Directly
+
+In Section 13.3, we designed four Agents to collaborate on the travel planning task. Among them, AttractionSearchAgent, WeatherQueryAgent, and HotelAgent all need to call Amap's API to obtain data. A natural question is: why not call Amap's HTTP API directly in the Agent?
+
+Let's first see what calling the API directly would look like. Amap provides a POI search API, and we need to construct HTTP requests, pass parameters, and parse responses:
+
+```python
+import requests
+
+def search_poi(keywords: str, city: str, api_key: str):
+    """Directly call Amap POI search API"""
+    url = "https://restapi.amap.com/v3/place/text"
+    params = {
+        "keywords": keywords,
+        "city": city,
+        "key": api_key,
+        "output": "json"
+    }
+    response = requests.get(url, params=params)
+    data = response.json()
+    return data
+```
+
+This approach looks simple, but will encounter several problems in actual use. First is **Agent cannot call autonomously**. In our HelloAgents framework, Agents call tools by recognizing tool call markers in prompts (such as `[TOOL_CALL:tool_name:arg1=value1]`). If we call the API directly in code, the Agent loses its autonomous decision-making ability and becomes a simple function call.
+
+Second is **complex parameter passing**. Amap's API has many parameters. For example, POI search has more than a dozen parameters such as `keywords`, `city`, `types`, `offset`, `page`, etc. If we want the Agent to use these parameters flexibly, we need to explain the meaning and format of each parameter in detail in the prompt, which will make the prompt very complex.
+
+Third is **difficult response parsing**. The data returned by Amap API is in JSON format with a relatively complex structure. We need to write code to parse this data and extract the fields we need. If the API's response format changes, we need to modify the parsing code.
+
+Finally is **chaotic tool management**. Amap provides more than a dozen different APIs (POI search, weather query, route planning, etc.). If we write a function for each API and then manually register it to the Agent's tool list, the code will become very lengthy. And when we want to add a new API, we need to modify multiple places.
+
+### 13.4.2 Amap MCP Integration
+
+MCP (Model Context Protocol) is a standardized protocol proposed by Anthropic for connecting LLMs and external tools. This section will introduce how to integrate the Amap MCP server in the project. Our project uses `amap-mcp-server`, which is an MCP server implemented in Node.js:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-7.png" alt="" width="85%"/>
+  <p>Figure 13.7 amap-mcp-server Tools</p>
+</div>
+
+The Amap MCP server provides various tools, mainly divided into the following categories, as shown in Table 13.1:
+
+<div align="center">
+  <p>Table 13.1 Amap MCP Tool Categories</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-table-1.png" alt="" width="85%"/>
+</div>
+
+Through the MCP protocol, we can easily integrate in HelloAgents:
+
+```python
+from hello_agents.tools import MCPTool
+from app.config import get_settings
+
+settings = get_settings()
+
+# Create MCP tool
+mcp_tool = MCPTool(
+    name="amap_mcp",
+    command="npx",
+    args=["-y", "@sugarforever/amap-mcp-server"],
+    env={"AMAP_API_KEY": settings.amap_api_key},
+    auto_expand=True
+)
+```
+
+What does this code do? First, `command` and `args` specify how to start the MCP server. `npx -y @sugarforever/amap-mcp-server` will download and run the `amap-mcp-server` package from the npm repository. The `env` parameter passes environment variables, here we pass the Amap API key.
+
+When we create the `MCPTool` object, it will start the MCP server process in the background and communicate with the server through standard input/output (stdin/stdout). This is a feature of the MCP protocol: using inter-process communication instead of HTTP, which is more efficient and easier to manage.
+
+The most critical parameter is `auto_expand=True`. When set to True, `MCPTool` will automatically query what tools the MCP server provides, and then create an independent Tool object for each tool. This is why we only created one `MCPTool`, but the Agent got 16 tools. Let's see this process:
+
+```python
+# Create one MCPTool
+mcp_tool = MCPTool(..., auto_expand=True)
+agent.add_tool(mcp_tool)
+
+# Agent actually gets 16 tools!
+print(list(agent.tools.keys()))
+# ['amap_maps_text_search', 'amap_maps_weather', ...]
+```
+
+As shown in Figure 13.8, suppose the user wants to search for attractions in Beijing. AttractionSearchAgent receives the query "Please search for historical and cultural attractions in Beijing". The Agent analyzes this query and decides to call the `amap_maps_text_search` tool with parameters `keywords=attraction, city=Beijing`.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-8.png" alt="" width="85%"/>
+  <p>Figure 13.8 MCP Tool Call Flow</p>
+</div>
+
+The Agent generates a tool call marker: `[TOOL_CALL:amap_maps_text_search:keywords=attraction,city=Beijing]`. The HelloAgents framework parses this marker, extracts the tool name and parameters, and then calls the corresponding Tool object.
+
+The Tool object is automatically created by `MCPTool`, and it will send the call request to the MCP server. Specifically, it will construct a JSON-RPC format message and send it to the server process through stdin:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "method": "tools/call",
+  "params": {
+    "name": "amap_maps_text_search",
+    "arguments": {
+      "keywords": "attraction",
+      "city": "Beijing"
+    }
+  }
+}
+```
+
+The MCP server receives this message, parses the parameters, and then calls Amap's HTTP API. It will construct an HTTP request, add the API key, send the request, and receive the response.
+
+Amap API returns JSON format data containing attraction list, address, coordinates, and other information. The MCP server parses this data, extracts key fields, and then constructs a response message, returning it to `MCPTool` through stdout:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "result": {
+    "content": [
+      {
+        "type": "text",
+        "text": "Found the following attractions:\n1. Forbidden City Museum - Address: No. 4 Jingshan Front Street, Dongcheng District\n2. Temple of Heaven Park - Address: Tiantan Road, Dongcheng District\n..."
+      }
+    ]
+  }
+}
+```
+
+`MCPTool` receives the response, extracts the text content, and returns it to the Agent. The Agent uses this result as the output of the tool call and continues to generate the final reply.
+
+This process looks complex, but for the Agent, it only needs to know that there is a tool called `amap_maps_text_search` that can search for attractions. All the underlying details are encapsulated by the MCP protocol and `MCPTool`.
+
+### 13.4.3 Sharing MCP Instances
+
+In our multi-Agent system, three Agents all need to use Amap tools. So should each Agent create its own `MCPTool` instance, or share the same instance?
+
+If each Agent creates a `MCPTool` instance, this means three server processes will run simultaneously. Each process will independently call the Amap API, which may exceed the API's rate limit. Moreover, multiple processes will occupy more memory and CPU resources.
+
+A better approach is to let all Agents share the same `MCPTool` instance. This way, only one MCP server process needs to be started, and all API calls go through this process. This not only saves resources but also allows better control of API call frequency.
+
+In the code, we create a `MCPTool` instance in the constructor of `TripPlannerAgent`, and then add it to each sub-Agent's tool list:
+
+```python
+class TripPlannerAgent:
+    def __init__(self):
+        settings = get_settings()
+        self.llm = HelloAgentsLLM()
+
+        # Create shared MCP tool instance (create only once)
+        self.mcp_tool = MCPTool(
+            name="amap_mcp",
+            command="npx",
+            args=["-y", "@sugarforever/amap-mcp-server"],
+            env={"AMAP_API_KEY": settings.amap_api_key},
+            auto_expand=True
+        )
+
+        # Create multiple Agents, sharing the same MCP tool
+        self.attraction_agent = SimpleAgent(
+            name="AttractionSearchAgent",
+            llm=self.llm,
+            system_prompt=ATTRACTION_AGENT_PROMPT
+        )
+        self.attraction_agent.add_tool(self.mcp_tool)  # Share
+
+        self.weather_agent = SimpleAgent(
+            name="WeatherQueryAgent",
+            llm=self.llm,
+            system_prompt=WEATHER_AGENT_PROMPT
+        )
+        self.weather_agent.add_tool(self.mcp_tool)  # Share
+
+        self.hotel_agent = SimpleAgent(
+            name="HotelAgent",
+            llm=self.llm,
+            system_prompt=HOTEL_AGENT_PROMPT
+        )
+        self.hotel_agent.add_tool(self.mcp_tool)  # Share
+```
+
+This way, all three Agents can use Amap's 16 tools, but only one MCP server process is running underneath. When we call the `plan_trip` method of `TripPlannerAgent`, the three Agents will call tools in sequence, and all requests are sent to the Amap API through the same MCP server.
+
+### 13.4.4 Unsplash Image API Integration
+
+In addition to Amap, we also need to obtain images for attractions to make the travel plan more vivid and intuitive. We use the Unsplash API to search for attraction images. Note that Unsplash is a foreign service and is one of the few image APIs that can be used for free, so search results may not be accurate enough. In actual projects, you can consider using Bing, Baidu, or Amap's POI image API, but these services usually require payment.
+
+The integration of Unsplash API is relatively simple. We create an `UnsplashService` class to encapsulate API calls:
+
+```python
+# backend/app/services/unsplash_service.py
+import requests
+from typing import Optional, List, Dict
+import logging
+
+logger = logging.getLogger(__name__)
+
+class UnsplashService:
+    """Unsplash image service"""
+
+    def __init__(self, access_key: str):
+        self.access_key = access_key
+        self.base_url = "https://api.unsplash.com"
+
+    def search_photos(self, query: str, per_page: int = 10) -> List[Dict]:
+        """Search for images"""
+        try:
+            url = f"{self.base_url}/search/photos"
+            params = {
+                "query": query,
+                "per_page": per_page,
+                "client_id": self.access_key
+            }
+
+            response = requests.get(url, params=params, timeout=10)
+            response.raise_for_status()
+
+            data = response.json()
+            results = data.get("results", [])
+
+            # Extract image URLs
+            photos = []
+            for result in results:
+                photos.append({
+                    "url": result["urls"]["regular"],
+                    "description": result.get("description", ""),
+                    "photographer": result["user"]["name"]
+                })
+
+            return photos
+
+        except Exception as e:
+            logger.error(f"Image search failed: {e}")
+            return []
+
+    def get_photo_url(self, query: str) -> Optional[str]:
+        """Get single image URL"""
+        photos = self.search_photos(query, per_page=1)
+        return photos[0].get("url") if photos else None
+```
+
+This service class provides two methods: `search_photos` searches for multiple images, and `get_photo_url` gets the URL of a single image. We use this service in the API route to get images for each attraction:
+
+```python
+# backend/app/api/routes/trip.py
+from app.services.unsplash_service import UnsplashService
+
+unsplash_service = UnsplashService(settings.unsplash_access_key)
+
+@router.post("/plan", response_model=TripPlan)
+async def create_trip_plan(request: TripPlanRequest) -> TripPlan:
+    # Generate travel plan
+    trip_plan = trip_planner_agent.plan_trip(request)
+
+    # Get images for each attraction
+    for day in trip_plan.days:
+        for attraction in day.attractions:
+            if not attraction.image_url:
+                image_url = unsplash_service.get_photo_url(
+                    f"{attraction.name} {trip_plan.city}"
+                )
+                attraction.image_url = image_url
+
+    return trip_plan
+```
+
+Note that we didn't encapsulate Unsplash as a Tool or MCP tool, but called it directly in the API route. This is because image search doesn't require the Agent's intelligent decision-making, it's just a simple data enhancement step. If you want the Agent to autonomously decide whether images are needed or choose different image sources, you can consider encapsulating it as a Tool.
+
+## 13.5 Front-End Development Details
+
+### 13.5.1 Front-End and Back-End Separation Web Architecture
+
+Before starting front-end development, we need to understand the architecture pattern of modern Web applications. In early Web development, front-end and back-end were mixed together. For example, technologies like PHP and JSP had HTML templates and business logic code written in the same file. This approach is convenient in small projects, but encounters many problems in large projects: front-end and back-end developers need frequent coordination, code is difficult to reuse, and testing is difficult.
+
+Modern Web applications generally adopt a **front-end and back-end separation** architecture. The back-end is only responsible for providing API interfaces and returning data in JSON format. The front-end is an independent application that calls back-end APIs through HTTP requests, obtains data, and then renders pages. This architecture has several obvious advantages: front-end and back-end can be developed, deployed, and tested independently; the front-end can be a Web application, mobile application, or desktop application, all using the same set of back-end APIs; the front-end can use modern frameworks and toolchains to provide a better user experience.
+
+In our intelligent travel assistant project, the back-end is implemented with Python and FastAPI, providing a core API interface `POST /api/trip/plan` that receives travel requirements and returns travel plans. The front-end is implemented with Vue 3 and TypeScript, and is a single-page application (SPA). Users fill in forms in the browser, click the "Start Planning" button, the front-end sends an HTTP request to the back-end, waits for a response, and then renders the result page. Throughout this process, the page doesn't refresh, and the user experience is very smooth.
+
+The choice of front-end technology stack needs to consider several factors: development efficiency, performance, ecosystem, and learning curve. As shown in Table 13.2, the project chose the following technology stack:
+
+<div align="center">
+  <p>Table 13.2 Front-End Technology Stack</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-table-2.png" alt="" width="85%"/>
+</div>
+
+The project's directory structure is as follows:
+
+```
+frontend/
+├── src/
+│   ├── views/              # Page components
+│   │   ├── Home.vue        # Home page (form)
+│   │   └── Result.vue      # Result page
+│   ├── services/           # API services
+│   │   └── api.ts
+│   ├── types/              # Type definitions
+│   │   └── index.ts
+│   ├── router/             # Router configuration
+│   │   └── index.ts
+│   ├── App.vue
+│   └── main.ts
+├── package.json
+├── vite.config.ts
+└── tsconfig.json
+```
+
+The `views` directory stores page components, the `services` directory stores API call logic, the `types` directory stores TypeScript type definitions, and the `router` directory stores router configuration.
+
+### 13.5.2 Type Definitions
+
+In Section 13.2, we used Pydantic to define data models on the back-end, such as `Location`, `Attraction`, `DayPlan`, `TripPlan`, etc. On the front-end, we need to define corresponding TypeScript types.
+
+Let's see how to define these types. First is the most basic `Location` type, representing longitude and latitude coordinates:
+
+```typescript
+// frontend/src/types/index.ts
+export interface Location {
+  longitude: number
+  latitude: number
+}
+```
+
+This type definition corresponds exactly to the back-end Pydantic model. Note that TypeScript uses the `interface` keyword to define types, field types are separated by colons, and no default values are needed.
+
+Next is the `Attraction` type, representing attraction information:
+
+```typescript
+export interface Attraction {
+  name: string
+  address: string
+  location: Location
+  visit_duration: number
+  description: string
+  category?: string
+  rating?: number
+  image_url?: string
+  ticket_price?: number
+}
+```
+
+Note that we use the `Location` type as a field type here, which is a nested type. The question mark `?` indicates an optional field, corresponding to `Optional` in the back-end Pydantic model.
+
+Similarly, we define types like `Meal`, `Hotel`, `Budget`, `WeatherInfo`, etc. Finally, the top-level `TripPlan` type:
+
+```typescript
+export interface TripPlan {
+  city: string
+  start_date: string
+  end_date: string
+  days: DayPlan[]
+  weather_info: WeatherInfo[]
+  overall_suggestions: string
+  budget?: Budget
+}
+```
+
+There's also the request type `TripPlanRequest`, corresponding to the back-end request model:
+
+```typescript
+export interface TripPlanRequest {
+  city: string
+  start_date: string
+  end_date: string
+  days: number
+  preferences: string
+  budget: string
+  transportation: string
+  accommodation: string
+}
+```
+
+What are these type definitions for? First, when we call the API, TypeScript will check whether the data we pass conforms to the `TripPlanRequest` type. If we accidentally write `days` as a string, TypeScript will immediately report an error. Second, when we receive the API response, TypeScript will check whether the response data conforms to the `TripPlan` type. If the back-end's data structure changes, the front-end will immediately discover it. Finally, the IDE can provide code completion based on type definitions. When we type `tripPlan.`, the IDE will automatically list all available fields.
+
+### 13.5.3 API Service Encapsulation
+
+With type definitions, we can encapsulate API calls. We create an `api.ts` file and use Axios to send HTTP requests:
+
+```typescript
+import axios from 'axios'
+import type { TripPlanRequest, TripPlan } from '../types'
+
+const api = axios.create({
+  baseURL: 'http://localhost:8000/api',
+  timeout: 120000, // 2-minute timeout
+  headers: {
+    'Content-Type': 'application/json'
+  }
+})
+```
+
+Here we create an Axios instance and configure the base URL, timeout, and request headers. Why is the timeout set to 2 minutes? Because generating a travel plan requires calling multiple Agents, each Agent needs to call the LLM and external APIs, and the entire process may take 10-30 seconds. If the timeout is too short, the request will be interrupted.
+
+Next we add interceptors. Interceptors can execute some common logic before sending requests and after receiving responses, such as logging, error handling, authentication, etc.:
+
+```typescript
+// Request interceptor
+api.interceptors.request.use(
+  config => {
+    console.log('Sending request:', config)
+    return config
+  },
+  error => Promise.reject(error)
+)
+
+// Response interceptor
+api.interceptors.response.use(
+  response => {
+    console.log('Received response:', response)
+    return response
+  },
+  error => {
+    console.error('Request failed:', error)
+    return Promise.reject(error)
+  }
+)
+```
+
+Finally, we define the API function, which is the only entry point for the front-end to call the back-end:
+
+```typescript
+// Generate travel plan
+export const generateTripPlan = async (request: TripPlanRequest): Promise<TripPlan> => {
+  const response = await api.post<TripPlan>('/trip/plan', request)
+  return response.data
+}
+```
+
+Note the type signature of this function: the parameter is of type `TripPlanRequest`, and the return value is of type `Promise<TripPlan>`. This means TypeScript will check whether the parameters passed by the caller meet the requirements, and will also check whether the use of the return value is correct.
+
+### 13.5.4 Home Form Design
+
+The Home page is the user's entry point, containing a form for users to fill in travel requirements. We use Vue 3's Composition API to organize the code:
+
+```vue
+<script setup lang="ts">
+import { ref } from 'vue'
+import { useRouter } from 'vue-router'
+import { message } from 'ant-design-vue'
+import { generateTripPlan } from '@/services/api'
+import type { TripPlanRequest } from '@/types'
+
+const router = useRouter()
+const loading = ref(false)
+const loadingProgress = ref(0)
+const loadingStatus = ref('')
+
+const formData = ref<TripPlanRequest>({
+  city: '',
+  start_date: '',
+  end_date: '',
+  days: 3,
+  preferences: 'History and Culture',
+  budget: 'Medium',
+  transportation: 'Public Transportation',
+  accommodation: 'Budget Hotel'
+})
+</script>
+```
+
+Here we use `ref` to create reactive variables. `formData` is the form data, of type `TripPlanRequest`. `loading` indicates whether it's loading, `loadingProgress` indicates the loading progress, and `loadingStatus` indicates the loading status text.
+
+The form submission logic is as follows:
+
+```typescript
+const handleSubmit = async () => {
+  loading.value = true
+  loadingProgress.value = 0
+
+  // Simulate progress updates
+  const progressInterval = setInterval(() => {
+    if (loadingProgress.value < 90) {
+      loadingProgress.value += 10
+      if (loadingProgress.value <= 30) loadingStatus.value = '🔍 Searching for attractions...'
+      else if (loadingProgress.value <= 50) loadingStatus.value = '🌤️ Querying weather...'
+      else if (loadingProgress.value <= 70) loadingStatus.value = '🏨 Recommending hotels...'
+      else loadingStatus.value = '📋 Generating itinerary...'
+    }
+  }, 500)
+
+  try {
+    const response = await generateTripPlan(formData.value)
+    clearInterval(progressInterval)
+    loadingProgress.value = 100
+    router.push({ name: 'result', state: { tripPlan: response } })
+  } catch (error) {
+    clearInterval(progressInterval)
+    message.error('Failed to generate plan, please try again')
+  } finally {
+    loading.value = false
+  }
+}
+```
+
+This code does several things. First, it sets `loading` to true to display the loading state. Then, it starts a timer that updates the progress bar and status text every 500 milliseconds. This is a simulated progress because we can't accurately know the back-end's processing progress. But this lets users know the system is working, rather than being stuck.
+
+Next, it calls the `generateTripPlan` function to send the API request. This is an asynchronous operation, and we use `await` to wait for the response. If the request succeeds, clear the timer, set progress to 100%, then navigate to the result page and pass the travel plan data. If the request fails, display an error message. Finally, whether successful or failed, set `loading` to false to hide the loading state.
+
+The template part uses Ant Design Vue components:
+
+```vue
+<template>
+  <div class="home-container">
+    <div class="page-header">
+      <h1 class="page-title">✈️ Intelligent Travel Assistant</h1>
+      <p class="page-subtitle">AI-Powered Personalized Travel Planning</p>
+    </div>
+
+    <a-card class="form-card">
+      <a-form :model="formData" @finish="handleSubmit">
+        <a-form-item label="Destination City" name="city" :rules="[{ required: true }]">
+          <a-input v-model:value="formData.city" placeholder="e.g., Beijing" />
+        </a-form-item>
+
+        <!-- More form items... -->
+
+        <a-form-item>
+          <a-button type="primary" html-type="submit" size="large" :loading="loading">
+            Start Planning
+          </a-button>
+        </a-form-item>
+
+        <!-- Loading progress bar -->
+        <a-form-item v-if="loading">
+          <a-progress :percent="loadingProgress" status="active" />
+          <p>{{ loadingStatus }}</p>
+        </a-form-item>
+      </a-form>
+    </a-card>
+  </div>
+</template>
+```
+
+Note the `v-model:value` directive, which implements two-way data binding. When users type in the input box, `formData.city` automatically updates. When the value of `formData.city` changes, the input box content also automatically updates.
+
+### 13.5.5 Result Page Display
+
+The Result page is the core of the entire application, displaying the generated travel plan. This page includes several parts: itinerary overview, budget details, map visualization, daily itinerary details, and weather information.
+
+First is map visualization. We use the Amap JS API to mark attraction locations on the map:
+
+```typescript
+import AMapLoader from '@amap/amap-jsapi-loader'
+
+const initMap = async () => {
+  const AMap = await AMapLoader.load({
+    key: 'your_amap_web_key',
+    version: '2.0'
+  })
+
+  map = new AMap.Map('amap-container', {
+    zoom: 12,
+    center: [116.397128, 39.916527]
+  })
+
+  // Add attraction markers
+  tripPlan.value.days.forEach((day) => {
+    day.attractions.forEach((attraction, index) => {
+      const marker = new AMap.Marker({
+        position: [attraction.location.longitude, attraction.location.latitude],
+        title: attraction.name,
+        label: { content: `${index + 1}`, direction: 'top' }
+      })
+      map.add(marker)
+    })
+  })
+}
+```
+
+This code first loads the Amap SDK, then creates a map instance, and finally iterates through all attractions to create a marker for each. The marker's position is the attraction's longitude and latitude coordinates, which are obtained from the back-end's `Attraction` object.
+
+The export function uses the `html2canvas` and `jsPDF` libraries. `html2canvas` can convert DOM elements to Canvas, and then we can export the Canvas as an image or PDF:
+
+```typescript
+import html2canvas from 'html2canvas'
+import jsPDF from 'jspdf'
+
+// Export as image
+const exportAsImage = async () => {
+  const element = document.getElementById('trip-plan-content')
+  const canvas = await html2canvas(element, { scale: 2 })
+  const link = document.createElement('a')
+  link.download = `${tripPlan.value.city} Travel Plan.png`
+  link.href = canvas.toDataURL()
+  link.click()
+}
+
+// Export as PDF
+const exportAsPDF = async () => {
+  const element = document.getElementById('trip-plan-content')
+  const canvas = await html2canvas(element, { scale: 2 })
+  const imgData = canvas.toDataURL('image/png')
+  const pdf = new jsPDF('p', 'mm', 'a4')
+  const imgWidth = 210
+  const imgHeight = (canvas.height * imgWidth) / canvas.width
+  pdf.addImage(imgData, 'PNG', 0, 0, imgWidth, imgHeight)
+  pdf.save(`${tripPlan.value.city} Travel Plan.pdf`)
+}
+```
+
+Through these front-end technologies, we implemented a complete Web application. Users can fill in forms in the browser, submit requests, wait for AI to generate travel plans, then view detailed itinerary arrangements, see attraction locations on the map, and export as images or PDFs. The entire process is smooth and natural - this is the charm of modern Web applications.
+
+## 13.6 Feature Implementation Details
+
+This section introduces the core feature implementations of the intelligent travel assistant, including budget calculation, loading progress bar, itinerary editing, export functionality, and side navigation.
+
+### 13.6.1 Budget Calculation Feature
+
+When planning a trip, budget is a very important consideration. Users need to know approximately how much this trip will cost and where the money will be spent. Our intelligent travel assistant provides automatic budget calculation functionality, dividing expenses into four major categories: attraction tickets, hotel accommodation, dining, and transportation.
+
+Where is the budget calculation logic implemented? We chose to implement it in the back-end's PlannerAgent. Why not calculate on the front-end? Because budget estimation needs to be based on attraction ticket prices, hotel price ranges, dining standards, and other information, all of which are already obtained by PlannerAgent when generating the itinerary. If calculated on the front-end, we would need to duplicate this logic, and it might not be accurate.
+
+In PlannerAgent's prompt, we explicitly require the LLM to generate budget information:
+
+```python
+PLANNER_AGENT_PROMPT = """
+You are an itinerary planning expert.
+
+**Output Format:**
+Strictly return in the following JSON format:
+{
+  ...
+  "budget": {
+    "total_attractions": 180,
+    "total_hotels": 1200,
+    "total_meals": 480,
+    "total_transportation": 200,
+    "total": 2060
+  }
+}
+
+**Planning Requirements:**
+...
+7. Include budget information, estimate based on attraction tickets, hotel prices, dining standards, and transportation methods
+"""
+```
+
+The LLM will estimate the cost of each item based on the attractions, hotels, and dining arrangements in the itinerary. For example, if the itinerary includes the Forbidden City (ticket 60 yuan), Temple of Heaven (ticket 15 yuan), and Summer Palace (ticket 30 yuan), then the total attraction ticket cost is 105 yuan. If it's a 3-day 2-night trip with budget hotels (300 yuan per night), then the total hotel cost is 600 yuan.
+
+On the front-end, we use Ant Design Vue's Statistic component to display budget information. This component is specifically designed for displaying statistical data and supports number animations, prefixes/suffixes, custom styles, etc.:
+
+```vue
+<a-card v-if="tripPlan.budget" title="💰 Budget Details">
+  <a-row :gutter="16">
+    <a-col :span="6">
+      <a-statistic title="Attraction Tickets" :value="tripPlan.budget.total_attractions" suffix="yuan" />
+    </a-col>
+    <a-col :span="6">
+      <a-statistic title="Hotel Accommodation" :value="tripPlan.budget.total_hotels" suffix="yuan" />
+    </a-col>
+    <a-col :span="6">
+      <a-statistic title="Dining Expenses" :value="tripPlan.budget.total_meals" suffix="yuan" />
+    </a-col>
+    <a-col :span="6">
+      <a-statistic title="Transportation" :value="tripPlan.budget.total_transportation" suffix="yuan" />
+    </a-col>
+  </a-row>
+  <a-divider />
+  <a-row>
+    <a-col :span="24" style="text-align: center;">
+      <a-statistic
+        title="Estimated Total Cost"
+        :value="tripPlan.budget.total"
+        suffix="yuan"
+        :value-style="{ color: '#cf1322', fontSize: '32px', fontWeight: 'bold' }"
+      />
+    </a-col>
+  </a-row>
+</a-card>
+```
+
+This code uses grid layout (`a-row` and `a-col`) to display the four expense items side by side. Each expense item uses an `a-statistic` component to display the title and value. Finally, a divider (`a-divider`) separates them, and below displays the total cost in large red font for emphasis.
+
+Note the conditional rendering `v-if="tripPlan.budget"`. Because budget information is optional (defined as `Optional[Budget]` in the Pydantic model), if the LLM doesn't generate budget information, this card won't be displayed. This reflects the front-end's error tolerance for data.
+
+### 13.6.2 Loading Progress Bar
+
+Generating a travel plan is a time-consuming operation. The back-end needs to sequentially call AttractionSearchAgent, WeatherQueryAgent, HotelAgent, and PlannerAgent, and each Agent needs to call the LLM and external APIs. The entire process may take 10-30 seconds. If the user clicks the "Start Planning" button and the page has no feedback, the user will think the system is stuck and may refresh the page or click repeatedly.
+
+To improve user experience, we added a loading progress bar and status prompts. Currently, it's just simulated progress, but it lets users know the system is working.
+
+```typescript
+const loading = ref(false)
+const loadingProgress = ref(0)
+const loadingStatus = ref('')
+
+const handleSubmit = async () => {
+  loading.value = true
+  loadingProgress.value = 0
+
+  // Simulate progress updates
+  const progressInterval = setInterval(() => {
+    if (loadingProgress.value < 90) {
+      loadingProgress.value += 10
+      if (loadingProgress.value <= 30) loadingStatus.value = '🔍 Searching for attractions...'
+      else if (loadingProgress.value <= 50) loadingStatus.value = '🌤️ Querying weather...'
+      else if (loadingProgress.value <= 70) loadingStatus.value = '🏨 Recommending hotels...'
+      else loadingStatus.value = '📋 Generating itinerary...'
+    }
+  }, 500)
+
+  try {
+    const response = await generateTripPlan(formData.value)
+    clearInterval(progressInterval)
+    loadingProgress.value = 100
+    loadingStatus.value = '✅ Complete!'
+    router.push({ name: 'result', state: { tripPlan: response } })
+  } catch (error) {
+    clearInterval(progressInterval)
+    message.error('Failed to generate plan')
+  } finally {
+    loading.value = false
+  }
+}
+```
+
+### 13.6.3 Itinerary Editing Feature
+
+Although AI-generated travel plans are intelligent, they may not fully meet users' personal needs. For example, users may not like a certain attraction and want to delete it, or want to adjust the order of attractions. We provide an itinerary editing feature that allows users to customize their itinerary.
+
+The core of the editing feature is **state management**. We need to maintain two states: the current itinerary plan and the original itinerary plan. When users enter edit mode, we save a copy of the original plan. If users cancel editing, we restore the original plan. If users save changes, we update the current plan:
+
+```typescript
+const editMode = ref(false)
+const originalPlan = ref<TripPlan | null>(null)
+
+// Enter edit mode
+const toggleEditMode = () => {
+  editMode.value = true
+  originalPlan.value = JSON.parse(JSON.stringify(tripPlan.value))
+}
+```
+
+Note that we use `JSON.parse(JSON.stringify(...))` to deep copy the object. Why not assign directly? Because objects in JavaScript are reference types - if we assign directly, `originalPlan` and `tripPlan` will point to the same object, and modifying one will affect the other. Deep copying creates a completely independent copy.
+
+The logic for moving attractions is to swap the positions of two elements in the array:
+
+```typescript
+// Move attraction
+const moveAttraction = (dayIndex: number, attractionIndex: number, direction: 'up' | 'down') => {
+  const attractions = tripPlan.value.days[dayIndex].attractions
+  const newIndex = direction === 'up' ? attractionIndex - 1 : attractionIndex + 1
+
+  if (newIndex >= 0 && newIndex < attractions.length) {
+    [attractions[attractionIndex], attractions[newIndex]] =
+    [attractions[newIndex], attractions[attractionIndex]]
+  }
+}
+```
+
+This uses ES6's destructuring assignment syntax to swap two elements. `[a, b] = [b, a]` is an elegant way to swap without needing a temporary variable.
+
+Deleting attractions uses the array's `splice` method:
+
+```typescript
+// Delete attraction
+const deleteAttraction = (dayIndex: number, attractionIndex: number) => {
+  tripPlan.value.days[dayIndex].attractions.splice(attractionIndex, 1)
+}
+```
+
+When saving changes, we need to reinitialize the map because attraction positions may have changed:
+
+```typescript
+// Save changes
+const saveChanges = () => {
+  editMode.value = false
+  message.success('Changes saved')
+  initMap()  // Reinitialize map
+}
+
+// Cancel editing
+const cancelEdit = () => {
+  if (originalPlan.value) {
+    tripPlan.value = originalPlan.value
+  }
+  editMode.value = false
+}
+```
+
+In the template, we display different UI based on the value of `editMode`. In edit mode, up, down, and delete buttons are displayed next to each attraction:
+
+```vue
+<div v-if="editMode" class="edit-buttons">
+  <a-button size="small" @click="moveAttraction(dayIndex, index, 'up')">Up</a-button>
+  <a-button size="small" @click="moveAttraction(dayIndex, index, 'down')">Down</a-button>
+  <a-button size="small" danger @click="deleteAttraction(dayIndex, index)">Delete</a-button>
+</div>
+```
+
+### 13.6.4 Export Functionality
+
+After users generate a satisfactory travel plan, they may want to save it or share it with friends. We provide two export methods: export as image and export as PDF.
+
+The core of the export functionality is the `html2canvas` library. This library can convert DOM elements to Canvas, and then we can export the Canvas as an image. But there's a technical challenge here: the map is rendered using Canvas, and `html2canvas` has compatibility issues when handling nested Canvas.
+
+We tried multiple solutions, including converting the map Canvas to an image before exporting, but due to Amap's Canvas rendering mechanism and cross-origin restrictions, this solution didn't completely solve the problem. In actual projects, you may need to consider the following alternative solutions:
+
+1. **Use Amap's static map API**: Call the `maps_staticmap` tool to generate static map images to replace dynamic maps
+2. **Export separately**: Export the map and itinerary content separately, then merge them on the back-end
+3. **Use screenshot service**: Use headless browsers like Puppeteer to take screenshots on the server side
+4. **Simplify export content**: Hide the map when exporting, only export text content
+
+In the current implementation, we adopted a simplified approach, temporarily hiding the map part when exporting and only exporting the text content and attraction information of the itinerary. Although this isn't the ideal solution, it ensures the export functionality is usable.
+
+The logic for exporting as an image is simple:
+
+```typescript
+import html2canvas from 'html2canvas'
+
+const exportAsImage = async () => {
+  const element = document.getElementById('trip-plan-content')
+  if (!element) return
+
+  const canvas = await html2canvas(element, {
+    backgroundColor: '#ffffff',
+    scale: 2,
+    useCORS: true
+  })
+
+  const link = document.createElement('a')
+  link.download = `${tripPlan.value.city} Travel Plan.png`
+  link.href = canvas.toDataURL('image/png')
+  link.click()
+  message.success('Export successful!')
+}
+```
+
+`scale: 2` means using 2x resolution, making the exported image clearer. `useCORS: true` allows cross-origin image loading, which is important for attraction images (from Unsplash).
+
+Exporting as PDF requires additional steps: first convert to Canvas, then convert to image, and finally add to PDF:
+
+```typescript
+import jsPDF from 'jspdf'
+
+const exportAsPDF = async () => {
+  // First capture map image
+  await captureMapImage()
+
+  const element = document.getElementById('trip-plan-content')
+  if (!element) return
+
+  const canvas = await html2canvas(element, {
+    backgroundColor: '#ffffff',
+    scale: 2,
+    useCORS: true,
+    allowTaint: true
+  })
+
+  // Restore map
+  restoreMap()
+
+  const pdf = new jsPDF('p', 'mm', 'a4')
+  const imgData = canvas.toDataURL('image/png')
+  const imgWidth = 210  // A4 width
+  const imgHeight = (canvas.height * imgWidth) / canvas.width
+
+  pdf.addImage(imgData, 'PNG', 0, 0, imgWidth, imgHeight)
+  pdf.save(`${tripPlan.value.city} Travel Plan.pdf`)
+  message.success('Export successful!')
+}
+```
+
+Here we need to calculate the image height to maintain the aspect ratio. The width of A4 paper is 210mm, and we calculate the corresponding height based on the Canvas aspect ratio.
+
+### 13.6.5 Side Navigation and Anchor Jumping
+
+The Result page has a lot of content, including itinerary overview, budget details, map, daily itinerary, weather information, etc. If users want to quickly jump to a certain section, they need to scroll a long distance. We provide side navigation and anchor jumping functionality, allowing users to quickly locate.
+
+Side navigation uses Ant Design Vue's Menu component:
+
+```vue
+<a-menu
+  v-model:selectedKeys="[activeSection]"
+  mode="inline"
+  @click="scrollToSection"
+>
+  <a-menu-item key="overview">📋 Itinerary Overview</a-menu-item>
+  <a-menu-item key="budget">💰 Budget Details</a-menu-item>
+  <a-menu-item key="map">🗺️ Map</a-menu-item>
+  <a-menu-item key="days">📅 Daily Itinerary</a-menu-item>
+  <a-menu-item key="weather">🌤️ Weather</a-menu-item>
+</a-menu>
+```
+
+When clicking a menu item, call the `scrollToSection` function:
+
+```typescript
+const activeSection = ref('overview')
+
+// Scroll to specified section
+const scrollToSection = ({ key }: { key: string }) => {
+  activeSection.value = key
+  const element = document.getElementById(key)
+  if (element) {
+    element.scrollIntoView({ behavior: 'smooth', block: 'start' })
+  }
+}
+```
+
+`scrollIntoView` is a native browser API that can scroll an element into the visible area. `behavior: 'smooth'` means smooth scrolling rather than instant jumping. `block: 'start'` means the top of the element aligns with the top of the visible area.
+
+In various parts of the page, we need to add corresponding ids:
+
+```vue
+<div id="overview">
+  <!-- Itinerary overview content -->
+</div>
+
+<div id="budget">
+  <!-- Budget details content -->
+</div>
+
+<div id="map">
+  <!-- Map content -->
+</div>
+```
+
+This way, when users click a menu item in the side navigation, the page will smoothly scroll to the corresponding section.
+
+Through the implementation of these features, our intelligent travel assistant not only generates travel plans but also provides rich interactive features: budget calculation lets users understand costs, loading progress bar makes waiting less anxious, itinerary editing makes plans more personalized, export functionality allows plans to be shared and saved, and side navigation makes long pages easy to browse. The combination of these features forms a complete, user-friendly, and practical Web application.
+
+## 13.7 Conclusion
+
+Congratulations on completing Chapter 13!
+
+Through this chapter, you not only learned how to build a complete intelligent travel assistant application, but more importantly, you mastered:
+
+1. **System Design Thinking**: How to decompose complex problems into multiple simple tasks
+2. **Engineering Practice Ability**: How to transform theoretical knowledge into runnable code
+3. **Full-Stack Development Ability**: How to integrate front-end and back-end technology stacks
+4. **AI Application Development**: How to use LLMs to build practical applications
+
+This project is a starting point, not an endpoint. Based on this project, you can:
+
+- Add more features
+- Optimize user experience
+- Extend to other domains (such as intelligent shopping assistant, intelligent learning assistant, etc.)
+- Deploy to production environment to serve real users
+
+The best way to learn is through practice. Don't just read the code - modify, extend, and optimize it yourself. Each practice will deepen your understanding of multi-Agent systems.
+
+Wishing you success on your journey in AI application development!
+

+ 153 - 149
docs/chapter13/第十三章 智能旅行助手.md

@@ -1,6 +1,10 @@
+<div align="right">
+  <a href="./Chapter13-Intelligent-Travel-Assistant.md">English</a> | 中文
+</div>
+
 # 第十三章 智能旅行助手
 
-在前面的章节中,我们从零开始构建了HelloAgents框架,实现了多种智能体范式、工具系统、记忆机制、协议通信和性能评估等核心功能。从本章开始,我们将进入一个全新的阶段:<strong>将所学知识融会贯通,构建完整的实用应用。</strong>
+在前面的章节中,我们从零开始构建了 HelloAgents 框架,实现了多种智能体范式、工具系统、记忆机制、协议通信和性能评估等核心功能。从本章开始,我们将进入一个全新的阶段:<strong>将所学知识融会贯通,构建完整的实用应用。</strong>
 
 还记得在第一章中,我们构建的第一个智能体吗?那是一个简单的智能旅行助手,展示了`Thought-Action-Observation`循环的基本原理。本章的智能旅行助手将是一个完整的项目,包含以下核心功能:
 
@@ -12,7 +16,7 @@
 
 <strong>(4)行程编辑</strong>:支持添加、删除、调整景点,实时更新地图。
 
-<strong>(5)导出功能</strong>:支持导出为PDF或图片,方便保存和分享。
+<strong>(5)导出功能</strong>:支持导出为 PDF 或图片,方便保存和分享。
 
 
 
@@ -24,13 +28,13 @@
 
 传统的旅行规划方式有几个痛点。首先是<strong>信息分散</strong>。景点信息在旅游网站上,天气信息在天气网站上,酒店信息在预订网站上,你需要在多个网站之间切换,手动整合这些信息。其次是<strong>缺少个性化</strong>。大部分攻略都是通用的,不考虑你的个人偏好、预算限制、出行时间等因素。最后是<strong>难以调整</strong>。当你想修改行程时,可能需要重新规划整个行程,因为景点的顺序、时间安排、预算都是相互关联的。
 
-AI技术为解决这些问题提供了新的可能。想象一下,你只需要告诉系统"我想去北京玩3天,喜欢历史文化,预算中等",系统就能自动为你生成一个完整的行程计划,包括每天去哪些景点、在哪里吃饭、住哪个酒店、需要多少预算。而且这个计划是可以调整的,你可以删除不喜欢的景点,调整游览顺序,系统会自动更新地图和预算。
+AI 技术为解决这些问题提供了新的可能。想象一下,你只需要告诉系统"我想去北京玩 3 天,喜欢历史文化,预算中等",系统就能自动为你生成一个完整的行程计划,包括每天去哪些景点、在哪里吃饭、住哪个酒店、需要多少预算。而且这个计划是可以调整的,你可以删除不喜欢的景点,调整游览顺序,系统会自动更新地图和预算。
 
-这就是我们要构建的智能旅行助手。它不仅仅是一个技术演示,而是一个真正有用的应用。通过这个项目,你会学到如何将AI技术应用到实际问题中,如何设计多智能体系统,如何构建完整的Web应用。
+这就是我们要构建的智能旅行助手。它不仅仅是一个技术演示,而是一个真正有用的应用。通过这个项目,你会学到如何将 AI 技术应用到实际问题中,如何设计多智能体系统,如何构建完整的 Web 应用。
 
 ### 13.1.2 技术架构概览
 
-系统采用经典的<strong>前后端分离架构</strong>,分为四个层次,如图13.1所示:
+系统采用经典的<strong>前后端分离架构</strong>,分为四个层次,如图 13.1 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-1.png" alt="" width="85%"/>
@@ -39,13 +43,13 @@ AI技术为解决这些问题提供了新的可能。想象一下,你只需要
 
 <strong>(1)前端层 (Vue3+TypeScript)</strong>:负责用户交互和数据展示,包括表单输入、结果展示、地图可视化。
 
-<strong>(2)后端层 (FastAPI)</strong>:负责API路由、数据验证、业务逻辑。
+<strong>(2)后端层 (FastAPI)</strong>:负责 API 路由、数据验证、业务逻辑。
 
-<strong>(3)智能体层 (HelloAgents)</strong>:负责任务分解、工具调用、结果整合。包含4个专门的Agent。
+<strong>(3)智能体层 (HelloAgents)</strong>:负责任务分解、工具调用、结果整合。包含 4 个专门的 Agent。
 
-<strong>(4)外部服务层</strong>:提供数据和能力,包括高德地图API、Unsplash API、LLM API。
+<strong>(4)外部服务层</strong>:提供数据和能力,包括高德地图 API、Unsplash API、LLM API。
 
-数据流转过程如下:用户在前端填写表单 → 后端验证数据 → 调用智能体系统 → 智能体依次调用景点搜索、天气查询、酒店推荐、行程规划Agent → 每个Agent通过MCP协议调用外部API → 整合结果返回前端 → 前端渲染展示。
+数据流转过程如下:用户在前端填写表单 → 后端验证数据 → 调用智能体系统 → 智能体依次调用景点搜索、天气查询、酒店推荐、行程规划 Agent → 每个 Agent 通过 MCP 协议调用外部 API → 整合结果返回前端 → 前端渲染展示。
 
 项目的结构参考如下,提供便于定位源码:
 ```
@@ -70,25 +74,25 @@ helloagents-trip-planner/
 
 详细的架构设计和数据流转将在后续章节中介绍。
 
-### 13.1.3 快速体验:5分钟运行项目
+### 13.1.3 快速体验:5 分钟运行项目
 
 在深入学习实现细节之前,让我们先把项目跑起来,看看最终的效果。这样你会对整个系统有一个直观的认识。
 
 <strong>环境要求:</strong>
 
-- Python 3.10或更高版本
-- Node.js 16.0或更高版本
-- npm 8.0或更高版本
+- Python 3.10 或更高版本
+- Node.js 16.0 或更高版本
+- npm 8.0 或更高版本
 
-<strong>获取API密钥:</strong>
+<strong>获取 API 密钥:</strong>
 
-你需要准备以下API密钥:
+你需要准备以下 API 密钥:
 
-- LLM的API(OpenAI、DeepSeek等)
-- 高德地图Web服务Key:访问 https://console.amap.com/ 注册并创建应用
+- LLM  API(OpenAI、DeepSeek 等)
+- 高德地图 Web 服务 Key:访问 https://console.amap.com/ 注册并创建应用
 - Unsplash Access Key:访问 https://unsplash.com/developers 注册并创建应用
 
-将所有API密钥放入`.env`文件。
+将所有 API 密钥放入`.env`文件。
 
 启动后端:
 
@@ -109,7 +113,7 @@ uvicorn app.api.main:app --reload
 python run.py
 ```
 
-成功启动后,访问 http://localhost:8000/docs 可以看到API文档。
+成功启动后,访问 http://localhost:8000/docs 可以看到 API 文档。
 
 打开新的终端窗口:
 
@@ -128,14 +132,14 @@ npm run dev
 
 体验核心功能:
 
-首先需在首页表单中填写目的地城市、旅行日期、偏好、预算、交通及住宿类型等信息。点击“开始规划”按钮后,系统会显示加载进度条,并很快生成结果页面,如图13.2所示。
+首先需在首页表单中填写目的地城市、旅行日期、偏好、预算、交通及住宿类型等信息。点击“开始规划”按钮后,系统会显示加载进度条,并很快生成结果页面,如图 13.2 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-2.png" alt="" width="85%"/>
   <p>图 13.2 旅行助手规划进行页面</p>
 </div>
 
-随后加载成功,该页面会清晰展示行程概览、预算明细、景点地图、每日行程详情和天气信息,如图13.3,13.4所示。
+随后加载成功,该页面会清晰展示行程概览、预算明细、景点地图、每日行程详情和天气信息,如图 13.3,13.4 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-3.png" alt="" width="85%"/>
@@ -147,7 +151,7 @@ npm run dev
   <p>图 13.4 旅行助手规划完成页面</p>
 </div>
 
-如果用户需要个性化调整,可以点击“编辑行程”按钮,自由调整景点顺序或删除某个景点,如图13.5所示。规划完成后,通过“导出行程”下拉菜单,即可将最终方案轻松保存为图片或PDF文件,方便随时查阅。
+如果用户需要个性化调整,可以点击“编辑行程”按钮,自由调整景点顺序或删除某个景点,如图 13.5 所示。规划完成后,通过“导出行程”下拉菜单,即可将最终方案轻松保存为图片或 PDF 文件,方便随时查阅。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-5.png" alt="" width="85%"/>
@@ -156,19 +160,19 @@ npm run dev
 
 ## 13.2 数据模型设计
 
-### 13.2.1 Web应用中的数据流转
+### 13.2.1 Web 应用中的数据流转
 
 在构建智能旅行助手时,我们需要解决一个核心问题:<strong>如何表示和传递旅行计划数据?</strong> 
 
-我们需要理解一个完整的Web应用中数据是如何流转的。想象一下,当用户在浏览器中点击"开始规划"按钮时,会发生什么?
+我们需要理解一个完整的 Web 应用中数据是如何流转的。想象一下,当用户在浏览器中点击"开始规划"按钮时,会发生什么?
 
-用户在前端填写的表单数据(目的地、日期、预算等)需要通过HTTP请求发送到后端服务器。后端接收到数据后,会调用智能体系统进行处理。智能体又会调用高德地图API、Unsplash API等外部服务获取数据。这些外部API返回的数据格式各不相同,有的用`lng`,有的用`lon`,有的用`longitude`。最后,后端需要将处理好的数据返回给前端,前端再渲染成用户看到的页面。
+用户在前端填写的表单数据(目的地、日期、预算等)需要通过 HTTP 请求发送到后端服务器。后端接收到数据后,会调用智能体系统进行处理。智能体又会调用高德地图 API、Unsplash API 等外部服务获取数据。这些外部 API 返回的数据格式各不相同,有的用`lng`,有的用`lon`,有的用`longitude`。最后,后端需要将处理好的数据返回给前端,前端再渲染成用户看到的页面。
 
-在这个过程中,数据经历了多次转换:前端表单 → HTTP请求 → 后端Python对象 → 外部API响应 → 后端Python对象 → HTTP响应 → 前端TypeScript对象 → 页面展示。如果没有统一的数据格式,每一步转换都可能出错。这就是为什么我们需要<strong>数据模型</strong>。
+在这个过程中,数据经历了多次转换:前端表单 → HTTP 请求 → 后端 Python 对象 → 外部 API 响应 → 后端 Python 对象 → HTTP 响应 → 前端 TypeScript 对象 → 页面展示。如果没有统一的数据格式,每一步转换都可能出错。这就是为什么我们需要<strong>数据模型</strong>。
 
-### 13.2.2 从字典到Pydantic模型
+### 13.2.2 从字典到 Pydantic 模型
 
-让我们从第一章的简单原型开始。在那个原型中,我们使用Python字典来表示景点数据:
+让我们从第一章的简单原型开始。在那个原型中,我们使用 Python 字典来表示景点数据:
 
 ```python
 # 第一章的做法:使用字典
@@ -182,13 +186,13 @@ attraction = {
 lng = attraction["location"]["lng"]
 ```
 
-这种方式在原型阶段很方便,但在实际项目中会遇到很多问题。首先是<strong>字段名不统一</strong>的问题。高德地图API返回的位置数据是`"116.397128,39.916527"`这样的字符串,需要手动分割成经纬度。而Unsplash API可能使用`longitude`和`latitude`。如果我们在代码中到处都用字典,就需要在每个地方都处理这些差异。
+这种方式在原型阶段很方便,但在实际项目中会遇到很多问题。首先是<strong>字段名不统一</strong>的问题。高德地图 API 返回的位置数据是`"116.397128,39.916527"`这样的字符串,需要手动分割成经纬度。而 Unsplash API 可能使用`longitude`和`latitude`。如果我们在代码中到处都用字典,就需要在每个地方都处理这些差异。
 
-其次是<strong>类型安全</strong>的问题。假设我们不小心把`price`写成了字符串`"60"`,在Python中这不会立即报错,但在计算总预算时就会出问题。更糟糕的是,这种错误只能在运行时才能发现,而且错误信息可能很难定位。
+其次是<strong>类型安全</strong>的问题。假设我们不小心把`price`写成了字符串`"60"`,在 Python 中这不会立即报错,但在计算总预算时就会出问题。更糟糕的是,这种错误只能在运行时才能发现,而且错误信息可能很难定位。
 
 最后是<strong>维护性</strong>的问题。当我们需要给景点添加新字段(比如`rating`评分)时,需要在代码的多个地方修改。如果遗漏了某个地方,就会导致数据不一致。
 
-Pydantic提供了一个解决方案。它是Python的数据验证库,可以让我们用类来定义数据结构,并自动处理验证、转换和序列化。让我们看一个简单的例子:
+Pydantic 提供了一个解决方案。它是 Python 的数据验证库,可以让我们用类来定义数据结构,并自动处理验证、转换和序列化。让我们看一个简单的例子:
 
 ```python
 from pydantic import BaseModel,Field
@@ -213,13 +217,13 @@ attraction = Attraction(
 lng = attraction.location.longitude  # IDE会提供代码补全
 ```
 
-这样做有几个好处。首先,如果我们传入了错误的类型(比如把`ticket_price`设为字符串),Pydantic会立即抛出异常,告诉我们哪里出错了。其次,IDE可以根据类型定义提供代码补全和类型检查,大大减少了拼写错误。最后,当我们需要修改数据结构时,只需要修改类定义,所有使用这个类的地方都会自动更新。
+这样做有几个好处。首先,如果我们传入了错误的类型(比如把`ticket_price`设为字符串),Pydantic 会立即抛出异常,告诉我们哪里出错了。其次,IDE 可以根据类型定义提供代码补全和类型检查,大大减少了拼写错误。最后,当我们需要修改数据结构时,只需要修改类定义,所有使用这个类的地方都会自动更新。
 
-### 13.2.3 Pydantic的核心概念
+### 13.2.3 Pydantic 的核心概念
 
-在深入设计我们的数据模型之前,让我们先了解Pydantic的几个核心概念。Pydantic的基础是`BaseModel`类,所有的数据模型都需要继承这个类。每个字段都可以指定类型,Pydantic会自动进行类型检查和转换。
+在深入设计我们的数据模型之前,让我们先了解 Pydantic 的几个核心概念。Pydantic 的基础是`BaseModel`类,所有的数据模型都需要继承这个类。每个字段都可以指定类型,Pydantic 会自动进行类型检查和转换。
 
-字段定义使用`Field`函数,它可以指定默认值、描述、验证规则等。`...`表示这个字段是必填的,如果创建对象时没有提供这个字段,Pydantic会抛出异常。我们也可以使用`Optional`来表示可选字段,或者直接提供默认值。
+字段定义使用`Field`函数,它可以指定默认值、描述、验证规则等。`...`表示这个字段是必填的,如果创建对象时没有提供这个字段,Pydantic 会抛出异常。我们也可以使用`Optional`来表示可选字段,或者直接提供默认值。
 
 ```python
 from pydantic import BaseModel,Field
@@ -232,7 +236,7 @@ class Attraction(BaseModel):
     description: Optional[str] = None  # 可选字段
 ```
 
-Pydantic还支持嵌套模型和列表。我们可以在一个模型中使用另一个模型作为字段类型,这样就可以构建复杂的数据结构。比如,一个景点包含位置信息,一个行程包含多个景点。
+Pydantic 还支持嵌套模型和列表。我们可以在一个模型中使用另一个模型作为字段类型,这样就可以构建复杂的数据结构。比如,一个景点包含位置信息,一个行程包含多个景点。
 
 ```python
 class DayPlan(BaseModel):
@@ -241,7 +245,7 @@ class DayPlan(BaseModel):
     hotel: Optional[Hotel] = None  # 可选的酒店信息
 ```
 
-最强大的功能之一是<strong>自定义验证器</strong>。有时候外部API返回的数据格式不符合我们的要求,我们可以使用`field_validator`装饰器来自定义验证和转换逻辑。比如,高德地图返回的温度是`"16°C"`这样的字符串,我们需要把它转换成数字:
+最强大的功能之一是<strong>自定义验证器</strong>。有时候外部 API 返回的数据格式不符合我们的要求,我们可以使用`field_validator`装饰器来自定义验证和转换逻辑。比如,高德地图返回的温度是`"16°C"`这样的字符串,我们需要把它转换成数字:
 
 ```python
 from pydantic import field_validator
@@ -385,9 +389,9 @@ class TripPlan(BaseModel):
 
 这样,我们就完成了整个数据模型的设计。从最基础的`Location`,到`Attraction`、`Meal`、`Hotel`,再到`DayPlan`,最后到`TripPlan`,形成了一个清晰的层次结构。
 
-### 13.2.5 数据模型在Web应用中的应用
+### 13.2.5 数据模型在 Web 应用中的应用
 
-现在让我们看看这些数据模型如何在实际的Web应用中使用。在FastAPI中,Pydantic模型可以直接用作请求和响应的类型定义。FastAPI会自动进行数据验证、序列化和文档生成。
+现在让我们看看这些数据模型如何在实际的 Web 应用中使用。在 FastAPI 中,Pydantic 模型可以直接用作请求和响应的类型定义。FastAPI 会自动进行数据验证、序列化和文档生成。
 
 ```python
 from fastapi import FastAPI
@@ -409,9 +413,9 @@ async def create_trip_plan(request: TripPlanRequest) -> TripPlan:
     return trip_plan
 ```
 
-当用户发送POST请求到`/api/trip/plan`时,FastAPI会自动将JSON数据转换成`TripPlanRequest`对象。如果数据格式不正确(比如缺少必填字段,或者类型不匹配),FastAPI会自动返回400错误,并告诉用户哪里出错了。
+当用户发送 POST 请求到`/api/trip/plan`时,FastAPI 会自动将 JSON 数据转换成`TripPlanRequest`对象。如果数据格式不正确(比如缺少必填字段,或者类型不匹配),FastAPI 会自动返回 400 错误,并告诉用户哪里出错了。
 
-在前端,我们也需要定义对应的TypeScript类型。虽然TypeScript和Python是不同的语言,但数据结构是一样的:
+在前端,我们也需要定义对应的 TypeScript 类型。虽然 TypeScript  Python 是不同的语言,但数据结构是一样的:
 
 ```typescript
 interface Location {
@@ -435,21 +439,21 @@ interface TripPlan {
 }
 ```
 
-这样,前后端就使用了统一的数据格式。当后端返回`TripPlan`对象时,前端可以直接使用,不需要任何转换。TypeScript的类型检查也能帮助我们避免很多错误。
+这样,前后端就使用了统一的数据格式。当后端返回`TripPlan`对象时,前端可以直接使用,不需要任何转换。TypeScript 的类型检查也能帮助我们避免很多错误。
 
 ## 13.3 多智能体协作设计
 
 ### 13.3.1 为何需要多智能体
 
-在第七章中,我们学习了如何使用SimpleAgent来构建智能体。SimpleAgent的设计理念是简单直接:每次调用`run()`方法时,Agent会分析用户的问题,决定是否需要调用工具,然后返回结果。这种设计在处理简单任务时非常有效,但当面对旅行规划这样的任务时,就会遇到一些问题。
+在第七章中,我们学习了如何使用 SimpleAgent 来构建智能体。SimpleAgent 的设计理念是简单直接:每次调用`run()`方法时,Agent 会分析用户的问题,决定是否需要调用工具,然后返回结果。这种设计在处理简单任务时非常有效,但当面对旅行规划这样的任务时,就会遇到一些问题。
 
-如果用单个Agent来完成旅行规划。这个Agent需要做什么呢?首先,它要搜索景点信息,这需要调用高德地图的POI搜索工具。然后,它要查询天气信息,这需要调用天气查询工具。接着,它要搜索酒店信息,这又需要调用POI搜索工具。最后,它要把所有这些信息整合起来,生成一个完整的旅行计划。
+如果用单个 Agent 来完成旅行规划。这个 Agent 需要做什么呢?首先,它要搜索景点信息,这需要调用高德地图的 POI 搜索工具。然后,它要查询天气信息,这需要调用天气查询工具。接着,它要搜索酒店信息,这又需要调用 POI 搜索工具。最后,它要把所有这些信息整合起来,生成一个完整的旅行计划。
 
-这听起来很简单,但实际操作时会遇到第一个问题:<strong>工具调用的限制</strong>。SimpleAgent每次`run()`调用只能执行一个工具。这意味着我们需要多次调用`run()`方法,每次调用处理一个任务。但这样做会带来一个新问题:如何在多次调用之间传递信息?第一次调用得到的景点信息,如何传递给第二次调用?我们需要手动管理这些中间结果,代码会变得很复杂。
+这听起来很简单,但实际操作时会遇到第一个问题:<strong>工具调用的限制</strong>。SimpleAgent 每次`run()`调用只能执行一个工具。这意味着我们需要多次调用`run()`方法,每次调用处理一个任务。但这样做会带来一个新问题:如何在多次调用之间传递信息?第一次调用得到的景点信息,如何传递给第二次调用?我们需要手动管理这些中间结果,代码会变得很复杂。
 
-当然,我们可以使用ReactAgent来解决这个问题。ReactAgent可以在一次调用中执行多个工具,它会自动进行多轮思考和行动。但这又带来了新的问题:<strong>时间成本</strong>。ReactAgent的每一轮思考都需要调用LLM,如果需要调用三个工具,就需要至少三轮思考,这意味着至少三次LLM调用。而且这些调用是串行的,必须等前一个完成才能开始下一个,总时间会很长。
+当然,我们可以使用 ReactAgent 来解决这个问题。ReactAgent 可以在一次调用中执行多个工具,它会自动进行多轮思考和行动。但这又带来了新的问题:<strong>时间成本</strong>。ReactAgent 的每一轮思考都需要调用 LLM,如果需要调用三个工具,就需要至少三轮思考,这意味着至少三次 LLM 调用。而且这些调用是串行的,必须等前一个完成才能开始下一个,总时间会很长。
 
-第二个问题是<strong>提示词的复杂度</strong>。如果我们要让一个Agent完成所有任务,就需要在提示词中详细描述每个任务的执行逻辑。比如:
+第二个问题是<strong>提示词的复杂度</strong>。如果我们要让一个 Agent 完成所有任务,就需要在提示词中详细描述每个任务的执行逻辑。比如:
 
 ```python
 COMPLEX_PROMPT = """你是旅行规划助手。你需要:
@@ -461,30 +465,30 @@ COMPLEX_PROMPT = """你是旅行规划助手。你需要:
 """
 ```
 
-这样的提示词有几个问题。首先是<strong>难以维护</strong>。如果我们想修改景点搜索的逻辑(比如增加评分筛选),就需要修改整个提示词,很容易影响到其他部分。其次是<strong>容易出错</strong>。LLM需要同时理解多个任务的要求,很容易搞混不同任务的格式和参数。最后是<strong>难以调试</strong>。当生成的计划不符合预期时,我们很难知道是哪个环节出了问题,是景点搜索不准确,还是天气查询失败,还是整合逻辑有问题?
+这样的提示词有几个问题。首先是<strong>难以维护</strong>。如果我们想修改景点搜索的逻辑(比如增加评分筛选),就需要修改整个提示词,很容易影响到其他部分。其次是<strong>容易出错</strong>。LLM 需要同时理解多个任务的要求,很容易搞混不同任务的格式和参数。最后是<strong>难以调试</strong>。当生成的计划不符合预期时,我们很难知道是哪个环节出了问题,是景点搜索不准确,还是天气查询失败,还是整合逻辑有问题?
 
-面对这些问题,一个自然的想法是:能不能把复杂的任务分解成多个简单的任务,让不同的Agent各司其职?这就是多Agent协作的核心思想。
+面对这些问题,一个自然的想法是:能不能把复杂的任务分解成多个简单的任务,让不同的 Agent 各司其职?这就是多 Agent 协作的核心思想。
 
 想象一下现实世界中的旅行社。当你去旅行社咨询旅行计划时,不会只有一个人为你服务。通常会有专门的景点顾问,负责推荐景点;有酒店顾问,负责预订酒店;还有行程规划师,负责把所有信息整合成完整的行程。每个人都专注于自己擅长的领域,最后由行程规划师把所有信息汇总。这种分工协作的方式,比让一个人做所有事情要高效得多。
 
-### 13.3.2 Agent角色设计
+### 13.3.2 Agent 角色设计
 
-基于任务分解原则,我们设计了四个专门的Agent,如图13.6所示:
+基于任务分解原则,我们设计了四个专门的 Agent,如图 13.6 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-6.png" alt="" width="85%"/>
   <p>图 13.6 多智能体协作流程</p>
 </div>
 
-- <strong>AttractionSearchAgent(景点搜索专家)</strong>专注于搜索景点信息。它只需要理解用户的偏好(比如"历史文化"、"自然风光"),然后调用高德地图的POI搜索工具,返回相关的景点列表。它的提示词很简单,只需要说明如何根据偏好选择关键词,如何调用工具。
+- <strong>AttractionSearchAgent(景点搜索专家)</strong>专注于搜索景点信息。它只需要理解用户的偏好(比如"历史文化"、"自然风光"),然后调用高德地图的 POI 搜索工具,返回相关的景点列表。它的提示词很简单,只需要说明如何根据偏好选择关键词,如何调用工具。
 
 - <strong>WeatherQueryAgent(天气查询专家)</strong>专注于查询天气信息。它只需要知道城市名称,然后调用天气查询工具,返回未来几天的天气预报。它的任务非常明确,几乎不会出错。
 
-- <strong>HotelAgent(酒店推荐专家)</strong>专注于搜索酒店信息。它需要理解用户的住宿需求(比如"经济型"、"豪华型"),然后调用POI搜索工具,返回符合要求的酒店列表。
+- <strong>HotelAgent(酒店推荐专家)</strong>专注于搜索酒店信息。它需要理解用户的住宿需求(比如"经济型"、"豪华型"),然后调用 POI 搜索工具,返回符合要求的酒店列表。
 
-- <strong>PlannerAgent(行程规划专家)</strong>负责整合所有信息。它接收前三个Agent的输出,加上用户的原始需求(日期、预算等),然后生成完整的旅行计划。它不需要调用任何外部工具,只需要专注于信息的整合和行程的安排。
+- <strong>PlannerAgent(行程规划专家)</strong>负责整合所有信息。它接收前三个 Agent 的输出,加上用户的原始需求(日期、预算等),然后生成完整的旅行计划。它不需要调用任何外部工具,只需要专注于信息的整合和行程的安排。
 
-现在让我们详细设计每个Agent的角色和提示词。设计提示词时,我们需要考虑几个关键问题:这个Agent需要什么输入?它应该产生什么输出?它需要调用什么工具?它可能遇到什么问题?
+现在让我们详细设计每个 Agent 的角色和提示词。设计提示词时,我们需要考虑几个关键问题:这个 Agent 需要什么输入?它应该产生什么输出?它需要调用什么工具?它可能遇到什么问题?
 
 <strong>AttractionSearchAgent</strong>的任务是根据用户偏好搜索景点。它的输入是城市名称和用户偏好(比如"历史文化"、"自然风光")。它需要调用`amap_maps_text_search`工具,参数是关键词和城市。它的输出是景点列表,包含名称、地址、评分等信息。
 
@@ -530,7 +534,7 @@ HOTEL_AGENT_PROMPT = """你是酒店推荐专家。
 """
 ```
 
-<strong>PlannerAgent</strong>是最复杂的,因为它需要整合所有信息。它的输入是用户需求和前三个Agent的输出,输出是完整的旅行计划(JSON格式)。
+<strong>PlannerAgent</strong>是最复杂的,因为它需要整合所有信息。它的输入是用户需求和前三个 Agent 的输出,输出是完整的旅行计划(JSON 格式)。
 
 ```python
 PLANNER_AGENT_PROMPT = """你是行程规划专家。
@@ -558,9 +562,9 @@ PLANNER_AGENT_PROMPT = """你是行程规划专家。
 """
 ```
 
-### 13.3.3 Agent协作流程
+### 13.3.3 Agent 协作流程
 
-现在让我们看看这四个Agent如何协作完成旅行规划任务。整个流程可以分为五个步骤:
+现在让我们看看这四个 Agent 如何协作完成旅行规划任务。整个流程可以分为五个步骤:
 
 ```python
 class TripPlannerAgent:
@@ -597,11 +601,11 @@ class TripPlannerAgent:
         return trip_plan
 ```
 
-这个流程顺序执行四个步骤,每个步骤的输出作为下一个步骤的输入。注意我们使用了`TripPlanRequest`和`TripPlan`这两个Pydantic模型,这是在13.2节中定义的。
+这个流程顺序执行四个步骤,每个步骤的输出作为下一个步骤的输入。注意我们使用了`TripPlanRequest`和`TripPlan`这两个 Pydantic 模型,这是在 13.2 节中定义的。
 
 ### 13.3.4 查询构建
 
-PlannerAgent需要整合所有信息,这个查询需要包含所有必要的信息,而且要组织得清晰有序,让LLM能够准确理解。
+PlannerAgent 需要整合所有信息,这个查询需要包含所有必要的信息,而且要组织得清晰有序,让 LLM 能够准确理解。
 
 ```python
 def _build_planner_query(
@@ -637,15 +641,15 @@ def _build_planner_query(
 """
 ```
 
-通过这种多Agent协作的设计,我们把一个复杂的旅行规划任务分解成了四个简单的子任务。每个Agent都专注于自己擅长的领域,也为未来的功能扩展(比如添加餐厅推荐Agent、交通规划Agent)打下了良好的基础。
+通过这种多 Agent 协作的设计,我们把一个复杂的旅行规划任务分解成了四个简单的子任务。每个 Agent 都专注于自己擅长的领域,也为未来的功能扩展(比如添加餐厅推荐 Agent、交通规划 Agent)打下了良好的基础。
 
-## 13.4 MCP工具集成详解
+## 13.4 MCP 工具集成详解
 
-### 13.4.1 为什么不直接调用API
+### 13.4.1 为什么不直接调用 API
 
-在13.3节中,我们设计了四个Agent来协作完成旅行规划任务。其中AttractionSearchAgent、WeatherQueryAgent和HotelAgent都需要调用高德地图的API来获取数据。一个自然的问题是:为什么不直接在Agent中调用高德地图的HTTP API?
+在 13.3 节中,我们设计了四个 Agent 来协作完成旅行规划任务。其中 AttractionSearchAgent、WeatherQueryAgent  HotelAgent 都需要调用高德地图的 API 来获取数据。一个自然的问题是:为什么不直接在 Agent 中调用高德地图的 HTTP API?
 
-让我们先看看直接调用API会是什么样子。高德地图提供了POI搜索API,我们需要构造HTTP请求,传递参数,解析响应:
+让我们先看看直接调用 API 会是什么样子。高德地图提供了 POI 搜索 API,我们需要构造 HTTP 请求,传递参数,解析响应:
 
 ```python
 import requests
@@ -664,31 +668,31 @@ def search_poi(keywords: str,city: str,api_key: str):
     return data
 ```
 
-这种方式看起来很简单,但在实际使用中会遇到几个问题。首先是<strong>Agent无法自主调用</strong>。在我们的HelloAgents框架中,Agent通过识别提示词中的工具调用标记(比如`[TOOL_CALL:tool_name:arg1=value1]`)来调用工具。如果我们直接在代码中调用API,Agent就失去了自主决策的能力,变成了一个简单的函数调用。
+这种方式看起来很简单,但在实际使用中会遇到几个问题。首先是<strong>Agent 无法自主调用</strong>。在我们的 HelloAgents 框架中,Agent 通过识别提示词中的工具调用标记(比如`[TOOL_CALL:tool_name:arg1=value1]`)来调用工具。如果我们直接在代码中调用 API,Agent 就失去了自主决策的能力,变成了一个简单的函数调用。
 
-其次是<strong>参数传递复杂</strong>。高德地图的API有很多参数,比如POI搜索有`keywords`、`city`、`types`、`offset`、`page`等十几个参数。如果我们要让Agent能够灵活使用这些参数,就需要在提示词中详细说明每个参数的含义和格式,这会让提示词变得非常复杂。
+其次是<strong>参数传递复杂</strong>。高德地图的 API 有很多参数,比如 POI 搜索有`keywords`、`city`、`types`、`offset`、`page`等十几个参数。如果我们要让 Agent 能够灵活使用这些参数,就需要在提示词中详细说明每个参数的含义和格式,这会让提示词变得非常复杂。
 
-第三是<strong>响应解析困难</strong>。高德地图API返回的是JSON格式的数据,结构比较复杂。我们需要编写代码来解析这些数据,提取我们需要的字段。如果API的响应格式发生变化,我们就需要修改解析代码。
+第三是<strong>响应解析困难</strong>。高德地图 API 返回的是 JSON 格式的数据,结构比较复杂。我们需要编写代码来解析这些数据,提取我们需要的字段。如果 API 的响应格式发生变化,我们就需要修改解析代码。
 
-最后是<strong>工具管理混乱</strong>。高德地图提供了十几个不同的API(POI搜索、天气查询、路线规划等),如果我们为每个API都编写一个函数,然后手动注册到Agent的工具列表中,代码会变得很冗长。而且当我们想添加新的API时,需要修改多个地方。
+最后是<strong>工具管理混乱</strong>。高德地图提供了十几个不同的 API(POI 搜索、天气查询、路线规划等),如果我们为每个 API 都编写一个函数,然后手动注册到 Agent 的工具列表中,代码会变得很冗长。而且当我们想添加新的 API 时,需要修改多个地方。
 
-### 13.4.2 高德地图MCP集成
+### 13.4.2 高德地图 MCP 集成
 
-MCP(Model Context Protocol)是Anthropic提出的标准化协议,用于连接LLM和外部工具。本节将介绍如何在项目中集成高德地图MCP服务器。我们的项目用的是`amap-mcp-server`,这是一个用Node.js实现的MCP服务器:
+MCP(Model Context Protocol)是 Anthropic 提出的标准化协议,用于连接 LLM 和外部工具。本节将介绍如何在项目中集成高德地图 MCP 服务器。我们的项目用的是`amap-mcp-server`,这是一个用 Node.js 实现的 MCP 服务器:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-7.png" alt="" width="85%"/>
-  <p>图 13.7 amap-mcp-server工具</p>
+  <p>图 13.7 amap-mcp-server 工具</p>
 </div>
 
-高德地图MCP服务器提供了多种工具,主要分为以下类别,如表13.1所示:
+高德地图 MCP 服务器提供了多种工具,主要分为以下类别,如表 13.1 所示:
 
 <div align="center">
-  <p>表 13.1 高德地图MCP工具分类</p>
+  <p>表 13.1 高德地图 MCP 工具分类</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-table-1.png" alt="" width="85%"/>
 </div>
 
-通过MCP协议,我们可以很方便地在HelloAgents中集成:
+通过 MCP 协议,我们可以很方便地在 HelloAgents 中集成:
 
 ```python
 from hello_agents.tools import MCPTool
@@ -706,11 +710,11 @@ mcp_tool = MCPTool(
 )
 ```
 
-这段代码做了什么呢?首先,`command`和`args`指定了如何启动MCP服务器。`npx -y @sugarforever/amap-mcp-server`会从npm仓库下载并运行`amap-mcp-server`这个包。`env`参数传递了环境变量,这里我们传递了高德地图的API密钥。
+这段代码做了什么呢?首先,`command`和`args`指定了如何启动 MCP 服务器。`npx -y @sugarforever/amap-mcp-server`会从 npm 仓库下载并运行`amap-mcp-server`这个包。`env`参数传递了环境变量,这里我们传递了高德地图的 API 密钥。
 
-当我们创建`MCPTool`对象时,它会在后台启动MCP服务器进程,并通过标准输入输出(stdin/stdout)与服务器通信。这是MCP协议的一个特点:使用进程间通信而不是HTTP,这样更高效,也更容易管理。
+当我们创建`MCPTool`对象时,它会在后台启动 MCP 服务器进程,并通过标准输入输出(stdin/stdout)与服务器通信。这是 MCP 协议的一个特点:使用进程间通信而不是 HTTP,这样更高效,也更容易管理。
 
-最关键的是`auto_expand=True`这个参数。当设置为True时,`MCPTool`会自动查询MCP服务器提供了哪些工具,然后为每个工具创建一个独立的Tool对象。这就是为什么我们只创建了一个`MCPTool`,但Agent却获得了16个工具。让我们看看这个过程:
+最关键的是`auto_expand=True`这个参数。当设置为 True 时,`MCPTool`会自动查询 MCP 服务器提供了哪些工具,然后为每个工具创建一个独立的 Tool 对象。这就是为什么我们只创建了一个`MCPTool`,但 Agent 却获得了 16 个工具。让我们看看这个过程:
 
 ```python
 # 创建一个MCPTool
@@ -722,16 +726,16 @@ print(list(agent.tools.keys()))
 # ['amap_maps_text_search', 'amap_maps_weather', ...]
 ```
 
-如图13.8所示,假设用户想搜索北京的景点,AttractionSearchAgent接收到查询"请搜索北京的历史文化景点"。Agent分析这个查询,决定调用`amap_maps_text_search`工具,参数是`keywords=景点,city=北京`。
+如图 13.8 所示,假设用户想搜索北京的景点,AttractionSearchAgent 接收到查询"请搜索北京的历史文化景点"。Agent 分析这个查询,决定调用`amap_maps_text_search`工具,参数是`keywords=景点,city=北京`。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/13-figures/13-8.png" alt="" width="85%"/>
-  <p>图 13.8 MCP工具调用流程</p>
+  <p>图 13.8 MCP 工具调用流程</p>
 </div>
 
-Agent生成工具调用标记:`[TOOL_CALL:amap_maps_text_search:keywords=景点,city=北京]`。HelloAgents框架解析这个标记,提取工具名称和参数,然后调用对应的Tool对象。
+Agent 生成工具调用标记:`[TOOL_CALL:amap_maps_text_search:keywords=景点,city=北京]`。HelloAgents 框架解析这个标记,提取工具名称和参数,然后调用对应的 Tool 对象。
 
-Tool对象是`MCPTool`自动创建的,它会把调用请求发送给MCP服务器。具体来说,它会构造一个JSON-RPC格式的消息,通过stdin发送给服务器进程:
+Tool 对象是`MCPTool`自动创建的,它会把调用请求发送给 MCP 服务器。具体来说,它会构造一个 JSON-RPC 格式的消息,通过 stdin 发送给服务器进程:
 
 ```json
 {
@@ -747,9 +751,9 @@ Tool对象是`MCPTool`自动创建的,它会把调用请求发送给MCP服务
 }
 ```
 
-MCP服务器接收到这个消息,解析参数,然后调用高德地图的HTTP API。它会构造HTTP请求,添加API密钥,发送请求,接收响应。
+MCP 服务器接收到这个消息,解析参数,然后调用高德地图的 HTTP API。它会构造 HTTP 请求,添加 API 密钥,发送请求,接收响应。
 
-高德地图API返回JSON格式的数据,包含景点列表、地址、坐标等信息。MCP服务器解析这些数据,提取关键字段,然后构造响应消息,通过stdout返回给`MCPTool`:
+高德地图 API 返回 JSON 格式的数据,包含景点列表、地址、坐标等信息。MCP 服务器解析这些数据,提取关键字段,然后构造响应消息,通过 stdout 返回给`MCPTool`:
 
 ```json
 {
@@ -765,19 +769,19 @@ MCP服务器接收到这个消息,解析参数,然后调用高德地图的HT
 }
 ```
 
-`MCPTool`接收到响应,提取文本内容,返回给Agent。Agent把这个结果作为工具调用的输出,继续生成最终的回复。
+`MCPTool`接收到响应,提取文本内容,返回给 Agent。Agent 把这个结果作为工具调用的输出,继续生成最终的回复。
 
-这个流程看起来很复杂,但对于Agent来说,它只需要知道有一个叫`amap_maps_text_search`的工具,可以搜索景点。所有的底层细节都被MCP协议和`MCPTool`封装起来了。
+这个流程看起来很复杂,但对于 Agent 来说,它只需要知道有一个叫`amap_maps_text_search`的工具,可以搜索景点。所有的底层细节都被 MCP 协议和`MCPTool`封装起来了。
 
-### 13.4.3 共享MCP实例
+### 13.4.3 共享 MCP 实例
 
-在我们的多Agent系统中,有三个Agent都需要使用高德地图的工具。那么每个Agent应该创建自己的`MCPTool`实例,还是共享同一个实例?
+在我们的多 Agent 系统中,有三个 Agent 都需要使用高德地图的工具。那么每个 Agent 应该创建自己的`MCPTool`实例,还是共享同一个实例?
 
-如果每个Agent都创建一个`MCPTool`实例,这意味着会有三个服务器进程同时运行。每个进程都会独立地调用高德地图API,这可能会超过API的速率限制。而且多个进程会占用更多的内存和CPU资源。
+如果每个 Agent 都创建一个`MCPTool`实例,这意味着会有三个服务器进程同时运行。每个进程都会独立地调用高德地图 API,这可能会超过 API 的速率限制。而且多个进程会占用更多的内存和 CPU 资源。
 
-更好的做法是让所有Agent共享同一个`MCPTool`实例。这样只需要启动一个MCP服务器进程,所有的API调用都通过这个进程进行。这不仅节省资源,还可以更好地控制API调用频率。
+更好的做法是让所有 Agent 共享同一个`MCPTool`实例。这样只需要启动一个 MCP 服务器进程,所有的 API 调用都通过这个进程进行。这不仅节省资源,还可以更好地控制 API 调用频率。
 
-在代码中,我们在`TripPlannerAgent`的构造函数中创建一个`MCPTool`实例,然后把它添加到每个子Agent的工具列表中:
+在代码中,我们在`TripPlannerAgent`的构造函数中创建一个`MCPTool`实例,然后把它添加到每个子 Agent 的工具列表中:
 
 ```python
 class TripPlannerAgent:
@@ -817,13 +821,13 @@ class TripPlannerAgent:
         self.hotel_agent.add_tool(self.mcp_tool)  # 共享
 ```
 
-这样,三个Agent都可以使用高德地图的16个工具,但底层只有一个MCP服务器进程在运行。当我们调用`TripPlannerAgent`的`plan_trip`方法时,三个Agent会依次调用工具,所有的请求都通过同一个MCP服务器发送到高德地图API。
+这样,三个 Agent 都可以使用高德地图的 16 个工具,但底层只有一个 MCP 服务器进程在运行。当我们调用`TripPlannerAgent`的`plan_trip`方法时,三个 Agent 会依次调用工具,所有的请求都通过同一个 MCP 服务器发送到高德地图 API。
 
-### 13.4.4 Unsplash图片API集成
+### 13.4.4 Unsplash 图片 API 集成
 
-除了高德地图,我们还需要为景点获取图片,让旅行计划更加生动直观。我们使用Unsplash API来搜索景点图片。需要注意的是,Unsplash是国外的服务,而且是为数不多可以免费使用的图片API,所以搜索结果可能不够准确。在实际项目中,可以考虑使用必应、百度或高德的POI图片API,但这些服务通常需要付费。
+除了高德地图,我们还需要为景点获取图片,让旅行计划更加生动直观。我们使用 Unsplash API 来搜索景点图片。需要注意的是,Unsplash 是国外的服务,而且是为数不多可以免费使用的图片 API,所以搜索结果可能不够准确。在实际项目中,可以考虑使用必应、百度或高德的 POI 图片 API,但这些服务通常需要付费。
 
-Unsplash API的集成比较简单,我们创建一个`UnsplashService`类来封装API调用:
+Unsplash API 的集成比较简单,我们创建一个`UnsplashService`类来封装 API 调用:
 
 ```python
 # backend/app/services/unsplash_service.py
@@ -877,7 +881,7 @@ class UnsplashService:
         return photos[0].get("url") if photos else None
 ```
 
-这个服务类提供了两个方法:`search_photos`搜索多张图片,`get_photo_url`获取单张图片的URL。我们在API路由中使用这个服务,为每个景点获取图片:
+这个服务类提供了两个方法:`search_photos`搜索多张图片,`get_photo_url`获取单张图片的 URL。我们在 API 路由中使用这个服务,为每个景点获取图片:
 ```python
 # backend/app/api/routes/trip.py
 from app.services.unsplash_service import UnsplashService
@@ -901,19 +905,19 @@ async def create_trip_plan(request: TripPlanRequest) -> TripPlan:
     return trip_plan
 ```
 
-注意我们没有把Unsplash封装成Tool或MCP工具,而是直接在API路由中调用。这是因为图片搜索不需要Agent的智能决策,只是一个简单的数据增强步骤。如果你想让Agent能够自主决定是否需要图片,或者选择不同的图片来源,可以考虑把它封装成Tool。
+注意我们没有把 Unsplash 封装成 Tool  MCP 工具,而是直接在 API 路由中调用。这是因为图片搜索不需要 Agent 的智能决策,只是一个简单的数据增强步骤。如果你想让 Agent 能够自主决定是否需要图片,或者选择不同的图片来源,可以考虑把它封装成 Tool。
 
 ## 13.5 前端开发详解
 
-### 13.5.1 前后端分离的Web架构
+### 13.5.1 前后端分离的 Web 架构
 
-在开始前端开发之前,我们需要理解现代Web应用的架构模式。在早期的Web开发中,前端和后端是混在一起的,比如PHP、JSP这样的技术,HTML模板和业务逻辑代码写在同一个文件里。这种方式在小项目中很方便,但在大型项目中会遇到很多问题:前端和后端开发者需要频繁协调,代码难以复用,测试困难。
+在开始前端开发之前,我们需要理解现代 Web 应用的架构模式。在早期的 Web 开发中,前端和后端是混在一起的,比如 PHP、JSP 这样的技术,HTML 模板和业务逻辑代码写在同一个文件里。这种方式在小项目中很方便,但在大型项目中会遇到很多问题:前端和后端开发者需要频繁协调,代码难以复用,测试困难。
 
-现代Web应用普遍采用<strong>前后端分离</strong>的架构。后端只负责提供API接口,返回JSON格式的数据。前端是一个独立的应用,通过HTTP请求调用后端API,获取数据后渲染页面。这种架构有几个明显的优势:前端和后端可以独立开发、独立部署、独立测试;前端可以是Web应用、移动应用或桌面应用,都使用同一套后端API;前端可以使用现代的框架和工具链,提供更好的用户体验。
+现代 Web 应用普遍采用<strong>前后端分离</strong>的架构。后端只负责提供 API 接口,返回 JSON 格式的数据。前端是一个独立的应用,通过 HTTP 请求调用后端 API,获取数据后渲染页面。这种架构有几个明显的优势:前端和后端可以独立开发、独立部署、独立测试;前端可以是 Web 应用、移动应用或桌面应用,都使用同一套后端 API;前端可以使用现代的框架和工具链,提供更好的用户体验。
 
-在我们的智能旅行助手项目中,后端是用Python和FastAPI实现的,提供了一个核心API接口`POST /api/trip/plan`,接收旅行需求,返回旅行计划。前端是用Vue 3和TypeScript实现的,是一个单页应用(SPA),用户在浏览器中填写表单,点击"开始规划"按钮,前端发送HTTP请求到后端,等待响应,然后渲染结果页面。整个过程中,页面不会刷新,用户体验很流畅。
+在我们的智能旅行助手项目中,后端是用 Python  FastAPI 实现的,提供了一个核心 API 接口`POST /api/trip/plan`,接收旅行需求,返回旅行计划。前端是用 Vue 3  TypeScript 实现的,是一个单页应用(SPA),用户在浏览器中填写表单,点击"开始规划"按钮,前端发送 HTTP 请求到后端,等待响应,然后渲染结果页面。整个过程中,页面不会刷新,用户体验很流畅。
 
-前端技术栈的选择需要考虑几个因素:开发效率、性能、生态系统、学习曲线。如表13.2所示,该项目选择了以下技术栈:
+前端技术栈的选择需要考虑几个因素:开发效率、性能、生态系统、学习曲线。如表 13.2 所示,该项目选择了以下技术栈:
 
 <div align="center">
   <p>表 13.2 前端技术栈</p>
@@ -940,11 +944,11 @@ frontend/
 └── tsconfig.json
 ```
 
-其中`views`目录存放页面组件,`services`目录存放API调用逻辑,`types`目录存放TypeScript类型定义,`router`目录存放路由配置。
+其中`views`目录存放页面组件,`services`目录存放 API 调用逻辑,`types`目录存放 TypeScript 类型定义,`router`目录存放路由配置。
 
 ### 13.5.2 类型定义
 
-在13.2节中,我们在后端使用Pydantic定义了数据模型,比如`Location`、`Attraction`、`DayPlan`、`TripPlan`等。在前端,我们需要定义对应的TypeScript类型。
+在 13.2 节中,我们在后端使用 Pydantic 定义了数据模型,比如`Location`、`Attraction`、`DayPlan`、`TripPlan`等。在前端,我们需要定义对应的 TypeScript 类型。
 
 让我们看看如何定义这些类型。首先是最基础的`Location`类型,表示经纬度坐标:
 
@@ -956,7 +960,7 @@ export interface Location {
 }
 ```
 
-这个类型定义和后端的Pydantic模型完全对应。注意TypeScript使用`interface`关键字定义类型,字段类型用冒号分隔,不需要默认值。
+这个类型定义和后端的 Pydantic 模型完全对应。注意 TypeScript 使用`interface`关键字定义类型,字段类型用冒号分隔,不需要默认值。
 
 接下来是`Attraction`类型,表示景点信息:
 
@@ -974,7 +978,7 @@ export interface Attraction {
 }
 ```
 
-注意这里使用了`Location`类型作为字段类型,这就是嵌套类型。问号`?`表示可选字段,对应后端Pydantic模型中的`Optional`。
+注意这里使用了`Location`类型作为字段类型,这就是嵌套类型。问号`?`表示可选字段,对应后端 Pydantic 模型中的`Optional`。
 
 类似地,我们定义`Meal`、`Hotel`、`Budget`、`WeatherInfo`等类型。最后是顶层的`TripPlan`类型:
 
@@ -1005,11 +1009,11 @@ export interface TripPlanRequest {
 }
 ```
 
-这些类型定义有什么用呢?首先,当我们调用API时,TypeScript会检查我们传递的数据是否符合`TripPlanRequest`类型。如果我们不小心把`days`写成了字符串,TypeScript会立即报错。其次,当我们接收API响应时,TypeScript会检查响应数据是否符合`TripPlan`类型。如果后端返回的数据结构发生变化,前端会立即发现。最后,IDE可以根据类型定义提供代码补全,我们输入`tripPlan.`时,IDE会自动列出所有可用的字段。
+这些类型定义有什么用呢?首先,当我们调用 API 时,TypeScript 会检查我们传递的数据是否符合`TripPlanRequest`类型。如果我们不小心把`days`写成了字符串,TypeScript 会立即报错。其次,当我们接收 API 响应时,TypeScript 会检查响应数据是否符合`TripPlan`类型。如果后端返回的数据结构发生变化,前端会立即发现。最后,IDE 可以根据类型定义提供代码补全,我们输入`tripPlan.`时,IDE 会自动列出所有可用的字段。
 
-### 13.5.3 API服务封装
+### 13.5.3 API 服务封装
 
-有了类型定义,我们就可以封装API调用了。我们创建一个`api.ts`文件,使用Axios来发送HTTP请求:
+有了类型定义,我们就可以封装 API 调用了。我们创建一个`api.ts`文件,使用 Axios 来发送 HTTP 请求:
 
 ```typescript
 import axios from 'axios'
@@ -1024,7 +1028,7 @@ const api = axios.create({
 })
 ```
 
-这里我们创建了一个Axios实例,配置了基础URL、超时时间和请求头。为什么超时时间设置为2分钟?因为生成旅行计划需要调用多个Agent,每个Agent都要调用LLM和外部API,整个过程可能需要10-30秒。如果超时时间太短,请求会被中断。
+这里我们创建了一个 Axios 实例,配置了基础 URL、超时时间和请求头。为什么超时时间设置为 2 分钟?因为生成旅行计划需要调用多个 Agent,每个 Agent 都要调用 LLM 和外部 API,整个过程可能需要 10-30 秒。如果超时时间太短,请求会被中断。
 
 接下来我们添加拦截器。拦截器可以在请求发送前和响应接收后执行一些通用逻辑,比如日志记录、错误处理、认证等:
 
@@ -1051,7 +1055,7 @@ api.interceptors.response.use(
 )
 ```
 
-最后我们定义API函数,这是前端调用后端的唯一入口:
+最后我们定义 API 函数,这是前端调用后端的唯一入口:
 
 ```typescript
 // 生成旅行计划
@@ -1061,11 +1065,11 @@ export const generateTripPlan = async (request: TripPlanRequest): Promise<TripPl
 }
 ```
 
-注意这个函数的类型签名:参数是`TripPlanRequest`类型,返回值是`Promise<TripPlan>`类型。这意味着TypeScript会检查调用者传递的参数是否符合要求,也会检查返回值的使用是否正确。
+注意这个函数的类型签名:参数是`TripPlanRequest`类型,返回值是`Promise<TripPlan>`类型。这意味着 TypeScript 会检查调用者传递的参数是否符合要求,也会检查返回值的使用是否正确。
 
-### 13.5.4 Home表单设计
+### 13.5.4 Home 表单设计
 
-Home页面是用户的入口,包含一个表单,让用户填写旅行需求。我们使用Vue 3的Composition API来组织代码:
+Home 页面是用户的入口,包含一个表单,让用户填写旅行需求。我们使用 Vue 3  Composition API 来组织代码:
 
 ```vue
 <script setup lang="ts">
@@ -1127,11 +1131,11 @@ const handleSubmit = async () => {
 }
 ```
 
-这段代码做了几件事。首先,设置`loading`为true,显示加载状态。然后,启动一个定时器,每500毫秒更新一次进度条和状态文本。这是一个模拟的进度,因为我们无法准确知道后端的处理进度。但这样可以让用户知道系统正在工作,而不是卡住了。
+这段代码做了几件事。首先,设置`loading`为 true,显示加载状态。然后,启动一个定时器,每 500 毫秒更新一次进度条和状态文本。这是一个模拟的进度,因为我们无法准确知道后端的处理进度。但这样可以让用户知道系统正在工作,而不是卡住了。
 
-接着,调用`generateTripPlan`函数发送API请求。这是一个异步操作,我们使用`await`等待响应。如果请求成功,清除定时器,设置进度为100%,然后跳转到结果页面,并把旅行计划数据传递过去。如果请求失败,显示错误消息。最后,无论成功还是失败,都设置`loading`为false,隐藏加载状态。
+接着,调用`generateTripPlan`函数发送 API 请求。这是一个异步操作,我们使用`await`等待响应。如果请求成功,清除定时器,设置进度为 100%,然后跳转到结果页面,并把旅行计划数据传递过去。如果请求失败,显示错误消息。最后,无论成功还是失败,都设置`loading`为 false,隐藏加载状态。
 
-模板部分使用Ant Design Vue的组件:
+模板部分使用 Ant Design Vue 的组件:
 
 ```vue
 <template>
@@ -1168,11 +1172,11 @@ const handleSubmit = async () => {
 
 注意`v-model:value`指令,它实现了双向数据绑定。当用户在输入框中输入内容时,`formData.city`会自动更新。当`formData.city`的值改变时,输入框的内容也会自动更新。
 
-### 13.5.5 Result页面展示
+### 13.5.5 Result 页面展示
 
-Result页面是整个应用的核心,展示生成的旅行计划。这个页面包含几个部分:行程概览、预算明细、地图可视化、每日行程详情、天气信息。
+Result 页面是整个应用的核心,展示生成的旅行计划。这个页面包含几个部分:行程概览、预算明细、地图可视化、每日行程详情、天气信息。
 
-首先是地图可视化。我们使用高德地图JS API在地图上标注景点位置:
+首先是地图可视化。我们使用高德地图 JS API 在地图上标注景点位置:
 
 ```typescript
 import AMapLoader from '@amap/amap-jsapi-loader'
@@ -1202,9 +1206,9 @@ const initMap = async () => {
 }
 ```
 
-这段代码首先加载高德地图SDK,然后创建地图实例,最后遍历所有景点,为每个景点创建一个标记(Marker)。标记的位置是景点的经纬度坐标,这些坐标是从后端的`Attraction`对象中获取的。
+这段代码首先加载高德地图 SDK,然后创建地图实例,最后遍历所有景点,为每个景点创建一个标记(Marker)。标记的位置是景点的经纬度坐标,这些坐标是从后端的`Attraction`对象中获取的。
 
-导出功能使用`html2canvas`和`jsPDF`库。`html2canvas`可以把DOM元素转换成Canvas,然后我们可以把Canvas导出为图片或PDF:
+导出功能使用`html2canvas`和`jsPDF`库。`html2canvas`可以把 DOM 元素转换成 Canvas,然后我们可以把 Canvas 导出为图片或 PDF:
 
 ```typescript
 import html2canvas from 'html2canvas'
@@ -1233,7 +1237,7 @@ const exportAsPDF = async () => {
 }
 ```
 
-通过这些前端技术,我们实现了一个完整的Web应用。用户可以在浏览器中填写表单,提交请求,等待AI生成旅行计划,然后查看详细的行程安排,在地图上看到景点位置,还可以导出为图片或PDF。整个过程流畅自然,这就是现代Web应用的魅力。
+通过这些前端技术,我们实现了一个完整的 Web 应用。用户可以在浏览器中填写表单,提交请求,等待 AI 生成旅行计划,然后查看详细的行程安排,在地图上看到景点位置,还可以导出为图片或 PDF。整个过程流畅自然,这就是现代 Web 应用的魅力。
 
 ## 13.6 功能实现详解
 
@@ -1243,9 +1247,9 @@ const exportAsPDF = async () => {
 
 在规划旅行时,预算是一个非常重要的考虑因素。用户需要知道这次旅行大概要花多少钱,钱都花在哪里。我们的智能旅行助手提供了自动预算计算功能,将费用分为四大类:景点门票、酒店住宿、餐饮和交通。
 
-预算计算的逻辑在哪里实现呢?我们选择在后端的PlannerAgent中实现。为什么不在前端计算?因为预算的估算需要基于景点的门票价格、酒店的价格范围、餐饮的标准等信息,这些信息都是PlannerAgent在生成行程时已经获取的。如果在前端计算,就需要重复这些逻辑,而且可能不准确。
+预算计算的逻辑在哪里实现呢?我们选择在后端的 PlannerAgent 中实现。为什么不在前端计算?因为预算的估算需要基于景点的门票价格、酒店的价格范围、餐饮的标准等信息,这些信息都是 PlannerAgent 在生成行程时已经获取的。如果在前端计算,就需要重复这些逻辑,而且可能不准确。
 
-在PlannerAgent的提示词中,我们明确要求LLM生成预算信息:
+在 PlannerAgent 的提示词中,我们明确要求 LLM 生成预算信息:
 
 ```python
 PLANNER_AGENT_PROMPT = """
@@ -1270,9 +1274,9 @@ PLANNER_AGENT_PROMPT = """
 """
 ```
 
-LLM会根据行程中的景点、酒店、餐饮安排,估算每一项的费用。比如,如果行程中包含故宫(门票60元)、天坛(门票15元)、颐和园(门票30元),那么景点门票总费用就是105元。如果是3天2晚的行程,酒店是经济型(每晚300元),那么酒店总费用就是600元。
+LLM 会根据行程中的景点、酒店、餐饮安排,估算每一项的费用。比如,如果行程中包含故宫(门票 60 元)、天坛(门票 15 元)、颐和园(门票 30 元),那么景点门票总费用就是 105 元。如果是 3  2 晚的行程,酒店是经济型(每晚 300 元),那么酒店总费用就是 600 元。
 
-在前端,我们使用Ant Design Vue的Statistic组件来展示预算信息。这个组件专门用于展示统计数据,支持数字动画、前缀后缀、自定义样式等:
+在前端,我们使用 Ant Design Vue  Statistic 组件来展示预算信息。这个组件专门用于展示统计数据,支持数字动画、前缀后缀、自定义样式等:
 
 ```vue
 <a-card v-if="tripPlan.budget" title="💰 预算明细">
@@ -1306,11 +1310,11 @@ LLM会根据行程中的景点、酒店、餐饮安排,估算每一项的费
 
 这段代码使用了栅格布局(`a-row`和`a-col`),将四项费用并排显示。每项费用使用一个`a-statistic`组件,显示标题和数值。最后用一个分隔线(`a-divider`)隔开,下面显示总费用,使用红色大字体突出显示。
 
-注意`v-if="tripPlan.budget"`这个条件渲染。因为预算信息是可选的(在Pydantic模型中定义为`Optional[Budget]`),如果LLM没有生成预算信息,这个卡片就不会显示。这体现了前端对数据的容错处理。
+注意`v-if="tripPlan.budget"`这个条件渲染。因为预算信息是可选的(在 Pydantic 模型中定义为`Optional[Budget]`),如果 LLM 没有生成预算信息,这个卡片就不会显示。这体现了前端对数据的容错处理。
 
 ### 13.6.2 加载进度条
 
-生成旅行计划是一个耗时的操作。后端需要依次调用AttractionSearchAgent、WeatherQueryAgent、HotelAgent和PlannerAgent,每个Agent都要调用LLM和外部API。整个过程可能需要10-30秒。如果用户点击"开始规划"按钮后,页面没有任何反馈,用户会以为系统卡住了,可能会刷新页面或重复点击。
+生成旅行计划是一个耗时的操作。后端需要依次调用 AttractionSearchAgent、WeatherQueryAgent、HotelAgent  PlannerAgent,每个 Agent 都要调用 LLM 和外部 API。整个过程可能需要 10-30 秒。如果用户点击"开始规划"按钮后,页面没有任何反馈,用户会以为系统卡住了,可能会刷新页面或重复点击。
 
 为了提升用户体验,我们添加了加载进度条和状态提示。现在只是模拟进度,可以让用户知道系统正在工作。
 
@@ -1351,7 +1355,7 @@ const handleSubmit = async () => {
 
 ### 13.6.3 行程编辑功能
 
-AI生成的旅行计划虽然很智能,但可能不完全符合用户的个人需求。比如,用户可能不喜欢某个景点,想删除它;或者想调整景点的游览顺序。我们提供了行程编辑功能,让用户可以自定义行程。
+AI 生成的旅行计划虽然很智能,但可能不完全符合用户的个人需求。比如,用户可能不喜欢某个景点,想删除它;或者想调整景点的游览顺序。我们提供了行程编辑功能,让用户可以自定义行程。
 
 编辑功能的核心是<strong>状态管理</strong>。我们需要维护两个状态:当前的行程计划和原始的行程计划。当用户进入编辑模式时,我们保存原始计划的副本。如果用户取消编辑,就恢复原始计划。如果用户保存修改,就更新当前计划:
 
@@ -1366,7 +1370,7 @@ const toggleEditMode = () => {
 }
 ```
 
-注意这里使用了`JSON.parse(JSON.stringify(...))`来深拷贝对象。为什么不直接赋值?因为JavaScript中对象是引用类型,如果直接赋值,`originalPlan`和`tripPlan`会指向同一个对象,修改一个会影响另一个。深拷贝可以创建一个完全独立的副本。
+注意这里使用了`JSON.parse(JSON.stringify(...))`来深拷贝对象。为什么不直接赋值?因为 JavaScript 中对象是引用类型,如果直接赋值,`originalPlan`和`tripPlan`会指向同一个对象,修改一个会影响另一个。深拷贝可以创建一个完全独立的副本。
 
 移动景点的逻辑是交换数组中两个元素的位置:
 
@@ -1383,7 +1387,7 @@ const moveAttraction = (dayIndex: number,attractionIndex: number,direction: 'up'
 }
 ```
 
-这里使用了ES6的解构赋值语法来交换两个元素。`[a,b] = [b,a]`是一个很优雅的交换方式,不需要临时变量。
+这里使用了 ES6 的解构赋值语法来交换两个元素。`[a,b] = [b,a]`是一个很优雅的交换方式,不需要临时变量。
 
 删除景点使用数组的`splice`方法:
 
@@ -1413,7 +1417,7 @@ const cancelEdit = () => {
 }
 ```
 
-在模板中,我们根据`editMode`的值显示不同的UI。编辑模式下,每个景点旁边会显示上移、下移、删除按钮:
+在模板中,我们根据`editMode`的值显示不同的 UI。编辑模式下,每个景点旁边会显示上移、下移、删除按钮:
 
 ```vue
 <div v-if="editMode" class="edit-buttons">
@@ -1425,15 +1429,15 @@ const cancelEdit = () => {
 
 ### 13.6.4 导出功能
 
-用户生成了满意的旅行计划后,可能想保存下来或分享给朋友。我们提供了两种导出方式:导出为图片和导出为PDF。
+用户生成了满意的旅行计划后,可能想保存下来或分享给朋友。我们提供了两种导出方式:导出为图片和导出为 PDF。
 
-导出功能的核心是`html2canvas`库。这个库可以把DOM元素转换成Canvas,然后我们可以把Canvas导出为图片。但这里有一个技术难点:地图是用Canvas渲染的,而`html2canvas`在处理嵌套Canvas时存在兼容性问题。
+导出功能的核心是`html2canvas`库。这个库可以把 DOM 元素转换成 Canvas,然后我们可以把 Canvas 导出为图片。但这里有一个技术难点:地图是用 Canvas 渲染的,而`html2canvas`在处理嵌套 Canvas 时存在兼容性问题。
 
-我们尝试了多种解决方案,包括将地图Canvas转换成图片后再导出,但由于高德地图的Canvas渲染机制和跨域限制,这个方案并没有完全解决问题。在实际项目中,可能需要考虑以下替代方案:
+我们尝试了多种解决方案,包括将地图 Canvas 转换成图片后再导出,但由于高德地图的 Canvas 渲染机制和跨域限制,这个方案并没有完全解决问题。在实际项目中,可能需要考虑以下替代方案:
 
-1. <strong>使用高德地图的静态地图API</strong>:调用`maps_staticmap`工具生成静态地图图片,替代动态地图
+1. <strong>使用高德地图的静态地图 API</strong>:调用`maps_staticmap`工具生成静态地图图片,替代动态地图
 2. <strong>分开导出</strong>:地图和行程内容分开导出,最后在后端合并
-3. <strong>使用截图服务</strong>:使用Puppeteer等无头浏览器在服务端截图
+3. <strong>使用截图服务</strong>:使用 Puppeteer 等无头浏览器在服务端截图
 4. <strong>简化导出内容</strong>:导出时隐藏地图,只导出文字内容
 
 目前的实现中,我们采用了简化方案,在导出时暂时隐藏地图部分,只导出行程的文字内容和景点信息。虽然这不是最理想的方案,但可以保证导出功能的可用性。
@@ -1461,9 +1465,9 @@ const exportAsImage = async () => {
 }
 ```
 
-`scale: 2`表示使用2倍分辨率,这样导出的图片更清晰。`useCORS: true`允许跨域加载图片,这对于景点图片(来自Unsplash)很重要。
+`scale: 2`表示使用 2 倍分辨率,这样导出的图片更清晰。`useCORS: true`允许跨域加载图片,这对于景点图片(来自 Unsplash)很重要。
 
-导出为PDF需要额外的步骤:先转换成Canvas,再转换成图片,最后添加到PDF中:
+导出为 PDF 需要额外的步骤:先转换成 Canvas,再转换成图片,最后添加到 PDF 中:
 
 ```typescript
 import jsPDF from 'jspdf'
@@ -1496,13 +1500,13 @@ const exportAsPDF = async () => {
 }
 ```
 
-这里需要计算图片的高度,保持宽高比。A4纸的宽度是210mm,我们根据Canvas的宽高比计算出对应的高度。
+这里需要计算图片的高度,保持宽高比。A4 纸的宽度是 210mm,我们根据 Canvas 的宽高比计算出对应的高度。
 
 ### 13.6.5 侧边导航与锚点跳转
 
-Result页面的内容很多,包括行程概览、预算明细、地图、每日行程、天气信息等。如果用户想快速跳转到某个部分,需要滚动很长的距离。我们提供了侧边导航和锚点跳转功能,让用户可以快速定位。
+Result 页面的内容很多,包括行程概览、预算明细、地图、每日行程、天气信息等。如果用户想快速跳转到某个部分,需要滚动很长的距离。我们提供了侧边导航和锚点跳转功能,让用户可以快速定位。
 
-侧边导航使用Ant Design Vue的Menu组件:
+侧边导航使用 Ant Design Vue  Menu 组件:
 
 ```vue
 <a-menu
@@ -1533,9 +1537,9 @@ const scrollToSection = ({ key }: { key: string }) => {
 }
 ```
 
-`scrollIntoView`是浏览器原生的API,可以让元素滚动到可视区域。`behavior: 'smooth'`表示平滑滚动,而不是瞬间跳转。`block: 'start'`表示元素的顶部对齐到可视区域的顶部。
+`scrollIntoView`是浏览器原生的 API,可以让元素滚动到可视区域。`behavior: 'smooth'`表示平滑滚动,而不是瞬间跳转。`block: 'start'`表示元素的顶部对齐到可视区域的顶部。
 
-在页面的各个部分,我们需要添加对应的id:
+在页面的各个部分,我们需要添加对应的 id:
 
 ```vue
 <div id="overview">
@@ -1553,7 +1557,7 @@ const scrollToSection = ({ key }: { key: string }) => {
 
 这样,当用户点击侧边导航的某个菜单项时,页面会平滑滚动到对应的部分。
 
-通过这些功能的实现,我们的智能旅行助手不仅能够生成旅行计划,还提供了丰富的交互功能:预算计算让用户了解费用,加载进度条让等待不再焦虑,行程编辑让计划更符合个人需求,导出功能让计划可以分享和保存,侧边导航让长页面易于浏览。这些功能的组合,构成了一个完整、易用、实用的Web应用。
+通过这些功能的实现,我们的智能旅行助手不仅能够生成旅行计划,还提供了丰富的交互功能:预算计算让用户了解费用,加载进度条让等待不再焦虑,行程编辑让计划更符合个人需求,导出功能让计划可以分享和保存,侧边导航让长页面易于浏览。这些功能的组合,构成了一个完整、易用、实用的 Web 应用。
 
 ## 13.7 结语
 
@@ -1564,7 +1568,7 @@ const scrollToSection = ({ key }: { key: string }) => {
 1. <strong>系统设计思维</strong>: 如何将复杂问题分解为多个简单任务
 2. <strong>工程实践能力</strong>: 如何将理论知识转化为可运行的代码
 3. <strong>全栈开发能力</strong>: 如何整合前后端技术栈
-4. <strong>AI应用开发</strong>: 如何利用LLM构建实用的应用
+4. <strong>AI 应用开发</strong>: 如何利用 LLM 构建实用的应用
 
 这个项目是一个起点,而不是终点。你可以基于这个项目:
 
@@ -1573,7 +1577,7 @@ const scrollToSection = ({ key }: { key: string }) => {
 - 扩展到其他领域(如智能购物助手、智能学习助手等)
 - 部署到生产环境,服务真实用户
 
-最好的学习方式是实践。不要只是阅读代码,而是要动手修改、扩展、优化。每一次实践都会让你对多Agent系统有更深的理解。
+最好的学习方式是实践。不要只是阅读代码,而是要动手修改、扩展、优化。每一次实践都会让你对多 Agent 系统有更深的理解。
 
-祝你在AI应用开发的道路上越走越远!
+祝你在 AI 应用开发的道路上越走越远!
 

+ 2160 - 0
docs/chapter14/Chapter14-Automated-Deep-Research-Agent.md

@@ -0,0 +1,2160 @@
+<div align="right">
+  English | <a href="./第十四章%20自动化深度研究智能体.md">中文</a>
+</div>
+
+# Chapter 14: Automated Deep Research Agent
+
+In Chapter 13's travel assistant project, we experienced how to apply HelloAgents to a multi-agent product. In this chapter, we continue forward, focusing on **knowledge-intensive applications**: **building an agent assistant that can automatically execute deep research tasks.**
+
+Compared to travel planning, the difficulty of deep research lies in the continuous divergence of information, rapid updates of facts, and users' high requirements for citation sources. To deliver trustworthy research reports, we need to equip agents with three core capabilities:
+
+**(1) Problem Analysis**: Decompose users' open topics into retrievable query statements.
+
+**(2) Multi-Round Information Collection**: Continuously mine materials by combining different search APIs and deduplicate and integrate them.
+
+**(3) Reflection and Summarization**: Identify knowledge gaps based on stage results, decide whether to continue retrieval, and generate structured summaries.
+
+## 14.1 Project Overview and Architecture Design
+
+### 14.1.1 Why We Need a Deep Research Assistant
+
+In the era of information explosion, we need to quickly understand new technologies, concepts, or events every day. Traditional research methods have several pain points. First is **information overload**. Search engines return thousands of results, and you need to click on links one by one and read a lot of content to find useful information. Second is **lack of structure**. Even if you find relevant information, this information is often fragmented and lacks systematic organization. Finally is **repetitive labor**. Every time you research a new topic, you need to repeat the process of "search → read → summarize → organize".
+
+This is the problem that the deep research assistant needs to solve. It's not just a search tool, but a research assistant that can autonomously plan, execute, and summarize.
+
+**Core Value of Deep Research Assistant:**
+
+1. **Save Time**: Compress 1-2 hours of research work into 5-10 minutes
+2. **Improve Quality**: Systematic research process to avoid missing important information
+3. **Traceable**: Record all search results and sources for easy verification and citation
+4. **Extensible**: Easily add new search engines, data sources, and analysis tools
+
+### 14.1.2 Technical Architecture Overview
+
+This system still adopts the classic **front-end and back-end separation architecture**, as shown in Figure 14.1.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-1.png" alt="" width="85%"/>
+  <p>Figure 14.1 Deep Research Assistant Technical Architecture</p>
+</div>
+
+The system is designed with a four-layer architecture:
+
+**Front-End Layer (Vue3+TypeScript)**: Full-screen modal dialog UI, Markdown result visualization
+
+**Back-End Layer (FastAPI)**: API routing (`/research/stream`)
+
+**Agent Layer (HelloAgents)**: Three specialized Agents (TODO Planner, Task Summarizer, Report Writer) + Two core tools (SearchTool, NoteTool)
+
+**External Service Layer**: Search engines + LLM providers
+
+Let's see how a complete research request flows through the system, as shown in Figure 14.2:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-2.png" alt="" width="85%"/>
+  <p>Figure 14.2 Deep Research Assistant Data Flow Process</p>
+</div>
+
+1. **User Input**: User enters research topic on the front-end
+2. **Front-End Sends**: Front-end connects to `/research/stream` via SSE
+3. **Back-End Receives**: FastAPI receives request, creates research state
+4. **Planning Phase**: Calls research planning Agent, decomposes into 3 subtasks
+5. **Execution Phase**: Executes each subtask one by one
+   - Use SearchTool to search
+   - Call task summarization Agent to summarize
+   - Use NoteTool to record results
+6. **Report Phase**: Call report generation Agent, integrate all summaries
+7. **Stream Return**: Push progress and results to front-end via SSE
+8. **Front-End Display**: Front-end updates task status, progress bar, logs, and report in real-time
+
+The project directory structure is as follows:
+
+```
+helloagents-deepresearch/
+├── backend/                    # Back-end code
+│   ├── src/
+│   │   ├── agent.py           # Core coordinator
+│   │   ├── main.py            # FastAPI entry
+│   │   ├── models.py          # Data models
+│   │   ├── prompts.py         # Prompt templates
+│   │   ├── config.py          # Configuration management
+│   │   └── services/          # Service layer
+│   │       ├── planner.py     # Planning service
+│   │       ├── summarizer.py  # Summarization service
+│   │       ├── reporter.py    # Report service
+│   │       └── search.py      # Search service
+│   ├── .env                   # Environment variables
+│   ├── pyproject.toml         # Dependency management
+│   └── workspace/             # Research notes
+│
+└── frontend/                   # Front-end code
+    ├── src/
+    │   ├── App.vue            # Main component
+    │   ├── components/        # UI components
+    │   │   └── ResearchModal.vue
+    │   └── composables/       # Composable functions
+    │       └── useResearch.ts
+    ├── package.json           # npm dependencies
+    └── vite.config.ts         # Build configuration
+```
+
+### 14.1.3 Quick Experience: Run the Project in 5 Minutes
+
+Before diving into implementation details, let's first run the project to see the final result. This way you'll have an intuitive understanding of the entire system.
+
+You can check versions with the following commands:
+
+```bash
+python --version  # Should show Python 3.10.x or higher
+node --version    # Should show v16.x.x or higher
+npm --version     # Should show 8.x.x or higher
+```
+
+**(1) Start the Back-End**
+
+```bash
+# 1. Enter back-end directory
+cd helloagents-deepresearch/backend
+
+# 2. Install dependencies
+# Method 1: Using uv (recommended, faster Python package manager)
+uv sync
+
+# Method 2: Using pip
+pip install -e .
+
+# 3. Configure environment variables
+cp .env.example .env
+
+# 4. Edit .env file, fill in your API keys
+# Open .env file with your favorite editor
+# At minimum, configure:
+# - LLM_PROVIDER (e.g., openai, deepseek, qwen)
+# - LLM_API_KEY (your LLM API key)
+# - SEARCH_API (e.g., duckduckgo, tavily)
+
+# 5. Start back-end
+python src/main.py
+```
+
+If everything is normal, you'll see output similar to:
+
+```
+INFO:     Started server process [12345]
+INFO:     Waiting for application startup.
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
+```
+
+**(2) Start the Front-End**
+
+Open a new terminal window:
+
+```bash
+# 1. Enter front-end directory
+cd helloagents-deepresearch/frontend
+
+# 2. Install dependencies
+npm install
+
+# 3. Start front-end
+npm run dev
+```
+
+If everything is normal, you'll see output similar to:
+
+```
+  VITE v5.0.0  ready in 500 ms
+
+  ➜  Local:   http://localhost:5174/
+  ➜  Network: use --host to expose
+  ➜  press h + enter to show help
+```
+
+**(3) Start Research**
+
+Open your browser and visit `http://localhost:5174`. You'll see a centered input card, as shown in Figure 14.3. Enter a research topic, for example `What kind of organization is Datawhale?`, select a search engine (if multiple are configured), and click the "Start Research" button.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-3.png" alt="" width="85%"/>
+  <p>Figure 14.3 Deep Research Assistant Search Page</p>
+</div>
+
+As shown in Figure 14.4, the system will automatically expand to full screen, with research information displayed on the left and research progress and results displayed in real-time on the right. The entire research process takes about 1-3 minutes, depending on the complexity of the topic and the response speed of the search engine.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-4.png" alt="" width="85%"/>
+  <p>Figure 14.4 Deep Research Assistant Expanded Research</p>
+</div>
+
+After research is complete, you'll see:
+
+- **Task List**: Shows all subtasks and their status
+- **Progress Log**: Shows all operations during the research process
+- **Final Report**: Structured Markdown report containing summaries of all subtasks and source citations
+
+Now you've successfully run the deep research assistant and have an intuitive understanding of the system.
+
+## 14.2 TODO-Driven Research Paradigm
+
+### 14.2.1 What is TODO-Driven Research
+
+Traditional search engines can only answer single questions, while deep research needs to answer a series of related questions. The TODO-driven research paradigm decomposes complex research topics into multiple subtasks (TODOs), executes them one by one, and integrates the results.
+
+The core idea of this paradigm is: **Transform the complex task of "research" into a "planning → execution → integration" process**.
+
+Let's understand this transformation through an example. Suppose you want to research "What kind of organization is Datawhale?". The traditional search method is:
+
+```
+User input: What kind of organization is Datawhale?
+Search engine: Returns 10-20 links
+User: Click on links one by one, read content, take notes
+Result: Fragmented information, lacking systematization
+```
+
+The problem with this approach is that each link only covers one aspect of the topic, lacks systematic structure, and requires manual organization and summarization.
+
+**TODO-Driven Approach: Systematic Research**
+
+```
+User input: What kind of organization is Datawhale?
+
+System planning:
+  ├─ TODO 1: Basic information about Datawhale (organizational positioning)
+  ├─ TODO 2: Main projects of Datawhale (core content)
+  ├─ TODO 3: Community culture of Datawhale (values)
+  └─ TODO 4: Influence of Datawhale (social contribution)
+
+System execution:
+  For each TODO:
+    1. Search for relevant materials
+    2. Summarize key information
+    3. Record source citations
+
+System integration:
+  Generate structured report:
+    ├─ Part 1: Organizational positioning (from TODO 1)
+    ├─ Part 2: Core content (from TODO 2)
+    ├─ Part 3: Values (from TODO 3)
+    ├─ Part 4: Social contribution (from TODO 4)
+    └─ References: All source citations
+```
+
+The advantages of this approach are that it decomposes complex topics into clear sub-questions, records search results and summaries for each subtask for easy traceability, and the systematic research process avoids missing important information. It's also easy to add new subtasks or adjust execution order.
+
+A complete TODO-driven research system contains three core elements:
+
+**(1) Intelligent Planner (TODO Planner)**: Responsible for decomposing research topics into subtasks. A good planner needs to understand the key aspects and research objectives of the topic, decompose the topic into 3-5 subtasks (too few won't cover everything, too many will be redundant), and design appropriate search queries for each subtask.
+
+**(2) Task Executor**: Responsible for executing each subtask. The executor needs to use search engines to obtain relevant materials, extract key information and remove redundant content, while saving all source citations for easy verification.
+
+**(3) Report Writer**: Responsible for integrating the results of all subtasks. The generator needs to organize content in logical order, merge duplicate information, and add source citations for each viewpoint.
+
+In our case, the TODO-driven research process is shown in Figure 14.5:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-5.png" alt="" width="85%"/>
+  <p>Figure 14.5 TODO-Driven Research Process</p>
+</div>
+
+The entire process is linear, but each stage has clear inputs and outputs. This design makes the system easy to understand and debug.
+
+### 14.2.2 Three-Stage Research Process
+
+The TODO-driven research process is divided into three stages: Planning, Execution, and Reporting. Each stage has a dedicated Agent responsible for it.
+
+**(1) Stage 1: Planning**
+
+The goal of the planning stage is to decompose the research topic into 3-5 subtasks. The system receives the research topic and current date as input, and outputs a JSON-format list of subtasks. Each subtask contains three fields: title (task title), intent (research intent), and query (search query).
+
+The research planning Agent adopts different decomposition strategies based on topic characteristics, usually starting with basic concepts, then understanding technical status, practical applications, and development trends, and conducting comparative analysis when necessary. For example, for "What kind of organization is Datawhale?", the planning Agent might generate the following subtasks:
+
+```json
+[
+  {
+    "title": "Basic information about Datawhale",
+    "intent": "Understand Datawhale's organizational positioning, founding time, development history",
+    "query": "Datawhale organization introduction history 2024"
+  },
+  {
+    "title": "Main projects of Datawhale",
+    "intent": "Understand Datawhale's core open source projects and tutorials",
+    "query": "Datawhale projects tutorials open source 2024"
+  },
+  ...
+]
+```
+
+A good plan should be comprehensive, logically clear, have precise queries, and an appropriate number of items.
+
+**(2) Stage 2: Execution**
+
+The execution stage executes each subtask one by one, searching and summarizing relevant materials. The system receives the subtask list and search engine configuration as input, and outputs a summary (Markdown format) and source citation list for each subtask. The execution process is as follows:
+
+For each subtask, the executor will:
+
+1. **Search for materials**: Use the configured search engine to execute the search
+
+   ```python
+   search_results = search_tool.run({
+       "input": task.query,
+       "backend": "tavily",
+       "mode": "structured",
+       "max_results": 5
+   })
+   ```
+
+2. **Get search results**: Extract title, URL, snippet
+
+   ```json
+   {
+     "results": [
+       {
+         "title": "What is a Multimodal Model?",
+         "url": "https://example.com/multimodal-model",
+         "snippet": "A multimodal model is an AI model that can process multiple types of data..."
+       },
+       ...
+     ]
+   }
+   ```
+
+3. **Call summarization Agent**: Summarize search results
+
+   ```python
+   summary = summarizer_agent.run(
+       task=task,
+       search_results=search_results
+   )
+   ```
+
+4. **Record summary and sources**: Save to NoteTool
+
+   ```python
+   note_tool.run({
+       "action": "create",
+       "title": task.title,
+       "content": f"## {task.title}\n\n{summary}\n\n## Sources\n{sources}",
+       "tags": ["research", "summary"]
+   })
+   ```
+
+The task summarization Agent will extract core viewpoints from each search result, merge similar information, retain important numbers, dates, names and other key data, and add source citations for each viewpoint. For example, for the search results of "Basic information about Datawhale", the summarization Agent might generate:
+
+```markdown
+## Basic Information about Datawhale
+
+Datawhale is an open source organization focused on data science and AI, founded in 2018[1]. The organization's core mission is "for the learner, grow together with learners", committed to building a pure learning community[2].
+
+**Core Positioning:**
+
+1. **Open Source Education Platform**: Provides high-quality AI and data science learning resources[1]
+2. **Learner Community**: Gathers tens of thousands of AI learners and practitioners[3]
+3. **Knowledge Sharing**: Advocates open source spirit, all content is completely free and open[2]
+
+**Development History:**
+
+- **2018**: Datawhale was founded, released first open source tutorial[1]
+- **2020**: Became one of the leading AI learning communities in China[3]
+- **2024**: Released 50+ open source projects, impacting 100,000+ learners[4]
+
+## Sources
+
+[1] https://github.com/datawhalechina
+[2] https://datawhale.club/about
+[3] https://www.zhihu.com/org/datawhale
+[4] https://datawhale.cn
+```
+
+During execution, the system will push progress information to the front-end in real-time:
+
+```json
+{
+  "type": "status",
+  "message": "Searching: Basic information about Datawhale"
+}
+```
+
+```json
+{
+  "type": "status",
+  "message": "Summarizing search results..."
+}
+```
+
+```json
+{
+  "type": "task",
+  "task": {
+    "id": 1,
+    "title": "Basic information about Datawhale",
+    "status": "completed"
+  }
+}
+```
+
+**(3) Stage 3: Reporting**
+
+The goal of the reporting stage is to integrate the summaries of all subtasks and generate the final report. The system receives the summaries of all subtasks and the research topic as input, and outputs the final report in Markdown format. The report contains five parts: title, overview, detailed analysis of each subtask, summary, and references. For example, for "What kind of organization is Datawhale?", the final report might be:
+
+```markdown
+# What Kind of Organization is Datawhale?
+
+## Overview
+
+This report systematically researched the open source organization Datawhale, covering four aspects: basic information, main projects, community culture, and influence.
+
+## 1. Basic Information about Datawhale
+
+Datawhale is an open source organization focused on data science and AI, founded in 2018...
+
+(Insert summary of subtask 1 here)
+
+## 2. Main Projects of Datawhale
+
+Datawhale has released multiple high-quality open source tutorials, including Hello-Agents, Joyful-Pandas, etc...
+
+(Insert summary of subtask 2 here)
+...
+## Summary
+
+Through this research, we learned about Datawhale's organizational positioning, core projects, community culture, and social contributions. Datawhale is a pure learning community that has made important contributions to AI education.
+
+## References
+
+[1] https://github.com/datawhalechina
+[2] https://datawhale.club/about
+...
+```
+
+The report generation Agent will organize content in the logical order of subtasks, add a brief overview at the beginning, merge duplicate information, unify Markdown format, and organize all source citations into the references section.
+
+## 14.3 Agent System Design
+
+### 14.3.1 Agent Responsibility Division
+
+In the deep research assistant, we designed three specialized Agents, each responsible for a specific task. This makes each Agent simple, easy to understand and maintain.
+
+In Chapter 7, we learned how to use `SimpleAgent` to build agents. The design philosophy of `SimpleAgent` is simple and direct: each time the `run()` method is called, the Agent analyzes the user's question, decides whether to call tools, and then returns the result. This design is very effective when handling simple tasks, but when facing complex tasks like deep research, we need to continue using a multi-agent collaboration approach.
+
+As shown in Table 14.1, the three Agents are respectively responsible for planning, summarization, and report generation.
+
+<div align="center">
+  <p>Table 14.1 Responsibility Division of Three Agents</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-table-1.png" alt="" width="85%"/>
+</div>
+
+Let's introduce the design of each Agent in detail.
+
+**Agent 1: Research Planning Expert (TODO Planner)**
+
+**Responsibility**: Decompose research topics into 3-5 subtasks
+
+**Design Philosophy**: The core task of the research planning expert is to understand the user's research topic, analyze the key aspects of the topic, and then generate a series of subtasks. This process is similar to the "brainstorming" stage of human researchers before starting research.
+
+**Prompt Design**:
+
+```python
+todo_planner_instructions = """
+You are a research planning expert. Your task is to decompose the user's research topic into 3-5 subtasks.
+
+Current date: {current_date}
+
+Research topic: {research_topic}
+
+Please analyze this research topic and decompose it into 3-5 subtasks. Each subtask should:
+1. Cover an important aspect of the topic
+2. Have a clear research objective
+3. Be able to find relevant materials through search engines
+
+Please return the subtask list in JSON format, each subtask containing:
+- title: Task title (concise and clear)
+- intent: Task intent (why research this)
+- query: Search query (query string for search engines, can use English for better search results)
+
+Example output:
+[
+  {{
+    "title": "What is a multimodal model",
+    "intent": "Understand the basic concepts of multimodal models to lay the foundation for subsequent research",
+    "query": "multimodal model definition concept 2024"
+  }},
+  ...
+]
+
+Please ensure:
+1. Number of subtasks is between 3-5
+2. Subtasks have logical relationships (e.g., from basics to applications, from current status to trends)
+3. Search queries can accurately find relevant materials
+4. Only return JSON, do not include other text
+"""
+```
+
+**Key Design Points**: The prompt includes the current date to get the latest information, explicitly requires JSON format output for easy parsing, helps the Agent understand expected output through examples, and emphasizes constraints such as number of subtasks and logical relationships.
+
+**Implementation Code**:
+
+The ToolAwareSimpleAgent here is an extension of SimpleAgent. You can learn about it in Section 14.3.2, no need to delve into it here.
+
+```python
+class PlanningService:
+    def __init__(self, llm: HelloAgentsLLM):
+        self._agent = ToolAwareSimpleAgent(
+            name="TODO Planner",
+            system_prompt="You are a research planning expert",
+            llm=llm,
+            tool_call_listener=self._on_tool_call
+        )
+
+    def plan_todo_list(self, state: SummaryState) -> List[TodoItem]:
+        prompt = todo_planner_instructions.format(
+            current_date=get_current_date(),
+            research_topic=state.research_topic,
+        )
+
+        response = self._agent.run(prompt)
+        tasks_payload = self._extract_tasks(response)
+
+        todo_items = []
+        for idx, item in enumerate(tasks_payload, start=1):
+            task = TodoItem(
+                id=idx,
+                title=item["title"],
+                intent=item["intent"],
+                query=item["query"],
+            )
+            todo_items.append(task)
+
+        return todo_items
+
+    def _extract_tasks(self, response: str) -> List[dict]:
+        """Extract JSON from Agent response"""
+        # Use regex to extract JSON part
+        json_match = re.search(r'\[.*\]', response, re.DOTALL)
+        if json_match:
+            json_str = json_match.group(0)
+            return json.loads(json_str)
+        else:
+            raise ValueError("Unable to extract JSON from response")
+```
+
+**Agent 2: Task Summarization Expert (Task Summarizer)**
+
+**Responsibility**: Summarize search results, extract key information
+
+**Design Philosophy**: The core task of the task summarization expert is to read search results, extract key information, and present it in a structured way. This process is similar to human researchers taking notes after reading literature.
+
+**Prompt Design**:
+
+```python
+task_summarizer_instructions = """
+You are a task summarization expert. Your task is to summarize search results and extract key information.
+
+Task title: {task_title}
+Task intent: {task_intent}
+Search query: {task_query}
+
+Search results:
+{search_results}
+
+Please carefully read the above search results, extract key information, and return a summary in Markdown format.
+
+The summary should include:
+1. **Core Viewpoints**: Core viewpoints and conclusions from search results
+2. **Key Data**: Important numbers, dates, names, etc.
+3. **Source Citations**: Add source citations for each viewpoint (using [1], [2], etc.)
+
+Please ensure:
+1. Summary is concise and clear, avoiding redundancy
+2. Retain important details and data
+3. Add source citations for each viewpoint
+4. Use Markdown format (headings, lists, bold, etc.)
+
+Example output:
+## Core Viewpoints
+
+Multimodal models are AI models that can process multiple types of data[1]. Unlike traditional unimodal models, multimodal models can simultaneously understand text, images, audio, etc.[2].
+
+**Key Features:**
+- Cross-modal understanding[1]
+- Unified representation[3]
+- End-to-end training[2]
+
+## Sources
+
+[1] https://example.com/source1
+[2] https://example.com/source2
+[3] https://example.com/source3
+"""
+```
+
+**Key Design Points**: The prompt includes task title, intent, query and other context to help the Agent understand the task, explicitly requires output to include core viewpoints, key data, and source citations, emphasizes adding source citations for each viewpoint, and helps the Agent understand the expected output format through examples.
+
+**Implementation Code**:
+
+```python
+class SummarizationService:
+    def __init__(self, llm: HelloAgentsLLM):
+        self._agent = ToolAwareSimpleAgent(
+            name="Task Summarizer",
+            system_prompt="You are a task summarization expert",
+            llm=llm,
+            tool_call_listener=self._on_tool_call
+        )
+
+    def summarize_task(
+        self,
+        task: TodoItem,
+        search_results: List[dict]
+    ) -> str:
+        # Format search results
+        formatted_sources = self._format_sources(search_results)
+
+        prompt = task_summarizer_instructions.format(
+            task_title=task.title,
+            task_intent=task.intent,
+            task_query=task.query,
+            search_results=formatted_sources,
+        )
+
+        summary = self._agent.run(prompt)
+        return summary
+
+    def _format_sources(self, search_results: List[dict]) -> str:
+        """Format search results"""
+        formatted = []
+        for idx, result in enumerate(search_results, start=1):
+            formatted.append(
+                f"[{idx}] {result['title']}\n"
+                f"URL: {result['url']}\n"
+                f"Snippet: {result['snippet']}\n"
+            )
+        return "\n".join(formatted)
+```
+
+**Agent 3: Report Writing Expert (Report Writer)**
+
+**Responsibility**: Integrate summaries of all subtasks and generate final report
+
+**Design Philosophy**: The core task of the report writing expert is to integrate the summaries of all subtasks into a structured report. This process is similar to human researchers writing research reports after completing all investigations.
+
+**Prompt Design**:
+
+```python
+report_writer_instructions = """
+You are a report writing expert. Your task is to integrate the summaries of all subtasks and generate a structured research report.
+
+Research topic: {research_topic}
+
+Subtask summaries:
+{task_summaries}
+
+Please integrate all the above subtask summaries and generate a structured research report.
+
+The report should include:
+1. **Title**: Research topic
+2. **Overview**: Briefly introduce the research topic and report structure (2-3 paragraphs)
+3. **Detailed Analysis of Each Subtask**: Organize in logical order (using level-2 headings)
+4. **Summary**: Summarize the main findings of the research (1-2 paragraphs)
+5. **References**: All source citations (grouped by subtask)
+
+Please ensure:
+1. Report structure is clear and logically coherent
+2. Eliminate duplicate information
+3. Retain all source citations
+4. Use Markdown format
+
+Example output:
+# Latest Advances in Multimodal Large Models
+
+## Overview
+
+This report systematically researched the latest advances in multimodal large models...
+
+## 1. What is a Multimodal Model
+
+(Insert summary of subtask 1 here)
+
+## 2. What are the Latest Multimodal Models
+
+(Insert summary of subtask 2 here)
+
+...
+
+## Summary
+
+Through this research, we learned about...
+
+## References
+
+### Task 1: What is a Multimodal Model
+[1] https://example.com/source1
+...
+"""
+```
+
+**Key Design Points**: The prompt explicitly requires the report to include title, overview, detailed analysis, summary, references and other structures, emphasizes organizing content in logical order, requires merging duplicate information to eliminate redundancy, and retains all source citations.
+
+**Implementation Code**:
+
+```python
+class ReportingService:
+    def __init__(self, llm: HelloAgentsLLM):
+        self._agent = ToolAwareSimpleAgent(
+            name="Report Writer",
+            system_prompt="You are a report writing expert",
+            llm=llm,
+            tool_call_listener=self._on_tool_call
+        )
+
+    def generate_report(
+        self,
+        research_topic: str,
+        task_summaries: List[Tuple[TodoItem, str]]
+    ) -> str:
+        # Format subtask summaries
+        formatted_summaries = self._format_summaries(task_summaries)
+
+        prompt = report_writer_instructions.format(
+            research_topic=research_topic,
+            task_summaries=formatted_summaries,
+        )
+
+        report = self._agent.run(prompt)
+        return report
+
+    def _format_summaries(
+        self,
+        task_summaries: List[Tuple[TodoItem, str]]
+    ) -> str:
+        """Format subtask summaries"""
+        formatted = []
+        for idx, (task, summary) in enumerate(task_summaries, start=1):
+            formatted.append(
+                f"## Task {idx}: {task.title}\n"
+                f"Intent: {task.intent}\n\n"
+                f"{summary}\n"
+            )
+        return "\n".join(formatted)
+```
+
+### 14.3.2 ToolAwareSimpleAgent Design
+
+In Chapter 7, we implemented `SimpleAgent`, which is the basic Agent of the HelloAgents framework. But in the deep research assistant, we need an Agent that can **record tool calls**. This is where `ToolAwareSimpleAgent` comes from.
+
+In the deep research assistant, we need to record the tool call status of each Agent for:
+
+1. **Debugging**: View which tools the Agent called and what parameters were passed
+2. **Logging**: Record all operations during the research process
+3. **Analysis**: Analyze the Agent's behavior patterns
+4. **Progress Display**: Show in real-time what the Agent is doing
+
+`SimpleAgent` itself does not support tool call listening, so we need to extend it.
+
+`ToolAwareSimpleAgent` adds a `tool_call_listener` parameter on top of `SimpleAgent`. This is a callback function that is called every time a tool is called.
+
+**Usage Example:**
+
+```python
+from hello_agents import ToolAwareSimpleAgent
+
+def tool_listener(call_info):
+    print(f"Agent: {call_info['agent_name']}")
+    print(f"Tool: {call_info['tool_name']}")
+    print(f"Parameters: {call_info['parsed_parameters']}")
+    print(f"Result: {call_info['result']}")
+
+agent = ToolAwareSimpleAgent(
+    name="Research Assistant",
+    system_prompt="You are a research assistant",
+    llm=llm,
+    tool_call_listener=tool_listener
+)
+```
+
+`ToolAwareSimpleAgent` inherits from `SimpleAgent` and overrides the `_execute_tool_call` method:
+
+```python
+class ToolAwareSimpleAgent(SimpleAgent):
+    def __init__(
+        self,
+        name: str,
+        system_prompt: str,
+        llm: HelloAgentsLLM,
+        tool_registry: Optional[ToolRegistry] = None,
+        tool_call_listener: Optional[Callable] = None,
+    ):
+        super().__init__(
+            name=name,
+            system_prompt=system_prompt,
+            llm=llm,
+            tool_registry=tool_registry,
+        )
+        self._tool_call_listener = tool_call_listener
+
+    def _execute_tool_call(self, tool_name: str, parameters: str) -> str:
+        """Execute tool call and notify listener"""
+        # Parse parameters
+        parsed_parameters = self._parse_parameters(parameters)
+
+        # Call tool
+        result = super()._execute_tool_call(tool_name, parameters)
+
+        # Notify listener
+        if self._tool_call_listener:
+            self._tool_call_listener({
+                "agent_name": self.name,
+                "tool_name": tool_name,
+                "parsed_parameters": parsed_parameters,
+                "result": result,
+            })
+
+        return result
+```
+
+In the deep research assistant, we use `ToolAwareSimpleAgent` to record all Agent tool calls:
+
+```python
+class DeepResearchAgent:
+    def __init__(self, config: Configuration):
+        self.config = config
+        self.llm = HelloAgentsLLM(...)
+
+        # Create tool call listener
+        def tool_listener(call_info):
+            self._emit_event({
+                "type": "tool_call",
+                "agent": call_info["agent_name"],
+                "tool": call_info["tool_name"],
+                "parameters": call_info["parsed_parameters"],
+            })
+
+        # Create three Agents, all using the same listener
+        self.planner = PlanningService(self.llm, tool_listener)
+        self.summarizer = SummarizationService(self.llm, tool_listener)
+        self.reporter = ReportingService(self.llm, tool_listener)
+```
+
+This way, all Agent tool calls are recorded and pushed to the front-end via SSE, displayed to the user in real-time.
+
+### 14.3.3 Agent Collaboration Mode
+
+The three Agents have a **sequential collaboration** relationship, as shown in Figure 14.6.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-6.png" alt="" width="85%"/>
+  <p>Figure 14.6 Agent Collaboration Process</p>
+</div>
+
+The characteristics of the sequential collaboration mode are:
+
+1. **Linear Process**: Agents execute in a fixed order
+2. **Clear Input and Output**: Each Agent's input comes from the previous Agent's output
+3. **No Concurrency**: Only one Agent is working at the same time
+
+`DeepResearchAgent` is the core coordinator of the entire system, responsible for scheduling the three Agents:
+
+```python
+class DeepResearchAgent:
+    def run(self, research_topic: str) -> str:
+        # 1. Planning stage
+        self._emit_event({"type": "status", "message": "Planning research tasks..."})
+        todo_list = self.planner.plan_todo_list(research_topic)
+        self._emit_event({"type": "tasks", "tasks": todo_list})
+
+        # 2. Execution stage
+        task_summaries = []
+        for task in todo_list:
+            self._emit_event({
+                "type": "status",
+                "message": f"Researching: {task.title}"
+            })
+
+            # Search
+            search_results = self.search_service.search(task.query)
+
+            # Summarize
+            summary = self.summarizer.summarize_task(task, search_results)
+            task_summaries.append((task, summary))
+
+            self._emit_event({
+                "type": "task_completed",
+                "task_id": task.id
+            })
+
+        # 3. Reporting stage
+        self._emit_event({"type": "status", "message": "Generating report..."})
+        report = self.reporter.generate_report(research_topic, task_summaries)
+        self._emit_event({"type": "report", "content": report})
+
+        return report
+```
+
+## 14.4 Tool System Integration
+
+### 14.4.1 SearchTool Extension
+
+In Chapter 7, we implemented the basic version of `SearchTool`, integrating Tavily and SerpApi search engines, demonstrating the design idea of multi-source search. In this chapter's deep research assistant, we further extended the capabilities of `SearchTool`, adding DuckDuckGo, Perplexity, SearXNG and other search engines, and implementing Advanced mode (combining multiple search engines). Search is the most core function of the deep research assistant, and these extensions enable the system to adapt to different usage scenarios and needs.
+
+As shown in Table 14.2, the search engines added this time have different characteristics and applicable scenarios.
+
+<div align="center">
+  <p>Table 14.2 Multi-Search Engine Comparison</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-table-2.png" alt="" width="85%"/>
+</div>
+
+We will no longer discuss how to extend separately. You can refer to the source code and the extension cases in Chapter 7 for implementation. `SearchTool` provides a unified search interface. No matter which search engine is used, the calling method is the same.
+
+In the deep research assistant, we select the search engine through the configuration file:
+
+```python
+# config.py
+class SearchAPI(str, Enum):
+    TAVILY = "tavily"
+    DUCKDUCKGO = "duckduckgo"
+    PERPLEXITY = "perplexity"
+    SEARXNG = "searxng"
+    ADVANCED = "advanced"
+
+class Configuration(BaseModel):
+    search_api: SearchAPI = SearchAPI.DUCKDUCKGO
+    # ...
+```
+
+```python
+# .env
+SEARCH_API=tavily
+```
+
+This way, users can select the search engine by modifying the `.env` file without modifying the code.
+
+The result returned by `SearchTool` is a dictionary containing:
+
+- `results`: List of search results, each result contains title, URL, snippet
+- `backend`: Search engine used
+- `answer`: AI-generated answer (Perplexity only)
+- `notices`: Notification information (such as API limits, errors, etc.)
+
+Here are some special case handling.
+
+Search results may contain duplicate URLs, we need to deduplicate:
+
+```python
+def deduplicate_sources(sources: List[dict]) -> List[dict]:
+    """Remove duplicate URLs"""
+    seen_urls = set()
+    unique_sources = []
+
+    for source in sources:
+        if source["url"] not in seen_urls:
+            seen_urls.add(source["url"])
+            unique_sources.append(source)
+
+    return unique_sources
+```
+
+Search results may contain a large amount of text, we need to limit the number of tokens for each source:
+
+```python
+def limit_source_tokens(source: dict, max_tokens: int = 2000) -> dict:
+    """Limit the number of tokens for a source"""
+    snippet = source["snippet"]
+
+    # Simple token estimation: 1 token is approximately 4 characters
+    max_chars = max_tokens * 4
+
+    if len(snippet) > max_chars:
+        snippet = snippet[:max_chars] + "..."
+
+    return {
+        **source,
+        "snippet": snippet
+    }
+```
+
+### 14.4.2 NoteTool Usage
+
+In the deep research assistant, we use `NoteTool` to persist research progress. `NoteTool` is a built-in tool integrated in Chapter 9, used to create, read, update, and delete notes.
+
+During the research process, we need to record the search results, summaries, and final research report for each subtask. This information needs to be persisted to disk so that research can continue from the last progress when interrupted, and it is also convenient to view all operations during the research process and analyze the quality and efficiency of the research.
+
+`NoteTool` stores notes in the specified workspace directory, with each note being a Markdown file. The note filename is the task ID, and the content includes task title, task intent, search query, search results, and summary.
+
+The final generated file style will be in the following tree structure:
+
+```
+workspace/
+├── notes/
+│   ├── 1.md  # Notes for task 1
+│   ├── 2.md  # Notes for task 2
+│   ├── 3.md  # Notes for task 3
+│   └── ...
+└── reports/
+    └── final_report.md  # Final report
+```
+
+In the deep research assistant, we use `NoteTool` to record the research progress of each subtask:
+
+```python
+class NotesService:
+    def __init__(self, workspace: str):
+        self.note_tool = NoteTool(workspace=workspace)
+
+    def save_task_summary(
+        self,
+        task: TodoItem,
+        search_results: List[dict],
+        summary: str
+    ):
+        """Save task summary"""
+        # Format note content
+        content = self._format_note_content(
+            task=task,
+            search_results=search_results,
+            summary=summary
+        )
+
+        # Create note
+        self.note_tool.run({
+            "action": "create",
+            "title": f"Task {task.id}: {task.title}",
+            "content": content,
+            "tags": ["research", "summary"]
+        })
+
+    def _format_note_content(
+        self,
+        task: TodoItem,
+        search_results: List[dict],
+        summary: str
+    ) -> str:
+        """Format note content"""
+        content = f"# Task {task.id}: {task.title}\n\n"
+        content += f"## Task Information\n\n"
+        content += f"- **Intent**: {task.intent}\n"
+        content += f"- **Query**: {task.query}\n\n"
+
+        content += f"## Search Results\n\n"
+        for idx, result in enumerate(search_results, start=1):
+            content += f"[{idx}] {result['title']}\n"
+            content += f"URL: {result['url']}\n"
+            content += f"Snippet: {result['snippet']}\n\n"
+
+        content += f"## Summary\n\n{summary}\n"
+
+        return content
+```
+
+### 14.4.3 ToolRegistry Tool Management
+
+`ToolRegistry` is the tool registry of the HelloAgents framework, also supported in our Chapter 7, used to manage the registration and invocation of all tools. In the deep research assistant, we use `ToolRegistry` to manage `SearchTool` and `NoteTool`.
+
+Before creating an Agent, we need to register tools first:
+
+```python
+from hello_agents import ToolAwareSimpleAgent
+from hello_agents.tools import ToolRegistry
+from hello_agents.tools import SearchTool
+from hello_agents.tools import NoteTool
+
+# Create tools
+search_tool = SearchTool(backend="hybrid")
+note_tool = NoteTool(workspace="./workspace/notes")
+
+# Create registry
+registry = ToolRegistry()
+
+# Register tools
+registry.register_tool(search_tool)
+registry.register_tool(note_tool)
+
+# Create Agent
+agent = ToolAwareSimpleAgent(
+    name="Research Assistant",
+    system_prompt="You are a research assistant",
+    llm=llm,
+    tool_registry=registry
+)
+```
+
+When an Agent needs to call a tool, it generates a tool call instruction, as shown in Figure 14.7.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-7.png" alt="" width="85%"/>
+  <p>Figure 14.7 Tool Call Process</p>
+</div>
+
+**Tool Call Process**:
+
+1. **Agent generates instruction**: Agent generates tool call instruction, such as `[TOOL_CALL:search_tool:{"input": "Datawhale organization", "backend": "tavily"}]`
+2. **Parse instruction**: `ToolRegistry` parses the instruction, extracts tool name and parameters
+3. **Find tool**: `ToolRegistry` finds the corresponding tool based on the tool name
+4. **Call tool**: Call the tool's `run` method, passing in parameters
+5. **Return result**: Tool returns execution result
+6. **Format result**: Format the result as a string and return it to the Agent
+
+## 14.5 Service Layer Implementation
+
+This section will introduce the implementation of core services in detail, including PlanningService, SummarizationService, ReportingService, and SearchService. These services are the bridge connecting Agents and tools, responsible for specific business logic.
+
+### 14.5.1 Task Planning Service
+
+`PlanningService` is responsible for calling the research planning Agent to decompose the research topic into subtasks. This is the first and most critical step of the entire research process.
+
+**(1) Implementation Approach**
+
+Its core responsibilities are:
+
+1. **Build planning Prompt**: Build Prompt based on research topic and current date
+2. **Call planning Agent**: Call TODO Planner Agent to generate subtask list
+3. **Parse JSON response**: Extract JSON-format subtask list from Agent's response
+4. **Validate subtask format**: Ensure each subtask contains required fields (title, intent, query)
+
+```python
+import re
+import json
+from typing import List, Callable, Optional
+from datetime import datetime
+
+from hello_agents import HelloAgentsLLM
+from hello_agents import ToolAwareSimpleAgent
+from models import TodoItem, SummaryState
+from prompts import todo_planner_instructions
+
+class PlanningService:
+    """Task planning service"""
+
+    def __init__(
+        self,
+        llm: HelloAgentsLLM,
+        tool_call_listener: Optional[Callable] = None
+    ):
+        self._llm = llm
+        self._tool_call_listener = tool_call_listener
+
+        # Create planning Agent
+        self._agent = ToolAwareSimpleAgent(
+            name="TODO Planner",
+            system_prompt="You are a research planning expert, skilled at decomposing complex research topics into clear subtasks.",
+            llm=llm,
+            tool_call_listener=tool_call_listener
+        )
+
+    def plan_todo_list(self, state: SummaryState) -> List[TodoItem]:
+        """Plan TODO list
+
+        Args:
+            state: Research state, containing research topic
+
+        Returns:
+            Subtask list
+        """
+        # Build Prompt
+        prompt = todo_planner_instructions.format(
+            current_date=self._get_current_date(),
+            research_topic=state.research_topic,
+        )
+
+        # Call Agent
+        response = self._agent.run(prompt)
+
+        # Parse JSON
+        tasks_payload = self._extract_tasks(response)
+
+        # Validate and create TodoItem
+        todo_items = []
+        for idx, item in enumerate(tasks_payload, start=1):
+            # Validate required fields
+            if not all(key in item for key in ["title", "intent", "query"]):
+                raise ValueError(f"Task {idx} is missing required fields")
+
+            task = TodoItem(
+                id=idx,
+                title=item["title"],
+                intent=item["intent"],
+                query=item["query"],
+            )
+            todo_items.append(task)
+
+        return todo_items
+
+    def _get_current_date(self) -> str:
+        """Get current date"""
+        return datetime.now().strftime("%Y-%m-%d")
+
+    def _extract_tasks(self, response: str) -> List[dict]:
+        """Extract JSON from Agent response
+
+        The Agent's response may contain extra text, such as:
+        "Okay, I will plan the following tasks for you:\n[{...}, {...}]\nThese tasks cover..."
+
+        We need to extract the JSON part.
+        """
+        # Method 1: Use regex to extract JSON array
+        json_match = re.search(r'\[.*\]', response, re.DOTALL)
+        if json_match:
+            json_str = json_match.group(0)
+            try:
+                return json.loads(json_str)
+            except json.JSONDecodeError as e:
+                raise ValueError(f"JSON parsing failed: {e}")
+
+        # Method 2: If no JSON array is found, try to parse the entire response directly
+        try:
+            return json.loads(response)
+        except json.JSONDecodeError:
+            raise ValueError("Unable to extract JSON from response")
+```
+
+**(2) JSON Parsing and Validation**
+
+The JSON returned by the Agent may contain extra text or format errors, so we need robust parsing logic:
+
+**Common Issues**:
+
+1. **Contains extra text**: Agent may add explanatory text before and after JSON
+2. **Format errors**: JSON may be missing quotes, commas, etc.
+3. **Missing fields**: Some subtasks may be missing required fields
+
+**Solutions**:
+
+1. **Use regex**: Extract JSON part
+2. **Multiple parsing strategies**: First try to extract JSON array, then try to parse directly
+3. **Field validation**: Ensure each subtask contains required fields
+
+**Example**:
+
+```python
+# Agent response example 1: Contains extra text
+response1 = """
+Okay, I will plan the following tasks for you:
+
+[
+  {
+    "title": "What is a multimodal model",
+    "intent": "Understand basic concepts",
+    "query": "multimodal model definition"
+  },
+  {
+    "title": "Latest multimodal models",
+    "intent": "Understand technical status",
+    "query": "latest multimodal models 2024"
+  }
+]
+
+These tasks cover the basic information and core projects of the Datawhale organization.
+"""
+
+# Extract JSON
+tasks1 = service._extract_tasks(response1)
+# Result: [{"title": "Basic information about Datawhale", ...}, ...]
+
+# Agent response example 2: Pure JSON
+response2 = """
+[
+  {"title": "Basic information about Datawhale", "intent": "Understand organizational positioning", "query": "Datawhale organization introduction"},
+  {"title": "Main projects of Datawhale", "intent": "Understand core content", "query": "Datawhale projects tutorials 2024"}
+]
+"""
+
+# Extract JSON
+tasks2 = service._extract_tasks(response2)
+# Result: [{"title": "What is a multimodal model", ...}, ...]
+```
+
+**(3) Planning Quality Assessment**
+
+A good plan should meet the following criteria:
+
+1. **Comprehensive coverage**: Cover all important aspects of the topic
+2. **Clear logic**: Clear logical relationships between subtasks
+3. **Precise queries**: Search queries can accurately find relevant materials
+4. **Appropriate quantity**: 3-5 subtasks
+
+We can add an evaluation method:
+
+```python
+def evaluate_plan(self, todo_items: List[TodoItem]) -> dict:
+    """Evaluate planning quality
+
+    Returns:
+        Evaluation results, including score and suggestions
+    """
+    score = 100
+    suggestions = []
+
+    # Check quantity
+    if len(todo_items) < 3:
+        score -= 20
+        suggestions.append("Too few subtasks, may miss important information")
+    elif len(todo_items) > 5:
+        score -= 10
+        suggestions.append("Too many subtasks, may have redundancy")
+
+    # Check query quality
+    for task in todo_items:
+        if len(task.query.split()) < 2:
+            score -= 10
+            suggestions.append(f"Query for task '{task.title}' is too simple")
+
+    # Check logical relationships
+    # (More complex logic checks can be added here)
+
+    return {
+        "score": score,
+        "suggestions": suggestions
+    }
+```
+
+### 14.5.2 Summarization Service
+
+`SummarizationService` is responsible for calling the task summarization Agent to summarize search results. This is the core link of the research process and determines the quality of the research.
+
+Its responsibilities are:
+
+1. **Format search results**: Format search results into readable text
+2. **Build summarization Prompt**: Build Prompt based on task information and search results
+3. **Call summarization Agent**: Call Task Summarizer Agent to generate summary
+4. **Extract source citations**: Extract source citations from summary
+
+Core code:
+
+```python
+from typing import List, Callable, Optional, Tuple
+
+from hello_agents import HelloAgentsLLM
+from hello_agents import ToolAwareSimpleAgent
+from models import TodoItem
+from prompts import task_summarizer_instructions
+
+class SummarizationService:
+    """Summarization service"""
+
+    def __init__(
+        self,
+        llm: HelloAgentsLLM,
+        tool_call_listener: Optional[Callable] = None
+    ):
+        self._llm = llm
+        self._tool_call_listener = tool_call_listener
+
+        # Create summarization Agent
+        self._agent = ToolAwareSimpleAgent(
+            name="Task Summarizer",
+            system_prompt="You are a task summarization expert, skilled at extracting key information from search results.",
+            llm=llm,
+            tool_call_listener=tool_call_listener
+        )
+
+    def summarize_task(
+        self,
+        task: TodoItem,
+        search_results: List[dict]
+    ) -> Tuple[str, List[str]]:
+        """Summarize task
+
+        Args:
+            task: Task information
+            search_results: Search results list
+
+        Returns:
+            (Summary text, source URL list)
+        """
+        # Format search results
+        formatted_sources = self._format_sources(search_results)
+
+        # Build Prompt
+        prompt = task_summarizer_instructions.format(
+            task_title=task.title,
+            task_intent=task.intent,
+            task_query=task.query,
+            search_results=formatted_sources,
+        )
+
+        # Call Agent
+        summary = self._agent.run(prompt)
+
+        # Extract source URLs
+        source_urls = [result["url"] for result in search_results]
+
+        return summary, source_urls
+
+    def _format_sources(self, search_results: List[dict]) -> str:
+        """Format search results
+
+        Format search results into readable text, including:
+        - Serial number
+        - Title
+        - URL
+        - Snippet
+        """
+        formatted = []
+        for idx, result in enumerate(search_results, start=1):
+            formatted.append(
+                f"[{idx}] {result['title']}\n"
+                f"URL: {result['url']}\n"
+                f"Snippet: {result['snippet']}\n"
+            )
+        return "\n".join(formatted)
+```
+
+### Report Structure Design
+
+The final report should include the following parts:
+
+## References
+
+### Task 1: What is a Multimodal Model
+- https://example.com/multimodal-model-definition
+...
+
+### Task 2: What are the Latest Multimodal Models
+- https://example.com/gpt4v
+...
+...
+
+### 14.5.3 Report Generation Service
+
+`ReportingService` is responsible for calling the report generation Agent to integrate the summaries of all subtasks. This is the last step of the research process, generating the final research report.
+
+Its responsibilities are:
+
+1. **Format subtask summaries**: Format all subtask summaries into a unified format
+2. **Build report Prompt**: Build Prompt based on research topic and subtask summaries
+3. **Call report Agent**: Call Report Writer Agent to generate final report
+4. **Organize citations**: Organize all source citations into the references section
+
+**Core Code Implementation**:
+
+```python
+from typing import List, Callable, Optional, Tuple
+
+from hello_agents import HelloAgentsLLM
+from hello_agents import ToolAwareSimpleAgent
+from models import TodoItem
+from prompts import report_writer_instructions
+
+class ReportingService:
+    """Report generation service"""
+
+    def __init__(
+        self,
+        llm: HelloAgentsLLM,
+        tool_call_listener: Optional[Callable] = None
+    ):
+        self._llm = llm
+        self._tool_call_listener = tool_call_listener
+
+        # Create report Agent
+        self._agent = ToolAwareSimpleAgent(
+            name="Report Writer",
+            system_prompt="You are a report writing expert, skilled at integrating information and generating structured reports.",
+            llm=llm,
+            tool_call_listener=tool_call_listener
+        )
+
+    def generate_report(
+        self,
+        research_topic: str,
+        task_summaries: List[Tuple[TodoItem, str, List[str]]]
+    ) -> str:
+        """Generate final report
+
+        Args:
+            research_topic: Research topic
+            task_summaries: Subtask summary list, each element is (task, summary, source URL list)
+
+        Returns:
+            Final report (Markdown format)
+        """
+        # Format subtask summaries
+        formatted_summaries = self._format_summaries(task_summaries)
+
+        # Build Prompt
+        prompt = report_writer_instructions.format(
+            research_topic=research_topic,
+            task_summaries=formatted_summaries,
+        )
+
+        # Call Agent
+        report = self._agent.run(prompt)
+
+        return report
+
+    def _format_summaries(
+        self,
+        task_summaries: List[Tuple[TodoItem, str, List[str]]]
+    ) -> str:
+        """Format subtask summaries
+
+        Format all subtask summaries into a unified format, including:
+        - Task serial number
+        - Task title
+        - Task intent
+        - Summary content
+        - Source URLs
+        """
+        formatted = []
+        for idx, (task, summary, source_urls) in enumerate(task_summaries, start=1):
+            formatted.append(
+                f"## Task {idx}: {task.title}\n\n"
+                f"**Intent**: {task.intent}\n\n"
+                f"{summary}\n\n"
+                f"**Sources**:\n"
+            )
+            for url in source_urls:
+                formatted.append(f"- {url}\n")
+            formatted.append("\n")
+
+        return "".join(formatted)
+```
+
+### 14.5.4 Search Scheduling Service
+
+`SearchService` is responsible for scheduling search engines, executing searches, and returning results. This is the bridge connecting Agents and SearchTool. Here we did not adopt the usual form of having SimpleAgent directly call tools, but instead return the execution results of SearchTool to the Agent through an intermediate layer, which makes the Agent more focused on processing the obtained information.
+
+Its responsibilities are:
+
+1. **Schedule search engine**: Select search engine based on configuration
+2. **Execute search**: Call SearchTool to execute search
+3. **Process results**: Deduplicate, limit tokens, format
+4. **Error handling**: Handle search failure situations
+
+Core code:
+
+```python
+from typing import List, Optional
+import logging
+
+from hello_agents.tools import SearchTool
+from config import Configuration
+
+logger = logging.getLogger(__name__)
+
+class SearchService:
+    """Search scheduling service"""
+
+    def __init__(self, config: Configuration):
+        self.config = config
+
+        # Create SearchTool
+        self.search_tool = SearchTool(backend="hybrid")
+
+    def search(
+        self,
+        query: str,
+        max_results: int = 5
+    ) -> List[dict]:
+        """Execute search
+
+        Args:
+            query: Search query
+            max_results: Maximum number of results
+
+        Returns:
+            Search results list
+        """
+        try:
+            # Call SearchTool
+            raw_response = self.search_tool.run({
+                "input": query,
+                "backend": self.config.search_api.value,
+                "mode": "structured",
+                "max_results": max_results
+            })
+
+            # Extract results
+            results = raw_response.get("results", [])
+
+            # Process results
+            results = self._deduplicate_sources(results)
+            results = self._limit_source_tokens(results)
+
+            logger.info(f"Search successful: {query}, returned {len(results)} results")
+
+            return results
+
+        except Exception as e:
+            logger.error(f"Search failed: {query}, error: {e}")
+            return []
+
+    def _deduplicate_sources(self, sources: List[dict]) -> List[dict]:
+        """Remove duplicate URLs"""
+        seen_urls = set()
+        unique_sources = []
+
+        for source in sources:
+            url = source.get("url", "")
+            if url and url not in seen_urls:
+                seen_urls.add(url)
+                unique_sources.append(source)
+
+        return unique_sources
+
+    def _limit_source_tokens(
+        self,
+        sources: List[dict],
+        max_tokens_per_source: int = 2000
+    ) -> List[dict]:
+        """Limit the number of tokens per source"""
+        limited_sources = []
+
+        for source in sources:
+            snippet = source.get("snippet", "")
+
+            # Simple token estimation: 1 token is approximately 4 characters
+            max_chars = max_tokens_per_source * 4
+
+            if len(snippet) > max_chars:
+                snippet = snippet[:max_chars] + "..."
+
+            limited_sources.append({
+                **source,
+                "snippet": snippet
+            })
+
+        return limited_sources
+```
+
+Select search engine based on configuration, as shown in Figure 14.8:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-8.png" alt="" width="85%"/>
+  <p>Figure 14.8 Search Engine Scheduling Process</p>
+</div>
+
+**Scheduling Logic**:
+
+1. **Read configuration**: Read `SEARCH_API` configuration from `.env` file
+2. **Select engine**: Select search engine based on configuration (tavily, duckduckgo, perplexity, etc.)
+3. **Execute search**: Call SearchTool to execute search
+4. **Process results**: Deduplicate, limit tokens, format
+5. **Return results**: Return processed search results
+
+To improve efficiency and reduce costs, we can add search result caching:
+
+```python
+import hashlib
+import json
+from pathlib import Path
+
+class SearchService:
+    def __init__(self, config: Configuration):
+        self.config = config
+        self.search_tool = SearchTool(backend="hybrid")
+
+        # Cache directory
+        self.cache_dir = Path("./cache/search")
+        self.cache_dir.mkdir(parents=True, exist_ok=True)
+
+    def search(
+        self,
+        query: str,
+        max_results: int = 5,
+        use_cache: bool = True
+    ) -> List[dict]:
+        """Execute search (with cache)"""
+        # Generate cache key
+        cache_key = self._generate_cache_key(query, max_results)
+        cache_file = self.cache_dir / f"{cache_key}.json"
+
+        # Try to read from cache
+        if use_cache and cache_file.exists():
+            logger.info(f"Reading search results from cache: {query}")
+            with open(cache_file, "r", encoding="utf-8") as f:
+                return json.load(f)
+
+        # Execute search
+        results = self._execute_search(query, max_results)
+
+        # Save to cache
+        if use_cache and results:
+            with open(cache_file, "w", encoding="utf-8") as f:
+                json.dump(results, f, ensure_ascii=False, indent=2)
+
+        return results
+
+    def _generate_cache_key(self, query: str, max_results: int) -> str:
+        """Generate cache key"""
+        # Generate MD5 hash using query and max results
+        content = f"{query}_{max_results}_{self.config.search_api.value}"
+        return hashlib.md5(content.encode()).hexdigest()
+```
+
+Through four core services (PlanningService, SummarizationService, ReportingService, SearchService), we built a complete research process. These services each perform their duties and collaborate through clear interfaces, achieving an automated process from research topic to final report.
+
+## 14.6 Front-End Interaction Design
+
+In the previous sections, we implemented the complete back-end system. This section will introduce the front-end interaction design in detail, including full-screen modal dialog UI, real-time progress display, and research result visualization.
+
+### 14.6.1 Full-Screen Modal Dialog UI Design
+
+The deep research assistant adopts a full-screen modal dialog UI design, which has the following advantages:
+
+1. **Immersive experience**: Full-screen display, avoiding distractions, focusing on research
+2. **Clear hierarchy**: Main page and research page are separated, with clear hierarchy
+3. **Easy to close**: Click the close button or press ESC key to return to the main page
+4. **Responsive design**: Adapts to different screen sizes
+
+As shown in Figure 14.9, the full-screen modal dialog contains the following parts:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-9.png" alt="" width="85%"/>
+  <p>Figure 14.9 Full-Screen Modal Dialog UI</p>
+</div>
+
+**UI Components**:
+
+1. **Top bar**: Contains research topic and close button
+2. **Progress area**: Shows current research progress (planning, execution, reporting)
+3. **Content area**: Shows research results (Markdown format)
+4. **Bottom bar**: Shows status information (such as "Researching...", "Completed")
+
+The corresponding Vue implementation is as follows (ResearchModal.vue):
+
+```vue
+<template>
+  <div v-if="isOpen" class="modal-overlay" @click.self="close">
+    <div class="modal-container">
+      <!-- Top bar -->
+      <div class="modal-header">
+        <h2>{{ researchTopic }}</h2>
+        <button @click="close" class="close-button">
+          <svg><!-- Close icon --></svg>
+        </button>
+      </div>
+
+      <!-- Progress area -->
+      <div class="progress-section">
+        <div class="progress-bar">
+          <div
+            class="progress-fill"
+            :style="{ width: progressPercentage + '%' }"
+          ></div>
+        </div>
+        <div class="progress-text">{{ progressText }}</div>
+      </div>
+
+      <!-- Content area -->
+      <div class="content-section">
+        <div v-if="isLoading" class="loading-spinner">
+          <div class="spinner"></div>
+          <p>Researching, please wait...</p>
+        </div>
+
+        <div v-else class="markdown-content" v-html="renderedMarkdown"></div>
+      </div>
+
+      <!-- Bottom bar -->
+      <div class="modal-footer">
+        <span class="status-text">{{ statusText }}</span>
+      </div>
+    </div>
+  </div>
+</template>
+
+<script setup lang="ts">
+import { ref, computed, watch } from 'vue'
+import { marked } from 'marked'
+
+interface Props {
+  isOpen: boolean
+  researchTopic: string
+}
+
+const props = defineProps<Props>()
+const emit = defineEmits<{
+  close: []
+}>()
+
+// State
+const isLoading = ref(true)
+const progressPercentage = ref(0)
+const progressText = ref('Preparing...')
+const statusText = ref('Researching...')
+const markdownContent = ref('')
+
+// Render Markdown
+const renderedMarkdown = computed(() => {
+  return marked(markdownContent.value)
+})
+
+// Close modal
+const close = () => {
+  emit('close')
+}
+
+// Listen for ESC key
+const handleKeydown = (e: KeyboardEvent) => {
+  if (e.key === 'Escape') {
+    close()
+  }
+}
+
+// Add keyboard listener on mount
+watch(() => props.isOpen, (isOpen) => {
+  if (isOpen) {
+    document.addEventListener('keydown', handleKeydown)
+  } else {
+    document.removeEventListener('keydown', handleKeydown)
+  }
+})
+</script>
+
+<style scoped>
+.modal-overlay {
+  position: fixed;
+  top: 0;
+  left: 0;
+  width: 100vw;
+  height: 100vh;
+  background-color: rgba(0, 0, 0, 0.5);
+  display: flex;
+  justify-content: center;
+  align-items: center;
+  z-index: 1000;
+}
+...
+</style>
+```
+
+To adapt to different screen sizes, we add media queries:
+
+```css
+/* Tablet devices */
+@media (max-width: 768px) {
+  .modal-container {
+    width: 95vw;
+    height: 95vh;
+  }
+
+  .modal-header,
+  .progress-section,
+  .content-section,
+  .modal-footer {
+    padding: 15px 20px;
+  }
+}
+
+/* Mobile devices */
+@media (max-width: 480px) {
+  .modal-container {
+    width: 100vw;
+    height: 100vh;
+    border-radius: 0;
+  }
+
+  .modal-header h2 {
+    font-size: 18px;
+  }
+}
+```
+
+### 14.6.2 Real-Time Progress Display
+
+The deep research assistant uses SSE to implement real-time progress display. SSE is a server push technology that allows the server to actively send data to the client, which is also explained in the protocol chapter.
+
+As shown in Figure 14.10, the SSE process includes the following steps:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-10.png" alt="" width="85%"/>
+  <p>Figure 14.10 SSE Process</p>
+</div>
+
+**Process Description**:
+
+1. **Client initiates request**: Send POST request to `/api/research`, containing research topic
+2. **Server establishes SSE connection**: Return `text/event-stream` response
+3. **Server pushes progress**: Periodically push research progress (planning, execution, reporting)
+4. **Client receives progress**: Listen for SSE events, update UI
+5. **Research complete**: Server pushes final report, closes connection
+
+If you want to use SSE in front-end and back-end projects, you also need to make the following configurations.
+
+**Back-End FastAPI SSE Endpoint**:
+
+```python
+from fastapi import FastAPI
+from fastapi.responses import StreamingResponse
+from typing import AsyncGenerator
+import asyncio
+import json
+
+app = FastAPI()
+
+async def research_stream(topic: str) -> AsyncGenerator[str, None]:
+    """Research streaming generator
+
+    Generate SSE format data:
+    data: {"type": "progress", "data": {...}}
+
+    """
+    try:
+        # 1. Planning stage
+        yield f"data: {json.dumps({'type': 'progress', 'stage': 'planning', 'percentage': 10, 'text': 'Planning research tasks...'})}\n\n"
+
+        # Call PlanningService
+        todo_items = await planning_service.plan_todo_list(topic)
+
+        yield f"data: {json.dumps({'type': 'plan', 'data': [item.dict() for item in todo_items]})}\n\n"
+
+        # 2. Execution stage
+        task_summaries = []
+        for idx, task in enumerate(todo_items, start=1):
+            # Update progress
+            percentage = 10 + (idx / len(todo_items)) * 70
+            yield f"data: {json.dumps({'type': 'progress', 'stage': 'executing', 'percentage': percentage, 'text': f'Researching task {idx}/{len(todo_items)}: {task.title}'})}\n\n"
+
+            # Search
+            search_results = await search_service.search(task.query)
+
+            # Summarize
+            summary, source_urls = await summarization_service.summarize_task(task, search_results)
+
+            task_summaries.append((task, summary, source_urls))
+
+            # Push task summary
+            yield f"data: {json.dumps({'type': 'task_summary', 'task_id': task.id, 'summary': summary})}\n\n"
+
+        # 3. Reporting stage
+        yield f"data: {json.dumps({'type': 'progress', 'stage': 'reporting', 'percentage': 90, 'text': 'Generating final report...'})}\n\n"
+
+        # Generate report
+        report = await reporting_service.generate_report(topic, task_summaries)
+
+        # Push final report
+        yield f"data: {json.dumps({'type': 'report', 'data': report})}\n\n"
+
+        # Complete
+        yield f"data: {json.dumps({'type': 'progress', 'stage': 'completed', 'percentage': 100, 'text': 'Research complete!'})}\n\n"
+
+    except Exception as e:
+        # Error handling
+        yield f"data: {json.dumps({'type': 'error', 'message': str(e)})}\n\n"
+
+@app.post("/api/research")
+async def research(request: ResearchRequest):
+    """Research endpoint (SSE)"""
+    return StreamingResponse(
+        research_stream(request.topic),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+        }
+    )
+```
+
+**Front-End Using EventSource to Receive SSE**:
+
+```typescript
+// composables/useResearch.ts
+import { ref } from 'vue'
+
+export function useResearch() {
+  const isLoading = ref(false)
+  const progressPercentage = ref(0)
+  const progressText = ref('')
+  const markdownContent = ref('')
+  const error = ref<string | null>(null)
+
+  const startResearch = (topic: string) => {
+    isLoading.value = true
+    error.value = null
+
+    // Create EventSource
+    const eventSource = new EventSource(`/api/research?topic=${encodeURIComponent(topic)}`)
+
+    // Listen for messages
+    eventSource.onmessage = (event) => {
+      const data = JSON.parse(event.data)
+
+      switch (data.type) {
+        case 'progress':
+          progressPercentage.value = data.percentage
+          progressText.value = data.text
+          break
+
+        case 'plan':
+          // Display planning results
+          console.log('Planning results:', data.data)
+          break
+
+        case 'task_summary':
+          // Append task summary to Markdown
+          markdownContent.value += `\n\n## Task ${data.task_id}\n\n${data.summary}`
+          break
+
+        case 'report':
+          // Display final report
+          markdownContent.value = data.data
+          break
+
+        case 'error':
+          error.value = data.message
+          eventSource.close()
+          isLoading.value = false
+          break
+
+        case 'completed':
+          eventSource.close()
+          isLoading.value = false
+          break
+      }
+    }
+
+    // Error handling
+    eventSource.onerror = (err) => {
+      console.error('SSE error:', err)
+      error.value = 'Connection failed, please retry'
+      eventSource.close()
+      isLoading.value = false
+    }
+  }
+
+  return {
+    isLoading,
+    progressPercentage,
+    progressText,
+    markdownContent,
+    error,
+    startResearch,
+  }
+}
+```
+
+**Using in Component**:
+
+```vue
+<script setup lang="ts">
+import { useResearch } from '@/composables/useResearch'
+
+const {
+  isLoading,
+  progressPercentage,
+  progressText,
+  markdownContent,
+  error,
+  startResearch
+} = useResearch()
+
+const handleStartResearch = (topic: string) => {
+  startResearch(topic)
+}
+</script>
+```
+
+### 14.6.3 Research Result Visualization
+
+Research results are displayed in Markdown format, including titles, paragraphs, lists, quotes, and other elements. We use the `marked` library to convert Markdown to HTML and add custom styles.
+
+**Rendering Markdown**:
+
+```typescript
+import { marked } from 'marked'
+
+// Configure marked
+marked.setOptions({
+  breaks: true,  // Support line breaks
+  gfm: true,     // Support GitHub Flavored Markdown
+})
+
+// Render
+const renderedHtml = marked(markdownContent.value)
+```
+
+Research reports contain a large number of source citations, which we need to handle specially:
+
+```markdown
+## References
+
+### Task 1: Basic Information about Datawhale
+- [Datawhale GitHub](https://github.com/datawhalechina)
+- [Datawhale Official Website](https://datawhale.club)
+
+### Task 2: Main Projects of Datawhale
+- [Hello-Agents Tutorial](https://github.com/datawhalechina/Hello-Agents)
+...
+```
+
+Through full-screen modal dialog UI, SSE real-time progress display, and Markdown result visualization, we built a user-friendly front-end interface. Users can clearly see the research progress and view research results in a beautiful format.
+
+## 14.7 Chapter Summary
+
+In this chapter, we built a complete automated deep research agent system from scratch. Let's review the core points:
+
+**(1) TODO-Driven Research Paradigm**
+
+We proposed a new research paradigm - TODO-driven research. This paradigm decomposes complex research topics into executable subtasks and completes research through three stages:
+
+- **Planning stage**: Decompose research topic into 3-5 subtasks, each subtask contains title, intent, and search query
+- **Execution stage**: Execute search and summarization for each subtask, generating structured knowledge
+- **Reporting stage**: Integrate summaries of all subtasks, generate final research report
+
+The advantages of this paradigm are:
+
+1. **Strong controllability**: Each subtask has clear objectives and scope
+2. **Reliable quality**: Dedicated Agents ensure quality at each stage
+3. **Easy to debug**: Can debug each subtask individually
+4. **Good scalability**: Can easily add new subtasks or modify existing subtasks
+
+**(2) Three-Agent Collaboration System**
+
+We designed three specialized Agents, each performing their duties:
+
+- **TODO Planner (Research Planning Expert)**: Responsible for decomposing research topics into subtasks
+- **Task Summarizer (Task Summarization Expert)**: Responsible for summarizing search results for each subtask
+- **Report Writer (Report Writing Expert)**: Responsible for integrating summaries of all subtasks and generating final report
+
+The advantages of this design are:
+
+1. **Clear responsibilities**: Each Agent focuses on a specific task
+2. **Prompt optimization**: Can customize specialized Prompts for each Agent
+3. **Easy to maintain**: Modifying one Agent does not affect other Agents
+4. **Quality assurance**: Each Agent is an "expert" in their field
+
+**(3) ToolAwareSimpleAgent Design**
+
+We extended the `SimpleAgent` of the HelloAgents framework and implemented `ToolAwareSimpleAgent`. This Agent has tool call listening capability and can:
+
+- **Listen to tool calls**: Listen to each tool call through callback functions
+- **Real-time feedback**: Push tool call information to the front-end in real-time
+- **Debugging support**: Record all tool calls for easy debugging
+
+This Agent has been integrated into the HelloAgents framework and can be reused in other projects.
+
+**(4) Tool System Integration**
+
+We fully utilized the tool system of the HelloAgents framework:
+
+- **SearchTool**: Extended to support more search engines (Tavily, DuckDuckGo, Perplexity, etc.)
+- **NoteTool**: Persist research progress, support recovery and auditing
+- **ToolRegistry**: Unified management of all tools, support custom extensions
+
+Through configuration-based design, users can easily switch search engines without modifying code.
+
+**(5) Core Service Implementation**
+
+We implemented four core services connecting Agents and tools:
+
+- **PlanningService**: Call planning Agent, parse JSON, validate format
+- **SummarizationService**: Call summarization Agent, process search results, extract sources
+- **ReportingService**: Call report Agent, integrate summaries, generate report
+- **SearchService**: Schedule search engines, process results, error degradation, result caching
+
+These services each perform their duties and collaborate through clear interfaces, achieving an automated process from research topic to final report.
+
+**(6) Front-End Interaction Design**
+
+We designed a user-friendly front-end interface:
+
+- **Full-screen modal dialog**: Immersive experience, clear hierarchy
+- **SSE real-time progress**: Real-time display of research progress, good user experience
+- **Markdown visualization**: Beautiful format, clear structure
+
+Through the Vue 3 + TypeScript + SSE technology stack, we implemented a modern web application.
+
+This knowledge is not only applicable to deep research assistants, but can also be applied to other AI applications. We hope readers can explore more possibilities based on this chapter and build more powerful AI systems.
+
+In the next chapter, we will build a multi-agent system combined with a game engine - Cyber Town, exploring complex interaction and collaboration patterns between Agents. Stay tuned!
+

+ 159 - 155
docs/chapter14/第十四章 自动化深度研究智能体.md

@@ -1,12 +1,16 @@
+<div align="right">
+  <a href="./Chapter14-Automated-Deep-Research-Agent.md">English</a> | 中文
+</div>
+
 # 第十四章 自动化深度研究智能体
 
-在第十三章的旅行助手项目中,我们体验了如何将HelloAgents应用于一个多智能体产品。本章我们继续向前,聚焦「知识密集型应用」:<strong>构建一个能够自动化执行深度研究任务的智能体助手。</strong>
+在第十三章的旅行助手项目中,我们体验了如何将 HelloAgents 应用于一个多智能体产品。本章我们继续向前,聚焦「知识密集型应用」:<strong>构建一个能够自动化执行深度研究任务的智能体助手。</strong>
 
 相比旅行规划,深度研究的难点在于信息的不断发散、事实的快速更新以及用户对引用来源的高要求。为了交付可信的研究报告,我们需要让智能体具备三个核心能力:
 
 <strong>(1)问题剖析</strong>:将用户的开放主题拆解为可检索的查询语句。
 
-<strong>(2)多轮信息采集</strong>:结合不同搜索API持续挖掘资料,并去重整合。
+<strong>(2)多轮信息采集</strong>:结合不同搜索 API 持续挖掘资料,并去重整合。
 
 <strong>(3)反思与总结</strong>:依据阶段结果识别知识空白,决定是否继续检索,并生成结构化总结。
 
@@ -20,14 +24,14 @@
 
 <strong>深度研究助手的核心价值:</strong>
 
-1. <strong>节省时间</strong>:将1-2小时的研究工作压缩到5-10分钟
+1. <strong>节省时间</strong>:将 1-2 小时的研究工作压缩到 5-10 分钟
 2. <strong>提高质量</strong>:系统化的研究流程,避免遗漏重要信息
 3. <strong>可追溯</strong>:记录所有搜索结果和来源,方便验证和引用
 4. <strong>可扩展</strong>:可以轻松添加新的搜索引擎、数据源和分析工具
 
 ### 14.1.2 技术架构概览
 
-此次系统仍然采用经典的<strong>前后端分离架构</strong>,如图14.1所示。
+此次系统仍然采用经典的<strong>前后端分离架构</strong>,如图 14.1 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-1.png" alt="" width="85%"/>
@@ -36,15 +40,15 @@
 
 系统分为四层架构设计:
 
-<strong>前端层 (Vue3+TypeScript)</strong>:全屏模态对话框UI、Markdown结果可视化
+<strong>前端层 (Vue3+TypeScript)</strong>:全屏模态对话框 UI、Markdown 结果可视化
 
-<strong>后端层 (FastAPI)</strong>:API路由(`/research/stream`)
+<strong>后端层 (FastAPI)</strong>:API 路由(`/research/stream`)
 
-<strong>智能体层 (HelloAgents)</strong>:三个专门Agent(TODO Planner、Task Summarizer、Report Writer)+ 两个核心工具(SearchTool、NoteTool)
+<strong>智能体层 (HelloAgents)</strong>:三个专门 Agent(TODO Planner、Task Summarizer、Report Writer)+ 两个核心工具(SearchTool、NoteTool)
 
-<strong>外部服务层</strong>:搜索引擎+ LLM提供商
+<strong>外部服务层</strong>:搜索引擎+ LLM 提供商
 
-让我们看看一个完整的研究请求是如何在系统中流转的,如图14.2所示:
+让我们看看一个完整的研究请求是如何在系统中流转的,如图 14.2 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-2.png" alt="" width="85%"/>
@@ -52,15 +56,15 @@
 </div>
 
 1. <strong>用户输入</strong>:用户在前端输入研究主题
-2. <strong>前端发送</strong>:前端通过SSE连接到`/research/stream`
-3. <strong>后端接收</strong>:FastAPI接收请求,创建研究状态
-4. <strong>规划阶段</strong>:调用研究规划Agent,分解为3个子任务
+2. <strong>前端发送</strong>:前端通过 SSE 连接到`/research/stream`
+3. <strong>后端接收</strong>:FastAPI 接收请求,创建研究状态
+4. <strong>规划阶段</strong>:调用研究规划 Agent,分解为 3 个子任务
 5. <strong>执行阶段</strong>:逐个执行每个子任务
-   - 使用SearchTool搜索
-   - 调用任务总结Agent总结
-   - 使用NoteTool记录结果
-6. <strong>报告阶段</strong>:调用报告生成Agent,整合所有总结
-7. <strong>流式返回</strong>:通过SSE推送进度和结果到前端
+   - 使用 SearchTool 搜索
+   - 调用任务总结 Agent 总结
+   - 使用 NoteTool 记录结果
+6. <strong>报告阶段</strong>:调用报告生成 Agent,整合所有总结
+7. <strong>流式返回</strong>:通过 SSE 推送进度和结果到前端
 8. <strong>前端展示</strong>:前端实时更新任务状态、进度条、日志、报告
 
 项目的目录结构如下:
@@ -94,7 +98,7 @@ helloagents-deepresearch/
     └── vite.config.ts         # 构建配置
 ```
 
-### 14.1.3 快速体验:5分钟运行项目
+### 14.1.3 快速体验:5 分钟运行项目
 
 在深入学习实现细节之前,让我们先把项目跑起来,看看最终的效果。这样你会对整个系统有一个直观的认识。
 
@@ -169,14 +173,14 @@ npm run dev
 
 (3)开始研究
 
-打开浏览器访问 `http://localhost:5174`,你会看到一个居中的输入卡片,如图14.3所示。输入研究主题,例如`Datawhale是一个什么样的组织?`,选择搜索引擎(如果配置了多个),点击"开始研究"按钮。
+打开浏览器访问 `http://localhost:5174`,你会看到一个居中的输入卡片,如图 14.3 所示。输入研究主题,例如`Datawhale是一个什么样的组织?`,选择搜索引擎(如果配置了多个),点击"开始研究"按钮。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-3.png" alt="" width="85%"/>
   <p>图 14.3 深度研究助手搜索页面</p>
 </div>
 
-如图14.4所示,系统会自动展开为全屏,左侧显示研究信息,右侧实时显示研究进度和结果。整个研究过程大约需要1-3分钟,取决于主题的复杂度和搜索引擎的响应速度。
+如图 14.4 所示,系统会自动展开为全屏,左侧显示研究信息,右侧实时显示研究进度和结果。整个研究过程大约需要 1-3 分钟,取决于主题的复杂度和搜索引擎的响应速度。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-4.png" alt="" width="85%"/>
@@ -187,19 +191,19 @@ npm run dev
 
 - <strong>任务列表</strong>:显示所有子任务及其状态
 - <strong>进度日志</strong>:显示研究过程中的所有操作
-- <strong>最终报告</strong>:结构化的Markdown报告,包含所有子任务的总结和来源引用
+- <strong>最终报告</strong>:结构化的 Markdown 报告,包含所有子任务的总结和来源引用
 
 现在你已经成功运行了深度研究助手,对系统有了直观的认识。
 
-## 14.2 TODO驱动的研究范式
+## 14.2 TODO 驱动的研究范式
 
-### 14.2.1 什么是TODO驱动的研究
+### 14.2.1 什么是 TODO 驱动的研究
 
-传统的搜索引擎只能回答单个问题,而深度研究需要回答一系列相关的问题。TODO驱动的研究范式将复杂的研究主题分解为多个子任务(TODO),逐个执行并整合结果。
+传统的搜索引擎只能回答单个问题,而深度研究需要回答一系列相关的问题。TODO 驱动的研究范式将复杂的研究主题分解为多个子任务(TODO),逐个执行并整合结果。
 
 这种范式的核心思想是:<strong>将"研究"这个复杂任务转化为"规划→执行→整合"的流程</strong>。
 
-让我们通过一个例子来理解这个转变。假设你想研究"Datawhale是一个什么样的组织?",传统的搜索方式是:
+让我们通过一个例子来理解这个转变。假设你想研究"Datawhale 是一个什么样的组织?",传统的搜索方式是:
 
 ```
 用户输入:Datawhale是一个什么样的组织?
@@ -210,7 +214,7 @@ npm run dev
 
 这种方式的问题在于每个链接只涵盖主题的一个方面、缺少系统性结构,需要手动整理和总结。
 
-<strong>TODO驱动方式:系统化研究</strong>
+<strong>TODO 驱动方式:系统化研究</strong>
 
 ```
 用户输入:Datawhale是一个什么样的组织?
@@ -238,19 +242,19 @@ npm run dev
 
 这种方式的优势在于将复杂主题分解为清晰的子问题,每个子任务的搜索结果和总结都被记录下来,方便追溯。同时,系统化的研究流程避免了遗漏重要信息,可以轻松添加新的子任务或调整执行顺序。
 
-一个完整的TODO驱动研究系统包含三个核心要素:
+一个完整的 TODO 驱动研究系统包含三个核心要素:
 
-<strong>(1)智能规划器(TODO Planner)</strong>:负责将研究主题分解为子任务。一个好的规划器需要理解主题的关键方面和研究目标,将主题分解为3-5个子任务(太少覆盖不全,太多会冗余),并为每个子任务设计合适的搜索查询。
+<strong>(1)智能规划器(TODO Planner)</strong>:负责将研究主题分解为子任务。一个好的规划器需要理解主题的关键方面和研究目标,将主题分解为 3-5 个子任务(太少覆盖不全,太多会冗余),并为每个子任务设计合适的搜索查询。
 
 <strong>(2)任务执行器(Task Executor)</strong>:负责执行每个子任务。执行器需要使用搜索引擎获取相关资料,提取关键信息并去除冗余内容,同时保存所有来源引用以方便验证。
 
 <strong>(3)报告生成器(Report Writer)</strong>:负责整合所有子任务的结果。生成器需要按照逻辑顺序组织内容,合并重复的信息,并为每个观点添加来源引用。
 
-在我们的案例里,TODO驱动的研究流程如图14.5所示:
+在我们的案例里,TODO 驱动的研究流程如图 14.5 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-5.png" alt="" width="85%"/>
-  <p>图 14.5 TODO驱动的研究流程</p>
+  <p>图 14.5 TODO 驱动的研究流程</p>
 </div>
 
 
@@ -258,13 +262,13 @@ npm run dev
 
 ### 14.2.2 三阶段研究流程
 
-TODO驱动的研究流程分为三个阶段:规划(Planning)、执行(Execution)、报告(Reporting)。每个阶段都有专门的Agent负责。
+TODO 驱动的研究流程分为三个阶段:规划(Planning)、执行(Execution)、报告(Reporting)。每个阶段都有专门的 Agent 负责。
 
-<strong>(1)阶段1:规划</strong>
+<strong>(1)阶段 1:规划</strong>
 
-规划阶段的目标是将研究主题分解为3-5个子任务。系统接收研究主题和当前日期作为输入,输出JSON格式的子任务列表。每个子任务包含三个字段:title(任务标题)、intent(研究意图)和query(搜索查询)。
+规划阶段的目标是将研究主题分解为 3-5 个子任务。系统接收研究主题和当前日期作为输入,输出 JSON 格式的子任务列表。每个子任务包含三个字段:title(任务标题)、intent(研究意图)和 query(搜索查询)。
 
-研究规划Agent会根据主题特点采用不同的分解策略,通常从基础概念入手,然后了解技术现状、实际应用和发展趋势,必要时还会进行对比分析。例如,对于"Datawhale是一个什么样的组织?",规划Agent可能生成以下子任务:
+研究规划 Agent 会根据主题特点采用不同的分解策略,通常从基础概念入手,然后了解技术现状、实际应用和发展趋势,必要时还会进行对比分析。例如,对于"Datawhale 是一个什么样的组织?",规划 Agent 可能生成以下子任务:
 
 ```json
 [
@@ -284,9 +288,9 @@ TODO驱动的研究流程分为三个阶段:规划(Planning)、执行(Exec
 
 一个好的规划应该覆盖全面、逻辑清晰、查询精准、条目数量适中。
 
-<strong>(2)阶段2:执行</strong>
+<strong>(2)阶段 2:执行</strong>
 
-执行阶段逐个执行每个子任务,搜索并总结相关资料。系统接收子任务列表和搜索引擎配置作为输入,输出每个子任务的总结(Markdown格式)和来源引用列表。执行流程如下:
+执行阶段逐个执行每个子任务,搜索并总结相关资料。系统接收子任务列表和搜索引擎配置作为输入,输出每个子任务的总结(Markdown 格式)和来源引用列表。执行流程如下:
 
 对于每个子任务,执行器会:
 
@@ -316,7 +320,7 @@ TODO驱动的研究流程分为三个阶段:规划(Planning)、执行(Exec
    }
    ```
 
-3. <strong>调用总结Agent</strong>:总结搜索结果
+3. <strong>调用总结 Agent</strong>:总结搜索结果
 
    ```python
    summary = summarizer_agent.run(
@@ -325,7 +329,7 @@ TODO驱动的研究流程分为三个阶段:规划(Planning)、执行(Exec
    )
    ```
 
-4. <strong>记录总结和来源</strong>:保存到NoteTool
+4. <strong>记录总结和来源</strong>:保存到 NoteTool
 
    ```python
    note_tool.run({
@@ -336,7 +340,7 @@ TODO驱动的研究流程分为三个阶段:规划(Planning)、执行(Exec
    })
    ```
 
-任务总结Agent会从每个搜索结果中提取核心观点,合并相似信息,保留重要的数字、日期、名称等关键数据,并为每个观点添加来源引用。例如,对于"Datawhale的基本信息"的搜索结果,总结Agent可能生成:
+任务总结 Agent 会从每个搜索结果中提取核心观点,合并相似信息,保留重要的数字、日期、名称等关键数据,并为每个观点添加来源引用。例如,对于"Datawhale 的基本信息"的搜索结果,总结 Agent 可能生成:
 
 ```markdown
 ## Datawhale的基本信息
@@ -390,9 +394,9 @@ Datawhale是一个专注于数据科学与AI领域的开源组织,成立于201
 }
 ```
 
-<strong>(3)阶段3:报告</strong>
+<strong>(3)阶段 3:报告</strong>
 
-报告阶段的目标是整合所有子任务的总结,生成最终报告。系统接收所有子任务的总结和研究主题作为输入,输出Markdown格式的最终报告。报告包含标题、概述、各个子任务的详细分析、总结和参考文献五个部分。例如,对于"Datawhale是一个什么样的组织?",最终报告可能是:
+报告阶段的目标是整合所有子任务的总结,生成最终报告。系统接收所有子任务的总结和研究主题作为输入,输出 Markdown 格式的最终报告。报告包含标题、概述、各个子任务的详细分析、总结和参考文献五个部分。例如,对于"Datawhale 是一个什么样的组织?",最终报告可能是:
 
 ```markdown
 # Datawhale是一个什么样的组织?
@@ -424,32 +428,32 @@ Datawhale发布了多个高质量的开源教程,包括Hello-Agents、Joyful-P
 ...
 ```
 
-报告生成Agent会按照子任务的逻辑顺序组织内容,在开头添加简要概述,合并重复的信息,统一Markdown格式,并将所有来源引用整理到参考文献部分。
+报告生成 Agent 会按照子任务的逻辑顺序组织内容,在开头添加简要概述,合并重复的信息,统一 Markdown 格式,并将所有来源引用整理到参考文献部分。
 
 ## 14.3 智能体系统设计
 
-### 14.3.1 Agent职责划分
+### 14.3.1 Agent 职责划分
 
-在深度研究助手中,我们设计了三个专门的Agent,每个Agent负责一个特定的任务。这使得每个Agent都很简单,易于理解和维护。
+在深度研究助手中,我们设计了三个专门的 Agent,每个 Agent 负责一个特定的任务。这使得每个 Agent 都很简单,易于理解和维护。
 
-在第七章中,我们学习了如何使用`SimpleAgent`来构建智能体。`SimpleAgent`的设计理念是简单直接:每次调用`run()`方法时,Agent会分析用户的问题,决定是否需要调用工具,然后返回结果。这种设计在处理简单任务时非常有效,但当面对深度研究这样的复杂任务时,就需要我们继续采用多智能体协作的方案进行。
+在第七章中,我们学习了如何使用`SimpleAgent`来构建智能体。`SimpleAgent`的设计理念是简单直接:每次调用`run()`方法时,Agent 会分析用户的问题,决定是否需要调用工具,然后返回结果。这种设计在处理简单任务时非常有效,但当面对深度研究这样的复杂任务时,就需要我们继续采用多智能体协作的方案进行。
 
-如表14.1所示,三个Agent分别负责规划、总结和报告生成。
+如表 14.1 所示,三个 Agent 分别负责规划、总结和报告生成。
 
 <div align="center">
-  <p>表 14.1 三个Agent的职责划分</p>
+  <p>表 14.1 三个 Agent 的职责划分</p>
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-table-1.png" alt="" width="85%"/>
 </div>
 
-让我们详细介绍每个Agent的设计。
+让我们详细介绍每个 Agent 的设计。
 
 <strong>Agent 1:研究规划专家(TODO Planner)</strong>
 
-<strong>职责</strong>:将研究主题分解为3-5个子任务
+<strong>职责</strong>:将研究主题分解为 3-5 个子任务
 
 <strong>设计理念</strong>:研究规划专家的核心任务是理解用户的研究主题,分析主题的关键方面,然后生成一系列子任务。这个过程类似于人类研究者在开始研究前的"头脑风暴"阶段。
 
-<strong>Prompt设计</strong>:
+<strong>Prompt 设计</strong>:
 
 ```python
 todo_planner_instructions = """
@@ -487,11 +491,11 @@ todo_planner_instructions = """
 """
 ```
 
-<strong>关键设计点</strong>:提示词包含当前日期以获取最新信息,明确要求JSON格式输出便于解析,通过示例帮助Agent理解期望输出,并强调子任务数量、逻辑关系等约束。
+<strong>关键设计点</strong>:提示词包含当前日期以获取最新信息,明确要求 JSON 格式输出便于解析,通过示例帮助 Agent 理解期望输出,并强调子任务数量、逻辑关系等约束。
 
 <strong>实现代码</strong>:
 
-这里的ToolAwareSimpleAgent是根据SimpleAgent拓展实现,可以在14.3.2了解,这里不用深究。
+这里的 ToolAwareSimpleAgent 是根据 SimpleAgent 拓展实现,可以在 14.3.2 了解,这里不用深究。
 
 ```python
 class PlanningService:
@@ -541,7 +545,7 @@ class PlanningService:
 
 <strong>设计理念</strong>:任务总结专家的核心任务是阅读搜索结果,提取关键信息,并以结构化的方式呈现。这个过程类似于人类研究者在阅读文献后做笔记的过程。
 
-<strong>Prompt设计</strong>:
+<strong>Prompt 设计</strong>:
 
 ```python
 task_summarizer_instructions = """
@@ -585,7 +589,7 @@ task_summarizer_instructions = """
 """
 ```
 
-<strong>关键设计点</strong>:提示词包含任务标题、意图、查询等上下文帮助Agent理解任务,明确要求输出包含核心观点、关键数据、来源引用,强调为每个观点添加来源引用,并通过示例帮助Agent理解期望的输出格式。
+<strong>关键设计点</strong>:提示词包含任务标题、意图、查询等上下文帮助 Agent 理解任务,明确要求输出包含核心观点、关键数据、来源引用,强调为每个观点添加来源引用,并通过示例帮助 Agent 理解期望的输出格式。
 
 <strong>实现代码</strong>:
 
@@ -635,7 +639,7 @@ class SummarizationService:
 
 <strong>设计理念</strong>:报告撰写专家的核心任务是将所有子任务的总结整合成一份结构化的报告。这个过程类似于人类研究者在完成所有调研后撰写研究报告的过程。
 
-<strong>Prompt设计</strong>:
+<strong>Prompt 设计</strong>:
 
 ```python
 report_writer_instructions = """
@@ -735,16 +739,16 @@ class ReportingService:
         return "\n".join(formatted)
 ```
 
-### 14.3.2 ToolAwareSimpleAgent的设计
+### 14.3.2 ToolAwareSimpleAgent 的设计
 
-在第七章中,我们实现了`SimpleAgent`,它是HelloAgents框架的基础Agent。但在深度研究助手中,我们需要一个能够<strong>记录工具调用</strong>的Agent。这就是`ToolAwareSimpleAgent`的由来。
+在第七章中,我们实现了`SimpleAgent`,它是 HelloAgents 框架的基础 Agent。但在深度研究助手中,我们需要一个能够<strong>记录工具调用</strong>的 Agent。这就是`ToolAwareSimpleAgent`的由来。
 
-在深度研究助手中,我们需要记录每个Agent的工具调用情况,用于:
+在深度研究助手中,我们需要记录每个 Agent 的工具调用情况,用于:
 
-1. <strong>调试</strong>:查看Agent调用了哪些工具,传入了什么参数
+1. <strong>调试</strong>:查看 Agent 调用了哪些工具,传入了什么参数
 2. <strong>日志</strong>:记录研究过程中的所有操作
-3. <strong>分析</strong>:分析Agent的行为模式
-4. <strong>进度展示</strong>:实时显示Agent正在做什么
+3. <strong>分析</strong>:分析 Agent 的行为模式
+4. <strong>进度展示</strong>:实时显示 Agent 正在做什么
 
 `SimpleAgent`本身不支持工具调用监听,因此我们需要扩展它。
 
@@ -809,7 +813,7 @@ class ToolAwareSimpleAgent(SimpleAgent):
         return result
 ```
 
-在深度研究助手中,我们使用`ToolAwareSimpleAgent`来记录所有Agent的工具调用:
+在深度研究助手中,我们使用`ToolAwareSimpleAgent`来记录所有 Agent 的工具调用:
 
 ```python
 class DeepResearchAgent:
@@ -832,24 +836,24 @@ class DeepResearchAgent:
         self.reporter = ReportingService(self.llm, tool_listener)
 ```
 
-这样,所有Agent的工具调用都会被记录,并通过SSE推送到前端,实时显示给用户。
+这样,所有 Agent 的工具调用都会被记录,并通过 SSE 推送到前端,实时显示给用户。
 
-### 14.3.3 Agent协作模式
+### 14.3.3 Agent 协作模式
 
-三个Agent之间是<strong>顺序协作</strong>的关系,如图14.6所示。
+三个 Agent 之间是<strong>顺序协作</strong>的关系,如图 14.6 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-6.png" alt="" width="85%"/>
-  <p>图 14.6 Agent协作流程</p>
+  <p>图 14.6 Agent 协作流程</p>
 </div>
 
 顺序协作模式的特点是:
 
-1. <strong>线性流程</strong>:Agent按照固定的顺序执行
-2. <strong>明确的输入输出</strong>:每个Agent的输入来自上一个Agent的输出
-3. <strong>无并发</strong>:同一时间只有一个Agent在工作
+1. <strong>线性流程</strong>:Agent 按照固定的顺序执行
+2. <strong>明确的输入输出</strong>:每个 Agent 的输入来自上一个 Agent 的输出
+3. <strong>无并发</strong>:同一时间只有一个 Agent 在工作
 
-`DeepResearchAgent`是整个系统的核心协调器,负责调度三个Agent:
+`DeepResearchAgent`是整个系统的核心协调器,负责调度三个 Agent:
 
 ```python
 class DeepResearchAgent:
@@ -891,11 +895,11 @@ class DeepResearchAgent:
 
 ## 14.4 工具系统集成
 
-### 14.4.1 SearchTool扩展
+### 14.4.1 SearchTool 扩展
 
-在第七章中,我们实现了`SearchTool`的基础版本,集成了Tavily和SerpApi两个搜索引擎,展示了多源搜索的设计思想。在本章的深度研究助手中,我们进一步扩展了`SearchTool`的能力,新增了DuckDuckGo、Perplexity、SearXNG等搜索引擎,并实现了Advanced模式(组合多个搜索引擎)。搜索是深度研究助手最核心的功能,这些扩展使得系统能够适应不同的使用场景和需求。
+在第七章中,我们实现了`SearchTool`的基础版本,集成了 Tavily  SerpApi 两个搜索引擎,展示了多源搜索的设计思想。在本章的深度研究助手中,我们进一步扩展了`SearchTool`的能力,新增了 DuckDuckGo、Perplexity、SearXNG 等搜索引擎,并实现了 Advanced 模式(组合多个搜索引擎)。搜索是深度研究助手最核心的功能,这些扩展使得系统能够适应不同的使用场景和需求。
 
-如表14.2所示,这次增加的搜索引擎有不同的特点和适用场景。
+如表 14.2 所示,这次增加的搜索引擎有不同的特点和适用场景。
 
 <div align="center">
   <p>表 14.2 多搜索引擎对比</p>
@@ -931,12 +935,12 @@ SEARCH_API=tavily
 
 - `results`:搜索结果列表,每个结果包含标题、URL、摘要
 - `backend`:使用的搜索引擎
-- `answer`:AI生成的答案(仅Perplexity)
-- `notices`:通知信息(如API限制、错误等)
+- `answer`:AI 生成的答案(仅 Perplexity)
+- `notices`:通知信息(如 API 限制、错误等)
 
 以下是一些特殊情况的处理。
 
-搜索结果可能包含重复的URL,我们需要去重:
+搜索结果可能包含重复的 URL,我们需要去重:
 
 ```python
 def deduplicate_sources(sources: List[dict]) -> List[dict]:
@@ -952,7 +956,7 @@ def deduplicate_sources(sources: List[dict]) -> List[dict]:
     return unique_sources
 ```
 
-搜索结果可能包含大量文本,我们需要限制每个来源的Token数量:
+搜索结果可能包含大量文本,我们需要限制每个来源的 Token 数量:
 
 ```python
 def limit_source_tokens(source: dict, max_tokens: int = 2000) -> dict:
@@ -971,13 +975,13 @@ def limit_source_tokens(source: dict, max_tokens: int = 2000) -> dict:
     }
 ```
 
-### 14.4.2 NoteTool使用
+### 14.4.2 NoteTool 使用
 
 在深度研究助手中,我们使用`NoteTool`来持久化研究进度。`NoteTool`是第九章集成的内置工具,用于创建、读取、更新和删除笔记。
 
 在研究过程中,我们需要记录每个子任务的搜索结果、总结以及最终的研究报告。这些信息需要持久化到磁盘,以便在研究过程中断时能够从上次的进度继续,同时也方便查看研究过程中的所有操作,分析研究的质量和效率。
 
-`NoteTool`将笔记存储在指定的工作空间目录中,每个笔记是一个Markdown文件。笔记的文件名是任务ID,内容包含任务标题、任务意图、搜索查询、搜索结果和总结。
+`NoteTool`将笔记存储在指定的工作空间目录中,每个笔记是一个 Markdown 文件。笔记的文件名是任务 ID,内容包含任务标题、任务意图、搜索查询、搜索结果和总结。
 
 最后生成的文件风格会是下面的树状图风格:
 
@@ -1044,11 +1048,11 @@ class NotesService:
         return content
 ```
 
-### 14.4.3 ToolRegistry工具管理
+### 14.4.3 ToolRegistry 工具管理
 
-`ToolRegistry`是HelloAgents框架的工具注册表,同样也是在我们的第七章所支持,用于管理所有工具的注册和调用。在深度研究助手中,我们使用`ToolRegistry`来管理`SearchTool`和`NoteTool`。
+`ToolRegistry`是 HelloAgents 框架的工具注册表,同样也是在我们的第七章所支持,用于管理所有工具的注册和调用。在深度研究助手中,我们使用`ToolRegistry`来管理`SearchTool`和`NoteTool`。
 
-在创建Agent之前,我们需要先注册工具:
+在创建 Agent 之前,我们需要先注册工具:
 
 ```python
 from hello_agents import ToolAwareSimpleAgent
@@ -1076,7 +1080,7 @@ agent = ToolAwareSimpleAgent(
 )
 ```
 
-当Agent需要调用工具时,它会生成工具调用指令,如图14.7所示。
+当 Agent 需要调用工具时,它会生成工具调用指令,如图 14.7 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-7.png" alt="" width="85%"/>
@@ -1085,28 +1089,28 @@ agent = ToolAwareSimpleAgent(
 
 **工具调用流程<strong>:
 
-1. </strong>Agent生成指令<strong>:Agent生成工具调用指令,如`[TOOL_CALL:search_tool:{"input": "Datawhale组织", "backend": "tavily"}]`
+1. </strong>Agent 生成指令<strong>:Agent 生成工具调用指令,如`[TOOL_CALL:search_tool:{"input": "Datawhale组织", "backend": "tavily"}]`
 2. </strong>解析指令<strong>:`ToolRegistry`解析指令,提取工具名称和参数
 3. </strong>查找工具<strong>:`ToolRegistry`根据工具名称查找对应的工具
 4. </strong>调用工具<strong>:调用工具的`run`方法,传入参数
 5. </strong>返回结果<strong>:工具返回执行结果
-6. </strong>格式化结果<strong>:将结果格式化为字符串,返回给Agent
+6. </strong>格式化结果<strong>:将结果格式化为字符串,返回给 Agent
 
 ## 14.5 服务层实现
 
-本节将详细介绍核心服务的实现,包括PlanningService、SummarizationService、ReportingService和SearchService。这些服务是连接Agent和工具的桥梁,负责具体的业务逻辑。
+本节将详细介绍核心服务的实现,包括 PlanningService、SummarizationService、ReportingService  SearchService。这些服务是连接 Agent 和工具的桥梁,负责具体的业务逻辑。
 
 ### 14.5.1 任务规划服务
 
-`PlanningService`负责调用研究规划Agent,将研究主题分解为子任务。这是整个研究流程的第一步,也是最关键的一步。
+`PlanningService`负责调用研究规划 Agent,将研究主题分解为子任务。这是整个研究流程的第一步,也是最关键的一步。
 
 </strong>(1)方案实现<strong>
 
 它的核心职责是:
 
-1. </strong>构建规划Prompt<strong>:根据研究主题和当前日期构建Prompt
-2. </strong>调用规划Agent<strong>:调用TODO Planner Agent生成子任务列表
-3. </strong>解析JSON响应<strong>:从Agent的响应中提取JSON格式的子任务列表
+1. </strong>构建规划 Prompt<strong>:根据研究主题和当前日期构建 Prompt
+2. </strong>调用规划 Agent<strong>:调用 TODO Planner Agent 生成子任务列表
+3. </strong>解析 JSON 响应<strong>:从 Agent 的响应中提取 JSON 格式的子任务列表
 4. </strong>验证子任务格式**:确保每个子任务包含必需的字段(title、intent、query)
 
 ```python
@@ -1205,20 +1209,20 @@ class PlanningService:
             raise ValueError("无法从响应中提取JSON")
 ```
 
-<strong>(2)JSON解析与验证</strong>
+<strong>(2)JSON 解析与验证</strong>
 
-Agent返回的JSON可能包含额外的文本或格式错误,我们需要robust的解析逻辑:
+Agent 返回的 JSON 可能包含额外的文本或格式错误,我们需要 robust 的解析逻辑:
 
 <strong>常见问题</strong>:
 
-1. <strong>包含额外文本</strong>:Agent可能在JSON前后添加说明文字
-2. <strong>格式错误</strong>:JSON可能缺少引号、逗号等
+1. <strong>包含额外文本</strong>:Agent 可能在 JSON 前后添加说明文字
+2. <strong>格式错误</strong>:JSON 可能缺少引号、逗号等
 3. <strong>字段缺失</strong>:某些子任务可能缺少必需字段
 
 <strong>解决方案</strong>:
 
-1. <strong>使用正则表达式</strong>:提取JSON部分
-2. <strong>多种解析策略</strong>:先尝试提取JSON数组,再尝试直接解析
+1. <strong>使用正则表达式</strong>:提取 JSON 部分
+2. <strong>多种解析策略</strong>:先尝试提取 JSON 数组,再尝试直接解析
 3. <strong>字段验证</strong>:确保每个子任务包含必需字段
 
 <strong>示例</strong>:
@@ -1268,7 +1272,7 @@ tasks2 = service._extract_tasks(response2)
 1. <strong>覆盖全面</strong>:涵盖主题的所有重要方面
 2. <strong>逻辑清晰</strong>:子任务之间有明确的逻辑关系
 3. <strong>查询精准</strong>:搜索查询能够准确找到相关资料
-4. <strong>数量适中</strong>:3-5个子任务
+4. <strong>数量适中</strong>:3-5 个子任务
 
 我们可以添加一个评估方法:
 
@@ -1307,13 +1311,13 @@ def evaluate_plan(self, todo_items: List[TodoItem]) -> dict:
 
 ### 14.5.2 总结服务
 
-`SummarizationService`负责调用任务总结Agent,总结搜索结果。这是研究流程的核心环节,决定了研究的质量。
+`SummarizationService`负责调用任务总结 Agent,总结搜索结果。这是研究流程的核心环节,决定了研究的质量。
 
 它的职责是:
 
 1. <strong>格式化搜索结果</strong>:将搜索结果格式化为易读的文本
-2. <strong>构建总结Prompt</strong>:根据任务信息和搜索结果构建Prompt
-3. <strong>调用总结Agent</strong>:调用Task Summarizer Agent生成总结
+2. <strong>构建总结 Prompt</strong>:根据任务信息和搜索结果构建 Prompt
+3. <strong>调用总结 Agent</strong>:调用 Task Summarizer Agent 生成总结
 4. <strong>提取来源引用</strong>:从总结中提取来源引用
 
 核心代码:
@@ -1403,13 +1407,13 @@ class SummarizationService:
 
 ### 14.5.3 报告生成服务
 
-`ReportingService`负责调用报告生成Agent,整合所有子任务的总结。这是研究流程的最后一步,生成最终的研究报告。
+`ReportingService`负责调用报告生成 Agent,整合所有子任务的总结。这是研究流程的最后一步,生成最终的研究报告。
 
 它的职责是:
 
 1. <strong>格式化子任务总结</strong>:将所有子任务的总结格式化为统一的格式
-2. <strong>构建报告Prompt</strong>:根据研究主题和子任务总结构建Prompt
-3. <strong>调用报告Agent</strong>:调用Report Writer Agent生成最终报告
+2. <strong>构建报告 Prompt</strong>:根据研究主题和子任务总结构建 Prompt
+3. <strong>调用报告 Agent</strong>:调用 Report Writer Agent 生成最终报告
 4. <strong>整理引用</strong>:将所有来源引用整理到参考文献部分
 
 <strong>核心代码实现</strong>:
@@ -1499,13 +1503,13 @@ class ReportingService:
 
 ### 14.5.4 搜索调度服务
 
-`SearchService`负责调度搜索引擎,执行搜索并返回结果。这是连接Agent和SearchTool的桥梁。在这里我们没有采用往常一样的使得simpleAgent直接调用工具的形式,而是将SearchTool的执行结果通过中间层来返回给Agent,这样会使得Agent更加专注处理得到的信息。
+`SearchService`负责调度搜索引擎,执行搜索并返回结果。这是连接 Agent  SearchTool 的桥梁。在这里我们没有采用往常一样的使得 simpleAgent 直接调用工具的形式,而是将 SearchTool 的执行结果通过中间层来返回给 Agent,这样会使得 Agent 更加专注处理得到的信息。
 
 它的职责是:
 
 1. <strong>调度搜索引擎</strong>:根据配置选择搜索引擎
-2. <strong>执行搜索</strong>:调用SearchTool执行搜索
-3. <strong>处理结果</strong>:去重、限制Token、格式化
+2. <strong>执行搜索</strong>:调用 SearchTool 执行搜索
+3. <strong>处理结果</strong>:去重、限制 Token、格式化
 4. <strong>错误处理</strong>:处理搜索失败的情况
 
 核心代码:
@@ -1604,7 +1608,7 @@ class SearchService:
         return limited_sources
 ```
 
-根据配置选择搜索引擎,如图14.8所示:
+根据配置选择搜索引擎,如图 14.8 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-8.png" alt="" width="85%"/>
@@ -1614,9 +1618,9 @@ class SearchService:
 **调度逻辑<strong>:
 
 1. </strong>读取配置<strong>:从`.env`文件读取`SEARCH_API`配置
-2. </strong>选择引擎<strong>:根据配置选择搜索引擎(tavily、duckduckgo、perplexity等)
-3. </strong>执行搜索<strong>:调用SearchTool执行搜索
-4. </strong>处理结果<strong>:去重、限制Token、格式化
+2. </strong>选择引擎<strong>:根据配置选择搜索引擎(tavily、duckduckgo、perplexity 等)
+3. </strong>执行搜索<strong>:调用 SearchTool 执行搜索
+4. </strong>处理结果<strong>:去重、限制 Token、格式化
 5. </strong>返回结果<strong>:返回处理后的搜索结果
 
 为了提高效率和降低成本,我们可以添加搜索结果缓存:
@@ -1673,32 +1677,32 @@ class SearchService:
 
 ## 14.6 前端交互设计
 
-在前面的章节中,我们实现了完整的后端系统。本节将详细介绍前端交互设计,包括全屏模态对话框UI、实时进度展示和研究结果可视化。
+在前面的章节中,我们实现了完整的后端系统。本节将详细介绍前端交互设计,包括全屏模态对话框 UI、实时进度展示和研究结果可视化。
 
-### 14.6.1 全屏模态对话框UI设计
+### 14.6.1 全屏模态对话框 UI 设计
 
-深度研究助手采用全屏模态对话框的UI设计,这种设计有以下优势:
+深度研究助手采用全屏模态对话框的 UI 设计,这种设计有以下优势:
 
 1. </strong>沉浸式体验<strong>:全屏显示,避免干扰,专注于研究
 2. </strong>清晰的层次<strong>:主页面和研究页面分离,层次清晰
-3. </strong>易于关闭<strong>:点击关闭按钮或按ESC键即可返回主页面
+3. </strong>易于关闭<strong>:点击关闭按钮或按 ESC 键即可返回主页面
 4. </strong>响应式设计<strong>:适配不同屏幕尺寸
 
-如图14.9所示,全屏模态对话框包含以下部分:
+如图 14.9 所示,全屏模态对话框包含以下部分:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-9.png" alt="" width="85%"/>
-  <p>图 14.9 全屏模态对话框UI</p>
+  <p>图 14.9 全屏模态对话框 UI</p>
 </div>
 
-</strong>UI组件<strong>:
+</strong>UI 组件<strong>:
 
 1. </strong>顶部栏<strong>:包含研究主题和关闭按钮
 2. </strong>进度区域<strong>:显示当前研究进度(规划、执行、报告)
-3. </strong>内容区域<strong>:显示研究结果(Markdown格式)
+3. </strong>内容区域<strong>:显示研究结果(Markdown 格式)
 4. </strong>底部栏**:显示状态信息(如"研究中..."、"已完成")
 
-对应的Vue实现如下所示(ResearchModal.vue):
+对应的 Vue 实现如下所示(ResearchModal.vue):
 
 ```vue
 <template>
@@ -1840,26 +1844,26 @@ watch(() => props.isOpen, (isOpen) => {
 
 ### 14.6.2 实时进度展示
 
-深度研究助手使用SSE实现实时进度展示。SSE是一种服务器推送技术,允许服务器主动向客户端发送数据,在协议章节也有所讲解。
+深度研究助手使用 SSE 实现实时进度展示。SSE 是一种服务器推送技术,允许服务器主动向客户端发送数据,在协议章节也有所讲解。
 
-如图14.10所示,SSE流程包括以下步骤:
+如图 14.10 所示,SSE 流程包括以下步骤:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/14-figures/14-10.png" alt="" width="85%"/>
-  <p>图 14.10 SSE流程</p>
+  <p>图 14.10 SSE 流程</p>
 </div>
 
 <strong>流程说明</strong>:
 
-1. <strong>客户端发起请求</strong>:发送POST请求到`/api/research`,包含研究主题
-2. <strong>服务器建立SSE连接</strong>:返回`text/event-stream`响应
+1. <strong>客户端发起请求</strong>:发送 POST 请求到`/api/research`,包含研究主题
+2. <strong>服务器建立 SSE 连接</strong>:返回`text/event-stream`响应
 3. <strong>服务器推送进度</strong>:定期推送研究进度(规划、执行、报告)
-4. <strong>客户端接收进度</strong>:监听SSE事件,更新UI
+4. <strong>客户端接收进度</strong>:监听 SSE 事件,更新 UI
 5. <strong>研究完成</strong>:服务器推送最终报告,关闭连接
 
-如果想把SSE用于前后端的项目中还需要做如下配置。
+如果想把 SSE 用于前后端的项目中还需要做如下配置。
 
-<strong>后端FastAPI SSE端点</strong>:
+<strong>后端 FastAPI SSE 端点</strong>:
 
 ```python
 from fastapi import FastAPI
@@ -1933,7 +1937,7 @@ async def research(request: ResearchRequest):
     )
 ```
 
-<strong>前端使用EventSource接收SSE</strong>:
+<strong>前端使用 EventSource 接收 SSE</strong>:
 
 ```typescript
 // composables/useResearch.ts
@@ -2034,9 +2038,9 @@ const handleStartResearch = (topic: string) => {
 
 ### 14.6.3 研究结果可视化
 
-研究结果以Markdown格式展示,包含标题、段落、列表、引用等元素。我们使用`marked`库将Markdown转换为HTML,并添加自定义样式。
+研究结果以 Markdown 格式展示,包含标题、段落、列表、引用等元素。我们使用`marked`库将 Markdown 转换为 HTML,并添加自定义样式。
 
-<strong>渲染Markdown</strong>:
+<strong>渲染 Markdown</strong>:
 
 ```typescript
 import { marked } from 'marked'
@@ -2065,30 +2069,30 @@ const renderedHtml = marked(markdownContent.value)
 ......
 ```
 
-通过全屏模态对话框UI、SSE实时进度展示和Markdown结果可视化,我们构建了一个用户友好的前端界面。用户可以清晰地看到研究进度,并以美观的格式查看研究结果。
+通过全屏模态对话框 UI、SSE 实时进度展示和 Markdown 结果可视化,我们构建了一个用户友好的前端界面。用户可以清晰地看到研究进度,并以美观的格式查看研究结果。
 
 ## 14.7 本章小结
 
 在本章中,我们从零开始构建了一个完整的自动化深度研究智能体系统。让我们回顾一下核心要点:
 
-<strong>(1)TODO驱动的研究范式</strong>
+<strong>(1)TODO 驱动的研究范式</strong>
 
-我们提出了一种新的研究范式——TODO驱动的研究。这种范式将复杂的研究主题分解为可执行的子任务,通过三个阶段完成研究:
+我们提出了一种新的研究范式——TODO 驱动的研究。这种范式将复杂的研究主题分解为可执行的子任务,通过三个阶段完成研究:
 
-- <strong>规划阶段</strong>:将研究主题分解为3-5个子任务,每个子任务包含标题、意图和搜索查询
+- <strong>规划阶段</strong>:将研究主题分解为 3-5 个子任务,每个子任务包含标题、意图和搜索查询
 - <strong>执行阶段</strong>:对每个子任务执行搜索和总结,生成结构化的知识
 - <strong>报告阶段</strong>:整合所有子任务的总结,生成最终的研究报告
 
 这种范式的优势在于:
 
 1. <strong>可控性强</strong>:每个子任务都有明确的目标和范围
-2. <strong>质量可靠</strong>:通过专门的Agent保证每个环节的质量
+2. <strong>质量可靠</strong>:通过专门的 Agent 保证每个环节的质量
 3. <strong>易于调试</strong>:可以单独调试每个子任务
 4. <strong>可扩展性好</strong>:可以轻松添加新的子任务或修改现有子任务
 
-<strong>(2)三Agent协作系统</strong>
+<strong>(2)三 Agent 协作系统</strong>
 
-我们设计了三个专门的Agent,各司其职:
+我们设计了三个专门的 Agent,各司其职:
 
 - <strong>TODO Planner(研究规划专家)</strong>:负责将研究主题分解为子任务
 - <strong>Task Summarizer(任务总结专家)</strong>:负责总结每个子任务的搜索结果
@@ -2096,26 +2100,26 @@ const renderedHtml = marked(markdownContent.value)
 
 这种设计的优势在于:
 
-1. <strong>职责清晰</strong>:每个Agent专注于一个特定的任务
-2. <strong>Prompt优化</strong>:可以为每个Agent定制专门的Prompt
-3. <strong>易于维护</strong>:修改一个Agent不会影响其他Agent
-4. <strong>质量保证</strong>:每个Agent都是该领域的"专家"
+1. <strong>职责清晰</strong>:每个 Agent 专注于一个特定的任务
+2. <strong>Prompt 优化</strong>:可以为每个 Agent 定制专门的 Prompt
+3. <strong>易于维护</strong>:修改一个 Agent 不会影响其他 Agent
+4. <strong>质量保证</strong>:每个 Agent 都是该领域的"专家"
 
-<strong>(3)ToolAwareSimpleAgent的设计</strong>
+<strong>(3)ToolAwareSimpleAgent 的设计</strong>
 
-我们扩展了HelloAgents框架的`SimpleAgent`,实现了`ToolAwareSimpleAgent`。这个Agent具有工具调用监听能力,可以:
+我们扩展了 HelloAgents 框架的`SimpleAgent`,实现了`ToolAwareSimpleAgent`。这个 Agent 具有工具调用监听能力,可以:
 
 - <strong>监听工具调用</strong>:通过回调函数监听每次工具调用
 - <strong>实时反馈</strong>:将工具调用信息实时推送给前端
 - <strong>调试支持</strong>:记录所有工具调用,便于调试
 
-这个Agent已经集成到HelloAgents框架中,可以在其他项目中复用。
+这个 Agent 已经集成到 HelloAgents 框架中,可以在其他项目中复用。
 
 <strong>(4)工具系统集成</strong>
 
-我们充分利用了HelloAgents框架的工具系统:
+我们充分利用了 HelloAgents 框架的工具系统:
 
-- <strong>SearchTool</strong>:扩展支持更多种搜索引擎(Tavily、DuckDuckGo、Perplexity等)
+- <strong>SearchTool</strong>:扩展支持更多种搜索引擎(Tavily、DuckDuckGo、Perplexity 等)
 - <strong>NoteTool</strong>:持久化研究进度,支持恢复和审计
 - <strong>ToolRegistry</strong>:统一管理所有工具,支持自定义扩展
 
@@ -2123,11 +2127,11 @@ const renderedHtml = marked(markdownContent.value)
 
 <strong>(5)核心服务实现</strong>
 
-我们实现了四个核心服务,连接Agent和工具:
+我们实现了四个核心服务,连接 Agent 和工具:
 
-- <strong>PlanningService</strong>:调用规划Agent,解析JSON,验证格式
-- <strong>SummarizationService</strong>:调用总结Agent,处理搜索结果,提取来源
-- <strong>ReportingService</strong>:调用报告Agent,整合总结,生成报告
+- <strong>PlanningService</strong>:调用规划 Agent,解析 JSON,验证格式
+- <strong>SummarizationService</strong>:调用总结 Agent,处理搜索结果,提取来源
+- <strong>ReportingService</strong>:调用报告 Agent,整合总结,生成报告
 - <strong>SearchService</strong>:调度搜索引擎,处理结果,错误降级,结果缓存
 
 这些服务各司其职,通过清晰的接口协作,实现了从研究主题到最终报告的自动化流程。
@@ -2137,16 +2141,16 @@ const renderedHtml = marked(markdownContent.value)
 我们设计了用户友好的前端界面:
 
 - <strong>全屏模态对话框</strong>:沉浸式体验,清晰的层次
-- <strong>SSE实时进度</strong>:实时展示研究进度,用户体验良好
-- <strong>Markdown可视化</strong>:美观的格式,清晰的结构
+- <strong>SSE 实时进度</strong>:实时展示研究进度,用户体验良好
+- <strong>Markdown 可视化</strong>:美观的格式,清晰的结构
 
-通过Vue 3 + TypeScript + SSE的技术栈,我们实现了一个现代化的Web应用。
+通过 Vue 3 + TypeScript + SSE 的技术栈,我们实现了一个现代化的 Web 应用。
 
 
 
-这些知识不仅适用于深度研究助手,也可以应用到其他AI应用中。希望读者能够在本章的基础上,探索更多的可能性,构建出更强大的AI系统。
+这些知识不仅适用于深度研究助手,也可以应用到其他 AI 应用中。希望读者能够在本章的基础上,探索更多的可能性,构建出更强大的 AI 系统。
 
-在下一章中,我们将构建一个与游戏引擎结合的多Agent系统——赛博小镇,探索Agent之间的复杂交互和协作模式。敬请期待!
+在下一章中,我们将构建一个与游戏引擎结合的多 Agent 系统——赛博小镇,探索 Agent 之间的复杂交互和协作模式。敬请期待!
 
 
 

+ 1885 - 0
docs/chapter15/Chapter15-Building-Cyber-Town.md

@@ -0,0 +1,1885 @@
+<div align="right">
+  English | <a href="./第十五章%20构建赛博小镇.md">中文</a>
+</div>
+
+# Chapter 15: Building Cyber Town
+
+In this chapter, we will explore a brand new direction: **combining agent technology with game engines to build an AI town full of vitality**.
+
+Do you remember those lifelike NPCs in "The Sims" or "Animal Crossing"? They have their own personalities, memories, and social relationships. The Cyber Town in this chapter will be a similar project, but unlike traditional games, our NPCs have real "intelligence" - they can understand player conversations, remember past interactions, and react differently based on affection levels. The Cyber Town in this chapter includes the following core features:
+
+**(1) Intelligent NPC Dialogue System**: Players can have natural language conversations with NPCs, and NPCs will respond based on their role settings and memories.
+
+**(2) Memory System**: NPCs have short-term and long-term memory, able to remember interaction history with players.
+
+**(3) Affection System**: NPC attitudes towards players change with interactions, from stranger to familiar, from friendly to intimate.
+
+**(4) Gamified Interaction**: Players can move freely in a 2D pixel-style office scene and interact with different NPCs.
+
+**(5) Real-Time Logging System**: All conversations and interactions are recorded for easy debugging and analysis.
+
+## 15.1 Project Overview and Architecture Design
+
+### 15.1.1 Why Build an AI Town
+
+NPCs in traditional games can usually only say fixed lines or have limited interactions through preset dialogue trees. Even in the most complex RPG games, NPC dialogues are pre-written by scriptwriters. This approach is controllable but lacks real "intelligence" and "vitality".
+
+Imagine if NPCs in games could understand anything you say, no longer limited to preset options. You can communicate with NPCs in natural language. NPCs will remember what you said last time, your relationship, and even your preferences. Each NPC has their own profession, personality, and speaking style. NPC attitudes towards you change with interactions, from strangers to friends, even close friends.
+
+This is the new possibility that AI technology brings to games. By combining large language models with game engines, we can create NPCs that are truly "alive". This is not just a technical demonstration, but an exploration of future game forms. In educational games, NPCs can play historical figures and scientists, conducting interactive teaching with students. In virtual offices, NPCs can play colleagues and mentors, providing help and advice. NPCs can also serve as companions, conducting emotional communication with users, applied in mental health fields. Of course, the most direct application is to add AI NPCs to traditional games to enhance player experience.
+
+### 15.1.2 Technical Architecture Overview
+
+Cyber Town adopts a **game engine + back-end service** separation architecture, divided into four layers, as shown in Figure 15.1.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-1.png" alt="" width="85%"/>
+  <p>Figure 15.1 Cyber Town Technical Architecture</p>
+</div>
+
+The front-end layer uses the Godot 4.5 game engine, responsible for game rendering, player control, NPC display, and dialogue UI. Godot is an open-source 2D/3D game engine, very suitable for quickly developing pixel-style games. The back-end layer uses the FastAPI framework, responsible for API routing, NPC state management, dialogue processing, and logging. FastAPI is a modern Python web framework with excellent performance and easy development. The agent layer uses our own HelloAgents framework, responsible for NPC intelligence, memory management, and affection calculation. Each NPC is a SimpleAgent instance with independent memory and state. The external service layer provides LLM capabilities, vector storage, and data persistence, including LLM API, Qdrant vector database, and SQLite relational database.
+
+The data flow process is shown in Figure 15.2:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-2.png" alt="" width="85%"/>
+  <p>Figure 15.2 Data Flow Process</p>
+</div>
+
+Players press the E key in Godot to interact with NPCs, and Godot sends dialogue requests to the FastAPI back-end via HTTP API. The back-end calls HelloAgents' SimpleAgent to process the dialogue, the Agent retrieves relevant history from the memory system, and then calls the LLM to generate a reply. The back-end updates NPC state and affection, records logs to console and file, and finally returns the reply to the Godot front-end. Godot displays the NPC reply and updates the UI, completing a complete interaction loop.
+
+The project structure is as follows, making it easy for you to locate the source code:
+
+```
+Helloagents-AI-Town/
+├── helloagents-ai-town/           # Godot game project
+│   ├── project.godot              # Godot project configuration
+│   ├── scenes/                    # Game scenes
+│   │   ├── main.tscn              # Main scene (office)
+│   │   ├── player.tscn            # Player character
+│   │   ├── npc.tscn               # NPC character
+│   │   └── dialogue_ui.tscn       # Dialogue UI
+│   ├── scripts/                   # GDScript scripts
+│   │   ├── main.gd                # Main scene logic
+│   │   ├── player.gd              # Player control
+│   │   ├── npc.gd                 # NPC behavior
+│   │   ├── dialogue_ui.gd         # Dialogue UI logic
+│   │   ├── api_client.gd          # API client
+│   │   └── config.gd              # Configuration management
+│   └── assets/                    # Game assets
+│       ├── characters/            # Character sprites
+│       ├── interiors/             # Interior scenes
+│       ├── ui/                    # UI materials
+│       └── audio/                 # Sound effects and music
+│
+└── backend/                       # Python back-end
+    ├── main.py                    # FastAPI main program
+    ├── agents.py                  # NPC Agent system
+    ├── relationship_manager.py    # Affection management
+    ├── state_manager.py           # State management
+    ├── logger.py                  # Logging system
+    ├── config.py                  # Configuration management
+    ├── models.py                  # Data models
+    ├── requirements.txt           # Python dependencies
+    └── .env.example               # Environment variable example
+```
+
+Detailed architecture design and data flow will be introduced in subsequent sections.
+
+### 15.1.3 Quick Experience: Run the Project in 5 Minutes
+
+Before diving into implementation details, let's first run the project to see the final result. This way you'll have an intuitive understanding of the entire system.
+
+**Environment Requirements:**
+
+- Godot 4.2 or higher
+- Python 3.10 or higher
+- LLM API key (OpenAI, DeepSeek, Zhipu, etc.)
+
+**Get the Project:**
+
+You can check `code/chapter15/Helloagents-AI-Town`, or clone the complete hello-agents repository from GitHub.
+
+**Start the Back-End:**
+
+```bash
+# 1. Enter backend directory
+cd Helloagents-AI-Town/backend
+
+# 2. Install dependencies
+pip install -r requirements.txt
+
+# 3. Configure environment variables
+cp .env.example .env
+# Edit .env file, fill in your API key
+
+# 4. Start back-end service
+python main.py
+```
+
+After successful startup, you will see the following output:
+
+```
+============================================================
+🎮 Cyber Town back-end service starting...
+============================================================
+✅ All services started!
+📡 API address: http://0.0.0.0:8000
+📚 API documentation: http://0.0.0.0:8000/docs
+============================================================
+```
+
+**Start Godot:**
+
+Godot installation is very simple. Windows provides a direct `.exe` file, and Mac also provides a `.dmg` file. You can download directly from the official website ([Windows](https://godotengine.org/download/windows/) / [Mac](https://godotengine.org/download/macos/))
+
+Open the Godot engine, click the "Import" button, browse to `Helloagents-AI-Town/helloagents-ai-town/project.godot`, and click "Import and Edit". After Godot imports the resources, press `F5` or click the "Run" button to start the game.
+
+**Experience Core Features:**
+
+After the game starts, you will see a pixel-style Datawhale office scene, as shown in Figure 15.3.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-3.png" alt="" width="85%"/>
+  <p>Figure 15.3 Cyber Town Game Scene</p>
+</div>
+
+Use WASD keys to move the player character. When you walk near an NPC, the screen will display a "Press E to interact" prompt. After pressing the E key, a dialogue box will pop up, and you can enter anything you want to say, as shown in Figure 15.4.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-4.png" alt="" width="85%"/>
+  <p>Figure 15.4 Dialogue Interface with NPC</p>
+</div>
+
+NPCs will respond based on their role settings (Python engineer, product manager, UI designer) and your interaction history. As the conversation progresses, the NPC's affection towards you will gradually increase, from "stranger" to "familiar", then to "friendly", "intimate", and even "close friend".
+
+**The affection system is implemented in the back-end**. Each conversation adjusts the affection value based on the player's message content and sentiment analysis. Although the affection value is not directly displayed in the front-end game interface, all affection changes are recorded in detail in the back-end logs. You can view the affection changes for each conversation in the `backend/logs/dialogue_YYYY-MM-DD.log` file. The log file records detailed information for each conversation, including: current affection value, retrieved relevant memories, NPC's reply, affection change amount (+2.0, +3.0, etc.), reason for change (friendly greeting, normal communication, etc.), and sentiment analysis results (positive, neutral, etc.). This design allows developers to clearly track the relationship development between NPCs and players, and also provides a data foundation for adding affection UI to the front-end later.
+
+All conversations are recorded in the back-end log files. You can view them in real-time with the following command:
+
+```bash
+# In the backend directory
+python view_logs.py
+```
+
+This simple experience demonstrates the core features of AI Town. Next, we will dive into how to implement these features.
+
+## 15.2 NPC Agent System
+
+### 15.2.1 SimpleAgent Based on HelloAgents
+
+In Cyber Town, each NPC is an independent agent. We use SimpleAgent from the HelloAgents framework to implement NPC intelligence. SimpleAgent is a lightweight agent implementation that encapsulates core functions such as LLM calls, message management, and tool calls.
+
+Recall the SimpleAgent we learned in Chapter 7. Its core is a simple dialogue loop: receive user message, call LLM to generate reply, return result. In Cyber Town, we need to create a SimpleAgent instance for each NPC and configure unique system prompts for them, giving each NPC different personalities and role settings.
+
+Let's see how to create an NPC Agent. First, we need to define the NPC's basic information, including ID, name, profession, and personality. Then, we build system prompts based on this information, letting the LLM play the role of this NPC. Finally, we create a SimpleAgent instance and configure the memory system.
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.memory import MemoryManager, WorkingMemory, EpisodicMemory
+
+def create_npc_agent(npc_id: str, name: str, role: str, personality: str):
+    """Create NPC Agent"""
+    # Build system prompt
+    system_prompt = f"""You are {name}, a {role}.
+Your personality traits: {personality}
+
+You work in the Datawhale office, working with colleagues to promote the development of the open source community.
+Please have natural conversations with players based on your role and personality.
+Remember your previous conversations to maintain dialogue coherence.
+"""
+
+    # Create LLM instance
+    llm = HelloAgentsLLM()
+
+    # Create memory manager
+    memory_manager = MemoryManager(
+        working_memory=WorkingMemory(capacity=10, ttl_minutes=120),
+        episodic_memory=EpisodicMemory(
+            db_path=f"memory_data/{npc_id}_episodic.db",
+            collection_name=f"{npc_id}_memories"
+        )
+    )
+
+    # Create Agent
+    agent = SimpleAgent(
+        name=name,
+        llm=llm,
+        system_prompt=system_prompt,
+        memory_manager=memory_manager
+    )
+
+    return agent
+```
+
+This code demonstrates how to create an NPC Agent. The system prompt defines the NPC's identity and personality, and the memory manager allows the NPC to remember conversation history with players. WorkingMemory is short-term memory with a capacity of 10 messages and a retention time of 120 minutes. EpisodicMemory is long-term memory, using SQLite database and Qdrant vector database for storage, and can retrieve relevant historical conversations.
+
+The workflow of NPC Agent is shown in Figure 15.5:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-5.png" alt="" width="85%"/>
+  <p>Figure 15.5 NPC Agent Workflow</p>
+</div>
+
+### 15.2.2 NPC Role Settings and Prompt Design
+
+A good NPC needs distinct personality and role settings. In Cyber Town, we designed three NPCs representing different professions and personalities.
+
+**Zhang San - Python Engineer**
+
+Zhang San is a senior Python engineer responsible for the core development of the HelloAgents framework. He has a rigorous personality, speaks directly, and likes to use technical terms. He has high requirements for code quality and often shares programming tips and best practices.
+
+```python
+npc_zhang = {
+    "npc_id": "zhang_san",
+    "name": "Zhang San",
+    "role": "Python Engineer",
+    "personality": "Rigorous, professional, likes to share technical knowledge. Speaks directly, focuses on code quality."
+}
+```
+
+**Li Si - Product Manager**
+
+Li Si is an experienced product manager responsible for product planning and user experience design of the HelloAgents framework. He has an outgoing personality, is good at communication, and can always think from the user's perspective. He likes to discuss product design and user needs, and often asks "why".
+
+```python
+npc_li = {
+    "npc_id": "li_si",
+    "name": "Li Si",
+    "role": "Product Manager",
+    "personality": "Outgoing, good at communication, focuses on user experience. Likes to think from the user's perspective."
+}
+```
+
+**Wang Wu - UI Designer**
+
+Wang Wu is a creative UI designer responsible for interface design and visual presentation of the HelloAgents framework. He has a gentle personality, unique aesthetics, and keen perception of color and layout. He likes to discuss design concepts and aesthetics, and often shares design inspiration.
+
+```python
+npc_wang = {
+    "npc_id": "wang_wu",
+    "name": "Wang Wu",
+    "role": "UI Designer",
+    "personality": "Gentle, creative, unique aesthetics. Focuses on visual presentation and user experience."
+}
+```
+
+These three NPCs have distinct characteristics. Players can choose to interact with different NPCs based on their interests. Zhang San can teach you programming skills, Li Si can discuss product design with you, and Wang Wu can share design inspiration.
+
+### 15.2.3 Memory System Integration
+
+The memory system is the key to NPC intelligence. An NPC that can remember past conversations will make players feel more realistic and interesting. We use HelloAgents' `WorkingMemory` and `EpisodicMemory` to construct short-term and long-term memory.
+
+Short-term memory stores recent conversation content with limited capacity and automatic cleanup over time. Its role is to maintain dialogue coherence, allowing NPCs to understand context. For example, when a player says "What color is it?", the NPC needs to find from short-term memory what "it" refers to.
+
+Long-term memory stores all conversation history, using vector databases for semantic retrieval. When a player mentions a topic, the NPC can retrieve relevant historical conversations from long-term memory, recalling previously discussed content. For example, when a player says "Do you remember the project we discussed last time?", the NPC can find relevant conversation records from long-term memory.
+
+The architecture of the memory system is shown in Figure 15.6:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-6.png" alt="" width="85%"/>
+  <p>Figure 15.6 Memory System Architecture</p>
+</div>
+
+In actual use, the Agent first obtains recent conversations from short-term memory, then retrieves relevant historical conversations from long-term memory, sends this information together to the LLM, and generates more accurate and personalized replies.
+
+```python
+# Agent's dialogue processing flow
+def process_dialogue(agent, player_message):
+    # 1. Get recent conversations from short-term memory
+    recent_messages = agent.memory_manager.working_memory.get_recent_messages(5)
+
+    # 2. Retrieve relevant history from long-term memory
+    relevant_memories = agent.memory_manager.episodic_memory.search(
+        query=player_message,
+        top_k=3
+    )
+
+    # 3. Build context
+    context = {
+        "recent": recent_messages,
+        "relevant": relevant_memories
+    }
+
+    # 4. Call Agent to generate reply
+    reply = agent.run(player_message, context=context)
+
+    # 5. Save to memory system
+    agent.memory_manager.add_interaction(player_message, reply)
+
+    return reply
+```
+
+This process ensures that NPCs can remember interaction history with players and reflect it in conversations.
+
+### 15.2.4 Batch Dialogue Generation: Light Load Mode
+
+In actual operation, a problem was quickly discovered: when multiple players simultaneously converse with different NPCs, the back-end needs to concurrently process multiple LLM requests. Each request needs to call the API, which not only increases costs but may also cause request failures or delays due to concurrency limits.
+
+To solve this problem, we designed a **batch dialogue generation system**. The core idea is: merge multiple NPC dialogue requests into one LLM call, letting the LLM generate all NPC replies at once. This is like a restaurant's "pre-made dishes" - prepared in batches in advance, used directly when needed, greatly reducing costs and latency.
+
+The workflow of batch generation is shown in Figure 15.7:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-7.png" alt="" width="85%"/>
+  <p>Figure 15.7 Batch Generation vs Traditional Mode</p>
+</div>
+
+The implementation of the batch generator is very clever. We build a special prompt requiring the LLM to generate all NPC dialogues at once and return them in JSON format. This way, one API call can obtain all NPC replies, reducing costs to 1/3 of the original and significantly reducing latency.
+
+```python
+class NPCBatchGenerator:
+    """Generator for batch generating NPC dialogues"""
+
+    def __init__(self):
+        self.llm = HelloAgentsLLM()
+        self.npc_configs = NPC_ROLES  # All NPC configurations
+
+    def generate_batch_dialogues(self, context: Optional[str] = None) -> Dict[str, str]:
+        """Batch generate dialogues for all NPCs
+
+        Args:
+            context: Scene context (such as "morning work time", "lunch time", etc.)
+
+        Returns:
+            Dict[str, str]: Mapping from NPC names to dialogue content
+        """
+        # Build batch generation prompt
+        prompt = self._build_batch_prompt(context)
+
+        # One LLM call generates all dialogues
+        response = self.llm.invoke([
+            {"role": "system", "content": "You are a game NPC dialogue generator, skilled at creating natural and realistic office dialogues."},
+            {"role": "user", "content": prompt}
+        ])
+
+        # Parse JSON response
+        dialogues = json.loads(response)
+        # Return format: {"Zhang San": "...", "Li Si": "...", "Wang Wu": "..."}
+
+        return dialogues
+
+    def _build_batch_prompt(self, context: Optional[str] = None) -> str:
+        """Build batch generation prompt"""
+        # Automatically infer scene based on time
+        if context is None:
+            context = self._get_current_context()
+
+        # Build NPC descriptions
+        npc_descriptions = []
+        for name, cfg in self.npc_configs.items():
+            desc = f"- {name}({cfg['title']}): {cfg['activity']} at {cfg['location']}, personality {cfg['personality']}"
+            npc_descriptions.append(desc)
+
+        npc_desc_text = "\n".join(npc_descriptions)
+
+        prompt = f"""Please generate current dialogues or behavior descriptions for 3 NPCs in the Datawhale office.
+
+【Scene】{context}
+
+【NPC Information】
+{npc_desc_text}
+
+【Generation Requirements】
+1. Generate 1 sentence for each NPC (20-40 characters)
+2. Content should match role settings, current activities, and scene atmosphere
+3. Can be self-talk, work status description, or simple thoughts
+4. Should be natural and realistic, like real office colleagues
+5. **Must strictly return in JSON format**
+
+【Output Format】(strictly follow)
+{{"Zhang San": "...", "Li Si": "...", "Wang Wu": "..."}}
+
+【Example Output】
+{{"Zhang San": "This bug is really annoying, been debugging for two hours...", "Li Si": "Hmm, the priority of this feature needs to be re-evaluated.", "Wang Wu": "The latte art on this coffee is really nice, inspiration is coming!"}}
+
+Please generate (only return JSON, no other content):
+"""
+        return prompt
+```
+
+The key to this design is the construction of the prompt. We explicitly require the LLM to return JSON format and provide example output. The LLM will strictly generate replies according to this format, and we only need to parse the JSON to obtain all NPC dialogues.
+
+Batch generation has an additional benefit: all NPC dialogues are generated in the same context, so they have a certain degree of correlation. For example, if Zhang San is debugging a bug, Li Si might mention helping to take a look; if Wang Wu is designing an interface, Zhang San might say he'll check the design draft later. This makes the atmosphere of the entire office more realistic and coherent.
+
+Of course, batch generation also has some limitations. It is more suitable for generating NPC "background dialogues" or "self-talk" rather than direct interactions with players. For player-initiated conversations, we still use individual Agents to process them to ensure personalized and accurate replies. Batch generation is mainly used in the following scenarios:
+
+1. **NPC background dialogues**: What NPCs are doing and saying when players enter the scene
+2. **Timed updates**: Update NPC status and dialogues at regular intervals
+3. **Scene atmosphere**: Generate different dialogues based on time (morning, noon, evening)
+4. **Cost reduction**: Use batch generation to reduce API call frequency in high-concurrency scenarios
+
+**Hybrid Mode: Batch Generation + Instant Response**
+
+In actual implementation, we adopted a hybrid mode that combines batch generation and instant response. This design is very clever, ensuring both efficiency and interaction quality.
+
+Specifically, the system periodically runs batch generation in the background, generating "background dialogues" for all NPCs in the current scene. These dialogues are cached, and when players approach NPCs but haven't initiated interaction yet, NPCs will display these background dialogues, such as "Debugging code...", "Reading product documentation...", etc. This makes NPCs appear "alive" rather than static models.
+
+However, when a player presses the E key to initiate interaction, the system immediately switches to instant response mode. At this point, the back-end calls the NPC's dedicated Agent, generating personalized replies based on the player's specific message, historical memory, and affection level. This process is real-time, ensuring that NPC replies are highly relevant to player input.
+
+```python
+# Hybrid mode implementation in main.py
+@app.post("/dialogue")
+async def dialogue(request: DialogueRequest):
+    """Handle player-NPC dialogue (instant response mode)"""
+    npc_id = request.npc_id
+    player_message = request.player_message
+    player_name = request.player_name
+
+    # Get NPC Agent (each NPC has an independent Agent)
+    agent = npc_agents.get(npc_id)
+    if not agent:
+        raise HTTPException(status_code=404, detail="NPC not found")
+
+    # Instantly generate personalized reply
+    # Here we don't use batch generation, but call Agent's run method
+    reply = agent.run(player_message)
+
+    # Update affection
+    affinity_change = relationship_manager.update_affinity(
+        npc_id, player_name, player_message, reply
+    )
+
+    return {
+        "npc_reply": reply,
+        "affinity_score": affinity_change["score"],
+        "affinity_level": affinity_change["level"]
+    }
+
+# Background task: periodically batch generate background dialogues
+async def background_dialogue_update():
+    """Background task: update NPC background dialogues every 5 minutes"""
+    while True:
+        try:
+            # Use batch generator to generate background dialogues for all NPCs
+            batch_generator = get_batch_generator()
+            dialogues = batch_generator.generate_batch_dialogues()
+
+            # Update to state manager
+            for npc_name, dialogue in dialogues.items():
+                state_manager.update_npc_background_dialogue(npc_name, dialogue)
+
+            print(f"✅ Background dialogue update complete: {len(dialogues)} NPCs")
+        except Exception as e:
+            print(f"❌ Background dialogue update failed: {e}")
+
+        # Wait 5 minutes
+        await asyncio.sleep(300)
+```
+
+The advantages of this hybrid mode are very obvious:
+
+1. **Cost reduction**: Background dialogues use batch generation, one call generates all NPC dialogues, low cost
+2. **Quality assurance**: Player interactions use instant response, each reply is personalized, high quality
+3. **Enhanced experience**: NPCs always have "background dialogues", appearing very lively; player interactions have accurate replies, good experience
+4. **Flexible adjustment**: Can dynamically adjust batch generation frequency based on server load
+
+Through the combination of batch generation and instant response, we implemented an NPC system that is both efficient and intelligent. Under normal circumstances, players don't feel any difference, but back-end costs and performance are significantly optimized. This design approach can also be applied to other scenarios requiring a large number of AI calls.
+
+## 15.3 Affection System Design
+
+### 15.3.1 Affection Level Classification
+
+In Cyber Town, NPC attitudes towards players change with interactions. We designed a five-level affection system, from stranger to close friend, with each level having different score ranges and corresponding behavioral performances.
+
+The core idea of the affection system is: by quantifying the relationship between NPCs and players, make NPC replies more realistic and layered. When players first enter the game, all NPCs have a stranger attitude towards players, with replies being polite but distant. As conversations progress, if players behave friendly, NPC affection will gradually increase, and replies will become more cordial and detailed.
+
+We divide affection into five levels, each corresponding to a score range, as shown in Figure 15.8:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-8.png" alt="" width="85%"/>
+  <p>Figure 15.8 Affection Level Classification</p>
+</div>
+
+- **Stranger (0-20 points)**: NPC just met the player, attitude is polite but maintains distance. Replies are brief, won't actively share personal information.
+
+- **Familiar (21-40 points)**: NPC starts to remember the player, willing to have simple exchanges. Replies become more natural, occasionally sharing some work-related information.
+
+- **Friendly (41-60 points)**: NPC treats the player as a friend, willing to share more information. Replies are more detailed, will actively ask about the player's situation.
+
+- **Intimate (61-80 points)**: NPC trusts the player very much, willing to share private topics. Replies are full of enthusiasm, will provide help and advice to the player.
+
+- **Close Friend (81-100 points)**: NPC treats the player as the best friend, talks about everything. Replies are very cordial, will share inner thoughts and feelings.
+
+This design allows players to clearly feel the change in their relationship with NPCs, and also provides a foundation for subsequent gameplay. For example, only after reaching a certain affection level will NPCs share certain special information or provide special tasks.
+
+### 15.3.2 Affection Calculation Logic
+
+Affection calculation needs to consider multiple factors. We can't simply add a fixed score for each conversation, which would make the system appear mechanical and unrealistic. A good affection system should be able to identify the player's attitude and dynamically adjust scores based on conversation content.
+
+In Cyber Town, we use LLM to analyze conversation content, judging whether the player's attitude is friendly, neutral, or unfriendly. Then we adjust the affection score based on the judgment result. This process is automatic, players don't need to deliberately choose options, making interactions more natural.
+
+The affection calculation process is shown in Figure 15.9:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-9.png" alt="" width="85%"/>
+  <p>Figure 15.9 Affection Calculation Process</p>
+</div>
+
+```python
+class RelationshipManager:
+    """Affection manager"""
+
+    def __init__(self):
+        self.affinity_data = {}  # Store affection data
+        self.llm = HelloAgentsLLM()  # For analyzing conversations
+
+    def analyze_sentiment(self, player_message: str, npc_reply: str) -> int:
+        """Analyze conversation sentiment, return affection change value"""
+        prompt = f"""Analyze the player's attitude in the following conversation:
+Player: {player_message}
+NPC: {npc_reply}
+
+Please judge if the player's attitude is:
+1. Friendly (+5 points): Polite, enthusiastic, expressing thanks or agreement
+2. Neutral (+2 points): Normal inquiry or statement
+3. Unfriendly (-3 points): Rude, indifferent, critical or negative
+
+Only return the number, no other content."""
+
+        response = self.llm.think([{"role": "user", "content": prompt}])
+        try:
+            score_change = int(response.strip())
+            return max(-3, min(5, score_change))  # Limit between -3 and 5
+        except:
+            return 2  # Default neutral
+
+    def update_affinity(self, npc_id: str, player_name: str,
+                       player_message: str, npc_reply: str) -> dict:
+        """Update affection"""
+        key = f"{npc_id}_{player_name}"
+
+        # Get current affection
+        if key not in self.affinity_data:
+            self.affinity_data[key] = {
+                "score": 0,
+                "level": "Stranger",
+                "interaction_count": 0
+            }
+
+        # Analyze conversation sentiment
+        score_change = self.analyze_sentiment(player_message, npc_reply)
+
+        # Update score
+        current_score = self.affinity_data[key]["score"]
+        new_score = max(0, min(100, current_score + score_change))
+
+        # Update level
+        level = self.get_affinity_level(new_score)
+
+        # Update data
+        self.affinity_data[key].update({
+            "score": new_score,
+            "level": level,
+            "interaction_count": self.affinity_data[key]["interaction_count"] + 1
+        })
+
+        return self.affinity_data[key]
+
+    def get_affinity_level(self, score: int) -> str:
+        """Get affection level based on score"""
+        if score <= 20:
+            return "Stranger"
+        elif score <= 40:
+            return "Familiar"
+        elif score <= 60:
+            return "Friendly"
+        elif score <= 80:
+            return "Intimate"
+        else:
+            return "Close Friend"
+```
+
+This implementation uses LLM to analyze conversation content, automatically judging the player's attitude and adjusting affection. This design makes the affection system more intelligent and natural, players don't need to deliberately please NPCs, just communicate normally.
+
+### 15.3.3 Affection Affects Dialogue
+
+Affection is not just a number, it should truly affect NPC behavior. In Cyber Town, we modify NPC system prompts to let NPCs adjust reply styles based on current affection levels.
+
+When affection is low, NPCs maintain a polite but distant attitude. When affection increases, NPCs become more enthusiastic and talkative. This change is achieved by dynamically adjusting system prompts.
+
+```python
+def create_npc_agent_with_affinity(npc_id: str, name: str, role: str,
+                                   personality: str, affinity_level: str):
+    """Create NPC Agent with affection"""
+
+    # Adjust prompts based on affection level
+    affinity_prompts = {
+        "Stranger": "You just met this player, be polite but not overly enthusiastic. Keep replies brief and professional.",
+        "Familiar": "You already know this player, can have normal exchanges. Replies should be natural and friendly.",
+        "Friendly": "You treat this player as a friend, willing to share more information. Replies should be detailed and enthusiastic.",
+        "Intimate": "You trust this player very much, can share private topics. Replies should be full of care.",
+        "Close Friend": "You treat this player as your best friend, talk about everything. Replies should be cordial and sincere."
+    }
+
+    system_prompt = f"""You are {name}, a {role}.
+Your personality traits: {personality}
+
+Current relationship with player: {affinity_level}
+{affinity_prompts.get(affinity_level, affinity_prompts["Stranger"])}
+
+You work in the Datawhale office, working with colleagues to promote the development of the open source community.
+Please reply naturally based on your role, personality, and relationship with the player.
+"""
+
+    # Create Agent
+    llm = HelloAgentsLLM()
+    agent = SimpleAgent(
+        name=name,
+        llm=llm,
+        system_prompt=system_prompt
+    )
+
+    return agent
+```
+
+This design makes NPC behavior change dynamically with affection. Players can clearly feel that as interactions increase, NPC attitudes towards them are gradually changing, greatly enhancing the game's immersion and fun.
+
+## 15.4 Back-End Service Implementation
+
+### 15.4.1 FastAPI Application Structure
+
+The back-end of Cyber Town is built using the FastAPI framework, responsible for handling requests from the Godot front-end, calling HelloAgents' NPC Agents, managing NPC state and affection, and recording logs. A clear application structure makes code easier to maintain and extend.
+
+Our FastAPI application adopts a modular design, separating different functions into different files, as shown in Figure 15.10:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-10.png" alt="" width="85%"/>
+  <p>Figure 15.10 Back-End Application Structure</p>
+</div>
+
+Let's start with `main.py`, the entry file for the FastAPI application:
+
+```python
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel, Field
+from typing import Optional
+import uvicorn
+
+from agents import NPCAgentManager
+from relationship_manager import RelationshipManager
+from state_manager import StateManager
+from logger import DialogueLogger
+from config import settings
+
+# Create FastAPI application
+app = FastAPI(
+    title="Cyber Town Back-End Service",
+    description="AI NPC dialogue system based on HelloAgents",
+    version="1.0.0"
+)
+
+# Configure CORS, allow Godot front-end access
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],  # Production environment should limit specific domains
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# Initialize各个managers
+agent_manager = NPCAgentManager()
+relationship_manager = RelationshipManager()
+state_manager = StateManager()
+dialogue_logger = DialogueLogger()
+
+@app.on_event("startup")
+async def startup_event():
+    """Initialization on application startup"""
+    print("=" * 60)
+    print("🎮 Cyber Town back-end service starting...")
+    print("=" * 60)
+
+    # Initialize NPC Agents
+    agent_manager.initialize_npcs()
+    print("✅ NPC Agents initialized")
+
+    # Initialize state manager
+    state_manager.initialize_npcs()
+    print("✅ State manager initialized")
+
+@app.get("/")
+async def root():
+    """Health check"""
+    return {
+        "status": "running",
+        "message": "Cyber Town back-end service is running",
+        "version": "1.0.0",
+        "npcs": state_manager.get_npc_count()
+    }
+
+if __name__ == "__main__":
+    uvicorn.run(
+        app,
+        host=settings.HOST,
+        port=settings.PORT,
+        log_level="info"
+    )
+```
+
+This main program file defines the basic structure of the FastAPI application, configures CORS middleware to allow cross-origin requests, and initializes各个managers on startup. Next we will implement specific API routes.
+
+### 15.4.2 API Route Design
+
+The back-end of Cyber Town needs to provide several core API endpoints to handle requests from the Godot front-end. We add these routes to `main.py`.
+
+**Get NPC Status**
+
+This API returns the current status of all NPCs, including location, whether busy, etc.:
+
+```python
+from models import NPCStatusResponse
+
+@app.get("/npcs/status", response_model=NPCStatusResponse)
+async def get_npc_status():
+    """Get status of all NPCs"""
+    npcs = state_manager.get_all_npc_states()
+    return {"npcs": npcs}
+
+@app.get("/npcs/{npc_id}/status")
+async def get_single_npc_status(npc_id: str):
+    """Get status of a single NPC"""
+    npc = state_manager.get_npc_state(npc_id)
+    if not npc:
+        raise HTTPException(status_code=404, detail=f"NPC {npc_id} does not exist")
+    return npc
+```
+
+**Dialogue Interface**
+
+This is the most core API, handling player-NPC conversations:
+
+```python
+from models import DialogueRequest, DialogueResponse
+
+@app.post("/dialogue", response_model=DialogueResponse)
+async def dialogue(request: DialogueRequest):
+    """Handle player-NPC dialogue"""
+    # 1. Verify NPC exists
+    if not agent_manager.has_npc(request.npc_id):
+        raise HTTPException(status_code=404, detail=f"NPC {request.npc_id} does not exist")
+
+    # 2. Check if NPC is busy
+    if state_manager.is_npc_busy(request.npc_id):
+        raise HTTPException(status_code=409, detail=f"NPC {request.npc_id} is talking with another player")
+
+    # 3. Mark NPC as busy
+    state_manager.set_npc_busy(request.npc_id, True)
+
+    try:
+        # 4. Get current affection
+        affinity_info = relationship_manager.get_affinity(
+            request.npc_id,
+            request.player_name
+        )
+
+        # 5. Call Agent to generate reply
+        agent = agent_manager.get_agent(request.npc_id, affinity_info["level"])
+        reply = agent.run(request.player_message)
+
+        # 6. Update affection
+        new_affinity = relationship_manager.update_affinity(
+            request.npc_id,
+            request.player_name,
+            request.player_message,
+            reply
+        )
+
+        # 7. Record log
+        dialogue_logger.log_dialogue(
+            npc_id=request.npc_id,
+            player_name=request.player_name,
+            player_message=request.player_message,
+            npc_reply=reply,
+            affinity_info=new_affinity
+        )
+
+        # 8. Return reply
+        return DialogueResponse(
+            npc_reply=reply,
+            affinity_level=new_affinity["level"],
+            affinity_score=new_affinity["score"]
+        )
+
+    except Exception as e:
+        dialogue_logger.log_error(f"Dialogue processing failed: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Dialogue processing failed: {str(e)}")
+
+    finally:
+        # 9. Release NPC status
+        state_manager.set_npc_busy(request.npc_id, False)
+```
+
+**Affection Query**
+
+This API allows querying player-NPC affection:
+
+```python
+from models import AffinityInfo
+
+@app.get("/affinity/{npc_id}/{player_name}", response_model=AffinityInfo)
+async def get_affinity(npc_id: str, player_name: str):
+    """Get player-NPC affection"""
+    if not agent_manager.has_npc(npc_id):
+        raise HTTPException(status_code=404, detail=f"NPC {npc_id} does not exist")
+
+    affinity = relationship_manager.get_affinity(npc_id, player_name)
+    return affinity
+```
+
+The API route call flow is shown in Figure 15.11:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-11.png" alt="" width="85%"/>
+  <p>Figure 15.11 API Call Flow</p>
+</div>
+
+### 15.4.3 State Management and Logging System
+
+**State Manager**
+
+The state manager is responsible for tracking the current state of each NPC, including location, whether busy, current action, etc. This is important for preventing concurrency issues, such as avoiding an NPC talking with multiple players simultaneously.
+
+```python
+# state_manager.py
+from typing import Dict, List, Optional
+from datetime import datetime
+
+class StateManager:
+    """NPC state manager"""
+
+    def __init__(self):
+        self.npc_states: Dict[str, dict] = {}
+
+    def initialize_npcs(self):
+        """Initialize NPC states"""
+        npcs = [
+            {
+                "npc_id": "zhang_san",
+                "name": "Zhang San",
+                "role": "Python Engineer",
+                "position": {"x": 300, "y": 200}
+            },
+            {
+                "npc_id": "li_si",
+                "name": "Li Si",
+                "role": "Product Manager",
+                "position": {"x": 500, "y": 200}
+            },
+            {
+                "npc_id": "wang_wu",
+                "name": "Wang Wu",
+                "role": "UI Designer",
+                "position": {"x": 700, "y": 200}
+            }
+        ]
+
+        for npc in npcs:
+            self.npc_states[npc["npc_id"]] = {
+                **npc,
+                "is_busy": False,
+                "current_action": "idle",
+                "last_interaction": None
+            }
+
+    def get_npc_state(self, npc_id: str) -> Optional[dict]:
+        """Get NPC state"""
+        return self.npc_states.get(npc_id)
+
+    def get_all_npc_states(self) -> List[dict]:
+        """Get all NPC states"""
+        return list(self.npc_states.values())
+
+    def is_npc_busy(self, npc_id: str) -> bool:
+        """Check if NPC is busy"""
+        npc = self.npc_states.get(npc_id)
+        return npc["is_busy"] if npc else False
+
+    def set_npc_busy(self, npc_id: str, busy: bool):
+        """Set NPC busy status"""
+        if npc_id in self.npc_states:
+            self.npc_states[npc_id]["is_busy"] = busy
+            if busy:
+                self.npc_states[npc_id]["last_interaction"] = datetime.now().isoformat()
+
+    def get_npc_count(self) -> int:
+        """Get NPC count"""
+        return len(self.npc_states)
+```
+
+**Logging System**
+
+The logging system implements dual output: console and file. This makes it convenient to view in real-time and save historical records.
+
+```python
+# logger.py
+import logging
+from datetime import datetime
+from pathlib import Path
+
+class DialogueLogger:
+    """Dialogue logger"""
+
+    def __init__(self, log_dir: str = "logs"):
+        self.log_dir = Path(log_dir)
+        self.log_dir.mkdir(exist_ok=True)
+
+        # Create log file name (by date)
+        today = datetime.now().strftime("%Y-%m-%d")
+        log_file = self.log_dir / f"dialogue_{today}.log"
+
+        # Configure logging
+        self.logger = logging.getLogger("DialogueLogger")
+        self.logger.setLevel(logging.INFO)
+
+        # Console handler
+        console_handler = logging.StreamHandler()
+        console_handler.setLevel(logging.INFO)
+        console_formatter = logging.Formatter(
+            '%(asctime)s - %(levelname)s - %(message)s',
+            datefmt='%H:%M:%S'
+        )
+        console_handler.setFormatter(console_formatter)
+
+        # File handler
+        file_handler = logging.FileHandler(log_file, encoding='utf-8')
+        file_handler.setLevel(logging.INFO)
+        file_formatter = logging.Formatter(
+            '%(asctime)s - %(levelname)s - %(message)s',
+            datefmt='%Y-%m-%d %H:%M:%S'
+        )
+        file_handler.setFormatter(file_formatter)
+
+        # Add handlers
+        self.logger.addHandler(console_handler)
+        self.logger.addHandler(file_handler)
+
+    def log_dialogue(self, npc_id: str, player_name: str,
+                    player_message: str, npc_reply: str,
+                    affinity_info: dict):
+        """Log dialogue"""
+        log_message = f"""
+{'='*60}
+NPC: {npc_id}
+Player: {player_name}
+Player message: {player_message}
+NPC reply: {npc_reply}
+Affection: {affinity_info['level']} ({affinity_info['score']}/100)
+Interaction count: {affinity_info['interaction_count']}
+{'='*60}
+"""
+        self.logger.info(log_message)
+
+    def log_error(self, error_message: str):
+        """Log error"""
+        self.logger.error(error_message)
+```
+
+This logging system displays dialogue content in real-time on the console while saving it to files. Each day's logs are saved in separate files for easy subsequent analysis.
+
+### 15.4.4 Understanding Godot's Scene System
+
+Before starting to build game scenes, we need to first understand Godot's core concepts - Scene and Node. This is the biggest difference between Godot and other game engines, and also one of its most powerful features.
+
+**What is a Node?**
+
+Nodes are the most basic building blocks in Godot. You can think of nodes as Lego bricks, each node has a specific function. For example, Sprite2D nodes are used to display images, AudioStreamPlayer nodes are used to play audio, and CharacterBody2D nodes are used to handle character physics movement. Godot provides hundreds of different types of nodes, each focusing on doing one thing well.
+
+Nodes can form parent-child relationships, forming a tree structure. Parent nodes can affect child nodes, for example, moving a parent node will simultaneously move all child nodes, hiding a parent node will simultaneously hide all child nodes. This hierarchical relationship allows us to easily organize and manage complex game objects.
+
+**What is a Scene?**
+
+A scene is a collection of nodes, saved in a .tscn file. You can think of a scene as a "prefab". For example, we can create a "player" scene containing all related nodes such as character sprites, collision bodies, sound effects, etc. Then use this scene multiple times in the game, each use will create an independent instance.
+
+The power of scenes lies in their reusability and modularity. We can instantiate one scene within another scene, forming nested structures. For example, the main scene can contain player scenes, multiple NPC scenes, and UI scenes. Modifying the NPC scene will automatically affect all NPC instances, greatly simplifying development and maintenance.
+
+**A Simple Example**
+
+Let's use a simple example to understand scenes and nodes. Suppose we want to create a "player" scene:
+
+```
+Player (CharacterBody2D)  ← Root node, responsible for physics movement
+├─ AnimatedSprite2D       ← Child node, displays character animation
+├─ CollisionShape2D       ← Child node, defines collision shape
+└─ Camera2D               ← Child node, camera follows player
+```
+
+This scene contains 4 nodes forming a tree structure. CharacterBody2D is the root node, the other three are its child nodes. We can add scripts to each node to control its behavior, or add a script to the root node to coordinate all child nodes.
+
+When we instantiate this Player scene in the main scene, Godot creates a copy of this entire node tree. We can create multiple player instances, each instance is independent with its own position, state, and behavior.
+
+**Advantages of Scene Instantiation**
+
+In Cyber Town, we have three NPCs: Zhang San, Li Si, and Wang Wu. Without using the scene system, we would need to create nodes, set properties, and write scripts for each NPC separately, leading to a lot of repetitive work. Using the scene system, we only need to create a generic NPC scene, then instantiate it three times, setting different names and role information through script parameters.
+
+The benefit of this design is: if we want to add a new feature to all NPCs (such as displaying dialogue bubbles above their heads), we only need to modify the NPC scene, and all instances will automatically get this feature.
+
+## 15.5 Godot Game Scene Construction
+
+**Why Choose Godot as the Game Engine?**
+
+Among many game engines, we chose Godot 4.5 as the front-end engine, mainly based on the following considerations:
+
+(1) **Godot has natural advantages in 2D game development**. Cyber Town is a top-down 2D pixel-style game. Godot's 2D engine is very mature, providing node types specifically designed for 2D games such as TileMap, AnimatedSprite2D, CharacterBody2D, etc. Development efficiency is much higher than engines like Unity. Godot's Scene System allows us to encapsulate elements like players, NPCs, and UI into independent scenes, then instantiate them in the main scene. This component-based design is very suitable for our needs.
+
+(2) **Godot is completely open source and free**. Godot uses the MIT license, with no royalty fees or revenue sharing, which is very friendly for teaching projects and open source projects. You can freely modify the engine source code and commercialize games without worrying about licensing issues. In contrast, although Unity is powerful, it introduced a runtime fee policy in 2024, causing widespread controversy in the developer community.
+
+(3) **Godot has an extremely low learning cost**. Godot uses GDScript as its main scripting language, a dynamically typed language similar to Python with concise and easy-to-understand syntax and a very gentle learning curve. For readers already familiar with Python, learning GDScript has almost no barrier - variable declarations, function definitions, control flow, and other syntax are highly similar to Python. You can even start writing game scripts within a few hours. Godot's node tree structure is also very intuitive, you can visually see the scene's hierarchical relationships in the editor, which is very friendly for beginners.
+
+(4) **Godot integrates very simply with Python back-ends**. Godot has a built-in HTTPRequest node that can easily communicate with FastAPI back-ends via HTTP. We only need to create an API client script encapsulating all API calls to invoke back-end AI capabilities in the game. This front-end and back-end separation architecture allows us to independently develop and test game logic and AI logic, greatly improving development efficiency.
+
+Of course, Godot also has some limitations. For example, Godot's 3D capabilities still lag behind Unreal Engine and Unity. If you want to develop large-scale 3D games, you may need to consider other engines. But for 2D games, indie games, and teaching projects, Godot is an excellent choice.
+
+### 15.5.1 Scene Design and Resource Organization
+
+After understanding Godot's scene system, let's look at Cyber Town's scene design. The entire game consists of four core scenes: Main (main scene), Player (player), NPC (non-player character), and DialogueUI (dialogue interface). Each scene is an independent module that can be edited and tested separately, then combined to form a complete game.
+
+Cyber Town's scene organization adopts a modular design. We first create three basic scenes: Player (player), NPC (non-player character), and DialogueUI (dialogue interface). Then in Main (main scene), we instantiate and combine these scenes. It's particularly worth noting that the three NPCs (Zhang San, Li Si, Wang Wu) are all instances of the same NPC scene, just with different role information set through script parameters.
+
+Let's first look at the structure of the four core scenes, as shown in Figure 15.12:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-12.png" alt="" width="85%"/>
+  <p>Figure 15.12 Four Core Scenes of Cyber Town</p>
+</div>
+
+This diagram shows four independent scenes and their internal structures. **Scene 1 (Main)** is the main scene, containing background image (Sprite2D), player instance, NPCs organization node (with three NPC instances below), dialogue interface instance, walls organization node, and background music. Note that Player, NPC_Zhang, NPC_Li, NPC_Wang, and DialogueUI here are scene instances, not ordinary nodes. **Scene 2 (Player)** defines the player character structure, including animation, collision, camera, and two sound effect nodes. **Scene 3 (NPC)** is a generic template - Zhang San, Li Si, and Wang Wu are all instances of this scene, containing collision, animation, interaction area, and two labels. **Scene 4 (DialogueUI)** is a CanvasLayer node containing Panel and various UI elements.
+
+The scene instantiation process can be understood this way: We created the NPC.tscn scene file in the Godot editor, defining the NPC's node structure. Then in the Main scene, we "instantiated" this NPC scene three times, creating three independent copies named NPC_Zhang, NPC_Li, and NPC_Wang respectively. Each copy has its own position and state, but they share the same node structure. If we modify NPC.tscn, such as adding a new sound effect node to the NPC, all three instances will automatically get this sound effect.
+
+The steps to create these scenes in Godot are as follows:
+
+1. **Create Player scene**: Create new scene, select CharacterBody2D as root node, add AnimatedSprite2D, CollisionShape2D, Camera2D, InteractSound, and RunningSound child nodes, save as Player.tscn.
+
+2. **Create NPC scene**: Create new scene, select CharacterBody2D as root node, add CollisionShape2D, AnimatedSprite2D, InteractionArea (Area2D with CollisionShape2D below), NameLabel, and DialogueLabel child nodes, save as NPC.tscn.
+
+3. **Create DialogueUI scene**: Create new scene, select CanvasLayer as root node, add Panel child node, under Panel add NPCName, NPCTitle, DialogueText (RichTextLabel), PlayerInput (LineEdit), SendButton, and CloseButton, save as DialogueUI.tscn.
+
+4. **Create Main scene**: Create new scene, select Node2D as root node, add Background (Sprite2D) as background image, under Background add whale decoration, then instantiate Player scene, create NPCs node and instantiate NPC scene three times below it, instantiate DialogueUI scene, create Walls node for organizing wall collisions, finally add AudioStreamPlayer to play background music.
+
+The advantages of this scene organization method are: each scene is independent and can be tested separately; NPCs use instances of the same scene, modifying once affects all NPCs; scenes communicate through signals with low coupling, easy to maintain and extend.
+
+### 15.5.2 Player Control Implementation
+
+The player character is one of the most important elements in the game. We need to implement WASD movement control, animation switching, collision detection, interaction with NPCs, and sound effects system.
+
+The player scene structure includes: a CharacterBody2D as the root node, responsible for physics movement and collision; an AnimatedSprite2D displaying character animation; a CollisionShape2D defining collision shape; a Camera2D following the player; two AudioStreamPlayers playing interaction sound effects and walking sound effects respectively.
+
+The player control script `player.gd` implements movement, interaction, and sound effect logic:
+
+```python
+extends CharacterBody2D
+
+# Movement speed
+@export var speed: float = 200.0
+
+# Currently interactable NPC
+var nearby_npc: Node = null
+
+# Interaction state (disable movement during interaction)
+var is_interacting: bool = false
+
+# Node references
+@onready var animated_sprite: AnimatedSprite2D = $AnimatedSprite2D
+@onready var camera: Camera2D = $Camera2D
+
+# Sound effect references
+@onready var interact_sound: AudioStreamPlayer = null
+@onready var running_sound: AudioStreamPlayer = null
+
+# Walking sound effect state
+var is_playing_running_sound: bool = false
+
+func _ready():
+    # Add to player group (important! NPCs need this group to identify player)
+    add_to_group("player")
+
+    # Get sound effect nodes (optional, won't error if doesn't exist)
+    interact_sound = get_node_or_null("InteractSound")
+    running_sound = get_node_or_null("RunningSound")
+
+    # Enable camera
+    camera.enabled = true
+
+    # Play default animation
+    if animated_sprite.sprite_frames != null and animated_sprite.sprite_frames.has_animation("idle"):
+        animated_sprite.play("idle")
+
+func _physics_process(_delta: float):
+    # If interacting, disable movement
+    if is_interacting:
+        velocity = Vector2.ZERO
+        move_and_slide()
+        # Play idle animation
+        if animated_sprite.sprite_frames != null and animated_sprite.sprite_frames.has_animation("idle"):
+            animated_sprite.play("idle")
+        # Stop walking sound effect
+        stop_running_sound()
+        return
+
+    # Get input direction
+    var input_direction = Input.get_vector("ui_left", "ui_right", "ui_up", "ui_down")
+
+    # Set velocity
+    velocity = input_direction * speed
+
+    # Move
+    move_and_slide()
+
+    # Update animation and direction
+    update_animation(input_direction)
+
+    # Update walking sound effect
+    update_running_sound(input_direction)
+
+func update_animation(direction: Vector2):
+    """Update character animation (supports 4 directions)"""
+    if animated_sprite.sprite_frames == null:
+        return
+
+    # Play animation based on movement direction
+    if direction.length() > 0:
+        # Moving - determine main direction
+        if abs(direction.x) > abs(direction.y):
+            # Left-right movement
+            if direction.x > 0:
+                # Right
+                if animated_sprite.sprite_frames.has_animation("walk_right"):
+                    animated_sprite.play("walk_right")
+                    animated_sprite.flip_h = false
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+                    animated_sprite.flip_h = false
+            else:
+                # Left
+                if animated_sprite.sprite_frames.has_animation("walk_left"):
+                    animated_sprite.play("walk_left")
+                    animated_sprite.flip_h = false
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+                    animated_sprite.flip_h = true
+        else:
+            # Up-down movement
+            if direction.y > 0:
+                # Down
+                if animated_sprite.sprite_frames.has_animation("walk_down"):
+                    animated_sprite.play("walk_down")
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+            else:
+                # Up
+                if animated_sprite.sprite_frames.has_animation("walk_up"):
+                    animated_sprite.play("walk_up")
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+    else:
+        # Idle
+        if animated_sprite.sprite_frames.has_animation("idle"):
+            animated_sprite.play("idle")
+
+func _input(event: InputEvent):
+    # Press E key to interact with NPC
+    if event is InputEventKey:
+        if event.pressed and not event.echo:
+            if event.keycode == KEY_E or event.keycode == KEY_ENTER:
+                if nearby_npc != null:
+                    interact_with_npc()
+
+func interact_with_npc():
+    """Interact with nearby NPC"""
+    if nearby_npc != null:
+        # Play interaction sound effect
+        if interact_sound:
+            interact_sound.play()
+
+        # Send signal to dialogue system
+        get_tree().call_group("dialogue_system", "start_dialogue", nearby_npc.npc_name)
+
+func set_nearby_npc(npc: Node):
+    """Set nearby NPC"""
+    nearby_npc = npc
+
+func set_interacting(interacting: bool):
+    """Set interaction state"""
+    is_interacting = interacting
+    if interacting:
+        # Stop walking sound effect
+        stop_running_sound()
+
+func update_running_sound(direction: Vector2):
+    """Update walking sound effect"""
+    if running_sound == null:
+        return
+
+    # If moving
+    if direction.length() > 0:
+        # If sound effect not playing yet, start playing
+        if not is_playing_running_sound:
+            running_sound.play()
+            is_playing_running_sound = true
+    else:
+        # If stopped moving, stop sound effect
+        stop_running_sound()
+
+func stop_running_sound():
+    """Stop walking sound effect"""
+    if running_sound and is_playing_running_sound:
+        running_sound.stop()
+        is_playing_running_sound = false
+```
+
+This script implements complete player control. Players use WASD keys (or arrow keys) to move, and the character plays corresponding 4-direction animations (walk_up/down/left/right) based on movement direction. When the player approaches an NPC, the NPC calls `set_nearby_npc()` to set itself as an interactable object, and the player can press the E key to trigger interaction. During interaction, sound effects play, and `call_group()` notifies the dialogue system to start conversation. During dialogue, `set_interacting(true)` disables player movement, which is restored after dialogue ends. Walking sound effects automatically play when the player moves and automatically stop when stopped.
+
+### 15.5.3 NPC Behavior and Interaction
+
+NPCs need to implement three core functions: randomly patrol and wander in the scene, respond to player interactions, and display dialogue bubbles. We use Area2D to detect whether the player is near the NPC. When the player enters the interaction range, the player is notified, and pressing the E key starts the conversation.
+
+The NPC scene structure includes: CharacterBody2D as root node; CollisionShape2D defines NPC collision shape; AnimatedSprite2D displays NPC animation; InteractionArea (Area2D) detects player entering interaction range, with CollisionShape2D below defining interaction range; NameLabel displays NPC name; DialogueLabel displays dialogue bubble.
+
+The NPC script `npc.gd` implements patrol, interaction, and dialogue bubble logic:
+
+```python
+extends CharacterBody2D
+
+# NPC information
+@export var npc_name: String = "Zhang San"
+@export var npc_title: String = "Python Engineer"
+
+# NPC appearance configuration
+@export var sprite_frames: SpriteFrames = null  # Custom sprite frame resource
+
+# NPC movement configuration
+@export var move_speed: float = 50.0  # Movement speed
+@export var wander_enabled: bool = true  # Whether to enable patrol
+@export var wander_range: float = 200.0  # Patrol range
+@export var wander_interval_min: float = 3.0  # Minimum patrol interval (seconds)
+@export var wander_interval_max: float = 8.0  # Maximum patrol interval (seconds)
+
+# Current dialogue content (obtained from back-end)
+var current_dialogue: String = ""
+
+# Node references
+@onready var animated_sprite: AnimatedSprite2D = $AnimatedSprite2D
+@onready var interaction_area: Area2D = $InteractionArea
+@onready var name_label: Label = $NameLabel
+@onready var dialogue_label: Label = $DialogueLabel
+
+# Player reference
+var player: Node = null
+
+# Patrol-related variables
+var wander_target: Vector2 = Vector2.ZERO  # Patrol target position
+var wander_timer: float = 0.0  # Patrol timer
+var is_wandering: bool = false  # Whether currently patrolling
+var is_interacting: bool = false  # Whether currently interacting with player
+var spawn_position: Vector2 = Vector2.ZERO  # Spawn position
+
+func _ready():
+    # Add to npcs group
+    add_to_group("npcs")
+
+    # Set NPC name
+    name_label.text = npc_name
+
+    # Connect interaction area signals
+    interaction_area.body_entered.connect(_on_body_entered)
+    interaction_area.body_exited.connect(_on_body_exited)
+
+    # Initialize dialogue label
+    dialogue_label.text = ""
+    dialogue_label.visible = false
+
+    # Set custom sprite frames (if any)
+    if sprite_frames != null:
+        animated_sprite.sprite_frames = sprite_frames
+
+    # Play default animation
+    if animated_sprite.sprite_frames != null and animated_sprite.sprite_frames.has_animation("idle"):
+        animated_sprite.play("idle")
+
+    # Record spawn position
+    spawn_position = global_position
+
+    # Initialize patrol timer
+    if wander_enabled:
+        wander_timer = randf_range(wander_interval_min, wander_interval_max)
+        choose_new_wander_target()
+
+func _on_body_entered(body: Node2D):
+    """Player enters interaction range"""
+    if body.is_in_group("player"):
+        player = body
+
+        if player.has_method("set_nearby_npc"):
+            player.set_nearby_npc(self)
+
+func _on_body_exited(body: Node2D):
+    """Player leaves interaction range"""
+    if body.is_in_group("player"):
+        if player != null and player.has_method("set_nearby_npc"):
+            player.set_nearby_npc(null)
+        player = null
+
+func update_dialogue(dialogue: String):
+    """Update NPC dialogue content"""
+    current_dialogue = dialogue
+    dialogue_label.text = dialogue
+    dialogue_label.visible = true
+
+    # Hide dialogue after 10 seconds
+    await get_tree().create_timer(10.0).timeout
+    dialogue_label.visible = false
+
+func _physics_process(delta: float):
+    """Physics update - handle movement"""
+    # If interacting with player, stop movement
+    if is_interacting:
+        velocity = Vector2.ZERO
+        move_and_slide()
+        # Play idle animation
+        if animated_sprite.sprite_frames != null and animated_sprite.sprite_frames.has_animation("idle"):
+            animated_sprite.play("idle")
+        return
+
+    # If patrol not enabled, don't move
+    if not wander_enabled:
+        return
+
+    # Update patrol timer
+    wander_timer -= delta
+
+    # If timer ends, choose new target and start moving
+    if wander_timer <= 0:
+        choose_new_wander_target()
+        wander_timer = randf_range(wander_interval_min, wander_interval_max)
+
+    # If patrolling, move to target
+    if is_wandering:
+        # Check if reached target
+        if global_position.distance_to(wander_target) < 10:
+            # Reached target, stop movement
+            is_wandering = false
+            velocity = Vector2.ZERO
+            move_and_slide()
+            # Play idle animation
+            if animated_sprite.sprite_frames != null and animated_sprite.sprite_frames.has_animation("idle"):
+                animated_sprite.play("idle")
+        else:
+            # Continue moving to target
+            var direction = (wander_target - global_position).normalized()
+            velocity = direction * move_speed
+            move_and_slide()
+            # Update animation
+            update_animation(direction)
+    else:
+        # Stop movement
+        velocity = Vector2.ZERO
+        move_and_slide()
+        # Play idle animation
+        if animated_sprite.sprite_frames != null and animated_sprite.sprite_frames.has_animation("idle"):
+            animated_sprite.play("idle")
+
+func choose_new_wander_target():
+    """Choose new patrol target"""
+    # Randomly choose a point near spawn position
+    var offset = Vector2(
+        randf_range(-wander_range, wander_range),
+        randf_range(-wander_range, wander_range)
+    )
+    wander_target = spawn_position + offset
+    is_wandering = true
+
+func update_animation(direction: Vector2):
+    """Update animation"""
+    if animated_sprite.sprite_frames == null:
+        return
+
+    if direction.length() > 0:
+        # Movement animation
+        if abs(direction.x) > abs(direction.y):
+            # Left-right movement
+            if direction.x > 0:
+                if animated_sprite.sprite_frames.has_animation("walk_right"):
+                    animated_sprite.play("walk_right")
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+                    animated_sprite.flip_h = false
+            else:
+                if animated_sprite.sprite_frames.has_animation("walk_left"):
+                    animated_sprite.play("walk_left")
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+                    animated_sprite.flip_h = true
+        else:
+            # Up-down movement
+            if direction.y > 0:
+                if animated_sprite.sprite_frames.has_animation("walk_down"):
+                    animated_sprite.play("walk_down")
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+            else:
+                if animated_sprite.sprite_frames.has_animation("walk_up"):
+                    animated_sprite.play("walk_up")
+                elif animated_sprite.sprite_frames.has_animation("walk"):
+                    animated_sprite.play("walk")
+    else:
+        # Idle animation
+        if animated_sprite.sprite_frames.has_animation("idle"):
+            animated_sprite.play("idle")
+
+func set_interacting(interacting: bool):
+    """Set interaction state"""
+    is_interacting = interacting
+```
+
+This script implements complete NPC behavior. NPCs randomly patrol within the `wander_range` around their spawn position, choosing a new target point and moving there every `wander_interval_min` to `wander_interval_max` seconds. During movement, 4-direction animations (walk_up/down/left/right) play, and upon reaching the target, they stop and play the idle animation. When a player enters the InteractionArea, the NPC calls the player's `set_nearby_npc(self)` method, setting itself as an interactable object. After the player presses the E key, the dialogue system calls the NPC's `set_interacting(true)` method, and the NPC stops moving. After dialogue ends, `set_interacting(false)` is called, and the NPC resumes patrol. The main scene periodically calls the `update_dialogue()` method to update the NPC's dialogue bubble, displaying autonomous dialogue content between NPCs.
+
+## 15.6 Front-End and Back-End Communication Implementation
+
+### 15.6.1 API Client Encapsulation
+
+The Godot front-end needs to communicate with the FastAPI back-end via HTTP. We create an API client script `api_client.gd`, encapsulating all API calls, and set it as an AutoLoad (auto-load) singleton so other scripts can conveniently use it.
+
+The API client uses Godot's HTTPRequest node to send HTTP requests. HTTPRequest is an asynchronous node that doesn't block the game after sending requests, but notifies request completion through signals. This ensures game fluidity - even with high network latency, there's no stuttering. We use the signal mechanism to notify other scripts of API responses rather than using await, allowing multiple scripts to simultaneously listen for the same API response.
+
+```python
+# api_client.gd
+extends Node
+
+# Signal definitions
+signal chat_response_received(npc_name: String, message: String)
+signal chat_error(error_message: String)
+signal npc_status_received(dialogues: Dictionary)
+signal npc_list_received(npcs: Array)
+
+# HTTP request nodes
+var http_chat: HTTPRequest
+var http_status: HTTPRequest
+var http_npcs: HTTPRequest
+
+func _ready():
+    # Create HTTP request nodes
+    http_chat = HTTPRequest.new()
+    http_status = HTTPRequest.new()
+    http_npcs = HTTPRequest.new()
+
+    add_child(http_chat)
+    add_child(http_status)
+    add_child(http_npcs)
+
+    # Connect signals
+    http_chat.request_completed.connect(_on_chat_request_completed)
+    http_status.request_completed.connect(_on_status_request_completed)
+    http_npcs.request_completed.connect(_on_npcs_request_completed)
+
+# ==================== Chat API ====================
+func send_chat(npc_name: String, message: String) -> void:
+    """Send chat request"""
+    var data = {
+        "npc_name": npc_name,
+        "message": message
+    }
+
+    var json_string = JSON.stringify(data)
+    var headers = ["Content-Type: application/json"]
+
+    var error = http_chat.request(
+        Config.API_CHAT,
+        headers,
+        HTTPClient.METHOD_POST,
+        json_string
+    )
+
+    if error != OK:
+        print("[ERROR] Failed to send chat request: ", error)
+        chat_error.emit("Network request failed")
+
+func _on_chat_request_completed(_result: int, response_code: int, _headers: PackedStringArray, body: PackedByteArray) -> void:
+    """Handle chat response"""
+    if response_code != 200:
+        print("[ERROR] Chat request failed: HTTP ", response_code)
+        chat_error.emit("Server error: " + str(response_code))
+        return
+
+    var json = JSON.new()
+    var parse_result = json.parse(body.get_string_from_utf8())
+
+    if parse_result != OK:
+        print("[ERROR] Failed to parse response")
+        chat_error.emit("Response parsing failed")
+        return
+
+    var response = json.data
+
+    if response.has("success") and response["success"]:
+        var npc_name = response["npc_name"]
+        var msg = response["message"]
+        print("[INFO] Received NPC reply: ", npc_name, " -> ", msg)
+        chat_response_received.emit(npc_name, msg)
+    else:
+        chat_error.emit("Chat failed")
+
+# ==================== NPC Status API ====================
+func get_npc_status() -> void:
+    """Get NPC status"""
+    # Check if request is being processed
+    if http_status.get_http_client_status() != HTTPClient.STATUS_DISCONNECTED:
+        print("[WARN] NPC status request is being processed, skipping this request")
+        return
+
+    var error = http_status.request(Config.API_NPC_STATUS)
+
+    if error != OK:
+        print("[ERROR] Failed to get NPC status: ", error)
+
+func _on_status_request_completed(_result: int, response_code: int, _headers: PackedStringArray, body: PackedByteArray) -> void:
+    """Handle NPC status response"""
+    if response_code != 200:
+        print("[ERROR] NPC status request failed: HTTP ", response_code)
+        return
+
+    var json = JSON.new()
+    var parse_result = json.parse(body.get_string_from_utf8())
+
+    if parse_result != OK:
+        print("[ERROR] Failed to parse NPC status")
+        return
+
+    var response = json.data
+
+    if response.has("dialogues"):
+        var dialogues = response["dialogues"]
+        print("[INFO] Received NPC status update: ", dialogues.size(), " NPCs")
+        npc_status_received.emit(dialogues)
+
+# ==================== NPC List API ====================
+func get_npc_list() -> void:
+    """Get NPC list"""
+    var error = http_npcs.request(Config.API_NPCS)
+
+    if error != OK:
+        print("[ERROR] Failed to get NPC list: ", error)
+
+func _on_npcs_request_completed(_result: int, response_code: int, _headers: PackedStringArray, body: PackedByteArray) -> void:
+    """Handle NPC list response"""
+    if response_code != 200:
+        print("[ERROR] NPC list request failed: HTTP ", response_code)
+        return
+
+    var json = JSON.new()
+    var parse_result = json.parse(body.get_string_from_utf8())
+
+    if parse_result != OK:
+        print("[ERROR] Failed to parse NPC list")
+        return
+
+    var response = json.data
+
+    if response.has("npcs"):
+        var npcs = response["npcs"]
+        print("[INFO] Received NPC list: ", npcs.size(), " NPCs")
+        npc_list_received.emit(npcs)
+```
+
+This API client encapsulates three core functions: send chat request (`send_chat`), get NPC status (`get_npc_status`), and get NPC list (`get_npc_list`). All HTTP requests are asynchronous, notifying response results through signals. We created independent HTTPRequest nodes for each API, allowing multiple requests to be sent simultaneously without interfering with each other. API URLs are obtained from the Config singleton for convenient unified management. The dialogue system listens to the `chat_response_received` signal to receive NPC replies, and the main scene listens to the `npc_status_received` signal to update NPC dialogue bubbles.
+
+### 15.6.2 Dialogue UI Implementation
+
+The dialogue UI is the interface for player-NPC interaction. We need to design a simple and beautiful dialogue box containing NPC name, title, dialogue content display, input box, and buttons.
+
+The dialogue UI structure is shown in Figure 15.13:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-13.png" alt="" width="85%"/>
+  <p>Figure 15.13 Dialogue UI Structure</p>
+</div>
+
+The dialogue UI design is very simple. DialogueUI is a CanvasLayer node, meaning it will always display on top of the game screen and won't be obscured by other game objects. Panel is the dialogue box background, anchored at the bottom of the screen. Under Panel are 6 UI elements placed directly: NPCName displays the NPC's name, NPCTitle displays the title, DialogueText uses RichTextLabel to display dialogue content (supports rich text format), PlayerInput is a LineEdit for player input, and SendButton and CloseButton are used to send messages and close the dialogue box respectively.
+
+The dialogue UI script `dialogue_ui.gd` implements the dialogue interface logic:
+
+```python
+# dialogue_ui.gd
+extends CanvasLayer
+
+# UI node references
+@onready var panel = $Panel
+@onready var npc_name_label = $Panel/NPCName
+@onready var npc_title_label = $Panel/NPCTitle
+@onready var dialogue_text = $Panel/DialogueText
+@onready var input_field = $Panel/PlayerInput
+@onready var send_button = $Panel/SendButton
+@onready var close_button = $Panel/CloseButton
+
+# API client
+var api_client: Node = null
+
+# Current NPC in dialogue
+var current_npc_name: String = ""
+
+func _ready():
+    # Hide dialogue box on initialization
+    visible = false
+
+    # Connect button signals
+    send_button.pressed.connect(_on_send_button_pressed)
+    close_button.pressed.connect(_on_close_button_pressed)
+    input_field.text_submitted.connect(_on_text_submitted)
+
+    # Get API client
+    api_client = get_node_or_null("/root/APIClient")
+
+func start_dialogue(npc_name: String):
+    """Start dialogue with NPC"""
+    current_npc_name = npc_name
+
+    # Set NPC information
+    npc_name_label.text = npc_name
+    npc_title_label.text = get_npc_title(npc_name)
+
+    # Clear dialogue content
+    dialogue_text.clear()
+    dialogue_text.append_text("[color=gray]Conversation with " + npc_name + " started...[/color]\n")
+
+    # Clear input field
+    input_field.text = ""
+
+    # Show dialogue box
+    show_dialogue()
+
+    # Focus input field
+    input_field.grab_focus()
+
+func show_dialogue():
+    """Show dialogue box"""
+    visible = true
+
+    # Notify player to enter interaction state (disable movement)
+    var player = get_tree().get_first_node_in_group("player")
+    if player and player.has_method("set_interacting"):
+        player.set_interacting(true)
+
+func hide_dialogue():
+    """Hide dialogue box"""
+    visible = false
+    current_npc_name = ""
+
+    # Notify player to exit interaction state (enable movement)
+    var player = get_tree().get_first_node_in_group("player")
+    if player and player.has_method("set_interacting"):
+        player.set_interacting(false)
+
+func _on_send_button_pressed():
+    """Send button clicked"""
+    send_message()
+
+func _on_close_button_pressed():
+    """Close button clicked"""
+    hide_dialogue()
+
+func _on_text_submitted(_text: String):
+    """Input field enter pressed"""
+    send_message()
+
+func send_message():
+    """Send message"""
+    var message = input_field.text.strip_edges()
+
+    if message.is_empty():
+        return
+
+    if current_npc_name.is_empty():
+        return
+
+    # Display player message
+    dialogue_text.append_text("\n[color=cyan]Player:[/color] " + message + "\n")
+
+    # Clear input field
+    input_field.text = ""
+
+    # Disable input
+    input_field.editable = false
+    send_button.disabled = true
+
+    # Send API request
+    if api_client:
+        api_client.send_chat_request(current_npc_name, message)
+
+func on_chat_response_received(npc_name: String, response: String):
+    """Received NPC reply"""
+    if npc_name == current_npc_name:
+        # Display NPC reply
+        dialogue_text.append_text("[color=yellow]" + npc_name + ":[/color] " + response + "\n")
+
+        # Enable input
+        input_field.editable = true
+        send_button.disabled = false
+        input_field.grab_focus()
+
+func get_npc_title(npc_name: String) -> String:
+    """Get NPC title"""
+    var titles = {
+        "Zhang San": "Python Engineer",
+        "Li Si": "Product Manager",
+        "Wang Wu": "UI Designer"
+    }
+    return titles.get(npc_name, "")
+```
+
+This dialogue UI implements complete dialogue functionality. Players can input and send messages, and the UI uses RichTextLabel's append_text method to display dialogue content, supporting rich text format (colors, bold, etc.). All API calls are asynchronous, disabling the input box while waiting for responses to prevent duplicate sends. When the dialogue box is displayed, it notifies the player to enter interaction state, disabling movement, and restores movement when closed.
+
+### 15.6.3 Main Scene Integration
+
+Finally, we need to integrate all functions in the main scene: player control, NPC interaction, dialogue UI, and NPC status updates. The main scene script `main.gd` coordinates these components and periodically obtains NPC status from the back-end to update NPC dialogue bubbles.
+
+```python
+# main.gd
+extends Node2D
+
+# NPC node references
+@onready var npc_zhang: Node2D = $NPCs/NPC_Zhang
+@onready var npc_li: Node2D = $NPCs/NPC_Li
+@onready var npc_wang: Node2D = $NPCs/NPC_Wang
+
+# API client
+var api_client: Node = null
+
+# NPC status update timer
+var status_update_timer: float = 0.0
+
+func _ready():
+    print("[INFO] Main scene initialization")
+
+    # Get API client
+    api_client = get_node_or_null("/root/APIClient")
+    if api_client:
+        api_client.npc_status_received.connect(_on_npc_status_received)
+
+        # Immediately get NPC status once
+        api_client.get_npc_status()
+    else:
+        print("[ERROR] API client not found")
+
+func _process(delta: float):
+    # Periodically update NPC status
+    status_update_timer += delta
+    if status_update_timer >= Config.NPC_STATUS_UPDATE_INTERVAL:
+        status_update_timer = 0.0
+        if api_client:
+            api_client.get_npc_status()
+
+func _on_npc_status_received(dialogues: Dictionary):
+    """Received NPC status update"""
+    print("[INFO] Update NPC status: ", dialogues)
+
+    # Update each NPC's dialogue
+    for npc_name in dialogues:
+        var dialogue = dialogues[npc_name]
+        update_npc_dialogue(npc_name, dialogue)
+
+func update_npc_dialogue(npc_name: String, dialogue: String):
+    """Update specified NPC's dialogue"""
+    var npc_node = get_npc_node(npc_name)
+    if npc_node and npc_node.has_method("update_dialogue"):
+        npc_node.update_dialogue(dialogue)
+
+func get_npc_node(npc_name: String) -> Node2D:
+    """Get NPC node by name"""
+    match npc_name:
+        "Zhang San":
+            return npc_zhang
+        "Li Si":
+            return npc_li
+        "Wang Wu":
+            return npc_wang
+        _:
+            return null
+```
+
+The core function of the main scene script is to periodically obtain NPC status from the back-end. In `_ready()`, we get a reference to the APIClient singleton and connect the `npc_status_received` signal. Then we immediately call `get_npc_status()` to get NPC status once. In `_process()`, we use a timer to call `get_npc_status()` every `Config.NPC_STATUS_UPDATE_INTERVAL` seconds (default 30 seconds). When NPC status updates are received, the `_on_npc_status_received()` callback function traverses all NPCs and calls their `update_dialogue()` method to update dialogue bubbles. This way, even if the player doesn't interact with NPCs, they can still see autonomous dialogue between NPCs.
+
+The complete front-end and back-end communication process is shown in Figure 15.14:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-14.png" alt="" width="85%"/>
+  <p>Figure 15.14 Complete Front-End and Back-End Communication Process</p>
+</div>
+
+At this point, all front-end and back-end communication functions have been implemented. Players can move freely in the game, interact with NPCs, and have natural language conversations. Meanwhile, the main scene periodically obtains NPC status from the back-end, updates NPC dialogue bubbles, and displays autonomous dialogue between NPCs. The entire system uses a signal mechanism for communication, with loose coupling between components, making it easy to maintain and extend.
+
+## 15.7 Summary and Outlook
+
+### 15.7.1 Chapter Review
+
+In this chapter, we completed a full AI town project - Cyber Town. This project combines the HelloAgents framework with the Godot game engine to create a vibrant virtual world. Let's review the core content we learned.
+
+**Technical Architecture Design**
+
+We adopted a separated architecture of game engine + back-end service, separating front-end rendering, back-end logic, and AI intelligence into different layers. Godot handles game graphics and player interaction, FastAPI handles API services and state management, and HelloAgents handles NPC intelligence and memory systems. This layered design allows each part to be developed and tested independently, and also provides a good foundation for future expansion.
+
+**NPC Agent System**
+
+We used HelloAgents' SimpleAgent to create independent agents for each NPC. Each NPC has its own role setting, personality traits, and memory system. Through carefully designed system prompts, we made Zhang San a rigorous Python engineer, Li Si a product manager good at communication, and Wang Wu a creative UI designer. These NPCs can not only understand player dialogue but also respond according to their role characteristics.
+
+**Memory and Affection System**
+
+We implemented a two-layer memory system: short-term memory maintains dialogue coherence, and long-term memory stores all interaction history. Through semantic retrieval in vector databases, NPCs can recall previously discussed topics. The affection system allows NPCs' attitudes toward players to change with interaction, from stranger to close friend, with different behavioral expressions at each level. These designs make NPCs appear more realistic and interesting.
+
+**Game Scene Construction**
+
+We used Godot to create a pixel-style office scene, implementing player control, NPC wandering, interaction detection, and dialogue UI. Through the modular design of the scene system, we can easily add new NPCs, new scenes, and new functions. GDScript's concise syntax makes game logic implementation intuitive and efficient.
+
+**Front-End and Back-End Communication**
+
+We used HTTP REST API to implement communication between the Godot front-end and FastAPI back-end. Through asynchronous requests and signal systems, we ensured game fluidity - even with high network latency, player experience is not affected. The API client encapsulation allows other scripts to conveniently call back-end services, and the dialogue UI implementation allows players to naturally communicate with NPCs.
+
+The project's technology stack is shown in Figure 15.15:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-15.png" alt="" width="85%"/>
+  <p>Figure 15.15 Cyber Town Technology Stack</p>
+</div>
+
+### 15.7.2 Extension Directions
+
+Cyber Town is just a starting point - there are many directions for extension. These extensions can not only enhance game fun but also explore more possibilities for AI technology in games.
+
+**(1) Multiplayer Online Support**
+
+Currently, Cyber Town is a single-player game, but we can extend it to a multiplayer online game. Multiple players can simultaneously enter the same office and interact with NPCs and other players. This requires introducing WebSocket for real-time communication and databases to persist player data and NPC states. NPCs can remember interactions with different players and maintain independent affection levels for each player.
+
+**(2) Quest System**
+
+We can design a quest system for NPCs. When a player's affection with an NPC reaches a certain level, the NPC will provide special quests. For example, Zhang San might ask the player to help debug code, Li Si might ask the player to collect user feedback, and Wang Wu might ask the player to evaluate design proposals. Completing quests can earn rewards and further increase affection.
+
+**(3) NPC-to-NPC Interaction**
+
+Currently, NPCs only interact with players, but we can enable NPCs to interact with each other. Zhang San can discuss product requirements with Li Si, Li Si can discuss interface design with Wang Wu, and Wang Wu can discuss technical implementation with Zhang San. These interactions can occur automatically in the background, and players can observe dialogue between NPCs, making the entire world appear more lively.
+
+**(4) Emotion System**
+
+In addition to affection, we can add a more complex emotion system for NPCs. NPCs can have different emotional states such as happy, sad, angry, and excited, which affect NPC reply style and behavior. For example, when an NPC is in a good mood, they'll be more willing to share information; when in a bad mood, they might be rather cold.
+
+**(5) Dynamic Event System**
+
+We can design dynamic events to make the game world richer. For example, regularly hold team meetings where all NPCs and players gather to discuss project progress; or hold birthday parties celebrating an NPC's birthday; or emergency tasks requiring everyone's collaboration. These events can increase game variety and fun.
+
+**(6) Larger World**
+
+Currently, Cyber Town has only one office scene, but we can expand to a larger world. We can add different scenes like cafes, libraries, and parks, each with different NPCs and interaction methods. Players can move between different scenes and explore a broader virtual world.
+
+**(7) Personalized Learning**
+
+NPCs can learn each player's preferences and habits. For example, if a player frequently discusses Python with Zhang San, the NPC will remember the player is interested in programming and will proactively share related content in the future. If a player likes playing games at night, the NPC will remember this time habit and be more active at night.
+
+### 15.7.3 Reflection and Outlook
+
+Cyber Town demonstrates the enormous potential of AI technology in games. NPCs in traditional games are limited by preset dialogue trees and scripts, while AI NPCs can understand and generate natural language, having real conversations with players. This not only enhances game immersion but also brings new possibilities to game design.
+
+However, AI NPCs also face some challenges. First is the cost issue - each conversation requires calling the LLM API, which incurs certain fees. For large multiplayer online games, this cost could be very high. Second is the latency issue - LLM inference takes time, and if network latency is high, players might need to wait several seconds to see NPC replies. Finally, there's the content control issue - LLM-generated content may not be fully controllable, requiring well-designed prompts and content filtering mechanisms.
+
+Despite these challenges, the future of AI NPCs remains full of promise. As LLM technology develops, inference speed will become faster and costs will become lower. Localized small LLMs are also developing rapidly - in the future, they may be able to run directly on players' devices, requiring no network requests at all. The combination of AI technology and games will bring players unprecedented experiences.
+
+In Part 5's graduation project chapter, we will learn how to construct general agents using single agents and multi-agents - this will be your creative time, so stay tuned!

+ 170 - 166
docs/chapter15/第十五章 构建赛博小镇.md

@@ -1,41 +1,45 @@
+<div align="right">
+  <a href="./Chapter15-Building-Cyber-Town.md">English</a> | 中文
+</div>
+
 # 第十五章 构建赛博小镇
 
-这一章,我们将探索一个全新的方向:<strong>将智能体技术与游戏引擎结合,构建一个充满生命力的AI小镇</strong>。
+这一章,我们将探索一个全新的方向:<strong>将智能体技术与游戏引擎结合,构建一个充满生命力的 AI 小镇</strong>。
 
-还记得《模拟人生》或《动物森友会》中那些栩栩如生的NPC吗?他们有自己的性格、记忆和社交关系。本章的赛博小镇将是一个类似的项目,但与传统游戏不同的是,我们的NPC拥有真正的"智能"——他们能够理解玩家的对话,记住过去的互动,并根据好感度做出不同的反应。本章的赛博小镇包含以下核心功能:
+还记得《模拟人生》或《动物森友会》中那些栩栩如生的 NPC 吗?他们有自己的性格、记忆和社交关系。本章的赛博小镇将是一个类似的项目,但与传统游戏不同的是,我们的 NPC 拥有真正的"智能"——他们能够理解玩家的对话,记住过去的互动,并根据好感度做出不同的反应。本章的赛博小镇包含以下核心功能:
 
-<strong>(1)智能NPC对话系统</strong>:玩家可以与NPC进行自然语言对话,NPC会根据自己的角色设定和记忆做出回应。
+<strong>(1)智能 NPC 对话系统</strong>:玩家可以与 NPC 进行自然语言对话,NPC 会根据自己的角色设定和记忆做出回应。
 
-<strong>(2)记忆系统</strong>:NPC拥有短期记忆和长期记忆,能够记住与玩家的互动历史。
+<strong>(2)记忆系统</strong>:NPC 拥有短期记忆和长期记忆,能够记住与玩家的互动历史。
 
-<strong>(3)好感度系统</strong>:NPC对玩家的态度会随着互动而变化,从陌生到熟悉,从友好到亲密。
+<strong>(3)好感度系统</strong>:NPC 对玩家的态度会随着互动而变化,从陌生到熟悉,从友好到亲密。
 
-<strong>(4)游戏化交互</strong>:玩家可以在2D像素风格的办公室场景中自由移动,与不同的NPC互动。
+<strong>(4)游戏化交互</strong>:玩家可以在 2D 像素风格的办公室场景中自由移动,与不同的 NPC 互动。
 
 <strong>(5)实时日志系统</strong>:所有对话和互动都会被记录,方便调试和分析。
 
 ## 15.1 项目概述与架构设计
 
-### 15.1.1 为什么要构建AI小镇
+### 15.1.1 为什么要构建 AI 小镇
 
-传统游戏中的NPC通常只能说固定的台词,或者通过预设的对话树进行有限的互动。即使是最复杂的RPG游戏,NPC的对话也是由编剧事先写好的。这种方式虽然可控,但缺乏真正的"智能"和"生命力"。
+传统游戏中的 NPC 通常只能说固定的台词,或者通过预设的对话树进行有限的互动。即使是最复杂的 RPG 游戏,NPC 的对话也是由编剧事先写好的。这种方式虽然可控,但缺乏真正的"智能"和"生命力"。
 
-想象一下,如果游戏中的NPC能够理解你说的任何话,不再局限于预设的选项,你可以用自然语言与NPC交流。NPC会记得你上次说了什么,你们的关系如何,甚至你的喜好。每个NPC都有自己的职业、性格和说话风格。NPC对你的态度会随着互动而变化,从陌生人到朋友,甚至挚友。
+想象一下,如果游戏中的 NPC 能够理解你说的任何话,不再局限于预设的选项,你可以用自然语言与 NPC 交流。NPC 会记得你上次说了什么,你们的关系如何,甚至你的喜好。每个 NPC 都有自己的职业、性格和说话风格。NPC 对你的态度会随着互动而变化,从陌生人到朋友,甚至挚友。
 
-这就是AI技术为游戏带来的新可能。通过将大语言模型与游戏引擎结合,我们可以创造出真正"活着"的NPC。这不仅仅是一个技术演示,更是对未来游戏形态的探索。在教育游戏中,NPC可以扮演历史人物、科学家,与学生进行互动式教学。在虚拟办公室中,NPC可以扮演同事、导师,提供帮助和建议。NPC还可以作为陪伴者,与用户进行情感交流,应用于心理健康领域。当然,最直接的应用就是为传统游戏增加AI NPC,提升玩家体验。
+这就是 AI 技术为游戏带来的新可能。通过将大语言模型与游戏引擎结合,我们可以创造出真正"活着"的 NPC。这不仅仅是一个技术演示,更是对未来游戏形态的探索。在教育游戏中,NPC 可以扮演历史人物、科学家,与学生进行互动式教学。在虚拟办公室中,NPC 可以扮演同事、导师,提供帮助和建议。NPC 还可以作为陪伴者,与用户进行情感交流,应用于心理健康领域。当然,最直接的应用就是为传统游戏增加 AI NPC,提升玩家体验。
 
 ### 15.1.2 技术架构概览
 
-赛博小镇采用<strong>游戏引擎+后端服务</strong>的分离架构,分为四个层次,如图15.1所示。
+赛博小镇采用<strong>游戏引擎+后端服务</strong>的分离架构,分为四个层次,如图 15.1 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-1.png" alt="" width="85%"/>
   <p>图 15.1 赛博小镇技术架构</p>
 </div>
 
-前端层使用Godot 4.5游戏引擎,负责游戏渲染、玩家控制、NPC显示和对话UI。Godot是一个开源的2D/3D游戏引擎,非常适合快速开发像素风格的游戏。后端层使用FastAPI框架,负责API路由、NPC状态管理、对话处理和日志记录。FastAPI是一个现代化的Python Web框架,性能优秀且易于开发。智能体层使用我们自己构建的HelloAgents框架,负责NPC智能、记忆管理和好感度计算。每个NPC都是一个SimpleAgent实例,拥有独立的记忆和状态。外部服务层提供LLM能力、向量存储和数据持久化,包括LLM API、Qdrant向量数据库和SQLite关系数据库。
+前端层使用 Godot 4.5 游戏引擎,负责游戏渲染、玩家控制、NPC 显示和对话 UI。Godot 是一个开源的 2D/3D 游戏引擎,非常适合快速开发像素风格的游戏。后端层使用 FastAPI 框架,负责 API 路由、NPC 状态管理、对话处理和日志记录。FastAPI 是一个现代化的 Python Web 框架,性能优秀且易于开发。智能体层使用我们自己构建的 HelloAgents 框架,负责 NPC 智能、记忆管理和好感度计算。每个 NPC 都是一个 SimpleAgent 实例,拥有独立的记忆和状态。外部服务层提供 LLM 能力、向量存储和数据持久化,包括 LLM API、Qdrant 向量数据库和 SQLite 关系数据库。
 
-数据流转过程如图15.2所示:
+数据流转过程如图 15.2 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-2.png" alt="" width="85%"/>
@@ -43,7 +47,7 @@
 </div>
 
 
-玩家在Godot中按E键与NPC互动,Godot通过HTTP API发送对话请求到FastAPI后端。后端调用HelloAgents的SimpleAgent处理对话,Agent从记忆系统中检索相关历史,然后调用LLM生成回复。后端更新NPC状态和好感度,记录日志到控制台和文件,最后返回回复给Godot前端。Godot显示NPC回复并更新UI,完成一次完整的交互循环。
+玩家在 Godot 中按 E 键与 NPC 互动,Godot 通过 HTTP API 发送对话请求到 FastAPI 后端。后端调用 HelloAgents  SimpleAgent 处理对话,Agent 从记忆系统中检索相关历史,然后调用 LLM 生成回复。后端更新 NPC 状态和好感度,记录日志到控制台和文件,最后返回回复给 Godot 前端。Godot 显示 NPC 回复并更新 UI,完成一次完整的交互循环。
 
 项目的结构如下,方便你定位源码:
 
@@ -83,19 +87,19 @@ Helloagents-AI-Town/
 
 详细的架构设计和数据流转将在后续章节中介绍。
 
-### 15.1.3 快速体验:5分钟运行项目
+### 15.1.3 快速体验:5 分钟运行项目
 
 在深入学习实现细节之前,让我们先把项目跑起来,看看最终的效果。这样你会对整个系统有一个直观的认识。
 
 <strong>环境要求:</strong>
 
-- Godot 4.2或更高版本
-- Python 3.10或更高版本
-- LLM API密钥(OpenAI、DeepSeek、智谱等)
+- Godot 4.2 或更高版本
+- Python 3.10 或更高版本
+- LLM API 密钥(OpenAI、DeepSeek、智谱等)
 
 <strong>获取项目:</strong>
 
-你可以到`code/chapter15/Helloagents-AI-Town`中查看,或者从GitHub克隆完整的hello-agents仓库。
+你可以到`code/chapter15/Helloagents-AI-Town`中查看,或者从 GitHub 克隆完整的 hello-agents 仓库。
 
 <strong>启动后端:</strong>
 
@@ -126,31 +130,31 @@ python main.py
 ============================================================
 ```
 
-<strong>启动Godot:</strong>
+<strong>启动 Godot:</strong>
 
-Godot的安装非常简单,Windows提供了直接打开的`.exe`文件,Mac也提供了`.dmg`文件。可直接在官网下载([Windows](https://godotengine.org/download/windows/) / [Mac](https://godotengine.org/download/macos/))
+Godot 的安装非常简单,Windows 提供了直接打开的`.exe`文件,Mac 也提供了`.dmg`文件。可直接在官网下载([Windows](https://godotengine.org/download/windows/) / [Mac](https://godotengine.org/download/macos/))
 
-打开Godot引擎,点击"导入"按钮,浏览到`Helloagents-AI-Town/helloagents-ai-town/project.godot`,点击"导入并编辑"。等待Godot导入资源后,按`F5`或点击"运行"按钮启动游戏。
+打开 Godot 引擎,点击"导入"按钮,浏览到`Helloagents-AI-Town/helloagents-ai-town/project.godot`,点击"导入并编辑"。等待 Godot 导入资源后,按`F5`或点击"运行"按钮启动游戏。
 
 <strong>体验核心功能:</strong>
 
-游戏启动后,你会看到一个像素风格的Datawhale办公室场景,如图15.3所示。
+游戏启动后,你会看到一个像素风格的 Datawhale 办公室场景,如图 15.3 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-3.png" alt="" width="85%"/>
   <p>图 15.3 赛博小镇游戏场景</p>
 </div>
 
-使用WASD键移动玩家角色,走到NPC附近时,屏幕上会显示"按E键交互"的提示。按下E键后,会弹出对话框,你可以输入任何想说的话,如图15.4所示。
+使用 WASD 键移动玩家角色,走到 NPC 附近时,屏幕上会显示"按 E 键交互"的提示。按下 E 键后,会弹出对话框,你可以输入任何想说的话,如图 15.4 所示。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-4.png" alt="" width="85%"/>
-  <p>图 15.4 与NPC对话界面</p>
+  <p>图 15.4 与 NPC 对话界面</p>
 </div>
 
-NPC会根据自己的角色设定(Python工程师、产品经理、UI设计师)和你们的互动历史做出回应。随着对话的进行,NPC对你的好感度会逐渐提升,从"陌生"到"熟悉",再到"友好"、"亲密"甚至"挚友"。
+NPC 会根据自己的角色设定(Python 工程师、产品经理、UI 设计师)和你们的互动历史做出回应。随着对话的进行,NPC 对你的好感度会逐渐提升,从"陌生"到"熟悉",再到"友好"、"亲密"甚至"挚友"。
 
-<strong>好感度系统在后端实现</strong>,每次对话都会根据玩家的消息内容和情感分析来调整好感度值。虽然前端游戏界面中没有直接显示好感度数值,但所有的好感度变化都会被详细记录在后端日志中。你可以在`backend/logs/dialogue_YYYY-MM-DD.log`文件中查看每次对话的好感度变化。日志文件会记录每次对话的详细信息,包括:当前好感度值、检索到的相关记忆、NPC的回复、好感度变化量(+2.0、+3.0等)、变化原因(友好问候、正常交流等)以及情感分析结果(positive、neutral等)。这种设计让开发者可以清晰地追踪NPC与玩家的关系发展,也为后续在前端添加好感度UI提供了数据基础。
+<strong>好感度系统在后端实现</strong>,每次对话都会根据玩家的消息内容和情感分析来调整好感度值。虽然前端游戏界面中没有直接显示好感度数值,但所有的好感度变化都会被详细记录在后端日志中。你可以在`backend/logs/dialogue_YYYY-MM-DD.log`文件中查看每次对话的好感度变化。日志文件会记录每次对话的详细信息,包括:当前好感度值、检索到的相关记忆、NPC 的回复、好感度变化量(+2.0、+3.0 等)、变化原因(友好问候、正常交流等)以及情感分析结果(positive、neutral 等)。这种设计让开发者可以清晰地追踪 NPC 与玩家的关系发展,也为后续在前端添加好感度 UI 提供了数据基础。
 
 所有的对话都会被记录在后端的日志文件中,你可以通过以下命令实时查看:
 
@@ -159,20 +163,20 @@ NPC会根据自己的角色设定(Python工程师、产品经理、UI设计师)
 python view_logs.py
 ```
 
-这个简单的体验展示了AI小镇的核心功能。接下来,我们将深入学习如何实现这些功能。
+这个简单的体验展示了 AI 小镇的核心功能。接下来,我们将深入学习如何实现这些功能。
 
 
 
 
-## 15.2 NPC智能体系统
+## 15.2 NPC 智能体系统
 
-### 15.2.1 基于HelloAgents的SimpleAgent
+### 15.2.1 基于 HelloAgents  SimpleAgent
 
-在赛博小镇中,每个NPC都是一个独立的智能体。我们使用HelloAgents框架中的SimpleAgent来实现NPC的智能。SimpleAgent是一个轻量级的智能体实现,它封装了LLM调用、消息管理和工具调用等核心功能。
+在赛博小镇中,每个 NPC 都是一个独立的智能体。我们使用 HelloAgents 框架中的 SimpleAgent 来实现 NPC 的智能。SimpleAgent 是一个轻量级的智能体实现,它封装了 LLM 调用、消息管理和工具调用等核心功能。
 
-回顾一下第七章中我们学习的SimpleAgent,它的核心是一个简单的对话循环:接收用户消息,调用LLM生成回复,返回结果。在赛博小镇中,我们需要为每个NPC创建一个SimpleAgent实例,并为其配置独特的系统提示词,让每个NPC拥有不同的性格和角色设定。
+回顾一下第七章中我们学习的 SimpleAgent,它的核心是一个简单的对话循环:接收用户消息,调用 LLM 生成回复,返回结果。在赛博小镇中,我们需要为每个 NPC 创建一个 SimpleAgent 实例,并为其配置独特的系统提示词,让每个 NPC 拥有不同的性格和角色设定。
 
-让我们看看如何创建一个NPC Agent。首先,我们需要定义NPC的基本信息,包括ID、名称、职业和性格。然后,我们根据这些信息构建系统提示词,让LLM扮演这个NPC的角色。最后,我们创建SimpleAgent实例,并配置记忆系统。
+让我们看看如何创建一个 NPC Agent。首先,我们需要定义 NPC 的基本信息,包括 ID、名称、职业和性格。然后,我们根据这些信息构建系统提示词,让 LLM 扮演这个 NPC 的角色。最后,我们创建 SimpleAgent 实例,并配置记忆系统。
 
 ```python
 from hello_agents import SimpleAgent, HelloAgentsLLM
@@ -212,23 +216,23 @@ def create_npc_agent(npc_id: str, name: str, role: str, personality: str):
     return agent
 ```
 
-这段代码展示了如何创建一个NPC Agent。系统提示词定义了NPC的身份和性格,记忆管理器让NPC能够记住与玩家的对话历史。WorkingMemory是短期记忆,容量为10条消息,保留时间为120分钟。EpisodicMemory是长期记忆,使用SQLite数据库和Qdrant向量数据库存储,可以检索相关的历史对话。
+这段代码展示了如何创建一个 NPC Agent。系统提示词定义了 NPC 的身份和性格,记忆管理器让 NPC 能够记住与玩家的对话历史。WorkingMemory 是短期记忆,容量为 10 条消息,保留时间为 120 分钟。EpisodicMemory 是长期记忆,使用 SQLite 数据库和 Qdrant 向量数据库存储,可以检索相关的历史对话。
 
-NPC Agent的工作流程如图15.5所示:
+NPC Agent 的工作流程如图 15.5 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-5.png" alt="" width="85%"/>
-  <p>图 15.5 NPC Agent工作流程</p>
+  <p>图 15.5 NPC Agent 工作流程</p>
 </div>
 
 
-### 15.2.2 NPC角色设定与Prompt设计
+### 15.2.2 NPC 角色设定与 Prompt 设计
 
-一个好的NPC需要有鲜明的性格和角色设定。在赛博小镇中,我们设计了三个NPC,分别代表不同的职业和性格。
+一个好的 NPC 需要有鲜明的性格和角色设定。在赛博小镇中,我们设计了三个 NPC,分别代表不同的职业和性格。
 
-<strong>张三 - Python工程师</strong>
+<strong>张三 - Python 工程师</strong>
 
-张三是一位资深的Python工程师,负责HelloAgents框架的核心开发。他性格严谨,说话直接,喜欢用技术术语。他对代码质量有很高的要求,经常会分享一些编程技巧和最佳实践。
+张三是一位资深的 Python 工程师,负责 HelloAgents 框架的核心开发。他性格严谨,说话直接,喜欢用技术术语。他对代码质量有很高的要求,经常会分享一些编程技巧和最佳实践。
 
 ```python
 npc_zhang = {
@@ -241,7 +245,7 @@ npc_zhang = {
 
 <strong>李四 - 产品经理</strong>
 
-李四是一位经验丰富的产品经理,负责HelloAgents框架的产品规划和用户体验设计。他性格外向,善于沟通,总是能从用户的角度思考问题。他喜欢讨论产品设计和用户需求,经常会问"为什么"。
+李四是一位经验丰富的产品经理,负责 HelloAgents 框架的产品规划和用户体验设计。他性格外向,善于沟通,总是能从用户的角度思考问题。他喜欢讨论产品设计和用户需求,经常会问"为什么"。
 
 ```python
 npc_li = {
@@ -252,9 +256,9 @@ npc_li = {
 }
 ```
 
-<strong>王五 - UI设计师</strong>
+<strong>王五 - UI 设计师</strong>
 
-王五是一位富有创意的UI设计师,负责HelloAgents框架的界面设计和视觉呈现。他性格温和,审美独特,对色彩和布局有敏锐的感知。他喜欢讨论设计理念和美学,经常会分享一些设计灵感。
+王五是一位富有创意的 UI 设计师,负责 HelloAgents 框架的界面设计和视觉呈现。他性格温和,审美独特,对色彩和布局有敏锐的感知。他喜欢讨论设计理念和美学,经常会分享一些设计灵感。
 
 ```python
 npc_wang = {
@@ -265,17 +269,17 @@ npc_wang = {
 }
 ```
 
-这三个NPC的设定各有特色,玩家可以根据自己的兴趣选择与不同的NPC互动。张三可以教你编程技巧,李四可以和你讨论产品设计,王五可以分享设计灵感。
+这三个 NPC 的设定各有特色,玩家可以根据自己的兴趣选择与不同的 NPC 互动。张三可以教你编程技巧,李四可以和你讨论产品设计,王五可以分享设计灵感。
 
 ### 15.2.3 记忆系统集成
 
-记忆系统是NPC智能的关键。一个能够记住过去对话的NPC,会让玩家感觉更加真实和有趣。我们采用helloagents的`WorkingMemory`和`EpisodicMemory`构造短期记忆和长期记忆。
+记忆系统是 NPC 智能的关键。一个能够记住过去对话的 NPC,会让玩家感觉更加真实和有趣。我们采用 helloagents 的`WorkingMemory`和`EpisodicMemory`构造短期记忆和长期记忆。
 
-短期记忆存储最近的对话内容,容量有限,会随着时间自动清理。它的作用是保持对话的连贯性,让NPC能够理解上下文。比如,当玩家说"它是什么颜色的?"时,NPC需要从短期记忆中找到"它"指的是什么。
+短期记忆存储最近的对话内容,容量有限,会随着时间自动清理。它的作用是保持对话的连贯性,让 NPC 能够理解上下文。比如,当玩家说"它是什么颜色的?"时,NPC 需要从短期记忆中找到"它"指的是什么。
 
-长期记忆存储所有的对话历史,使用向量数据库进行语义检索。当玩家提到某个话题时,NPC可以从长期记忆中检索相关的历史对话,回忆起之前讨论过的内容。比如,当玩家说"还记得我们上次讨论的那个项目吗?",NPC可以从长期记忆中找到相关的对话记录。
+长期记忆存储所有的对话历史,使用向量数据库进行语义检索。当玩家提到某个话题时,NPC 可以从长期记忆中检索相关的历史对话,回忆起之前讨论过的内容。比如,当玩家说"还记得我们上次讨论的那个项目吗?",NPC 可以从长期记忆中找到相关的对话记录。
 
-记忆系统的架构如图15.6所示:
+记忆系统的架构如图 15.6 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-6.png" alt="" width="85%"/>
@@ -283,7 +287,7 @@ npc_wang = {
 </div>
 
 
-在实际使用中,Agent会先从短期记忆中获取最近的对话,然后从长期记忆中检索相关的历史对话,将这些信息一起发送给LLM,生成更加准确和个性化的回复。
+在实际使用中,Agent 会先从短期记忆中获取最近的对话,然后从长期记忆中检索相关的历史对话,将这些信息一起发送给 LLM,生成更加准确和个性化的回复。
 
 ```python
 # Agent处理对话的流程
@@ -312,23 +316,23 @@ def process_dialogue(agent, player_message):
     return reply
 ```
 
-这个流程确保了NPC能够记住与玩家的互动历史,并在对话中体现出来。
+这个流程确保了 NPC 能够记住与玩家的互动历史,并在对话中体现出来。
 
 ### 15.2.4 批量对话生成:轻负载模式
 
-在实际运行中,很快就会发现了一个问题:当多个玩家同时与不同的NPC对话时,后端需要并发处理多个LLM请求。每个请求都需要调用API,这不仅增加了成本,还可能因为并发限制导致请求失败或延迟。
+在实际运行中,很快就会发现了一个问题:当多个玩家同时与不同的 NPC 对话时,后端需要并发处理多个 LLM 请求。每个请求都需要调用 API,这不仅增加了成本,还可能因为并发限制导致请求失败或延迟。
 
-为了解决这个问题,我们设计了一个<strong>批量对话生成系统</strong>。核心思想是:将多个NPC的对话请求合并成一次LLM调用,让LLM一次性生成所有NPC的回复。这就像餐厅的"预制菜"一样,提前批量准备好,需要时直接使用,大大降低了成本和延迟。
+为了解决这个问题,我们设计了一个<strong>批量对话生成系统</strong>。核心思想是:将多个 NPC 的对话请求合并成一次 LLM 调用,让 LLM 一次性生成所有 NPC 的回复。这就像餐厅的"预制菜"一样,提前批量准备好,需要时直接使用,大大降低了成本和延迟。
 
-批量生成的工作流程如图15.7所示:
+批量生成的工作流程如图 15.7 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-7.png" alt="" width="85%"/>
-  <p>图 15.7 批量生成vs传统模式</p>
+  <p>图 15.7 批量生成 vs 传统模式</p>
 </div>
 
 
-批量生成器的实现非常巧妙。我们构建一个特殊的提示词,要求LLM一次性生成所有NPC的对话,并以JSON格式返回。这样,一次API调用就能获得所有NPC的回复,成本降低到原来的1/3,延迟也大幅减少。
+批量生成器的实现非常巧妙。我们构建一个特殊的提示词,要求 LLM 一次性生成所有 NPC 的对话,并以 JSON 格式返回。这样,一次 API 调用就能获得所有 NPC 的回复,成本降低到原来的 1/3,延迟也大幅减少。
 
 ```python
 class NPCBatchGenerator:
@@ -401,24 +405,24 @@ class NPCBatchGenerator:
         return prompt
 ```
 
-这个设计的关键在于提示词的构建。我们明确要求LLM返回JSON格式,并提供了示例输出。LLM会严格按照这个格式生成回复,我们只需要解析JSON就能获得所有NPC的对话。
+这个设计的关键在于提示词的构建。我们明确要求 LLM 返回 JSON 格式,并提供了示例输出。LLM 会严格按照这个格式生成回复,我们只需要解析 JSON 就能获得所有 NPC 的对话。
 
-批量生成还有一个额外的好处:所有NPC的对话是在同一个上下文中生成的,因此它们之间会有一定的关联性。比如,如果张三在调试bug,李四可能会提到要帮忙看看;如果王五在设计界面,张三可能会说等会儿去看看设计稿。这让整个办公室的氛围更加真实和连贯。
+批量生成还有一个额外的好处:所有 NPC 的对话是在同一个上下文中生成的,因此它们之间会有一定的关联性。比如,如果张三在调试 bug,李四可能会提到要帮忙看看;如果王五在设计界面,张三可能会说等会儿去看看设计稿。这让整个办公室的氛围更加真实和连贯。
 
-当然,批量生成也有一些限制。它更适合生成NPC的"背景对话"或"自言自语",而不是与玩家的直接互动。对于玩家发起的对话,我们仍然使用单独的Agent来处理,以保证回复的个性化和准确性。批量生成主要用于以下场景:
+当然,批量生成也有一些限制。它更适合生成 NPC 的"背景对话"或"自言自语",而不是与玩家的直接互动。对于玩家发起的对话,我们仍然使用单独的 Agent 来处理,以保证回复的个性化和准确性。批量生成主要用于以下场景:
 
-1. <strong>NPC背景对话</strong>:玩家进入场景时,NPC正在做什么、说什么
-2. <strong>定时更新</strong>:每隔一段时间更新NPC的状态和对话
+1. <strong>NPC 背景对话</strong>:玩家进入场景时,NPC 正在做什么、说什么
+2. <strong>定时更新</strong>:每隔一段时间更新 NPC 的状态和对话
 3. <strong>场景氛围</strong>:根据时间(早上、中午、晚上)生成不同的对话
-4. <strong>降低成本</strong>:在高并发场景下,使用批量生成降低API调用次数
+4. <strong>降低成本</strong>:在高并发场景下,使用批量生成降低 API 调用次数
 
 <strong>混合模式:批量生成+即时响应</strong>
 
 在实际实现中,我们采用了一种混合模式,将批量生成和即时响应结合起来。这个设计非常巧妙,既保证了效率,又保证了交互的质量。
 
-具体来说,系统会在后台定期运行批量生成,为所有NPC生成当前场景下的"背景对话"。这些对话会被缓存起来,当玩家靠近NPC但还没有发起交互时,NPC会显示这些背景对话,比如"正在调试代码..."、"在看产品文档..."等。这让NPC看起来是"活着的",而不是静止的模型。
+具体来说,系统会在后台定期运行批量生成,为所有 NPC 生成当前场景下的"背景对话"。这些对话会被缓存起来,当玩家靠近 NPC 但还没有发起交互时,NPC 会显示这些背景对话,比如"正在调试代码..."、"在看产品文档..."等。这让 NPC 看起来是"活着的",而不是静止的模型。
 
-但是,当玩家按下E键发起交互时,系统会立即切换到即时响应模式。此时,后端会调用该NPC的专属Agent,根据玩家的具体消息、历史记忆和好感度,生成个性化的回复。这个过程是实时的,确保NPC的回复与玩家的输入高度相关。
+但是,当玩家按下 E 键发起交互时,系统会立即切换到即时响应模式。此时,后端会调用该 NPC 的专属 Agent,根据玩家的具体消息、历史记忆和好感度,生成个性化的回复。这个过程是实时的,确保 NPC 的回复与玩家的输入高度相关。
 
 ```python
 # 在main.py中的混合模式实现
@@ -472,48 +476,48 @@ async def background_dialogue_update():
 
 这种混合模式的优势非常明显:
 
-1. <strong>降低成本</strong>:背景对话使用批量生成,一次调用生成所有NPC的对话,成本低
+1. <strong>降低成本</strong>:背景对话使用批量生成,一次调用生成所有 NPC 的对话,成本低
 2. <strong>保证质量</strong>:玩家交互使用即时响应,每个回复都是个性化的,质量高
-3. <strong>提升体验</strong>:NPC始终有"背景对话",看起来很生动;玩家交互时回复准确,体验好
+3. <strong>提升体验</strong>:NPC 始终有"背景对话",看起来很生动;玩家交互时回复准确,体验好
 4. <strong>灵活调整</strong>:可以根据服务器负载动态调整批量生成的频率
 
-通过批量生成和即时响应的结合,我们实现了一个既高效又智能的NPC系统。在正常情况下,玩家感受不到任何差异,但后端的成本和性能得到了显著优化。这个设计思路也可以应用到其他需要大量AI调用的场景中。
+通过批量生成和即时响应的结合,我们实现了一个既高效又智能的 NPC 系统。在正常情况下,玩家感受不到任何差异,但后端的成本和性能得到了显著优化。这个设计思路也可以应用到其他需要大量 AI 调用的场景中。
 
 
 ## 15.3 好感度系统设计
 
 ### 15.3.1 好感度等级划分
 
-在赛博小镇中,NPC对玩家的态度会随着互动而变化。我们设计了一个五级好感度系统,从陌生到挚友,每个等级都有不同的分数范围和对应的行为表现。
+在赛博小镇中,NPC 对玩家的态度会随着互动而变化。我们设计了一个五级好感度系统,从陌生到挚友,每个等级都有不同的分数范围和对应的行为表现。
 
-好感度系统的核心思想是:通过量化NPC与玩家的关系,让NPC的回复更加真实和有层次感。当玩家刚进入游戏时,所有NPC对玩家都是陌生的态度,回复比较礼貌但疏远。随着对话的进行,如果玩家表现友好,NPC的好感度会逐渐提升,回复也会变得更加亲切和详细。
+好感度系统的核心思想是:通过量化 NPC 与玩家的关系,让 NPC 的回复更加真实和有层次感。当玩家刚进入游戏时,所有 NPC 对玩家都是陌生的态度,回复比较礼貌但疏远。随着对话的进行,如果玩家表现友好,NPC 的好感度会逐渐提升,回复也会变得更加亲切和详细。
 
-我们将好感度分为五个等级,每个等级对应一个分数范围,如图15.8所示:
+我们将好感度分为五个等级,每个等级对应一个分数范围,如图 15.8 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-8.png" alt="" width="85%"/>
   <p>图 15.8 好感度等级划分</p>
 </div>
 
-- <strong>陌生(0-20分)</strong>:NPC刚认识玩家,态度礼貌但保持距离。回复简短,不会主动分享个人信息。
+- <strong>陌生(0-20 分)</strong>:NPC 刚认识玩家,态度礼貌但保持距离。回复简短,不会主动分享个人信息。
 
-- <strong>熟悉(21-40分)</strong>:NPC开始记住玩家,愿意进行简单的交流。回复变得更加自然,偶尔会分享一些工作相关的信息。
+- <strong>熟悉(21-40 分)</strong>:NPC 开始记住玩家,愿意进行简单的交流。回复变得更加自然,偶尔会分享一些工作相关的信息。
 
-- <strong>友好(41-60分)</strong>:NPC把玩家当作朋友,愿意分享更多信息。回复更加详细,会主动询问玩家的情况。
+- <strong>友好(41-60 分)</strong>:NPC 把玩家当作朋友,愿意分享更多信息。回复更加详细,会主动询问玩家的情况。
 
-- <strong>亲密(61-80分)</strong>:NPC非常信任玩家,愿意分享私人话题。回复充满热情,会给玩家提供帮助和建议。
+- <strong>亲密(61-80 分)</strong>:NPC 非常信任玩家,愿意分享私人话题。回复充满热情,会给玩家提供帮助和建议。
 
-- <strong>挚友(81-100分)</strong>:NPC把玩家当作最好的朋友,无话不谈。回复非常亲切,会分享内心的想法和感受。
+- <strong>挚友(81-100 分)</strong>:NPC 把玩家当作最好的朋友,无话不谈。回复非常亲切,会分享内心的想法和感受。
 
-这个设计让玩家能够清晰地感受到与NPC关系的变化,也为后续的游戏玩法提供了基础。比如,只有达到一定好感度,NPC才会分享某些特殊信息或提供特殊任务。
+这个设计让玩家能够清晰地感受到与 NPC 关系的变化,也为后续的游戏玩法提供了基础。比如,只有达到一定好感度,NPC 才会分享某些特殊信息或提供特殊任务。
 
 ### 15.3.2 好感度计算逻辑
 
 好感度的计算需要考虑多个因素。我们不能简单地让每次对话都增加固定的分数,这样会让系统显得机械和不真实。一个好的好感度系统应该能够识别玩家的态度,并根据对话内容动态调整分数。
 
-在赛博小镇中,我们使用LLM来分析对话内容,判断玩家的态度是友好、中立还是不友好。然后根据判断结果调整好感度分数。这个过程是自动的,不需要玩家刻意选择选项,让互动更加自然。
+在赛博小镇中,我们使用 LLM 来分析对话内容,判断玩家的态度是友好、中立还是不友好。然后根据判断结果调整好感度分数。这个过程是自动的,不需要玩家刻意选择选项,让互动更加自然。
 
-好感度计算流程如图15.9所示:
+好感度计算流程如图 15.9 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-9.png" alt="" width="85%"/>
@@ -595,13 +599,13 @@ NPC: {npc_reply}
             return "挚友"
 ```
 
-这个实现使用LLM来分析对话内容,自动判断玩家的态度并调整好感度。这样的设计让好感度系统更加智能和自然,玩家不需要刻意讨好NPC,只需要正常交流即可。
+这个实现使用 LLM 来分析对话内容,自动判断玩家的态度并调整好感度。这样的设计让好感度系统更加智能和自然,玩家不需要刻意讨好 NPC,只需要正常交流即可。
 
 ### 15.3.3 好感度影响对话
 
-好感度不仅仅是一个数字,它应该真正影响NPC的行为。在赛博小镇中,我们通过修改NPC的系统提示词,让NPC根据当前的好感度等级调整回复风格。
+好感度不仅仅是一个数字,它应该真正影响 NPC 的行为。在赛博小镇中,我们通过修改 NPC 的系统提示词,让 NPC 根据当前的好感度等级调整回复风格。
 
-当好感度较低时,NPC会保持礼貌但疏远的态度。当好感度提升后,NPC会变得更加热情和健谈。这种变化是通过动态调整系统提示词实现的。
+当好感度较低时,NPC 会保持礼貌但疏远的态度。当好感度提升后,NPC 会变得更加热情和健谈。这种变化是通过动态调整系统提示词实现的。
 
 ```python
 def create_npc_agent_with_affinity(npc_id: str, name: str, role: str,
@@ -638,16 +642,16 @@ def create_npc_agent_with_affinity(npc_id: str, name: str, role: str,
     return agent
 ```
 
-这个设计让NPC的行为随着好感度动态变化。玩家可以明显感受到,随着互动的增加,NPC对自己的态度在逐渐改变,这大大增强了游戏的沉浸感和趣味性。
+这个设计让 NPC 的行为随着好感度动态变化。玩家可以明显感受到,随着互动的增加,NPC 对自己的态度在逐渐改变,这大大增强了游戏的沉浸感和趣味性。
 
 
 ## 15.4 后端服务实现
 
-### 15.4.1 FastAPI应用结构
+### 15.4.1 FastAPI 应用结构
 
-赛博小镇的后端使用FastAPI框架构建,负责处理Godot前端的请求,调用HelloAgents的NPC Agent,管理NPC状态和好感度,以及记录日志。一个清晰的应用结构能够让代码更易于维护和扩展。
+赛博小镇的后端使用 FastAPI 框架构建,负责处理 Godot 前端的请求,调用 HelloAgents  NPC Agent,管理 NPC 状态和好感度,以及记录日志。一个清晰的应用结构能够让代码更易于维护和扩展。
 
-我们的FastAPI应用采用模块化设计,将不同的功能分离到不同的文件中,如图15.10所示:
+我们的 FastAPI 应用采用模块化设计,将不同的功能分离到不同的文件中,如图 15.10 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-10.png" alt="" width="85%"/>
@@ -655,7 +659,7 @@ def create_npc_agent_with_affinity(npc_id: str, name: str, role: str,
 </div>
 
 
-让我们从`main.py`开始,这是FastAPI应用的入口文件:
+让我们从`main.py`开始,这是 FastAPI 应用的入口文件:
 
 ```python
 from fastapi import FastAPI, HTTPException
@@ -726,15 +730,15 @@ if __name__ == "__main__":
     )
 ```
 
-这个主程序文件定义了FastAPI应用的基本结构,配置了CORS中间件以允许跨域请求,并在启动时初始化各个管理器。接下来我们将实现具体的API路由。
+这个主程序文件定义了 FastAPI 应用的基本结构,配置了 CORS 中间件以允许跨域请求,并在启动时初始化各个管理器。接下来我们将实现具体的 API 路由。
 
-### 15.4.2 API路由设计
+### 15.4.2 API 路由设计
 
-赛博小镇的后端需要提供几个核心API端点,用于处理Godot前端的请求。我们将这些路由添加到`main.py`中。
+赛博小镇的后端需要提供几个核心 API 端点,用于处理 Godot 前端的请求。我们将这些路由添加到`main.py`中。
 
-<strong>获取NPC状态</strong>
+<strong>获取 NPC 状态</strong>
 
-这个API返回所有NPC的当前状态,包括位置、是否忙碌等信息:
+这个 API 返回所有 NPC 的当前状态,包括位置、是否忙碌等信息:
 
 ```python
 from models import NPCStatusResponse
@@ -756,7 +760,7 @@ async def get_single_npc_status(npc_id: str):
 
 <strong>对话接口</strong>
 
-这是最核心的API,处理玩家与NPC的对话:
+这是最核心的 API,处理玩家与 NPC 的对话:
 
 ```python
 from models import DialogueRequest, DialogueResponse
@@ -821,7 +825,7 @@ async def dialogue(request: DialogueRequest):
 
 <strong>好感度查询</strong>
 
-这个API允许查询玩家与NPC的好感度:
+这个 API 允许查询玩家与 NPC 的好感度:
 
 ```python
 from models import AffinityInfo
@@ -836,11 +840,11 @@ async def get_affinity(npc_id: str, player_name: str):
     return affinity
 ```
 
-API路由的调用流程如图15.11所示:
+API 路由的调用流程如图 15.11 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-11.png" alt="" width="85%"/>
-  <p>图 15.11 API调用流程</p>
+  <p>图 15.11 API 调用流程</p>
 </div>
 
 
@@ -848,7 +852,7 @@ API路由的调用流程如图15.11所示:
 
 <strong>状态管理器</strong>
 
-状态管理器负责跟踪每个NPC的当前状态,包括位置、是否忙碌、当前动作等。这对于防止并发问题很重要,比如避免一个NPC同时与多个玩家对话。
+状态管理器负责跟踪每个 NPC 的当前状态,包括位置、是否忙碌、当前动作等。这对于防止并发问题很重要,比如避免一个 NPC 同时与多个玩家对话。
 
 ```python
 # state_manager.py
@@ -987,21 +991,21 @@ NPC回复: {npc_reply}
 
 这个日志系统会在控制台实时显示对话内容,同时保存到文件中。每天的日志会保存在单独的文件中,方便后续分析。
 
-### 15.4.4 理解Godot的场景系统
+### 15.4.4 理解 Godot 的场景系统
 
-在开始构建游戏场景之前,我们需要先理解Godot的核心概念——场景(Scene)和节点(Node)。这是Godot与其他游戏引擎最大的不同之处,也是它最强大的特性之一。
+在开始构建游戏场景之前,我们需要先理解 Godot 的核心概念——场景(Scene)和节点(Node)。这是 Godot 与其他游戏引擎最大的不同之处,也是它最强大的特性之一。
 
 <strong>什么是节点?</strong>
 
-节点是Godot中最基本的构建块。你可以把节点想象成乐高积木,每个节点都有特定的功能。比如,Sprite2D节点用于显示图片,AudioStreamPlayer节点用于播放音频,CharacterBody2D节点用于处理角色的物理移动。Godot提供了上百种不同类型的节点,每种节点都专注于做好一件事。
+节点是 Godot 中最基本的构建块。你可以把节点想象成乐高积木,每个节点都有特定的功能。比如,Sprite2D 节点用于显示图片,AudioStreamPlayer 节点用于播放音频,CharacterBody2D 节点用于处理角色的物理移动。Godot 提供了上百种不同类型的节点,每种节点都专注于做好一件事。
 
 节点之间可以形成父子关系,构成一个树状结构。父节点可以影响子节点,比如移动父节点会同时移动所有子节点,隐藏父节点会同时隐藏所有子节点。这种层级关系让我们可以轻松地组织和管理复杂的游戏对象。
 
 <strong>什么是场景?</strong>
 
-场景是一组节点的集合,保存在一个.tscn文件中。你可以把场景理解为一个"预制件"。比如,我们可以创建一个"玩家"场景,包含角色的精灵、碰撞体、音效等所有相关节点。然后在游戏中多次使用这个场景,每次使用都会创建一个独立的实例。
+场景是一组节点的集合,保存在一个.tscn 文件中。你可以把场景理解为一个"预制件"。比如,我们可以创建一个"玩家"场景,包含角色的精灵、碰撞体、音效等所有相关节点。然后在游戏中多次使用这个场景,每次使用都会创建一个独立的实例。
 
-场景的强大之处在于它的可复用性和模块化。我们可以在一个场景中实例化另一个场景,形成嵌套结构。比如,主场景可以包含玩家场景、多个NPC场景和UI场景。修改NPC场景会自动影响所有NPC实例,这大大简化了开发和维护。
+场景的强大之处在于它的可复用性和模块化。我们可以在一个场景中实例化另一个场景,形成嵌套结构。比如,主场景可以包含玩家场景、多个 NPC 场景和 UI 场景。修改 NPC 场景会自动影响所有 NPC 实例,这大大简化了开发和维护。
 
 <strong>一个简单的例子</strong>
 
@@ -1014,39 +1018,39 @@ Player (CharacterBody2D)  ← 根节点,负责物理移动
 └─ Camera2D               ← 子节点,摄像机跟随玩家
 ```
 
-这个场景包含4个节点,形成树状结构。CharacterBody2D是根节点,其他三个是它的子节点。我们可以给每个节点添加脚本来控制它的行为,也可以给根节点添加脚本来协调所有子节点。
+这个场景包含 4 个节点,形成树状结构。CharacterBody2D 是根节点,其他三个是它的子节点。我们可以给每个节点添加脚本来控制它的行为,也可以给根节点添加脚本来协调所有子节点。
 
-当我们在主场景中实例化这个Player场景时,Godot会创建这整个节点树的一个副本。我们可以创建多个玩家实例,每个实例都是独立的,有自己的位置、状态和行为。
+当我们在主场景中实例化这个 Player 场景时,Godot 会创建这整个节点树的一个副本。我们可以创建多个玩家实例,每个实例都是独立的,有自己的位置、状态和行为。
 
 <strong>场景实例化的优势</strong>
 
-在赛博小镇中,我们有三个NPC:张三、李四和王五。如果不使用场景系统,我们需要为每个NPC分别创建节点、设置属性、编写脚本,这会导致大量重复工作。而使用场景系统,我们只需要创建一个通用的NPC场景,然后实例化三次,通过脚本参数设置不同的名称和角色信息即可。
+在赛博小镇中,我们有三个 NPC:张三、李四和王五。如果不使用场景系统,我们需要为每个 NPC 分别创建节点、设置属性、编写脚本,这会导致大量重复工作。而使用场景系统,我们只需要创建一个通用的 NPC 场景,然后实例化三次,通过脚本参数设置不同的名称和角色信息即可。
 
-这种设计的好处是:如果我们想给所有NPC添加一个新功能(比如头顶显示对话气泡),只需要修改NPC场景,所有实例都会自动获得这个功能。
+这种设计的好处是:如果我们想给所有 NPC 添加一个新功能(比如头顶显示对话气泡),只需要修改 NPC 场景,所有实例都会自动获得这个功能。
 
-## 15.5 Godot游戏场景构建
+## 15.5 Godot 游戏场景构建
 
-<strong>为什么选择Godot作为游戏引擎?</strong>
+<strong>为什么选择 Godot 作为游戏引擎?</strong>
 
-在众多游戏引擎中,我们选择Godot 4.5作为前端引擎,主要基于以下几个考虑:
+在众多游戏引擎中,我们选择 Godot 4.5 作为前端引擎,主要基于以下几个考虑:
 
-(1)Godot在2D游戏开发上有着天然的优势</strong>。赛博小镇是一个俯视角的2D像素风格游戏,Godot的2D引擎非常成熟,提供了TileMap、AnimatedSprite2D、CharacterBody2D等专门为2D游戏设计的节点类型,开发效率远高于Unity等引擎。Godot的场景系统(Scene System)让我们可以将玩家、NPC、UI等元素封装成独立的场景,然后在主场景中实例化,这种组件化的设计非常适合我们的需求。
+(1)Godot  2D 游戏开发上有着天然的优势</strong>。赛博小镇是一个俯视角的 2D 像素风格游戏,Godot  2D 引擎非常成熟,提供了 TileMap、AnimatedSprite2D、CharacterBody2D 等专门为 2D 游戏设计的节点类型,开发效率远高于 Unity 等引擎。Godot 的场景系统(Scene System)让我们可以将玩家、NPC、UI 等元素封装成独立的场景,然后在主场景中实例化,这种组件化的设计非常适合我们的需求。
 
-(2)<strong>Godot是完全开源且免费的</strong>。Godot使用MIT许可证,没有任何版权费用或收入分成,这对于教学项目和开源项目非常友好。你可以自由地修改引擎源码,也可以将游戏商业化而不用担心授权问题。相比之下,Unity虽然功能强大,但在2024年引入了运行时费用政策,引发了开发者社区的广泛争议。
+(2)<strong>Godot 是完全开源且免费的</strong>。Godot 使用 MIT 许可证,没有任何版权费用或收入分成,这对于教学项目和开源项目非常友好。你可以自由地修改引擎源码,也可以将游戏商业化而不用担心授权问题。相比之下,Unity 虽然功能强大,但在 2024 年引入了运行时费用政策,引发了开发者社区的广泛争议。
 
-(3)<strong>Godot的学习成本极低</strong>。Godot使用GDScript作为主要脚本语言,这是一种类似Python的动态类型语言,语法简洁易懂,学习曲线非常平缓。对于已经熟悉Python的读者来说,学习GDScript几乎没有门槛——变量声明、函数定义、控制流程等语法都与Python高度相似,你甚至可以在几小时内就上手编写游戏脚本。Godot的节点树结构也非常直观,你可以在编辑器中直观地看到场景的层级关系,这对于初学者非常友好。
+(3)<strong>Godot 的学习成本极低</strong>。Godot 使用 GDScript 作为主要脚本语言,这是一种类似 Python 的动态类型语言,语法简洁易懂,学习曲线非常平缓。对于已经熟悉 Python 的读者来说,学习 GDScript 几乎没有门槛——变量声明、函数定义、控制流程等语法都与 Python 高度相似,你甚至可以在几小时内就上手编写游戏脚本。Godot 的节点树结构也非常直观,你可以在编辑器中直观地看到场景的层级关系,这对于初学者非常友好。
 
-(4)<strong>Godot与Python后端的集成非常简单</strong>。Godot内置了HTTPRequest节点,可以轻松地与FastAPI后端进行HTTP通信。我们只需要创建一个API客户端脚本,封装所有的API调用,就可以在游戏中调用后端的AI能力。这种前后端分离的架构让我们可以独立开发和测试游戏逻辑和AI逻辑,大大提高了开发效率。
+(4)<strong>Godot  Python 后端的集成非常简单</strong>。Godot 内置了 HTTPRequest 节点,可以轻松地与 FastAPI 后端进行 HTTP 通信。我们只需要创建一个 API 客户端脚本,封装所有的 API 调用,就可以在游戏中调用后端的 AI 能力。这种前后端分离的架构让我们可以独立开发和测试游戏逻辑和 AI 逻辑,大大提高了开发效率。
 
-当然,Godot也有一些局限性。比如,Godot的3D能力相比Unreal Engine和Unity还有差距,如果你要开发大型3D游戏,可能需要考虑其他引擎。但对于2D游戏、独立游戏和教学项目,Godot是一个非常优秀的选择。
+当然,Godot 也有一些局限性。比如,Godot  3D 能力相比 Unreal Engine  Unity 还有差距,如果你要开发大型 3D 游戏,可能需要考虑其他引擎。但对于 2D 游戏、独立游戏和教学项目,Godot 是一个非常优秀的选择。
 
 ### 15.5.1 场景设计与资源组织
 
-理解了Godot的场景系统后,我们来看看赛博小镇的场景设计。整个游戏由四个核心场景组成:Main(主场景)、Player(玩家)、NPC(非玩家角色)和DialogueUI(对话界面)。每个场景都是一个独立的模块,可以单独编辑和测试,然后组合在一起形成完整的游戏。
+理解了 Godot 的场景系统后,我们来看看赛博小镇的场景设计。整个游戏由四个核心场景组成:Main(主场景)、Player(玩家)、NPC(非玩家角色)和 DialogueUI(对话界面)。每个场景都是一个独立的模块,可以单独编辑和测试,然后组合在一起形成完整的游戏。
 
-赛博小镇的场景组织采用了模块化设计。我们首先创建三个基础场景:Player(玩家)、NPC(非玩家角色)和DialogueUI(对话界面)。然后在Main(主场景)中将这些场景实例化并组合起来。特别值得注意的是,三个NPC(张三、李四、王五)都是同一个NPC场景的实例,只是通过脚本参数设置了不同的角色信息。
+赛博小镇的场景组织采用了模块化设计。我们首先创建三个基础场景:Player(玩家)、NPC(非玩家角色)和 DialogueUI(对话界面)。然后在 Main(主场景)中将这些场景实例化并组合起来。特别值得注意的是,三个 NPC(张三、李四、王五)都是同一个 NPC 场景的实例,只是通过脚本参数设置了不同的角色信息。
 
-让我们先看看四个核心场景的结构,如图15.12所示:
+让我们先看看四个核心场景的结构,如图 15.12 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-12.png" alt="" width="85%"/>
@@ -1054,27 +1058,27 @@ Player (CharacterBody2D)  ← 根节点,负责物理移动
 </div>
 
 
-这个图展示了四个独立的场景及其内部结构。<strong>场景1(Main)</strong>是主场景,它包含了背景图片(Sprite2D)、玩家实例、NPCs组织节点(下面有三个NPC实例)、对话界面实例、墙体组织节点和背景音乐。注意,这里的Player、NPC_Zhang、NPC_Li、NPC_Wang和DialogueUI都是场景实例,不是普通节点。<strong>场景2(Player)</strong>定义了玩家角色的结构,包含动画、碰撞、摄像机和两个音效节点。<strong>场景3(NPC)</strong>是一个通用模板,张三、李四、王五都是这个场景的实例,包含碰撞、动画、交互区域和两个标签。<strong>场景4(DialogueUI)</strong>是一个CanvasLayer节点,包含Panel和各种UI元素。
+这个图展示了四个独立的场景及其内部结构。<strong>场景 1(Main)</strong>是主场景,它包含了背景图片(Sprite2D)、玩家实例、NPCs 组织节点(下面有三个 NPC 实例)、对话界面实例、墙体组织节点和背景音乐。注意,这里的 Player、NPC_Zhang、NPC_Li、NPC_Wang  DialogueUI 都是场景实例,不是普通节点。<strong>场景 2(Player)</strong>定义了玩家角色的结构,包含动画、碰撞、摄像机和两个音效节点。<strong>场景 3(NPC)</strong>是一个通用模板,张三、李四、王五都是这个场景的实例,包含碰撞、动画、交互区域和两个标签。<strong>场景 4(DialogueUI)</strong>是一个 CanvasLayer 节点,包含 Panel 和各种 UI 元素。
 
-场景实例化的过程可以这样理解:我们在Godot编辑器中创建了NPC.tscn这个场景文件,定义了NPC的节点结构。然后在Main场景中,我们三次"实例化"这个NPC场景,创建了三个独立的副本,分别命名为NPC_Zhang、NPC_Li和NPC_Wang。每个副本都有自己的位置和状态,但它们共享相同的节点结构。如果我们修改NPC.tscn,比如给NPC添加一个新的音效节点,那么所有三个实例都会自动获得这个音效。
+场景实例化的过程可以这样理解:我们在 Godot 编辑器中创建了 NPC.tscn 这个场景文件,定义了 NPC 的节点结构。然后在 Main 场景中,我们三次"实例化"这个 NPC 场景,创建了三个独立的副本,分别命名为 NPC_Zhang、NPC_Li  NPC_Wang。每个副本都有自己的位置和状态,但它们共享相同的节点结构。如果我们修改 NPC.tscn,比如给 NPC 添加一个新的音效节点,那么所有三个实例都会自动获得这个音效。
 
-在Godot中创建这些场景的步骤如下:
+在 Godot 中创建这些场景的步骤如下:
 
-1. <strong>创建Player场景</strong>:新建场景,选择CharacterBody2D作为根节点,添加AnimatedSprite2D、CollisionShape2D、Camera2D、InteractSound和RunningSound子节点,保存为Player.tscn。
+1. <strong>创建 Player 场景</strong>:新建场景,选择 CharacterBody2D 作为根节点,添加 AnimatedSprite2D、CollisionShape2D、Camera2D、InteractSound  RunningSound 子节点,保存为 Player.tscn。
 
-2. <strong>创建NPC场景</strong>:新建场景,选择CharacterBody2D作为根节点,添加CollisionShape2D、AnimatedSprite2D、InteractionArea(Area2D,下面有CollisionShape2D)、NameLabel和DialogueLabel子节点,保存为NPC.tscn。
+2. <strong>创建 NPC 场景</strong>:新建场景,选择 CharacterBody2D 作为根节点,添加 CollisionShape2D、AnimatedSprite2D、InteractionArea(Area2D,下面有 CollisionShape2D)、NameLabel  DialogueLabel 子节点,保存为 NPC.tscn。
 
-3. <strong>创建DialogueUI场景</strong>:新建场景,选择CanvasLayer作为根节点,添加Panel子节点,在Panel下添加NPCName、NPCTitle、DialogueText(RichTextLabel)、PlayerInput(LineEdit)、SendButton和CloseButton,保存为DialogueUI.tscn。
+3. <strong>创建 DialogueUI 场景</strong>:新建场景,选择 CanvasLayer 作为根节点,添加 Panel 子节点,在 Panel 下添加 NPCName、NPCTitle、DialogueText(RichTextLabel)、PlayerInput(LineEdit)、SendButton  CloseButton,保存为 DialogueUI.tscn。
 
-4. <strong>创建Main场景</strong>:新建场景,选择Node2D作为根节点,添加Background(Sprite2D)作为背景图,在Background下添加小鲸鱼装饰,然后实例化Player场景,创建NPCs节点并在其下三次实例化NPC场景,实例化DialogueUI场景,创建Walls节点用于组织墙体碰撞,最后添加AudioStreamPlayer播放背景音乐。
+4. <strong>创建 Main 场景</strong>:新建场景,选择 Node2D 作为根节点,添加 Background(Sprite2D)作为背景图,在 Background 下添加小鲸鱼装饰,然后实例化 Player 场景,创建 NPCs 节点并在其下三次实例化 NPC 场景,实例化 DialogueUI 场景,创建 Walls 节点用于组织墙体碰撞,最后添加 AudioStreamPlayer 播放背景音乐。
 
-这种场景组织方式的优势在于:每个场景都是独立的,可以单独测试;NPC使用同一个场景的实例,修改一次就能影响所有NPC;场景之间通过信号通信,耦合度低,易于维护和扩展。
+这种场景组织方式的优势在于:每个场景都是独立的,可以单独测试;NPC 使用同一个场景的实例,修改一次就能影响所有 NPC;场景之间通过信号通信,耦合度低,易于维护和扩展。
 
 ### 15.5.2 玩家控制实现
 
-玩家角色是游戏中最重要的元素之一。我们需要实现WASD移动控制、动画切换、碰撞检测、与NPC的交互,以及音效系统。
+玩家角色是游戏中最重要的元素之一。我们需要实现 WASD 移动控制、动画切换、碰撞检测、与 NPC 的交互,以及音效系统。
 
-玩家场景的结构包括:一个CharacterBody2D作为根节点,负责物理移动和碰撞;一个AnimatedSprite2D显示角色动画;一个CollisionShape2D定义碰撞形状;一个Camera2D跟随玩家;两个AudioStreamPlayer分别播放交互音效和走路音效。
+玩家场景的结构包括:一个 CharacterBody2D 作为根节点,负责物理移动和碰撞;一个 AnimatedSprite2D 显示角色动画;一个 CollisionShape2D 定义碰撞形状;一个 Camera2D 跟随玩家;两个 AudioStreamPlayer 分别播放交互音效和走路音效。
 
 玩家控制脚本`player.gd`实现了移动、交互和音效逻辑:
 
@@ -1239,15 +1243,15 @@ func stop_running_sound():
         is_playing_running_sound = false
 ```
 
-这个脚本实现了完整的玩家控制。玩家使用WASD键(或方向键)移动,角色会根据移动方向播放相应的4方向动画(walk_up/down/left/right)。当玩家靠近NPC时,NPC会调用`set_nearby_npc()`设置自己为可交互对象,玩家按E键即可触发交互。交互时会播放音效,并通过`call_group()`通知对话系统开始对话。对话期间,`set_interacting(true)`会禁用玩家移动,对话结束后恢复移动。走路音效会在玩家移动时自动播放,停止时自动停止。
+这个脚本实现了完整的玩家控制。玩家使用 WASD 键(或方向键)移动,角色会根据移动方向播放相应的 4 方向动画(walk_up/down/left/right)。当玩家靠近 NPC 时,NPC 会调用`set_nearby_npc()`设置自己为可交互对象,玩家按 E 键即可触发交互。交互时会播放音效,并通过`call_group()`通知对话系统开始对话。对话期间,`set_interacting(true)`会禁用玩家移动,对话结束后恢复移动。走路音效会在玩家移动时自动播放,停止时自动停止。
 
-### 15.5.3 NPC行为与交互
+### 15.5.3 NPC 行为与交互
 
-NPC需要实现三个核心功能:在场景中随机巡逻游走、响应玩家的交互、显示对话气泡。我们使用Area2D来检测玩家是否靠近NPC,当玩家进入交互范围时通知玩家,玩家按E键即可开始对话。
+NPC 需要实现三个核心功能:在场景中随机巡逻游走、响应玩家的交互、显示对话气泡。我们使用 Area2D 来检测玩家是否靠近 NPC,当玩家进入交互范围时通知玩家,玩家按 E 键即可开始对话。
 
-NPC场景的结构包括:CharacterBody2D作为根节点;CollisionShape2D定义NPC的碰撞形状;AnimatedSprite2D显示NPC动画;InteractionArea(Area2D)检测玩家进入交互范围,下面有CollisionShape2D定义交互范围;NameLabel显示NPC名字;DialogueLabel显示对话气泡。
+NPC 场景的结构包括:CharacterBody2D 作为根节点;CollisionShape2D 定义 NPC 的碰撞形状;AnimatedSprite2D 显示 NPC 动画;InteractionArea(Area2D)检测玩家进入交互范围,下面有 CollisionShape2D 定义交互范围;NameLabel 显示 NPC 名字;DialogueLabel 显示对话气泡。
 
-NPC脚本`npc.gd`实现了巡逻、交互和对话气泡逻辑:
+NPC 脚本`npc.gd`实现了巡逻、交互和对话气泡逻辑:
 
 ```python
 extends CharacterBody2D
@@ -1443,16 +1447,16 @@ func set_interacting(interacting: bool):
     is_interacting = interacting
 ```
 
-这个脚本实现了NPC的完整行为。NPC会在出生位置附近的`wander_range`范围内随机巡逻,每隔`wander_interval_min`到`wander_interval_max`秒选择一个新的目标点并移动过去。移动时会播放4方向动画(walk_up/down/left/right),到达目标后停止并播放idle动画。当玩家进入InteractionArea时,NPC会调用玩家的`set_nearby_npc(self)`方法,将自己设置为可交互对象。玩家按E键后,对话系统会调用NPC的`set_interacting(true)`方法,NPC停止移动。对话结束后调用`set_interacting(false)`,NPC恢复巡逻。主场景会定时调用`update_dialogue()`方法更新NPC的对话气泡,显示NPC之间的自主对话内容。
+这个脚本实现了 NPC 的完整行为。NPC 会在出生位置附近的`wander_range`范围内随机巡逻,每隔`wander_interval_min`到`wander_interval_max`秒选择一个新的目标点并移动过去。移动时会播放 4 方向动画(walk_up/down/left/right),到达目标后停止并播放 idle 动画。当玩家进入 InteractionArea 时,NPC 会调用玩家的`set_nearby_npc(self)`方法,将自己设置为可交互对象。玩家按 E 键后,对话系统会调用 NPC 的`set_interacting(true)`方法,NPC 停止移动。对话结束后调用`set_interacting(false)`,NPC 恢复巡逻。主场景会定时调用`update_dialogue()`方法更新 NPC 的对话气泡,显示 NPC 之间的自主对话内容。
 
 
 ## 15.6 前后端通信实现
 
-### 15.6.1 API客户端封装
+### 15.6.1 API 客户端封装
 
-Godot前端需要与FastAPI后端进行HTTP通信。我们创建一个API客户端脚本`api_client.gd`,封装所有的API调用,并将其设置为AutoLoad(自动加载)单例,让其他脚本可以方便地使用。
+Godot 前端需要与 FastAPI 后端进行 HTTP 通信。我们创建一个 API 客户端脚本`api_client.gd`,封装所有的 API 调用,并将其设置为 AutoLoad(自动加载)单例,让其他脚本可以方便地使用。
 
-API客户端使用Godot的HTTPRequest节点来发送HTTP请求。HTTPRequest是一个异步节点,发送请求后不会阻塞游戏,而是通过信号通知请求完成。这样可以保证游戏的流畅性,即使网络延迟较高也不会卡顿。我们使用信号机制来通知其他脚本API响应,而不是使用await,这样可以让多个脚本同时监听同一个API响应。
+API 客户端使用 Godot  HTTPRequest 节点来发送 HTTP 请求。HTTPRequest 是一个异步节点,发送请求后不会阻塞游戏,而是通过信号通知请求完成。这样可以保证游戏的流畅性,即使网络延迟较高也不会卡顿。我们使用信号机制来通知其他脚本 API 响应,而不是使用 await,这样可以让多个脚本同时监听同一个 API 响应。
 
 ```python
 # api_client.gd
@@ -1593,23 +1597,23 @@ func _on_npcs_request_completed(_result: int, response_code: int, _headers: Pack
         npc_list_received.emit(npcs)
 ```
 
-这个API客户端封装了三个核心功能:发送对话请求(`send_chat`)、获取NPC状态(`get_npc_status`)和获取NPC列表(`get_npc_list`)。所有的HTTP请求都是异步的,通过信号通知响应结果。我们为每个API创建了独立的HTTPRequest节点,这样可以同时发送多个请求而不会互相干扰。API的URL从Config单例中获取,方便统一管理。对话系统监听`chat_response_received`信号来接收NPC回复,主场景监听`npc_status_received`信号来更新NPC对话气泡。
+这个 API 客户端封装了三个核心功能:发送对话请求(`send_chat`)、获取 NPC 状态(`get_npc_status`)和获取 NPC 列表(`get_npc_list`)。所有的 HTTP 请求都是异步的,通过信号通知响应结果。我们为每个 API 创建了独立的 HTTPRequest 节点,这样可以同时发送多个请求而不会互相干扰。API  URL  Config 单例中获取,方便统一管理。对话系统监听`chat_response_received`信号来接收 NPC 回复,主场景监听`npc_status_received`信号来更新 NPC 对话气泡。
 
-### 15.6.2 对话UI实现
+### 15.6.2 对话 UI 实现
 
-对话UI是玩家与NPC交互的界面。我们需要设计一个简洁美观的对话框,包含NPC名称、职位、对话内容显示、输入框和按钮。
+对话 UI 是玩家与 NPC 交互的界面。我们需要设计一个简洁美观的对话框,包含 NPC 名称、职位、对话内容显示、输入框和按钮。
 
-对话UI的结构如图15.13所示:
+对话 UI 的结构如图 15.13 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-13.png" alt="" width="85%"/>
-  <p>图 15.13 对话UI结构</p>
+  <p>图 15.13 对话 UI 结构</p>
 </div>
 
 
-对话UI的设计非常简洁。DialogueUI是一个CanvasLayer节点,这意味着它会始终显示在游戏画面的最上层,不会被其他游戏对象遮挡。Panel是对话框的背景,锚定在屏幕底部。Panel下直接放置了6个UI元素:NPCName显示NPC的名字,NPCTitle显示职位,DialogueText使用RichTextLabel显示对话内容(支持富文本格式),PlayerInput是一个LineEdit用于玩家输入,SendButton和CloseButton分别用于发送消息和关闭对话框。
+对话 UI 的设计非常简洁。DialogueUI 是一个 CanvasLayer 节点,这意味着它会始终显示在游戏画面的最上层,不会被其他游戏对象遮挡。Panel 是对话框的背景,锚定在屏幕底部。Panel 下直接放置了 6  UI 元素:NPCName 显示 NPC 的名字,NPCTitle 显示职位,DialogueText 使用 RichTextLabel 显示对话内容(支持富文本格式),PlayerInput 是一个 LineEdit 用于玩家输入,SendButton  CloseButton 分别用于发送消息和关闭对话框。
 
-对话UI脚本`dialogue_ui.gd`实现了对话界面的逻辑:
+对话 UI 脚本`dialogue_ui.gd`实现了对话界面的逻辑:
 
 ```python
 # dialogue_ui.gd
@@ -1739,11 +1743,11 @@ func get_npc_title(npc_name: String) -> String:
     return titles.get(npc_name, "")
 ```
 
-这个对话UI实现了完整的对话功能。玩家可以输入消息并发送,UI使用RichTextLabel的append_text方法显示对话内容,支持富文本格式(颜色、粗体等)。所有的API调用都是异步的,在等待响应时会禁用输入框,防止重复发送。对话框显示时会通知玩家进入交互状态,禁用移动,关闭时恢复移动。
+这个对话 UI 实现了完整的对话功能。玩家可以输入消息并发送,UI 使用 RichTextLabel  append_text 方法显示对话内容,支持富文本格式(颜色、粗体等)。所有的 API 调用都是异步的,在等待响应时会禁用输入框,防止重复发送。对话框显示时会通知玩家进入交互状态,禁用移动,关闭时恢复移动。
 
 ### 15.6.3 主场景整合
 
-最后,我们需要在主场景中整合所有的功能:玩家控制、NPC交互、对话UI和NPC状态更新。主场景脚本`main.gd`负责协调这些组件,并定时从后端获取NPC状态,更新NPC的对话气泡。
+最后,我们需要在主场景中整合所有的功能:玩家控制、NPC 交互、对话 UI  NPC 状态更新。主场景脚本`main.gd`负责协调这些组件,并定时从后端获取 NPC 状态,更新 NPC 的对话气泡。
 
 ```python
 # main.gd
@@ -1809,9 +1813,9 @@ func get_npc_node(npc_name: String) -> Node2D:
             return null
 ```
 
-主场景脚本的核心功能是定时从后端获取NPC状态。在`_ready()`中,我们获取APIClient单例的引用,并连接`npc_status_received`信号。然后立即调用`get_npc_status()`获取一次NPC状态。在`_process()`中,我们使用计时器每隔`Config.NPC_STATUS_UPDATE_INTERVAL`秒(默认30秒)调用一次`get_npc_status()`。当收到NPC状态更新时,`_on_npc_status_received()`回调函数会遍历所有NPC,调用它们的`update_dialogue()`方法更新对话气泡。这样,即使玩家不与NPC交互,也能看到NPC之间的自主对话。
+主场景脚本的核心功能是定时从后端获取 NPC 状态。在`_ready()`中,我们获取 APIClient 单例的引用,并连接`npc_status_received`信号。然后立即调用`get_npc_status()`获取一次 NPC 状态。在`_process()`中,我们使用计时器每隔`Config.NPC_STATUS_UPDATE_INTERVAL`秒(默认 30 秒)调用一次`get_npc_status()`。当收到 NPC 状态更新时,`_on_npc_status_received()`回调函数会遍历所有 NPC,调用它们的`update_dialogue()`方法更新对话气泡。这样,即使玩家不与 NPC 交互,也能看到 NPC 之间的自主对话。
 
-整个前后端通信流程如图15.14所示:
+整个前后端通信流程如图 15.14 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-14.png" alt="" width="85%"/>
@@ -1819,36 +1823,36 @@ func get_npc_node(npc_name: String) -> Node2D:
 </div>
 
 
-至此,前后端通信的所有功能都已实现。玩家可以在游戏中自由移动,与NPC互动,进行自然语言对话。同时,主场景会定时从后端获取NPC状态,更新NPC的对话气泡,展示NPC之间的自主对话。整个系统使用信号机制进行通信,各个组件之间松耦合,易于维护和扩展。
+至此,前后端通信的所有功能都已实现。玩家可以在游戏中自由移动,与 NPC 互动,进行自然语言对话。同时,主场景会定时从后端获取 NPC 状态,更新 NPC 的对话气泡,展示 NPC 之间的自主对话。整个系统使用信号机制进行通信,各个组件之间松耦合,易于维护和扩展。
 
 
 ## 15.7 总结与展望
 
 ### 15.7.1 本章回顾
 
-在本章中,我们完成了一个完整的AI小镇项目——赛博小镇。这个项目将HelloAgents框架与Godot游戏引擎结合,创造出了一个充满生命力的虚拟世界。让我们回顾一下我们学到的核心内容。
+在本章中,我们完成了一个完整的 AI 小镇项目——赛博小镇。这个项目将 HelloAgents 框架与 Godot 游戏引擎结合,创造出了一个充满生命力的虚拟世界。让我们回顾一下我们学到的核心内容。
 
 <strong>技术架构设计</strong>
 
-我们采用了游戏引擎+后端服务的分离架构,将前端渲染、后端逻辑和AI智能分离到不同的层次。Godot负责游戏画面和玩家交互,FastAPI负责API服务和状态管理,HelloAgents负责NPC智能和记忆系统。这种分层设计让每个部分都可以独立开发和测试,也为后续的扩展提供了良好的基础。
+我们采用了游戏引擎+后端服务的分离架构,将前端渲染、后端逻辑和 AI 智能分离到不同的层次。Godot 负责游戏画面和玩家交互,FastAPI 负责 API 服务和状态管理,HelloAgents 负责 NPC 智能和记忆系统。这种分层设计让每个部分都可以独立开发和测试,也为后续的扩展提供了良好的基础。
 
-<strong>NPC智能体系统</strong>
+<strong>NPC 智能体系统</strong>
 
-我们使用HelloAgents的SimpleAgent为每个NPC创建了独立的智能体。每个NPC都有自己的角色设定、性格特点和记忆系统。通过精心设计的系统提示词,我们让张三成为了一位严谨的Python工程师,李四成为了一位善于沟通的产品经理,王五成为了一位富有创意的UI设计师。这些NPC不仅能够理解玩家的对话,还能根据自己的角色特点做出相应的回复。
+我们使用 HelloAgents  SimpleAgent 为每个 NPC 创建了独立的智能体。每个 NPC 都有自己的角色设定、性格特点和记忆系统。通过精心设计的系统提示词,我们让张三成为了一位严谨的 Python 工程师,李四成为了一位善于沟通的产品经理,王五成为了一位富有创意的 UI 设计师。这些 NPC 不仅能够理解玩家的对话,还能根据自己的角色特点做出相应的回复。
 
 <strong>记忆与好感度系统</strong>
 
-我们实现了两层记忆系统:短期记忆保持对话的连贯性,长期记忆存储所有的互动历史。通过向量数据库的语义检索,NPC可以回忆起之前讨论过的话题。好感度系统让NPC对玩家的态度随着互动而变化,从陌生到挚友,每个等级都有不同的行为表现。这些设计让NPC显得更加真实和有趣。
+我们实现了两层记忆系统:短期记忆保持对话的连贯性,长期记忆存储所有的互动历史。通过向量数据库的语义检索,NPC 可以回忆起之前讨论过的话题。好感度系统让 NPC 对玩家的态度随着互动而变化,从陌生到挚友,每个等级都有不同的行为表现。这些设计让 NPC 显得更加真实和有趣。
 
 <strong>游戏场景构建</strong>
 
-我们使用Godot创建了一个像素风格的办公室场景,实现了玩家控制、NPC游走、交互检测和对话UI。通过场景系统的模块化设计,我们可以轻松地添加新的NPC、新的场景和新的功能。GDScript的简洁语法让游戏逻辑的实现变得直观和高效。
+我们使用 Godot 创建了一个像素风格的办公室场景,实现了玩家控制、NPC 游走、交互检测和对话 UI。通过场景系统的模块化设计,我们可以轻松地添加新的 NPC、新的场景和新的功能。GDScript 的简洁语法让游戏逻辑的实现变得直观和高效。
 
 <strong>前后端通信</strong>
 
-我们使用HTTP REST API实现了Godot前端与FastAPI后端的通信。通过异步请求和信号系统,我们保证了游戏的流畅性,即使网络延迟较高也不会影响玩家体验。API客户端的封装让其他脚本可以方便地调用后端服务,对话UI的实现让玩家可以自然地与NPC交流。
+我们使用 HTTP REST API 实现了 Godot 前端与 FastAPI 后端的通信。通过异步请求和信号系统,我们保证了游戏的流畅性,即使网络延迟较高也不会影响玩家体验。API 客户端的封装让其他脚本可以方便地调用后端服务,对话 UI 的实现让玩家可以自然地与 NPC 交流。
 
-整个项目的技术栈如图15.15所示:
+整个项目的技术栈如图 15.15 所示:
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/15-figures/15-15.png" alt="" width="85%"/>
@@ -1858,42 +1862,42 @@ func get_npc_node(npc_name: String) -> Node2D:
 
 ### 15.7.2 扩展方向
 
-赛博小镇只是一个起点,还有很多可以扩展的方向。这些扩展不仅能够增强游戏的趣味性,也能探索AI技术在游戏中的更多可能性。
+赛博小镇只是一个起点,还有很多可以扩展的方向。这些扩展不仅能够增强游戏的趣味性,也能探索 AI 技术在游戏中的更多可能性。
 
 <strong>(1)多人在线支持</strong>
 
-目前的赛博小镇是单人游戏,但我们可以将其扩展为多人在线游戏。多个玩家可以同时进入同一个办公室,与NPC和其他玩家互动。这需要引入WebSocket进行实时通信,以及数据库来持久化玩家数据和NPC状态。NPC可以记住与不同玩家的互动,对每个玩家保持独立的好感度。
+目前的赛博小镇是单人游戏,但我们可以将其扩展为多人在线游戏。多个玩家可以同时进入同一个办公室,与 NPC 和其他玩家互动。这需要引入 WebSocket 进行实时通信,以及数据库来持久化玩家数据和 NPC 状态。NPC 可以记住与不同玩家的互动,对每个玩家保持独立的好感度。
 
 <strong>(2)任务系统</strong>
 
-我们可以为NPC设计任务系统。当玩家与NPC的好感度达到一定程度时,NPC会提供特殊任务。比如张三可能会请玩家帮忙调试一段代码,李四可能会请玩家收集用户反馈,王五可能会请玩家评价设计方案。完成任务可以获得奖励,也能进一步提升好感度。
+我们可以为 NPC 设计任务系统。当玩家与 NPC 的好感度达到一定程度时,NPC 会提供特殊任务。比如张三可能会请玩家帮忙调试一段代码,李四可能会请玩家收集用户反馈,王五可能会请玩家评价设计方案。完成任务可以获得奖励,也能进一步提升好感度。
 
-<strong>(3)NPC之间的互动</strong>
+<strong>(3)NPC 之间的互动</strong>
 
-目前NPC只与玩家互动,但我们可以让NPC之间也能互动。张三可以和李四讨论产品需求,李四可以和王五讨论界面设计,王五可以和张三讨论技术实现。这些互动可以在后台自动进行,玩家可以观察到NPC之间的对话,让整个世界显得更加生动。
+目前 NPC 只与玩家互动,但我们可以让 NPC 之间也能互动。张三可以和李四讨论产品需求,李四可以和王五讨论界面设计,王五可以和张三讨论技术实现。这些互动可以在后台自动进行,玩家可以观察到 NPC 之间的对话,让整个世界显得更加生动。
 
 <strong>(4)情感系统</strong>
 
-除了好感度,我们还可以为NPC添加更复杂的情感系统。NPC可以有开心、难过、生气、兴奋等不同的情绪状态,这些情绪会影响NPC的回复风格和行为。比如当NPC心情好的时候,会更愿意分享信息;当NPC心情不好的时候,可能会比较冷淡。
+除了好感度,我们还可以为 NPC 添加更复杂的情感系统。NPC 可以有开心、难过、生气、兴奋等不同的情绪状态,这些情绪会影响 NPC 的回复风格和行为。比如当 NPC 心情好的时候,会更愿意分享信息;当 NPC 心情不好的时候,可能会比较冷淡。
 
 <strong>(5)动态事件系统</strong>
 
-我们可以设计一些动态事件,让游戏世界更加丰富。比如定期举办团队会议,所有NPC和玩家聚在一起讨论项目进展;或者举办生日派对,庆祝某个NPC的生日;或者突发紧急任务,需要大家协作完成。这些事件可以增加游戏的变化性和趣味性。
+我们可以设计一些动态事件,让游戏世界更加丰富。比如定期举办团队会议,所有 NPC 和玩家聚在一起讨论项目进展;或者举办生日派对,庆祝某个 NPC 的生日;或者突发紧急任务,需要大家协作完成。这些事件可以增加游戏的变化性和趣味性。
 
 <strong>(6)更大的世界</strong>
 
-目前的赛博小镇只有一个办公室场景,但我们可以扩展到更大的世界。可以添加咖啡厅、图书馆、公园等不同的场景,每个场景有不同的NPC和互动方式。玩家可以在不同场景之间移动,探索更广阔的虚拟世界。
+目前的赛博小镇只有一个办公室场景,但我们可以扩展到更大的世界。可以添加咖啡厅、图书馆、公园等不同的场景,每个场景有不同的 NPC 和互动方式。玩家可以在不同场景之间移动,探索更广阔的虚拟世界。
 
 <strong>(7)个性化学习</strong>
 
-NPC可以学习每个玩家的偏好和习惯。比如如果玩家经常和张三讨论Python,NPC会记住玩家对编程感兴趣,以后会主动分享相关的内容。如果玩家喜欢在晚上玩游戏,NPC会记住这个时间习惯,在晚上更加活跃。
+NPC 可以学习每个玩家的偏好和习惯。比如如果玩家经常和张三讨论 Python,NPC 会记住玩家对编程感兴趣,以后会主动分享相关的内容。如果玩家喜欢在晚上玩游戏,NPC 会记住这个时间习惯,在晚上更加活跃。
 
 ### 15.7.3 思考与展望
 
-赛博小镇展示了AI技术在游戏中的巨大潜力。传统游戏中的NPC受限于预设的对话树和脚本,而AI NPC可以理解和生成自然语言,与玩家进行真正的对话。这不仅提升了游戏的沉浸感,也为游戏设计带来了新的可能性。
+赛博小镇展示了 AI 技术在游戏中的巨大潜力。传统游戏中的 NPC 受限于预设的对话树和脚本,而 AI NPC 可以理解和生成自然语言,与玩家进行真正的对话。这不仅提升了游戏的沉浸感,也为游戏设计带来了新的可能性。
 
-但AI NPC也面临一些挑战。首先是成本问题,每次对话都需要调用LLM API,这会产生一定的费用。对于大型多人在线游戏,这个成本可能会很高。其次是延迟问题,LLM的推理需要时间,如果网络延迟较高,玩家可能需要等待几秒才能看到NPC的回复。最后是内容控制问题,LLM生成的内容可能不完全可控,需要设计好的提示词和内容过滤机制。
+但 AI NPC 也面临一些挑战。首先是成本问题,每次对话都需要调用 LLM API,这会产生一定的费用。对于大型多人在线游戏,这个成本可能会很高。其次是延迟问题,LLM 的推理需要时间,如果网络延迟较高,玩家可能需要等待几秒才能看到 NPC 的回复。最后是内容控制问题,LLM 生成的内容可能不完全可控,需要设计好的提示词和内容过滤机制。
 
-尽管有这些挑战,AI NPC的未来仍然充满希望。随着LLM技术的发展,推理速度会越来越快,成本会越来越低。本地化的小型LLM也在快速发展,未来可能可以在玩家的设备上直接运行,完全不需要网络请求。AI技术与游戏的结合,将为玩家带来前所未有的体验。
+尽管有这些挑战,AI NPC 的未来仍然充满希望。随着 LLM 技术的发展,推理速度会越来越快,成本会越来越低。本地化的小型 LLM 也在快速发展,未来可能可以在玩家的设备上直接运行,完全不需要网络请求。AI 技术与游戏的结合,将为玩家带来前所未有的体验。
 
 在第五部分的毕业设计章节,我们将会学习如何用单智能体和多智能体构造通用智能体,这将是你的创作时间,敬请期待!

+ 1011 - 0
docs/chapter16/Chapter16-Graduation-Project.md

@@ -0,0 +1,1011 @@
+<div align="right">
+  English | <a href="./第十六章%20毕业设计.md">中文</a>
+</div>
+
+# Chapter 16: Graduation Project - Building Your Own Multi-Agent Application
+
+Congratulations on reaching the final chapter of the Hello-Agents tutorial! In the previous 15 chapters, we built the HelloAgents framework from scratch and learned about core agent concepts, multiple paradigms, tool systems, memory mechanisms, communication protocols, reinforcement learning training, and performance evaluation. In Chapters 13-15, we also demonstrated how to integrate all learned knowledge through three complete practical projects (Intelligent Travel Assistant, Automated Deep Research Agent, and Cyber Town).
+
+Now, it's time for you to become a true agent system builder! This chapter will guide you in **building your own multi-agent application** and sharing your achievements with the community through open-source collaboration.
+
+## 16.1 The Significance of the Graduation Project
+
+### 16.1.1 Why Do a Graduation Project
+
+The best way to learn technology is not by reading tutorials, but by **hands-on practice**. Through the previous chapters, you have mastered the theoretical knowledge and technical tools for building agent systems. However, the real challenge lies in: **How to apply this knowledge to real problems? How to design a complete system? How to handle various edge cases and exceptions?**
+
+The core value of the graduation project is to cultivate your comprehensive application ability, selectively integrating all the knowledge learned previously (agent paradigms, tool systems, memory mechanisms, communication protocols, etc.) into a complete project.
+
+Through the learning and practice in this chapter, we hope you can independently design and implement a complete agent application, skillfully use various functions of the HelloAgents framework, master basic Git and GitHub operations, learn to write clear project documentation, participate in open-source community collaborative development, and ultimately obtain a technical work you can showcase.
+
+### 16.1.2 Form of the Graduation Project
+
+Your graduation project will be submitted to the Hello-Agents co-creation project repository (`Co-creation-projects` directory) in the form of an **open-source project**. Specific requirements are as follows:
+
+1. **Project Naming**: Use the format `{your-GitHub-username}-{project-name}`, for example `jjyaoao-CodeReviewAgent`
+
+2. **Project Content**:
+   - A runnable Jupyter Notebook (`.ipynb` file) or Python script
+   - Complete dependency list (`requirements.txt`)
+   - Clear README documentation (`README.md`)
+   - Optional: demo videos, screenshots, datasets, etc.
+
+3. **Submission Method**: Submit via GitHub Pull Request (PR)
+
+4. **Review Process**: Community members will review your code, provide improvement suggestions, and merge into the main repository after approval
+
+## 16.2 Project Topic Selection Guide
+
+### 16.2.1 Topic Selection Principles
+
+A good graduation project should be practical, solving real problems rather than technology for technology's sake. We need to pursue completion within limited time and resources while clearly demonstrating your technical capabilities.
+
+### 16.2.2 Recommended Topic Directions
+
+Here are some recommended project directions - you can choose one or propose your own ideas:
+
+**(1) Productivity Tools**
+
+- **Intelligent Code Review Assistant**: Automatically analyze code quality, discover potential bugs, provide optimization suggestions
+- **Intelligent Documentation Generator**: Automatically generate API documentation and user manuals based on code
+- **Intelligent Meeting Assistant**: Record meeting content, generate meeting minutes, extract action items
+- **Intelligent Email Assistant**: Automatically classify emails, generate reply drafts, remind of important matters
+
+**(2) Learning Assistance**
+
+- **Intelligent Learning Partner**: Recommend learning resources based on learning progress, generate practice questions, answer questions
+- **Intelligent Paper Assistant**: Help find literature, summarize papers, generate citations
+- **Intelligent Programming Tutor**: Provide programming exercises, code review, learning path planning
+- **Intelligent Language Learning Assistant**: Provide conversation practice, grammar correction, vocabulary expansion
+
+**(3) Creative Entertainment**
+
+- **Intelligent Story Generator**: Generate novels, scripts, poetry based on user input
+- **Intelligent Game NPC**: Create game characters with personality who can naturally converse with players
+- **Intelligent Music Recommendation**: Recommend music based on mood and scene, generate playlists
+- **Intelligent Recipe Assistant**: Recommend recipes based on ingredients and taste, generate shopping lists
+
+**(4) Data Analysis**
+
+- **Intelligent Data Analyst**: Automatically analyze data, generate visualization charts, write analysis reports
+- **Intelligent Stock Analysis**: Analyze stock data and news sentiment, provide investment advice
+- **Intelligent Public Opinion Monitoring**: Monitor social media and news websites, analyze public opinion trends
+- **Intelligent Competitive Analysis**: Collect competitor information, comparative analysis, generate reports
+
+**(5) Life Services**
+
+- **Intelligent Health Assistant**: Record health data, provide health advice, create exercise plans
+- **Intelligent Financial Assistant**: Record income and expenses, analyze spending habits, provide financial advice
+- **Intelligent Shopping Assistant**: Compare prices, recommend products, generate shopping lists
+- **Intelligent Home Control**: Control smart home devices through natural language
+
+### 16.2.3 Topic Selection Example
+
+Let's illustrate how to select a topic and design a project through a specific example.
+
+**Project Name**: Intelligent Code Review Assistant (CodeReviewAgent)
+
+**Problem Analysis**: Code review is an important part of software development, but manual review is time-consuming and prone to missing issues. Existing static analysis tools can only find syntax errors and cannot understand code logic, so an intelligent assistant that can understand code semantics and provide in-depth analysis is needed.
+
+**Core Functions**: This project will implement code quality analysis (check code style, naming conventions, comment completeness), potential bug detection (discover logic errors, boundary condition issues, resource leaks), performance optimization suggestions (identify performance bottlenecks, propose optimization solutions), security vulnerability scanning (detect SQL injection, XSS and other security issues), and best practice recommendations (propose improvements based on language features and design patterns).
+
+**Expected Outcomes**: The final deliverable will be a runnable Jupyter Notebook demonstrating the complete review process, supporting mainstream languages like Python and JavaScript, capable of generating structured Markdown format review reports, and providing specific code examples and improvement suggestions.
+
+## 16.3 Development Environment Preparation
+
+### 16.3.1 Installing Necessary Tools
+
+Before starting development, please ensure your development environment has the following tools installed:
+
+**(1) Python Environment**
+
+```bash
+# Install HelloAgents
+pip install "hello-agents[all]"
+```
+
+**(2) Git and GitHub**
+
+```bash
+# Check Git version
+git --version
+
+# Configure Git user information
+git config --global user.name "Your Name"
+git config --global user.email "your.email@example.com"
+
+# Configure GitHub SSH key (recommended)
+# 1. Generate SSH key
+ssh-keygen -t ed25519 -C "your.email@example.com"
+
+# 2. Add public key to GitHub
+# Copy the content of ~/.ssh/id_ed25519.pub
+# Add in GitHub Settings > SSH and GPG keys
+
+# 3. Test connection
+ssh -T git@github.com
+```
+
+**(3) Jupyter Notebook**
+
+```bash
+# Install Jupyter
+pip install jupyter notebook
+
+# Or use JupyterLab (recommended)
+pip install jupyterlab
+
+# Start Jupyter
+jupyter lab
+```
+
+### 16.3.2 Fork the Project Repository
+
+**Step 1: Fork the Repository**
+
+1. Visit the Hello-Agents repository: https://github.com/datawhalechina/Hello-Agents
+2. Click the "Fork" button in the upper right corner, as shown in the red box in Figure 16.1
+3. Select your GitHub account and create the Fork
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/16-figures/16-1.png" alt="" width="85%"/>
+  <p>Figure 16.1 Fork Repository Steps</p>
+</div>
+
+**Step 2: Clone to Local**
+
+```bash
+# As shown in Figure 16.2, clone your forked repository
+git clone git@github.com:your-username/Hello-Agents.git
+
+# Enter project directory
+cd Hello-Agents
+
+# Add upstream repository (for syncing updates)
+git remote add upstream https://github.com/datawhalechina/Hello-Agents.git
+
+# View remote repositories
+git remote -v
+```
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/16-figures/16-2.png" alt="" width="85%"/>
+  <p>Figure 16.2 Clone Repository to Local</p>
+</div>
+
+**Step 3: Create Development Branch**
+
+```bash
+# Create and switch to new branch
+git checkout -b feature/your-project-name
+
+# For example:
+git checkout -b feature/code-review-agent
+```
+
+### 16.3.3 Project Directory Structure
+
+Create your project folder in the `Co-creation-projects` directory:
+
+```bash
+# Enter co-creation projects directory
+cd Co-creation-projects
+
+# Create project folder (format: GitHub-username-project-name)
+mkdir your-username-project-name
+
+# For example:
+mkdir jjyaoao-CodeReviewAgent
+
+# Enter project directory
+cd jjyaoao-CodeReviewAgent
+```
+
+Recommended project structure:
+
+```
+jjyaoao-CodeReviewAgent/
+├── README.md              # Project documentation
+├── requirements.txt       # Python dependency list
+├── main.ipynb            # Main Jupyter Notebook
+├── data/                 # Data files (optional)
+│   ├── sample_code.py
+│   └── test_cases.json
+├── outputs/              # Output results (optional)
+│   ├── review_report.md
+│   └── screenshots/
+├── src/                  # Source code (optional, if code is extensive)
+│   ├── agents/
+│   ├── tools/
+│   └── utils/
+└── .env.example          # Environment variable template
+```
+
+## 16.4 Project Development Guide
+
+### 16.4.1 Writing README Documentation
+
+README is the face of your project. A good README should contain the following:
+
+```markdown
+# Project Name
+
+> One-sentence description of your project
+
+## 📝 Project Introduction
+
+Detailed introduction to your project:
+- What problem does it solve?
+- What are its special features?
+- What scenarios is it suitable for?
+
+## ✨ Core Features
+
+- [ ] Feature 1: Description
+- [ ] Feature 2: Description
+- [ ] Feature 3: Description
+
+## 🛠️ Technology Stack
+
+- HelloAgents framework
+- Agent paradigms used (e.g., ReAct, Plan-and-Solve, etc.)
+- Tools and APIs used
+- Other dependency libraries
+
+## 🚀 Quick Start
+
+### Environment Requirements
+
+- Python 3.10+
+- Other requirements
+
+### Install Dependencies
+
+\`\`\`bash
+pip install -r requirements.txt
+\`\`\`
+
+### Configure API Keys
+
+\`\`\`bash
+# Create .env file
+cp .env.example .env
+
+# Edit .env file and fill in your API keys
+\`\`\`
+
+### Run Project
+
+\`\`\`bash
+# Start Jupyter Notebook
+jupyter lab
+
+# Open main.ipynb and run
+\`\`\`
+
+## 📖 Usage Examples
+
+Show how to use your project, preferably with code examples and results.
+
+## 🎯 Project Highlights
+
+- Highlight 1: Explanation
+- Highlight 2: Explanation
+- Highlight 3: Explanation
+
+## 📊 Performance Evaluation
+
+If you have evaluation results, display them here:
+- Accuracy: XX%
+- Response time: XX seconds
+- Other metrics
+
+## 🔮 Future Plans
+
+- [ ] Feature 1 to be implemented
+- [ ] Feature 2 to be implemented
+- [ ] Parts to be optimized
+
+## 🤝 Contribution Guidelines
+
+Issues and Pull Requests are welcome!
+
+## 📄 License
+
+MIT License
+
+## 👤 Author
+
+- GitHub: [@your-username](https://github.com/your-username)
+- Email: your.email@example.com (optional)
+
+## 🙏 Acknowledgments
+
+Thanks to the Datawhale community and Hello-Agents project!
+```
+
+### 16.4.2 Writing requirements.txt
+
+List all Python dependencies required for the project:
+
+```txt
+# Core dependencies
+hello-agents[all]>=0.2.7
+
+# Visualization (if needed)
+matplotlib>=3.7.0
+plotly>=5.14.0
+
+# Web framework (if needed)
+fastapi>=0.109.0
+uvicorn>=0.27.0
+```
+
+### 16.4.3 Developing Jupyter Notebook
+
+**(1) Notebook Structure Recommendations**
+
+A good Jupyter Notebook should contain the following parts:
+
+```python
+# ========================================
+# Part 1: Project Introduction
+# ========================================
+
+"""
+# Project Name
+
+## Project Introduction
+Brief introduction to project goals and features
+
+## Author Information
+- Name: XXX
+- GitHub: @XXX
+- Date: 2025-XX-XX
+"""
+
+# ========================================
+# Part 2: Environment Configuration
+# ========================================
+
+# Install dependencies
+!pip install -q hello-agents[all]
+
+# Import necessary libraries
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.tools import BaseTool
+import os
+from dotenv import load_dotenv
+
+# Load environment variables
+load_dotenv()
+
+# ========================================
+# Part 3: Tool Definition
+# ========================================
+
+class CustomTool(BaseTool):
+    """Custom tool class"""
+
+    name = "tool_name"
+    description = "Tool description"
+
+    def run(self, query: str) -> str:
+        """Tool execution logic"""
+        # Implement your tool logic
+        return "Result"
+
+# ========================================
+# Part 4: Agent Construction
+# ========================================
+
+# Create LLM
+llm = HelloAgentsLLM()
+
+# Create agent
+agent = SimpleAgent(
+    name="Agent Name",
+    llm=llm,
+    system_prompt="System prompt"
+)
+
+# Add tools
+agent.add_tool(CustomTool())
+
+# ========================================
+# Part 5: Feature Demonstration
+# ========================================
+
+# Example 1: Basic functionality
+print("=== Example 1: Basic Functionality ===")
+result = agent.run("User input")
+print(result)
+
+# Example 2: Complex scenario
+print("\n=== Example 2: Complex Scenario ===")
+result = agent.run("Complex user input")
+print(result)
+
+# ========================================
+# Part 6: Performance Evaluation (Optional)
+# ========================================
+
+# Evaluation code
+# ...
+
+# ========================================
+# Part 7: Summary and Outlook
+# ========================================
+
+"""
+## Project Summary
+
+### Implemented Features
+- Feature 1
+- Feature 2
+
+### Challenges Encountered
+- Challenge 1 and solution
+- Challenge 2 and solution
+
+### Future Improvement Directions
+- Improvement 1
+- Improvement 2
+"""
+```
+
+### 16.4.4 Testing Your Project
+
+Before submission, use this checklist to determine if your project meets submission requirements:
+
+```markdown
+- [ ] Code runs normally without errors
+- [ ] README documentation is complete with clear instructions
+- [ ] requirements.txt contains all dependencies
+- [ ] Clear usage examples provided
+- [ ] Code has appropriate comments
+- [ ] Output results meet expectations
+- [ ] Common exception cases handled
+- [ ] Project structure is clear with standardized file naming
+- [ ] Large files properly handled (see next section)
+```
+
+### 16.4.5 Large File Handling Guide
+
+**⚠️ Important: Avoid Oversized Main Repository**
+
+To keep the Hello-Agents main repository lightweight, please follow these large file handling guidelines:
+
+**(1) File Size Limits**
+
+- **Total project size**: Not exceeding 5MB
+- **Prohibited from direct submission**: Video files, large datasets, model files
+
+**(2) Large File Handling Solutions**
+
+If your project contains large files (datasets, videos, models, etc.), please use the following solutions:
+
+**Solution 1: Use External Links (Recommended)**
+
+Upload large files to external platforms and provide download links in README:
+
+```markdown
+## Datasets
+
+The datasets used in this project are large. Please download from the following links:
+
+- Dataset 1: [Baidu Netdisk](link) Extraction code: xxxx
+- Dataset 2: [Google Drive](link)
+- Demo video: [Bilibili](link) / [YouTube](link)
+```
+
+Recommended external platforms:
+- **Datasets**: Baidu Netdisk, Google Drive, Kaggle, HuggingFace Datasets
+- **Videos**: Bilibili, YouTube, Tencent Video
+- **Models**: HuggingFace Models, ModelScope
+- **Images**: GitHub Issues, image hosting services
+
+**Solution 2: Create Independent Repository**
+
+If the project has many resources, consider creating an independent data repository:
+
+```markdown
+## Project Resources
+
+Due to the large amount of data and demo resources, a separate resource repository has been created:
+
+- Resource repository: https://github.com/your-username/project-name-resources
+- Contains: Datasets, demo videos, model files, test data, etc.
+
+### Usage
+
+\`\`\`bash
+# Clone resource repository
+git clone https://github.com/your-username/project-name-resources.git
+
+# Copy data to project directory
+cp -r project-name-resources/data ./data
+\`\`\`
+```
+
+**Solution 3: Use Sample Data**
+
+Only provide small-scale sample data in the main repository:
+
+```python
+# Explain in README
+## Data Description
+
+- `data/sample.csv`: Sample data (100 records)
+- Complete dataset (100,000 records) download from [here](link)
+```
+
+**(3) Best Practice Example**
+
+```
+your-username-project-name/
+├── README.md              # Contains external resource links
+├── requirements.txt
+├── main.ipynb
+├── .gitignore            # Ignore large files
+├── data/
+│   └── sample.csv        # Sample data only (<1MB)
+└── outputs/
+    └── demo_result.png   # Demo results only (<1MB)
+```
+
+README explanation:
+
+```markdown
+## Data and Resources
+
+### Sample Data
+Project includes small-scale sample data for quick testing (located in `data/sample.csv`)
+
+### Complete Dataset
+Complete dataset (500MB) download from the following link:
+- Baidu Netdisk: [Link] Extraction code: xxxx
+- Extract to `data/` directory after download
+
+### Demo Video
+- Bilibili: [Project Demo Video](link)
+- YouTube: [Demo Video](link)
+```
+
+## 16.5 Submitting Pull Request
+
+### 16.5.1 Submitting Code to GitHub
+
+**Step 1: Check Modifications**
+
+```bash
+# View modified files
+git status
+```
+
+**Step 2: Add Files**
+
+```bash
+# Add all modified files
+git add .
+
+# Or add specific files
+git add Co-creation-projects/your-username-project-name/
+```
+
+**Step 3: Commit Changes**
+
+Commit messages should follow this format:
+
+```bash
+# Format: type: brief description
+git commit -m "feat: Add XXX graduation project"
+```
+
+**Commit Type Specifications:**
+
+- `feat`: New feature or project (use this type for graduation projects)
+- `fix`: Bug fix
+- `docs`: Documentation update
+- `style`: Code format adjustment (doesn't affect functionality)
+- `refactor`: Code refactoring
+- `test`: Test-related
+- `chore`: Other modifications (e.g., dependency updates)
+
+**Step 4: Push to GitHub**
+
+```bash
+# Push to your forked repository
+git push origin feature/your-project-name
+```
+
+### 16.5.2 Creating Pull Request
+
+**Step 1: Visit GitHub**
+
+1. Visit your forked repository: `https://github.com/your-username/Hello-Agents`
+2. Click the "Pull requests" tab, as shown in Figure 16.3
+3. Click the "New pull request" button
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/16-figures/16-3.png" alt="" width="85%"/>
+  <p>Figure 16.3 Creating Pull Request</p>
+</div>
+
+**Step 2: Select Branches**
+
+- Base repository: `datawhalechina/Hello-Agents`
+- Base branch: `main`
+- Head repository: `your-username/Hello-Agents`
+- Compare branch: `feature/your-project-name`
+
+**Step 3: Fill in PR Information**
+
+**⚠️ Important: Unified PR Title Format**
+
+For easy management and retrieval, all graduation project PR titles must follow this format:
+
+```
+[Graduation Project] Project Name - Brief Description
+```
+
+Examples:
+- `[Graduation Project] CodeReviewAgent - Intelligent Code Review Assistant`
+- `[Graduation Project] StudyBuddy - AI Learning Partner`
+- `[Graduation Project] DataAnalyst - Intelligent Data Analyst`
+
+**PR Description Template:**
+
+```markdown
+## Project Information
+
+- **Project Name**: XXX
+- **Author**: @your-username
+- **Project Type**: Productivity Tool/Learning Assistance/Creative Entertainment/Data Analysis/Life Service
+
+## Project Introduction
+
+Brief description of your project (2-3 sentences)
+
+## Core Features
+
+- [ ] Feature 1
+- [ ] Feature 2
+- [ ] Feature 3
+
+## Technical Highlights
+
+- Used XXX paradigm
+- Implemented XXX functionality
+- Optimized XXX performance
+
+## Demo Effects
+
+(Optional) Add screenshots or GIFs to showcase project effects
+
+## Self-Check List
+
+- [ ] Code runs normally
+- [ ] README documentation complete
+- [ ] requirements.txt complete
+- [ ] Clear usage examples provided
+- [ ] Code has appropriate comments
+
+## Other Notes
+
+(Optional) Other content that needs explanation
+```
+
+**Step 4: Submit PR**
+
+As shown in Figure 16.4, click the "Create pull request" button to submit.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/16-figures/16-4.png" alt="" width="85%"/>
+  <p>Figure 16.4 Submit Pull Request</p>
+</div>
+
+### 16.5.3 Responding to Review Comments
+
+After submitting the PR, community members will review your code and provide suggestions. Please respond promptly:
+
+1. **View Comments**: Check reviewer comments on the PR page
+2. **Modify Code**: Modify code based on suggestions
+3. **Submit Updates**:
+   ```bash
+   git add .
+   git commit -m "fix: Modify XXX based on review comments"
+   git push origin feature/your-project-name
+   ```
+4. **Reply to Comments**: Reply to reviewers on GitHub, explaining your modifications
+
+## 16.6 Example Project Showcase
+
+To help you better understand graduation project requirements, here's a complete example project. Don't worry - small creative ideas can also be included. Any work you create yourself is worth cherishing.
+
+**Project Information**
+
+- **Project Name**: CodeReviewAgent
+- **Author**: @jjyaoao
+- **Project Path**: `Co-creation-projects/jjyaoao-CodeReviewAgent/`
+
+**Project Structure**
+
+```
+jjyaoao-CodeReviewAgent/
+├── README.md              # Project documentation
+├── requirements.txt       # Dependency list
+├── main.ipynb            # Main program (includes quick demo and full features)
+├── .env.example          # Environment variable example
+├── .gitignore            # Git ignore rules
+├── data/
+│   └── sample_code.py    # Sample code
+└── outputs/
+    └── review_report.md  # Sample report
+```
+
+**Core Code Snippet (main.ipynb)**
+
+```python
+# ========================================
+# Intelligent Code Review Assistant
+# ========================================
+
+from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry
+from hello_agents.tools import Tool, ToolParameter
+from typing import Dict, Any, List
+import ast
+import os
+
+# ========================================
+# 0. Configure LLM Parameters
+# ========================================
+
+os.environ["LLM_MODEL_ID"] = "Qwen/Qwen2.5-72B-Instruct"
+os.environ["LLM_API_KEY"] = "your_api_key_here"
+os.environ["LLM_BASE_URL"] = "https://api-inference.modelscope.cn/v1/"
+os.environ["LLM_TIMEOUT"] = "60"
+
+# ========================================
+# 1. Define Code Analysis Tools
+# ========================================
+
+class CodeAnalysisTool(Tool):
+    """Code static analysis tool"""
+
+    def __init__(self):
+        super().__init__(
+            name="code_analysis",
+            description="Analyze Python code structure, complexity, and potential issues"
+        )
+
+    def run(self, parameters: Dict[str, Any]) -> str:
+        """Analyze code and return results"""
+        code = parameters.get("code", "")
+        if not code:
+            return "Error: Code cannot be empty"
+
+        try:
+            tree = ast.parse(code)
+            functions = [node for node in ast.walk(tree)
+                        if isinstance(node, ast.FunctionDef)]
+            classes = [node for node in ast.walk(tree)
+                      if isinstance(node, ast.ClassDef)]
+
+            result = {
+                "Number of functions": len(functions),
+                "Number of classes": len(classes),
+                "Lines of code": len(code.split('\n')),
+                "Function list": [f.name for f in functions],
+                "Class list": [c.name for c in classes]
+            }
+            return str(result)
+        except SyntaxError as e:
+            return f"Syntax error: {str(e)}"
+
+    def get_parameters(self) -> List[ToolParameter]:
+        return [
+            ToolParameter(
+                name="code",
+                type="string",
+                description="Python code to analyze",
+                required=True
+            )
+        ]
+
+class StyleCheckTool(Tool):
+    """Code style checking tool"""
+
+    def __init__(self):
+        super().__init__(
+            name="style_check",
+            description="Check if code complies with PEP 8 standards"
+        )
+
+    def run(self, parameters: Dict[str, Any]) -> str:
+        """Check code style"""
+        code = parameters.get("code", "")
+        if not code:
+            return "Error: Code cannot be empty"
+
+        issues = []
+        lines = code.split('\n')
+        for i, line in enumerate(lines, 1):
+            if len(line) > 79:
+                issues.append(f"Line {i}: Exceeds 79 characters")
+            if line.startswith(' ') and not line.startswith('    '):
+                if len(line) - len(line.lstrip()) not in [0, 4, 8, 12]:
+                    issues.append(f"Line {i}: Non-standard indentation")
+
+        if not issues:
+            return "Code style is good, complies with PEP 8 standards"
+        return "Found the following issues:\n" + "\n".join(issues)
+
+    def get_parameters(self) -> List[ToolParameter]:
+        return [
+            ToolParameter(
+                name="code",
+                type="string",
+                description="Python code to check",
+                required=True
+            )
+        ]
+
+# ========================================
+# 2. Create Tool Registry and Agent
+# ========================================
+
+# Create tool registry
+tool_registry = ToolRegistry()
+tool_registry.register_tool(CodeAnalysisTool())
+tool_registry.register_tool(StyleCheckTool())
+
+# Initialize LLM
+llm = HelloAgentsLLM()
+
+# Define system prompt
+system_prompt = """You are an experienced code review expert. Your tasks are:
+
+1. Use code_analysis tool to analyze code structure
+2. Use style_check tool to check code style
+3. Based on analysis results, provide detailed review report
+
+The review report should include:
+- Code structure analysis
+- Style issues
+- Potential bugs
+- Performance optimization suggestions
+- Best practice recommendations
+
+Please output the report in Markdown format."""
+
+# Create agent
+agent = SimpleAgent(
+    name="Code Review Assistant",
+    llm=llm,
+    system_prompt=system_prompt,
+    tool_registry=tool_registry
+)
+
+# ========================================
+# 3. Run Example
+# ========================================
+
+# Read sample code
+with open("data/sample_code.py", "r", encoding="utf-8") as f:
+    sample_code = f.read()
+
+print("=== Code to Review ===")
+print(sample_code)
+print("\n" + "="*50 + "\n")
+
+# Execute code review
+print("=== Starting Code Review ===")
+review_result = agent.run(f"Please review the following Python code:\n\n```python\n{sample_code}\n```")
+
+print(review_result)
+
+# Save review report
+with open("outputs/review_report.md", "w", encoding="utf-8") as f:
+    f.write(review_result)
+
+print("\nReview report saved to outputs/review_report.md")
+```
+
+**README.md Example**
+
+```markdown
+# CodeReviewAgent - Intelligent Code Review Assistant
+
+> Intelligent code review tool based on HelloAgents framework
+
+## 📝 Project Introduction
+
+CodeReviewAgent is an intelligent code review assistant that can automatically analyze Python code quality, discover potential issues, and provide optimization suggestions.
+
+### Core Features
+
+- ✅ Code structure analysis: Count functions, classes, lines of code, etc.
+- ✅ Style checking: Check compliance with PEP 8 standards
+- ✅ Intelligent suggestions: Provide in-depth analysis and optimization suggestions based on LLM
+- ✅ Report generation: Generate review reports in Markdown format
+
+## 🛠️ Technology Stack
+
+- HelloAgents framework (SimpleAgent + ToolRegistry)
+- Python AST module (code parsing)
+- ModelScope API (Qwen2.5-72B model)
+
+## 🚀 Quick Start
+
+### Install Dependencies
+
+\`\`\`bash
+pip install -r requirements.txt
+\`\`\`
+
+### Configure LLM Parameters
+
+**Method 1: Use .env file**
+
+\`\`\`bash
+cp .env.example .env
+# Edit .env file and fill in your API key
+\`\`\`
+
+**Method 2: Set directly in Notebook**
+
+The project is pre-configured with ModelScope API and can run directly. To modify, edit the configuration code in Part 1 of main.ipynb.
+
+### Run Project
+
+\`\`\`bash
+jupyter lab
+# Open main.ipynb and run all cells
+\`\`\`
+
+## 📖 Usage Example
+
+1. Place code to review in `data/sample_code.py`
+2. Run `main.ipynb`
+3. View generated review report `outputs/review_report.md`
+
+## 🎯 Project Highlights
+
+- **Automation**: No need for manual line-by-line checking, automatically discovers issues
+- **Intelligence**: Uses LLM to understand code semantics and provide in-depth suggestions
+- **Extensibility**: Easy to add new checking rules and tools
+
+## 👤 Author
+
+- GitHub: [@jjyaoao](https://github.com/jjyaoao)
+- Project link: [CodeReviewAgent](https://github.com/datawhalechina/Hello-Agents/tree/main/Co-creation-projects/jjyaoao-CodeReviewAgent)
+
+## 🙏 Acknowledgments
+
+Thanks to the Datawhale community and Hello-Agents project!
+```
+
+## 16.7 Summary and Outlook
+
+By completing the graduation project, you should have mastered the complete process of agent system design: designing system architecture from requirements, skillfully using various functions and components of the HelloAgents framework, developing custom tools to extend agent capabilities, completing full project development from requirement analysis to code implementation, learning to use Git and GitHub for open-source collaboration, and writing clear technical documentation.
+
+In this project, we built the HelloAgents framework from scratch and used it to implement multiple practical applications. Completing the graduation project is just the beginning. You can continue to deepen your learning of more agent paradigms and algorithms, prompt engineering and context engineering, multi-agent collaboration mechanisms, and other theoretical knowledge. You can also expand your technology stack by learning web development to build complete applications, learning databases to implement data persistence, and learning deployment to launch applications online. You can also continuously optimize your project by adding more features, optimizing performance and user experience, and improving testing and documentation. More importantly, actively participate in community contributions by helping other learners, participating in Hello-Agents framework development, and sharing your experiences and insights.
+
+From the simple agent in Chapter 1 to now being able to independently build complete multi-agent applications, you have traveled through an exciting learning journey. But this is not the end - it's a new beginning.
+
+AI technology is changing rapidly, and the agent field is full of infinite possibilities. We hope you can maintain curiosity and continuously learn new technologies, courageously use AI technology to solve practical problems and create value, willingly share your experiences and achievements with the community, and constantly refine your work in pursuit of excellence.
+
+Finally, thank you for reading this project in its entirety. We hope you have gained something from the learning process and that you can apply what you've learned to actual projects, creating amazing agent applications. The future of AI is full of infinite possibilities - let's explore and create together!
+
+**Remember: The best way to learn is through hands-on practice!**
+
+Now, start building your own agent application! We look forward to seeing your excellent work in the Co-creation-projects directory!
+
+If you find the Hello-Agents project helpful, please give us a ⭐Star!
+
+---
+<div align="center">
+  <strong>🎓 Congratulations on completing the Hello-Agents tutorial! 🎉</strong>
+</div>
+

+ 139 - 135
docs/chapter16/第十六章 毕业设计.md

@@ -1,6 +1,10 @@
+<div align="right">
+  <a href="./Chapter16-Graduation-Project.md">English</a> | 中文
+</div>
+
 # 第十六章 毕业设计:构建属于你的多智能体应用
 
-恭喜你来到Hello-Agents教程的最后一章!在前面的15章中,我们从零开始构建了HelloAgents框架,学习了智能体的核心概念、多种范式、工具系统、记忆机制、通信协议、强化学习训练和性能评估等知识。在第13-15章中,我们还通过三个完整的实战项目(智能旅行助手、自动化深度研究智能体、赛博小镇)展示了如何将所学知识融会贯通。
+恭喜你来到 Hello-Agents 教程的最后一章!在前面的 15 章中,我们从零开始构建了 HelloAgents 框架,学习了智能体的核心概念、多种范式、工具系统、记忆机制、通信协议、强化学习训练和性能评估等知识。在第 13-15 章中,我们还通过三个完整的实战项目(智能旅行助手、自动化深度研究智能体、赛博小镇)展示了如何将所学知识融会贯通。
 
 现在,是时候让你成为真正的智能体系统构建者了!本章将指导你<strong>构建属于你自己的多智能体应用</strong>,并通过开源协作的方式与社区分享你的成果。
 
@@ -12,23 +16,23 @@
 
 毕业设计的核心价值在于培养你的综合应用能力,将前面学到的所有知识(智能体范式、工具系统、记忆机制、通信协议等)选择性的整合到一个完整的项目中。
 
-通过本章的学习和实践,希望你能够独立设计并实现一个完整的智能体应用,熟练使用HelloAgents框架的各种功能,掌握Git和GitHub的基本操作,学会编写清晰的项目文档,参与开源社区的协作开发,最终获得一个可以展示的技术作品。
+通过本章的学习和实践,希望你能够独立设计并实现一个完整的智能体应用,熟练使用 HelloAgents 框架的各种功能,掌握 Git  GitHub 的基本操作,学会编写清晰的项目文档,参与开源社区的协作开发,最终获得一个可以展示的技术作品。
 
 ### 16.1.2 毕业设计的形式
 
-你的毕业设计将以<strong>开源项目</strong>的形式提交到Hello-Agents的共创项目仓库(`Co-creation-projects`目录)。具体要求如下:
+你的毕业设计将以<strong>开源项目</strong>的形式提交到 Hello-Agents 的共创项目仓库(`Co-creation-projects`目录)。具体要求如下:
 
 1. <strong>项目命名</strong>:使用`{你的GitHub用户名}-{项目名称}`的格式,例如`jjyaoao-CodeReviewAgent`
 
 2. <strong>项目内容</strong>:
-   - 一个可运行的Jupyter Notebook(`.ipynb`文件)或Python脚本
+   - 一个可运行的 Jupyter Notebook(`.ipynb`文件)或 Python 脚本
    - 完整的依赖列表(`requirements.txt`)
-   - 清晰的README文档(`README.md`)
+   - 清晰的 README 文档(`README.md`)
    - 可选:演示视频、截图、数据集等
 
-3. <strong>提交方式</strong>:通过GitHub的Pull Request(PR)提交
+3. <strong>提交方式</strong>:通过 GitHub  Pull Request(PR)提交
 
-4. <strong>评审流程</strong>:社区成员会review你的代码,提出改进建议,通过后合并到主仓库
+4. <strong>评审流程</strong>:社区成员会 review 你的代码,提出改进建议,通过后合并到主仓库
 
 ## 16.2 项目选题指南
 
@@ -42,8 +46,8 @@
 
 <strong>(1)生产力工具类</strong>
 
-- <strong>智能代码审查助手</strong>:自动分析代码质量、发现潜在bug、提出优化建议
-- <strong>智能文档生成器</strong>:根据代码自动生成API文档、用户手册
+- <strong>智能代码审查助手</strong>:自动分析代码质量、发现潜在 bug、提出优化建议
+- <strong>智能文档生成器</strong>:根据代码自动生成 API 文档、用户手册
 - <strong>智能会议助手</strong>:记录会议内容、生成会议纪要、提取行动项
 - <strong>智能邮件助手</strong>:自动分类邮件、生成回复草稿、提醒重要事项
 
@@ -51,13 +55,13 @@
 
 - <strong>智能学习伙伴</strong>:根据学习进度推荐学习资源、生成练习题、答疑解惑
 - <strong>智能论文助手</strong>:帮助查找文献、总结论文、生成引用
-- <strong>智能编程导师</strong>:提供编程练习、代码review、学习路径规划
+- <strong>智能编程导师</strong>:提供编程练习、代码 review、学习路径规划
 - <strong>智能语言学习助手</strong>:提供对话练习、语法纠错、词汇扩展
 
 <strong>(3)创意娱乐类</strong>
 
 - <strong>智能故事生成器</strong>:根据用户输入生成小说、剧本、诗歌
-- <strong>智能游戏NPC</strong>:创建有个性的游戏角色,能够与玩家自然对话
+- <strong>智能游戏 NPC</strong>:创建有个性的游戏角色,能够与玩家自然对话
 - <strong>智能音乐推荐</strong>:根据心情、场景推荐音乐,生成播放列表
 - <strong>智能菜谱助手</strong>:根据食材、口味推荐菜谱,生成购物清单
 
@@ -83,9 +87,9 @@
 
 <strong>问题分析</strong>:代码审查是软件开发中的重要环节,但人工审查耗时且容易遗漏问题。现有的静态分析工具只能发现语法错误,无法理解代码逻辑,因此需要一个能够理解代码语义、提供深度分析的智能助手。
 
-<strong>核心功能</strong>:该项目将实现代码质量分析(检查代码风格、命名规范、注释完整性)、潜在bug检测(发现逻辑错误、边界条件问题、资源泄漏)、性能优化建议(识别性能瓶颈、提出优化方案)、安全漏洞扫描(检测SQL注入、XSS等安全问题)以及最佳实践推荐(根据语言特性和设计模式提出改进建议)。
+<strong>核心功能</strong>:该项目将实现代码质量分析(检查代码风格、命名规范、注释完整性)、潜在 bug 检测(发现逻辑错误、边界条件问题、资源泄漏)、性能优化建议(识别性能瓶颈、提出优化方案)、安全漏洞扫描(检测 SQL 注入、XSS 等安全问题)以及最佳实践推荐(根据语言特性和设计模式提出改进建议)。
 
-<strong>预期成果</strong>:最终将交付一个可运行的Jupyter Notebook展示完整的审查流程,支持Python、JavaScript等主流语言,能够生成结构化的Markdown格式审查报告,并提供具体的代码示例和改进建议。
+<strong>预期成果</strong>:最终将交付一个可运行的 Jupyter Notebook 展示完整的审查流程,支持 Python、JavaScript 等主流语言,能够生成结构化的 Markdown 格式审查报告,并提供具体的代码示例和改进建议。
 
 ## 16.3 开发环境准备
 
@@ -93,14 +97,14 @@
 
 在开始开发之前,请确保你的开发环境已经安装了以下工具:
 
-<strong>(1)Python环境</strong>
+<strong>(1)Python 环境</strong>
 
 ```bash
 # 安装HelloAgents
 pip install "hello-agents[all]"
 ```
 
-<strong>(2)Git和GitHub</strong>
+<strong>(2)Git  GitHub</strong>
 
 ```bash
 # 检查Git版本
@@ -135,20 +139,20 @@ pip install jupyterlab
 jupyter lab
 ```
 
-### 16.3.2 Fork项目仓库
+### 16.3.2 Fork 项目仓库
 
-<strong>步骤1:Fork仓库</strong>
+<strong>步骤 1:Fork 仓库</strong>
 
-1. 访问Hello-Agents仓库:https://github.com/datawhalechina/Hello-Agents
-2. 点击右上角的"Fork"按钮,如图16.1红色方框位置
-3. 选择你的GitHub账号,创建Fork
+1. 访问 Hello-Agents 仓库:https://github.com/datawhalechina/Hello-Agents
+2. 点击右上角的"Fork"按钮,如图 16.1 红色方框位置
+3. 选择你的 GitHub 账号,创建 Fork
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/16-figures/16-1.png" alt="" width="85%"/>
-  <p>图 16.1 Fork仓库步骤</p>
+  <p>图 16.1 Fork 仓库步骤</p>
 </div>
 
-<strong>步骤2:克隆到本地</strong>
+<strong>步骤 2:克隆到本地</strong>
 
 ```bash
 # 如图16.2所示,克隆你Fork的仓库
@@ -169,13 +173,13 @@ git remote -v
   <p>图 16.2 克隆仓库到本地</p>
 </div>
 
-<strong>步骤3:创建开发分支</strong>
+<strong>步骤 3:创建开发分支</strong>
 
 ```bash
 # 创建并切换到新分支
 git checkout -b feature/你的项目名称
 
-# 例如
+# 例如:
 git checkout -b feature/code-review-agent
 ```
 
@@ -188,10 +192,10 @@ git checkout -b feature/code-review-agent
 # 进入共创项目目录
 cd Co-creation-projects
 
-# 创建项目文件夹(格式GitHub用户名-项目名称)
+# 创建项目文件夹(格式:GitHub用户名-项目名称)
 mkdir 你的用户名-项目名称
 
-# 例如
+# 例如:
 mkdir jjyaoao-CodeReviewAgent
 
 # 进入项目目录
@@ -220,9 +224,9 @@ jjyaoao-CodeReviewAgent/
 
 ## 16.4 项目开发指南
 
-### 16.4.1 编写README文档
+### 16.4.1 编写 README 文档
 
-README是项目的门面,一个好的README应该包含以下内容:
+README 是项目的门面,一个好的 README 应该包含以下内容:
 
 ```markdown
 # 项目名称
@@ -231,16 +235,16 @@ README是项目的门面,一个好的README应该包含以下内容:
 
 ## 📝 项目简介
 
-详细介绍你的项目
+详细介绍你的项目:
 - 解决什么问题?
 - 有什么特色功能?
 - 适用于什么场景?
 
 ## ✨ 核心功能
 
-- [ ] 功能1描述
-- [ ] 功能2描述
-- [ ] 功能3描述
+- [ ] 功能1:描述
+- [ ] 功能2:描述
+- [ ] 功能3:描述
 
 ## 🛠️ 技术栈
 
@@ -286,15 +290,15 @@ jupyter lab
 
 ## 🎯 项目亮点
 
-- 亮点1说明
-- 亮点2说明
-- 亮点3说明
+- 亮点1:说明
+- 亮点2:说明
+- 亮点3:说明
 
 ## 📊 性能评估
 
-如果有评估结果,展示在这里
-- 准确率XX%
-- 响应时间XX秒
+如果有评估结果,展示在这里:
+- 准确率:XX%
+- 响应时间:XX秒
 - 其他指标
 
 ## 🔮 未来计划
@@ -321,9 +325,9 @@ MIT License
 感谢Datawhale社区和Hello-Agents项目!
 ```
 
-### 16.4.2 编写requirements.txt
+### 16.4.2 编写 requirements.txt
 
-列出项目所需的所有Python依赖:
+列出项目所需的所有 Python 依赖:
 
 ```txt
 # 核心依赖
@@ -338,15 +342,15 @@ fastapi>=0.109.0
 uvicorn>=0.27.0
 ```
 
-### 16.4.3 开发Jupyter Notebook
+### 16.4.3 开发 Jupyter Notebook
 
-<strong>(1)Notebook结构建议</strong>
+<strong>(1)Notebook 结构建议</strong>
 
-一个好的Jupyter Notebook应该包含以下部分:
+一个好的 Jupyter Notebook 应该包含以下部分:
 
 ```python
 # ========================================
-# 第1部分项目介绍
+# 第1部分:项目介绍
 # ========================================
 
 """
@@ -356,13 +360,13 @@ uvicorn>=0.27.0
 简要介绍项目的目标和功能
 
 ## 作者信息
-- 姓名XXX
-- GitHub@XXX
-- 日期2025-XX-XX
+- 姓名:XXX
+- GitHub:@XXX
+- 日期:2025-XX-XX
 """
 
 # ========================================
-# 第2部分环境配置
+# 第2部分:环境配置
 # ========================================
 
 # 安装依赖
@@ -378,7 +382,7 @@ from dotenv import load_dotenv
 load_dotenv()
 
 # ========================================
-# 第3部分工具定义
+# 第3部分:工具定义
 # ========================================
 
 class CustomTool(BaseTool):
@@ -393,7 +397,7 @@ class CustomTool(BaseTool):
         return "结果"
 
 # ========================================
-# 第4部分智能体构建
+# 第4部分:智能体构建
 # ========================================
 
 # 创建LLM
@@ -410,28 +414,28 @@ agent = SimpleAgent(
 agent.add_tool(CustomTool())
 
 # ========================================
-# 第5部分功能演示
+# 第5部分:功能演示
 # ========================================
 
-# 示例1基础功能
-print("=== 示例1基础功能 ===")
+# 示例1:基础功能
+print("=== 示例1:基础功能 ===")
 result = agent.run("用户输入")
 print(result)
 
-# 示例2复杂场景
-print("\n=== 示例2复杂场景 ===")
+# 示例2:复杂场景
+print("\n=== 示例2:复杂场景 ===")
 result = agent.run("复杂的用户输入")
 print(result)
 
 # ========================================
-# 第6部分性能评估(可选)
+# 第6部分:性能评估(可选)
 # ========================================
 
 # 评估代码
 # ...
 
 # ========================================
-# 第7部分总结与展望
+# 第7部分:总结与展望
 # ========================================
 
 """
@@ -471,25 +475,25 @@ print(result)
 
 <strong>⚠️ 重要:避免主仓库过大</strong>
 
-为了保持Hello-Agents主仓库的轻量化,请遵循以下大文件处理规范:
+为了保持 Hello-Agents 主仓库的轻量化,请遵循以下大文件处理规范:
 
 <strong>(1)文件大小限制</strong>
 
-- **项目总大小**: 不超过5MB
-- **禁止直接提交**: 视频文件、大型数据集、模型文件
+- **项目总大小**: 不超过 5MB
+- **禁止直接提交** 视频文件、大型数据集、模型文件
 
 <strong>(2)大文件处理方案</strong>
 
 如果你的项目包含大文件(数据集、视频、模型等),请使用以下方案:
 
-**方案1:使用外部链接(推荐)**
+**方案 1:使用外部链接(推荐)**
 
-将大文件上传到外部平台,在README中提供下载链接:
+将大文件上传到外部平台,在 README 中提供下载链接:
 
 ```markdown
 ## 数据集
 
-本项目使用的数据集较大,请从以下链接下载
+本项目使用的数据集较大,请从以下链接下载:
 
 - 数据集1: [百度网盘](链接) 提取码: xxxx
 - 数据集2: [Google Drive](链接)
@@ -497,19 +501,19 @@ print(result)
 ```
 
 推荐的外部平台:
-- **数据集**: 百度网盘、Google Drive、Kaggle、HuggingFace Datasets
-- **视频**: B站、YouTube、腾讯视频
-- **模型**: HuggingFace Models、ModelScope
-- **图片**: GitHub Issues、图床服务
+- **数据集** 百度网盘、Google Drive、Kaggle、HuggingFace Datasets
+- **视频**: B 站、YouTube、腾讯视频
+- **模型** HuggingFace Models、ModelScope
+- **图片** GitHub Issues、图床服务
 
-**方案2:创建独立仓库**
+**方案 2:创建独立仓库**
 
 如果项目资源较多,建议创建独立的数据仓库:
 
 ```markdown
 ## 项目资源
 
-由于项目包含大量数据和演示资源,已单独创建资源仓库
+由于项目包含大量数据和演示资源,已单独创建资源仓库:
 
 - 资源仓库: https://github.com/你的用户名/项目名称-resources
 - 包含内容: 数据集、演示视频、模型文件、测试数据等
@@ -525,7 +529,7 @@ cp -r 项目名称-resources/data ./data
 \`\`\`
 ```
 
-**方案3:使用示例数据**
+**方案 3:使用示例数据**
 
 在主仓库中只提供小规模的示例数据:
 
@@ -551,7 +555,7 @@ cp -r 项目名称-resources/data ./data
     └── demo_result.png   # 仅演示结果(<1MB)
 ```
 
-README中的说明:
+README 中的说明:
 
 ```markdown
 ## 数据和资源
@@ -560,7 +564,7 @@ README中的说明:
 项目包含小规模示例数据用于快速测试(位于`data/sample.csv`)
 
 ### 完整数据集
-完整数据集(500MB)请从以下链接下载
+完整数据集(500MB)请从以下链接下载:
 - 百度网盘: [链接] 提取码: xxxx
 - 下载后解压到`data/`目录
 
@@ -569,18 +573,18 @@ README中的说明:
 - YouTube: [Demo Video](链接)
 ```
 
-## 16.5 提交Pull Request
+## 16.5 提交 Pull Request
 
-### 16.5.1 提交代码到GitHub
+### 16.5.1 提交代码到 GitHub
 
-<strong>步骤1:检查修改</strong>
+<strong>步骤 1:检查修改</strong>
 
 ```bash
 # 查看修改的文件
 git status
 ```
 
-<strong>步骤2:添加文件</strong>
+<strong>步骤 2:添加文件</strong>
 
 ```bash
 # 添加所有修改的文件
@@ -590,58 +594,58 @@ git add .
 git add Co-creation-projects/你的用户名-项目名称/
 ```
 
-<strong>步骤3:提交修改</strong>
+<strong>步骤 3:提交修改</strong>
 
 提交信息应遵循以下格式:
 
 ```bash
-# 格式类型: 简短描述
+# 格式:类型: 简短描述
 git commit -m "feat: 添加XXX毕业设计项目"
 ```
 
 <strong>提交类型规范:</strong>
 
-- `feat`: 新增功能或项目(毕业设计项目使用此类型)
-- `fix`: 修复bug
-- `docs`: 文档更新
-- `style`: 代码格式调整(不影响功能)
-- `refactor`: 代码重构
-- `test`: 测试相关
-- `chore`: 其他修改(如依赖更新)
+- `feat` 新增功能或项目(毕业设计项目使用此类型)
+- `fix`: 修复 bug
+- `docs` 文档更新
+- `style` 代码格式调整(不影响功能)
+- `refactor` 代码重构
+- `test` 测试相关
+- `chore` 其他修改(如依赖更新)
 
-<strong>步骤4:推送到GitHub</strong>
+<strong>步骤 4:推送到 GitHub</strong>
 
 ```bash
 # 推送到你的Fork仓库
 git push origin feature/你的项目名称
 ```
 
-### 16.5.2 创建Pull Request
+### 16.5.2 创建 Pull Request
 
-<strong>步骤1:访问GitHub</strong>
+<strong>步骤 1:访问 GitHub</strong>
 
-1. 访问你Fork的仓库:`https://github.com/你的用户名/Hello-Agents`
-2. 点击"Pull requests"标签,如图16.3所示
+1. 访问你 Fork 的仓库:`https://github.com/你的用户名/Hello-Agents`
+2. 点击"Pull requests"标签,如图 16.3 所示
 3. 点击"New pull request"按钮
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/16-figures/16-3.png" alt="" width="85%"/>
-  <p>图 16.3 创建Pull Request</p>
+  <p>图 16.3 创建 Pull Request</p>
 </div>
 
 
-<strong>步骤2:选择分支</strong>
+<strong>步骤 2:选择分支</strong>
 
-- Base repository: `datawhalechina/Hello-Agents`
-- Base branch: `main`
-- Head repository: `你的用户名/Hello-Agents`
-- Compare branch: `feature/你的项目名称`
+- Base repository `datawhalechina/Hello-Agents`
+- Base branch `main`
+- Head repository `你的用户名/Hello-Agents`
+- Compare branch `feature/你的项目名称`
 
-<strong>步骤3:填写PR信息</strong>
+<strong>步骤 3:填写 PR 信息</strong>
 
-<strong>⚠️ 重要:PR标题统一格式</strong>
+<strong>⚠️ 重要:PR 标题统一格式</strong>
 
-为了便于管理和检索,所有毕业设计项目的PR标题必须遵循以下格式:
+为了便于管理和检索,所有毕业设计项目的 PR 标题必须遵循以下格式:
 
 ```
 [毕业设计] 项目名称 - 简短描述
@@ -652,14 +656,14 @@ git push origin feature/你的项目名称
 - `[毕业设计] StudyBuddy - AI学习伙伴`
 - `[毕业设计] DataAnalyst - 智能数据分析师`
 
-<strong>PR描述模板:</strong>
+<strong>PR 描述模板:</strong>
 
 ```markdown
 ## 项目信息
 
-- **项目名称**XXX
-- **作者**@你的用户名
-- **项目类型**生产力工具/学习辅助/创意娱乐/数据分析/生活服务
+- **项目名称**:XXX
+- **作者**:@你的用户名
+- **项目类型**:生产力工具/学习辅助/创意娱乐/数据分析/生活服务
 
 ## 项目简介
 
@@ -694,22 +698,22 @@ git push origin feature/你的项目名称
 (可选)其他需要说明的内容
 ```
 
-<strong>步骤4:提交PR</strong>
+<strong>步骤 4:提交 PR</strong>
 
-如图16.4所示,点击"Create pull request"按钮提交。
+如图 16.4 所示,点击"Create pull request"按钮提交。
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/16-figures/16-4.png" alt="" width="85%"/>
-  <p>图 16.4 提交Pull Request</p>
+  <p>图 16.4 提交 Pull Request</p>
 </div>
 
 
 
-### 16.5.3 响应Review意见
+### 16.5.3 响应 Review 意见
 
-提交PR后,社区成员会review你的代码并提出建议。请及时响应:
+提交 PR 后,社区成员会 review 你的代码并提出建议。请及时响应:
 
-1. <strong>查看评论</strong>:在PR页面查看reviewer的评论
+1. <strong>查看评论</strong>:在 PR 页面查看 reviewer 的评论
 2. <strong>修改代码</strong>:根据建议修改代码
 3. <strong>提交更新</strong>:
    ```bash
@@ -717,7 +721,7 @@ git push origin feature/你的项目名称
    git commit -m "fix: 根据review意见修改XXX"
    git push origin feature/你的项目名称
    ```
-4. <strong>回复评论</strong>:在GitHub上回复reviewer,说明你的修改
+4. <strong>回复评论</strong>:在 GitHub 上回复 reviewer,说明你的修改
 
 ## 16.6 示例项目展示
 
@@ -783,7 +787,7 @@ class CodeAnalysisTool(Tool):
         """分析代码并返回结果"""
         code = parameters.get("code", "")
         if not code:
-            return "错误代码不能为空"
+            return "错误:代码不能为空"
 
         try:
             tree = ast.parse(code)
@@ -801,7 +805,7 @@ class CodeAnalysisTool(Tool):
             }
             return str(result)
         except SyntaxError as e:
-            return f"语法错误{str(e)}"
+            return f"语法错误:{str(e)}"
 
     def get_parameters(self) -> List[ToolParameter]:
         return [
@@ -826,20 +830,20 @@ class StyleCheckTool(Tool):
         """检查代码风格"""
         code = parameters.get("code", "")
         if not code:
-            return "错误代码不能为空"
+            return "错误:代码不能为空"
 
         issues = []
         lines = code.split('\n')
         for i, line in enumerate(lines, 1):
             if len(line) > 79:
-                issues.append(f"第{i}行超过79个字符")
+                issues.append(f"第{i}行:超过79个字符")
             if line.startswith(' ') and not line.startswith('    '):
                 if len(line) - len(line.lstrip()) not in [0, 4, 8, 12]:
-                    issues.append(f"第{i}行缩进不规范")
+                    issues.append(f"第{i}行:缩进不规范")
 
         if not issues:
             return "代码风格良好,符合PEP 8规范"
-        return "发现以下问题\n" + "\n".join(issues)
+        return "发现以下问题:\n" + "\n".join(issues)
 
     def get_parameters(self) -> List[ToolParameter]:
         return [
@@ -864,13 +868,13 @@ tool_registry.register_tool(StyleCheckTool())
 llm = HelloAgentsLLM()
 
 # 定义系统提示词
-system_prompt = """你是一位经验丰富的代码审查专家。你的任务是
+system_prompt = """你是一位经验丰富的代码审查专家。你的任务是:
 
 1. 使用code_analysis工具分析代码结构
 2. 使用style_check工具检查代码风格
 3. 基于分析结果,提供详细的审查报告
 
-审查报告应包括
+审查报告应包括:
 - 代码结构分析
 - 风格问题
 - 潜在bug
@@ -901,7 +905,7 @@ print("\n" + "="*50 + "\n")
 
 # 执行代码审查
 print("=== 开始代码审查 ===")
-review_result = agent.run(f"请审查以下Python代码\n\n```python\n{sample_code}\n```")
+review_result = agent.run(f"请审查以下Python代码:\n\n```python\n{sample_code}\n```")
 
 print(review_result)
 
@@ -912,7 +916,7 @@ with open("outputs/review_report.md", "w", encoding="utf-8") as f:
 print("\n审查报告已保存到 outputs/review_report.md")
 ```
 
-<strong>README.md示例</strong>
+<strong>README.md 示例</strong>
 
 ```markdown
 # CodeReviewAgent - 智能代码审查助手
@@ -925,10 +929,10 @@ CodeReviewAgent是一个智能代码审查助手,能够自动分析Python代
 
 ### 核心功能
 
-- ✅ 代码结构分析统计函数、类、代码行数等
-- ✅ 风格检查检查是否符合PEP 8规范
-- ✅ 智能建议基于LLM提供深度分析和优化建议
-- ✅ 报告生成生成Markdown格式的审查报告
+- ✅ 代码结构分析:统计函数、类、代码行数等
+- ✅ 风格检查:检查是否符合PEP 8规范
+- ✅ 智能建议:基于LLM提供深度分析和优化建议
+- ✅ 报告生成:生成Markdown格式的审查报告
 
 ## 🛠️ 技术栈
 
@@ -972,14 +976,14 @@ jupyter lab
 
 ## 🎯 项目亮点
 
-- **自动化**无需人工逐行检查,自动发现问题
-- **智能化**利用LLM理解代码语义,提供深度建议
-- **可扩展**易于添加新的检查规则和工具
+- **自动化**:无需人工逐行检查,自动发现问题
+- **智能化**:利用LLM理解代码语义,提供深度建议
+- **可扩展**:易于添加新的检查规则和工具
 
 ## 👤 作者
 
 - GitHub: [@jjyaoao](https://github.com/jjyaoao)
-- 项目链接[CodeReviewAgent](https://github.com/datawhalechina/Hello-Agents/tree/main/Co-creation-projects/jjyaoao-CodeReviewAgent)
+- 项目链接:[CodeReviewAgent](https://github.com/datawhalechina/Hello-Agents/tree/main/Co-creation-projects/jjyaoao-CodeReviewAgent)
 
 ## 🙏 致谢
 
@@ -990,22 +994,22 @@ jupyter lab
 
 ## 16.7 总结与展望
 
-通过完成毕业设计,你应该已经掌握了智能体系统设计的完整流程。从需求出发设计系统架构,熟练使用HelloAgents框架的各种功能和组件,开发自定义工具扩展智能体能力,完成从需求分析到代码实现的完整项目开发,学会使用Git和GitHub进行开源协作,以及编写清晰的技术文档。
+通过完成毕业设计,你应该已经掌握了智能体系统设计的完整流程。从需求出发设计系统架构,熟练使用 HelloAgents 框架的各种功能和组件,开发自定义工具扩展智能体能力,完成从需求分析到代码实现的完整项目开发,学会使用 Git  GitHub 进行开源协作,以及编写清晰的技术文档。
 
-在本项目中,我们从零开始构建了HelloAgents框架,并用它实现了多个实用的应用。完成毕业设计只是开始,你可以继续深入学习更多智能体范式和算法、提示工程和上下文工程、多智能体协作机制等理论知识;也可以扩展技术栈,学习Web开发构建完整的应用、学习数据库实现数据持久化、学习部署将应用上线;还可以持续优化你的项目,添加更多功能、优化性能和用户体验、完善测试和文档;更重要的是,积极参与社区贡献,帮助其他学习者、参与Hello-Agents框架开发、分享你的经验和心得。
+在本项目中,我们从零开始构建了 HelloAgents 框架,并用它实现了多个实用的应用。完成毕业设计只是开始,你可以继续深入学习更多智能体范式和算法、提示工程和上下文工程、多智能体协作机制等理论知识;也可以扩展技术栈,学习 Web 开发构建完整的应用、学习数据库实现数据持久化、学习部署将应用上线;还可以持续优化你的项目,添加更多功能、优化性能和用户体验、完善测试和文档;更重要的是,积极参与社区贡献,帮助其他学习者、参与 Hello-Agents 框架开发、分享你的经验和心得。
 
 从第一章的简单智能体,到现在能够独立构建完整的多智能体应用,你已经走过了一段精彩的学习旅程。但这不是终点,而是新的起点。
 
-AI技术日新月异,智能体领域更是充满无限可能。希望你能够保持好奇心持续学习新技术,勇于用AI技术解决实际问题创造价值,乐于将你的经验和成果分享给社区,不断打磨你的作品追求卓越。
+AI 技术日新月异,智能体领域更是充满无限可能。希望你能够保持好奇心持续学习新技术,勇于用 AI 技术解决实际问题创造价值,乐于将你的经验和成果分享给社区,不断打磨你的作品追求卓越。
 
-最后,感谢你完整阅读了本项目。希望你在学习的过程中有所收获,也希望你能够将所学应用到实际项目中,创造出令人惊叹的智能体应用。AI的未来充满无限可能,让我们一起探索和创造!
+最后,感谢你完整阅读了本项目。希望你在学习的过程中有所收获,也希望你能够将所学应用到实际项目中,创造出令人惊叹的智能体应用。AI 的未来充满无限可能,让我们一起探索和创造!
 
 <strong>记住:最好的学习方式就是动手实践!</strong>
 
-现在,开始构建属于你的智能体应用吧!我们期待在Co-creation-projects目录中看到你的精彩作品!
+现在,开始构建属于你的智能体应用吧!我们期待在 Co-creation-projects 目录中看到你的精彩作品!
 
-如果你觉得Hello-Agents项目对你有帮助,请给我们一个⭐Star!
+如果你觉得 Hello-Agents 项目对你有帮助,请给我们一个⭐Star!
 
 ---
 <div align="center">
-  <strong>🎓 恭喜你完成了Hello-Agents教程的学习!🎉</strong>
+  <strong>🎓 恭喜你完成了 Hello-Agents 教程的学习!🎉</strong>

+ 567 - 0
docs/chapter2/Chapter2-History-of-Agents.md

@@ -0,0 +1,567 @@
+<div align="right">
+  English | <a href="./第二章 智能体发展史.md">中文</a>
+</div>
+
+# Chapter 2: History of Agents
+
+To deeply understand why modern agents present their current form and the origins of their core design philosophies, this chapter will trace back through history: starting from the classical era of artificial intelligence, exploring how the earliest "intelligence" was defined within rule systems of logic and symbols; then witnessing the major shift from single, centralized intelligence models to distributed, collaborative intelligence thinking; and finally understanding how the "learning" paradigm completely transformed the way agents acquire capabilities, giving birth to the modern agents we see today.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-00.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.1 The evolutionary ladder of AI agents</p>
+</div>
+
+As shown in Figure 2.1, **the emergence of each new paradigm is to solve the core "pain points" or fundamental limitations of the previous generation paradigm.** While new solutions bring capability leaps, they also introduce new "limitations" that are difficult to overcome at the time, which in turn lay the groundwork for the birth of the next generation paradigm. Understanding this "problem-driven" iterative process helps us more profoundly grasp the deep reasons and historical inevitability behind modern agent technology choices.
+
+## 2.1 Early Agents Based on Symbols and Logic
+
+Early explorations in the field of artificial intelligence were deeply influenced by mathematical logic and fundamental principles of computer science. In that era, researchers generally held a belief: human intelligence, especially logical reasoning ability, could be captured and reproduced by formalized symbolic systems. This core idea gave birth to the first important paradigm of artificial intelligence—Symbolicism, also known as "Logic AI" or "Traditional AI."
+
+In the view of symbolicism, the core of intelligent behavior is operating on symbols based on a set of explicit rules. Therefore, an agent can be viewed as a physical symbol system: it represents the external world through internal symbols and plans actions through logical reasoning. The "wisdom" of agents in this era came entirely from knowledge bases and reasoning rules pre-coded by designers, rather than acquired through autonomous learning.
+
+### 2.1.1 Physical Symbol System Hypothesis
+
+The theoretical foundation of the symbolicism era was the **Physical Symbol System Hypothesis (PSSH)**<sup>[1]</sup>, jointly proposed by **Allen Newell** and **Herbert A. Simon** in 1976. These two Turing Award winners provided theoretical guidance and criteria for implementing general artificial intelligence on computers through this hypothesis.
+
+The hypothesis contains two core assertions:
+
+1. **Sufficiency Assertion**: Any physical symbol system has sufficient means to produce general intelligent behavior.
+2. **Necessity Assertion**: Any system capable of exhibiting general intelligent behavior must essentially be a physical symbol system.
+
+A physical symbol system here refers to a system that can exist in the physical world, composed of a set of distinguishable symbols and a series of processes that operate on these symbols, with constituent elements as shown in Figure 2.2. These symbols can be combined into more complex structures (such as expressions), while processes can create, modify, copy, and destroy these symbol structures.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-0.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.2 Constituent elements of a physical symbol system</p>
+</div>
+
+In short, PSSH boldly declared: **The essence of intelligence is the computation and processing of symbols.**
+
+This hypothesis had far-reaching influence. It transformed the study of the vague and complex philosophical problem of human mind into a concrete problem that could be engineered and implemented on computers. It instilled strong confidence in early artificial intelligence researchers that as long as we could find the right way to represent knowledge and design effective reasoning algorithms, we could definitely create machine intelligence comparable to humans. Almost all research in the symbolicism era, from expert systems to automated planning, was conducted under the guidance of this hypothesis.
+
+### 2.1.2 Expert Systems
+
+Under the direct influence of the physical symbol system hypothesis, **Expert Systems** became the most important and successful application achievement of the symbolicism era. The core goal of expert systems was to simulate the ability of human experts to solve problems in specific domains. By encoding expert knowledge and experience into computer programs, they could provide conclusions or recommendations comparable to or even surpassing human experts when facing similar problems.
+
+A typical expert system usually consists of several core components including a knowledge base, inference engine, and user interface, with a general architecture as shown in Figure 2.3.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-1.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.3 General architecture of expert systems</p>
+</div>
+
+This architecture clearly embodies the design philosophy of separating knowledge from reasoning, an important characteristic of symbolicism AI.
+
+**Knowledge Base and Inference Engine**
+
+The "intelligence" of expert systems mainly comes from its two core components: the knowledge base and the inference engine.
+
+- **Knowledge Base**: This is the knowledge storage center of the expert system, used to store domain expert knowledge and experience. **Knowledge Representation** is key to building a knowledge base. In expert systems, the most commonly used knowledge representation method is **Production Rules**, i.e., a series of conditional statements in "IF-THEN" form. For example: IF patient has fever symptoms AND cough THEN may have respiratory infection. These rules associate specific situations (IF part, conditions) with corresponding conclusions or actions (THEN part, conclusions). A complex expert system may contain hundreds or thousands of such rules, collectively forming a vast knowledge network.
+- **Inference Engine**: The inference engine is the core computational engine of the expert system. It is a general program whose task is to find and apply relevant rules in the knowledge base based on facts provided by users, thereby deriving new conclusions. The inference engine mainly works in two ways:
+  - **Forward Chaining**: Starting from known facts, continuously matching the IF parts of rules, triggering THEN part conclusions, and adding new conclusions to the fact base until finally deriving the goal or no new rules can be matched. This is a "data-driven" reasoning approach.
+  - **Backward Chaining**: Starting from a hypothetical goal (such as "does the patient have pneumonia"), finding rules that can derive that goal, then taking the IF part of that rule as a new sub-goal, recursing in this way until all sub-goals can be proven by known facts. This is a "goal-driven" reasoning approach.
+
+**Application Case and Analysis: MYCIN System**
+
+MYCIN is one of the most famous and influential expert systems in history, developed by Stanford University in the 1970s<sup>[2]</sup>. It was designed to assist doctors in diagnosing bacterial blood infections and recommending appropriate antibiotic treatment plans.
+
+- **Working Principle**: MYCIN collected patient symptoms, medical history, and test results through question-and-answer interactions with doctors. Its knowledge base contained about 600 "IF-THEN" rules provided by medical experts. The inference engine mainly worked in backward chaining: starting from the highest goal of "determining the pathogen," it backward-derived what evidence and conditions were needed, then asked doctors questions to obtain this information. Its simplified workflow is shown in Figure 2.4.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-2.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.4 Schematic diagram of MYCIN backward chaining reasoning process</p>
+</div>
+
+- **Uncertainty Handling**: Medical diagnosis is full of uncertainty. An important innovation of MYCIN was introducing the concept of **Certainty Factor (CF)**, using a numerical value between -1 and 1 to represent the credibility of a conclusion. This enabled the system to handle uncertain, ambiguous medical knowledge and provide diagnostic results with credibility assessments, which was closer to the real world than simple Boolean logic.
+- **Achievements and Significance**: In an evaluation, MYCIN's performance in blood infection diagnosis exceeded that of non-specialist doctors and even reached the level of human experts. Its success eloquently proved the validity of the physical symbol system hypothesis: through careful knowledge engineering and symbolic reasoning, machines could indeed exhibit excellent "intelligence" in highly complex professional domains. MYCIN was not only a milestone in the development history of expert systems but also paved the way for subsequent commercial applications of artificial intelligence in various vertical domains.
+
+### 2.1.3 SHRDLU
+
+If expert systems demonstrated the "depth" of symbolic AI in professional domains, then the SHRDLU project<sup>[3]</sup> developed by **Terry Winograd** from 1968-1970 achieved a revolutionary breakthrough in "breadth." As shown in Figure 2.5, SHRDLU aimed to build a comprehensive intelligent agent that could interact fluently with humans through natural language in the micro-environment of the "blocks world." The "blocks world" is a simulated three-dimensional virtual space containing blocks of different shapes, colors, and sizes, as well as a virtual robotic arm that can grasp and move them. Users issue commands or ask questions to SHRDLU in natural language, and SHRDLU executes actions in the virtual world or provides text responses.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-3.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.5 SHRDLU's "blocks world" interaction interface</p>
+</div>
+
+SHRDLU attracted widespread attention at the time mainly because it was the first to integrate multiple independent artificial intelligence modules (such as language parsing, planning, memory) into a unified system and make them work collaboratively:
+
+- **Natural Language Understanding**: SHRDLU could parse structurally complex and ambiguous English sentences. It could not only understand direct commands (such as `Pick up a big red block.`) but also handle more complex instructions, such as:
+  - Reference resolution: `Find a block which is taller than the one you are holding and put it into the box.` In this instruction, the system needs to understand that `the one you are holding` refers to the object currently grasped by the robotic arm.
+  - Contextual memory: Users could say `Grasp the pyramid.`, then ask `What does the box contain?`, and the system could answer by connecting the context.
+- **Planning and Action**: After understanding instructions, SHRDLU could autonomously plan a series of necessary actions to complete tasks. For example, if the instruction was "put the blue block on the red block," and there was already another green block on the red block, the system would plan the action sequence of "first move the green block away, then put the blue block on."
+- **Memory and Q&A**: SHRDLU had memory about its environment and its own behavior. Users could ask questions about this, such as:
+  - Inquiring about world state: `Is there a large block behind a pyramid?`
+  - Inquiring about behavior history: `Did you touch any pyramid before you put the green one on the little cube?`
+  - Inquiring about behavior motivation: `Why did you pick up the red block?` SHRDLU could answer: `BECAUSE YOU ASKED ME TO.`
+
+SHRDLU's historical status and influence are mainly reflected in three aspects:
+
+- **Paradigm of Comprehensive Intelligence**: Before SHRDLU, AI research mostly focused on single functions. It was the first to integrate multiple AI modules such as language understanding, reasoning planning, and action memory into a unified system. Its "perceive-think-act" closed-loop design laid the foundation for modern agent research.
+- **Popularization of Micro-World Research Methods**: Its success proved the feasibility of exploring and verifying basic principles of complex agents in a simplified environment with clear rules, a method that profoundly influenced subsequent robotics and AI planning research.
+- **Optimism and Reflection Triggered**: SHRDLU's success sparked early optimistic expectations for AGI, but its capabilities were strictly limited to the blocks world. This limitation triggered long-term speculation in the AI field about the difference between "symbol processing" and "true understanding," revealing deep challenges on the path to general intelligence.
+
+### 2.1.4 Fundamental Challenges Facing Symbolicism
+
+Despite significant achievements in early projects, starting from the 1980s, symbolic AI encountered fundamental difficulties inherent in its methodology when moving from "micro-worlds" to the open, complex real world. These difficulties can mainly be summarized into two major categories:
+
+**(1) Common-sense Knowledge and Knowledge Acquisition Bottleneck**
+
+The "intelligence" of symbolic agents depends entirely on the quality and completeness of their knowledge bases. However, how to build a knowledge base that can support real-world interaction has proven to be an extremely arduous task, mainly reflected in two aspects:
+
+- **Knowledge Acquisition Bottleneck**: The knowledge of expert systems needs to be constructed by human experts and knowledge engineers through tedious processes of interviews, refinement, and encoding. This process is costly, time-consuming, and difficult to scale. More importantly, much of human expert knowledge is implicit and intuitive, difficult to be clearly expressed as "IF-THEN" rules. Attempting to manually symbolize all knowledge of the entire world is considered an almost impossible task.
+- **Common-sense Problem**: Human behavior relies on a vast background of common sense (for example, "water is wet," "ropes can pull but not push"), but symbolic systems know nothing about this unless explicitly encoded. Establishing a complete knowledge base for broad, vague common sense remains a major challenge to this day. The Cyc project<sup>[4]</sup>, after decades of effort, still has very limited results and applications.
+
+**(2) Frame Problem and System Brittleness**
+
+In addition to knowledge-level challenges, symbolicism also encountered logical dilemmas when dealing with a dynamically changing world.
+
+- **Frame Problem**: In a dynamic world, how to efficiently determine what things have not changed after an agent executes an action is a logical puzzle<sup>[5]</sup>. Explicitly declaring all invariant states for each action is computationally infeasible, yet humans can effortlessly ignore irrelevant changes.
+- **Brittleness**: Symbolic systems rely entirely on preset rules, making their behavior very "brittle." Once encountering any minor change or new situation outside the rules, the system may completely fail, unable to adapt flexibly like humans. SHRDLU's success was precisely because it operated in a closed world with complete rules, while the real world is full of exceptions.
+
+## 2.2 Building Rule-Based Chatbots
+
+After exploring the theoretical challenges of symbolicism, in this section we will intuitively experience how rule-based systems work through a specific programming practice. We will attempt to reproduce ELIZA, an extremely influential early chatbot in the history of artificial intelligence.
+
+### 2.2.1 ELIZA's Design Philosophy
+
+ELIZA was a computer program released in 1966 by MIT computer scientist **Joseph Weizenbaum**<sup>[6]</sup>, one of the famous early attempts in the field of natural language processing. ELIZA was not a single program but a framework that could execute different "scripts." Among them, the most widely known and successful script was "DOCTOR," which imitated a Rogerian non-directive psychotherapist.
+
+ELIZA's working method was extremely clever: it never directly answered questions or provided information but identified keywords in user input, then applied a set of preset transformation rules to convert user statements into open-ended questions. For example, when a user said "I am sad about my boyfriend," ELIZA might identify the keyword "I am sad about..." and apply a rule to generate the response: "Why are you sad about your boyfriend?"
+
+Weizenbaum's design philosophy was not to create an agent that could truly "understand" human emotions; on the contrary, he wanted to prove that through some simple sentence transformation techniques, machines could create an illusion of "intelligence" and "empathy" without understanding the conversation content at all. However, to his surprise, many people who interacted with ELIZA (including his secretary) developed emotional dependence on it, deeply believing it could understand them.
+
+The practical goal of this section is to reproduce ELIZA's core mechanism to deeply understand the advantages and fundamental limitations of this rule-driven approach.
+
+### 2.2.2 Pattern Matching and Text Substitution
+
+ELIZA's algorithm flow is based on **Pattern Matching and Text Substitution**, which can be clearly decomposed into the following four steps:
+
+1. **Keyword Identification and Ranking:** The rule base sets a priority for each keyword (such as `mother`, `dreamed`, `depressed`). When input contains multiple keywords, the program selects the rule corresponding to the keyword with the highest priority for processing.
+2. **Decomposition Rules:** After finding a keyword, the program uses decomposition rules with wildcards (`*`) to capture the rest of the sentence.
+   1. **Rule Example**: `* my *`
+   2. **User Input**: `"My mother is afraid of me"`
+   3. **Capture Result**: `["", "mother is afraid of me"]`
+3. **Reassembly Rules:** The program selects one from a set of reassembly rules associated with the decomposition rule to generate a response (usually randomly selected to increase diversity), and optionally uses the content captured in the previous step.
+   1. **Rule Example**: `"Tell me more about your family."`
+   2. **Generated Output**: `"Tell me more about your family."`
+4. **Pronoun Conversion:** Before reassembly, the program performs simple pronoun conversion (such as `I` → `you`, `my` → `your`) to maintain conversation coherence.
+
+The entire workflow can be represented by a simple pseudocode idea:
+
+```Python
+FUNCTION generate_response(user_input):
+    // 1. Split user input into words
+    words = SPLIT(user_input)
+
+    // 2. Find the highest priority keyword rule
+    best_rule = FIND_BEST_RULE(words)
+    IF best_rule is NULL:
+        RETURN a_generic_response() // For example: "Please go on."
+
+    // 3. Use rule to decompose user input
+    decomposed_parts = DECOMPOSE(user_input, best_rule.decomposition_pattern)
+    IF decomposition_failed:
+        RETURN a_generic_response()
+
+    // 4. Perform pronoun conversion on decomposed parts
+    transformed_parts = TRANSFORM_PRONOUNS(decomposed_parts)
+
+    // 5. Use reassembly rules to generate response
+    response = REASSEMBLE(transformed_parts, best_rule.reassembly_patterns)
+
+    RETURN response
+```
+
+Through this mechanism, ELIZA successfully simplified the complex natural language understanding problem into an operable, rule-based pattern matching game.
+
+### 2.2.3 Implementation of Core Logic
+
+Now, we will transform the technical principles described in the previous section into a simple, runnable Python function. The following code implements a mini version of ELIZA, which contains a small set of rules but is sufficient to demonstrate its core working mechanism.
+
+```Python
+import re
+import random
+
+# Define rule base: pattern (regular expression) -> response template list
+rules = {
+    r'I need (.*)': [
+        "Why do you need {0}?",
+        "Would it really help you to get {0}?",
+        "Are you sure you need {0}?"
+    ],
+    r'Why don\'t you (.*)\?': [
+        "Do you really think I don't {0}?",
+        "Perhaps eventually I will {0}.",
+        "Do you really want me to {0}?"
+    ],
+    r'Why can\'t I (.*)\?': [
+        "Do you think you should be able to {0}?",
+        "If you could {0}, what would you do?",
+        "I don't know -- why can't you {0}?"
+    ],
+    r'I am (.*)': [
+        "Did you come to me because you are {0}?",
+        "How long have you been {0}?",
+        "How do you feel about being {0}?"
+    ],
+    r'.* mother .*': [
+        "Tell me more about your mother.",
+        "What was your relationship with your mother like?",
+        "How do you feel about your mother?"
+    ],
+    r'.* father .*': [
+        "Tell me more about your father.",
+        "How did your father make you feel?",
+        "What has your father taught you?"
+    ],
+    r'.*': [
+        "Please tell me more.",
+        "Let's change focus a bit... Tell me about your family.",
+        "Can you elaborate on that?"
+    ]
+}
+
+# Define pronoun conversion rules
+pronoun_swap = {
+    "i": "you", "you": "i", "me": "you", "my": "your",
+    "am": "are", "are": "am", "was": "were", "i'd": "you would",
+    "i've": "you have", "i'll": "you will", "yours": "mine",
+    "mine": "yours"
+}
+
+def swap_pronouns(phrase):
+    """
+    Perform first/second person conversion on pronouns in input phrase
+    """
+    words = phrase.lower().split()
+    swapped_words = [pronoun_swap.get(word, word) for word in words]
+    return " ".join(swapped_words)
+
+def respond(user_input):
+    """
+    Generate response based on rule base
+    """
+    for pattern, responses in rules.items():
+        match = re.search(pattern, user_input, re.IGNORECASE)
+        if match:
+            # Capture matched part
+            captured_group = match.group(1) if match.groups() else ''
+            # Perform pronoun conversion
+            swapped_group = swap_pronouns(captured_group)
+            # Randomly select one from templates and format
+            response = random.choice(responses).format(swapped_group)
+            return response
+    # If no specific rule is matched, use the last wildcard rule
+    return random.choice(rules[r'.*'])
+
+# Main chat loop
+if __name__ == '__main__':
+    print("Therapist: Hello! How can I help you today?")
+    while True:
+        user_input = input("You: ")
+        if user_input.lower() in ["quit", "exit", "bye"]:
+            print("Therapist: Goodbye. It was nice talking to you.")
+            break
+        response = respond(user_input)
+        print(f"Therapist: {response}")
+
+>>>
+Therapist: Hello! How can I help you today?
+You: I am feeling sad today.
+Therapist: How long have you been feeling sad today?
+You: I need some help with my project.
+Therapist: Are you sure you need some help with your project?
+You: My mother is not happy with my work.
+Therapist: Tell me more about your mother.
+You: quit
+Therapist: Goodbye. It was nice talking to you.
+```
+
+Through the above programming practice, we can intuitively summarize the fundamental limitations of rule-driven systems, which are direct confirmations of the theoretical challenges of symbolicism discussed in Section `2.1.4`:
+
+- **Lack of Semantic Understanding**: The system does not understand word meanings. For example, when faced with the input "I am **not** happy," it will still mechanically match the `I am (.*)` rule and generate a semantically incorrect response because it cannot understand the role of the negation word "not."
+- **No Contextual Memory**: The system is **stateless**, with each response based only on the current single sentence input, unable to conduct coherent multi-turn conversations.
+- **Rule Scalability Problem**: Attempting to add more rules leads to explosive growth in the rule base size, and conflict management and priority handling between rules become extremely complex, ultimately making the system difficult to maintain.
+
+However, despite these obvious defects, ELIZA produced the famous "**ELIZA effect**" at the time, with many users believing it could understand them. This illusion of intelligence mainly stemmed from its clever conversation strategies (such as playing a passive questioner, using open-ended templates) and humans' innate emotional projection psychology.
+
+ELIZA's practice clearly revealed the core contradiction of the symbolicism approach: the system's seemingly intelligent performance depends entirely on rules pre-coded by designers. However, facing the infinite possibilities of real-world language, this exhaustive method is destined to be unscalable. The system has no true understanding, only executing symbol operations, which is the root of its brittleness.
+
+## 2.3 Marvin Minsky's Society of Mind
+
+The exploration of symbolicism and ELIZA's practice jointly pointed to a problem: a single, centralized reasoning engine built through preset rules seems difficult to lead to true intelligence. No matter how large the rule base, the system always appears rigid and brittle when facing the ambiguity, complexity, and infinite changes of the real world. This dilemma prompted some top thinkers to reflect on the most fundamental design philosophy of artificial intelligence. Among them, **Marvin Minsky** did not continue trying to add more rules to a single reasoning core but proposed a revolutionary question in his book **"The Society of Mind"**<sup>[7]</sup>: "What magical trick makes us intelligent? The trick is that there is no trick. The power of intelligence stems from our vast diversity, not from any single, perfect principle."
+
+### 2.3.1 Reflection on Single Holistic Intelligence Models
+
+From the 1970s to the 1980s, the limitations of symbolicism became increasingly apparent. Although expert systems achieved success in highly vertical domains, they could not possess child-like common sense; although SHRDLU could perform excellently in a closed blocks world, it could not understand anything outside that world; although ELIZA could imitate conversation, it knew nothing about the conversation content itself. These systems all followed a top-down design approach: an omniscient central processor that processes information and makes decisions according to a unified set of logical rules.
+
+Facing this universal failure, Minsky began to raise a series of fundamental questions:
+
+- **What is "understanding"?** When we say we understand a story, is this a single ability? Or is it actually the result of dozens of different mental processes working together, such as visualization ability, logical reasoning ability, emotional resonance ability, and social relationship common sense?
+- **What is "common sense"?** Is common sense a huge knowledge base containing millions of logical rules (as attempted by the Cyc project)? Or is it a distributed network woven from countless specific experiences and simple rule fragments?
+- **How should agents be built?** Should we continue pursuing a perfect, unified logical system, or should we acknowledge that intelligence itself is an "imperfect" hodgepodge composed of many functionally different, even conflicting simple parts?
+
+These questions directly addressed the core drawbacks of single holistic intelligence models. Such models attempt to solve all problems with a unified representation and reasoning mechanism, but this is far from how we observe natural intelligence (especially human intelligence) operating. Minsky believed that forcibly cramming diverse mental activities into a rigid logical framework was the root cause of early artificial intelligence research stagnation.
+
+Based on this reflection, Minsky proposed a subversive conception: he no longer viewed the mind as a pyramid-like hierarchical structure but saw it as a flattened "society" full of interaction and collaboration.
+
+### 2.3.2 Intelligence as Collaboration
+
+In Minsky's theoretical framework, the definition of an agent differs from the modern agents we discussed in Chapter 1. Here, an agent refers to an extremely simple, specialized mental process that is itself "mindless." For example, a `LINE-FINDER` agent responsible for identifying lines, or a `GRASP` agent responsible for grasping.
+
+These simple agents are organized to form more powerful **Agencies**. An agency is a group of agents working together to complete a more complex task. For example, a `BUILD` agency responsible for building blocks might be composed of multiple lower-level agents or agencies such as `SEE`, `FIND`, `GET`, and `PUT`. They influence each other through decentralized activation and inhibition signals, forming dynamic control flow.
+
+**Emergence** is key to understanding the society of mind theory. Complex, purposeful intelligent behavior is not pre-planned by some high-level agent but spontaneously arises from local interactions among numerous simple bottom-level agents.
+
+Let's use the classic "building a block tower" task as an example to illustrate this process, as shown in Figure 2.6. When a high-level goal (such as "I want to build a tower") appears, it activates a high-level agency called `BUILD-TOWER`.
+
+1. The `BUILD-TOWER` agency doesn't know how to execute specific physical actions; its only role is to activate its subordinate agencies, such as `BUILDER`.
+2. The `BUILDER` agency is also very simple; it might only contain loop logic: as long as the tower isn't finished, activate the `ADD-BLOCK` agency.
+3. The `ADD-BLOCK` agency is responsible for coordinating more specific subtasks; it sequentially activates three sub-agencies: `FIND-BLOCK`, `GET-BLOCK`, and `PUT-ON-TOP`.
+4. Each sub-agency is composed of even lower-level agents. For example, the `GET-BLOCK` agency activates the `SEE-SHAPE` agent in the visual system and the `REACH` and `GRASP` agents in the motor system.
+
+In this process, no single agent or agency has a global plan for the entire task. `GRASP` is only responsible for grasping; it doesn't know what a tower is; `BUILDER` is only responsible for looping; it doesn't know how to control the arm. However, when this society composed of countless "mindless" agents interacts through simple activation and inhibition rules, a seemingly highly intelligent behavior—building a block tower—naturally emerges.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-4.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.6 Schematic diagram of the emergence mechanism of block tower building behavior in the "society of mind"</p>
+</div>
+
+### 2.3.3 Theoretical Inspiration for Multi-Agent Systems
+
+The most far-reaching influence of the society of mind theory is that it provided an important conceptual foundation for **Distributed Artificial Intelligence (DAI)** and later **Multi-Agent Systems (MAS)**. It prompted researchers to think:
+
+**If intelligence within a mind emerges through collaboration of numerous simple agents, then can more powerful "collective intelligence" also emerge through collaboration among multiple independent, physically separated computational entities (computers, robots)?**
+
+The raising of this question directly shifted research focus from "how to build an omnipotent single agent" to "how to design an efficiently collaborating agent group." Specifically, the society of mind directly inspired MAS research in the following aspects:
+
+- **Decentralized Control**: The core of the theory is that there is no central controller. This idea was completely inherited by the MAS field, and how to design coordination mechanisms and task allocation strategies without central nodes became one of the core research topics of MAS.
+- **Emergent Computation**: Solutions to complex problems can spontaneously arise from simple local interaction rules. This inspired numerous emergence-based algorithms in MAS, such as ant colony algorithms and particle swarm optimization, for solving complex optimization and search problems.
+- **Agent Sociality**: Minsky's theory emphasized interactions between agents (activation, inhibition). The MAS field further expanded this, systematically studying communication languages between agents (such as ACL), interaction protocols (such as contract nets), negotiation strategies, trust models, and even organizational structures, thereby constructing true computational societies.
+
+It can be said that Minsky's "society of mind" theory provided an important analytical framework for AI researchers to understand the internal structure of "collective intelligence." It provided later researchers with a completely new perspective to explore complex systems composed of independent, autonomous, socially capable computational agents, formally opening the prelude to multi-agent system research.
+
+## 2.4 Evolution of Learning Paradigms and Modern Agents
+
+The "society of mind" theory discussed earlier pointed the way for collective intelligence and decentralized collaboration at the philosophical level, but the implementation path remained unclear. Meanwhile, the fundamental challenges exposed by symbolicism in dealing with real-world complexity also indicated that truly robust intelligence could not be built solely on pre-coded rules.
+
+These two threads jointly pointed to a question: If intelligence cannot be completely designed, can it be learned?
+
+This question opened the "learning" era of artificial intelligence. Its core goal was no longer to manually encode knowledge but to build systems that could automatically acquire knowledge and capabilities from experience and data. This section will trace the evolution of this paradigm: from the learning foundation laid by connectionism, to interactive learning achieved by reinforcement learning, to modern agents driven by large language models today.
+
+### 2.4.1 From Symbols to Connections
+
+As a direct response to the limitations of symbolicism, **Connectionism** re-emerged in the 1980s. Unlike symbolicism's top-down design philosophy relying on explicit logical rules, connectionism is a bottom-up approach inspired by mimicking the neural network structure of biological brains<sup>[8]</sup>. Its core ideas can be summarized as follows:
+
+1. **Distributed Representation of Knowledge**: Knowledge is not stored in some knowledge base in the form of explicit symbols or rules but is stored in a distributed manner in the form of connection weights between numerous simple processing units (i.e., artificial neurons). The connection pattern of the entire network itself constitutes knowledge.
+2. **Simple Processing Units**: Each neuron only performs very simple computations, such as receiving weighted inputs from other neurons, processing them through an activation function, and then outputting results to the next neuron.
+3. **Adjusting Weights Through Learning**: The system's intelligence does not come from complex programs pre-written by designers but from the "learning" process. By being exposed to numerous samples, the system automatically and iteratively adjusts connection weights between neurons according to some learning algorithm (such as backpropagation), gradually making the entire network's output approach the desired target.
+
+Under this paradigm, agents are no longer passive logical reasoning machines executing rules but adaptive systems capable of self-optimization through experience. As shown in Figure 2.7, this represents a fundamental shift in the core idea of building agents. Symbolicism attempted to explicitly encode human knowledge to machines, while connectionism attempted to create machines that could learn knowledge like humans.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-5.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.7 Comparison of symbolicism and connectionism paradigms</p>
+</div>
+
+The rise of connectionism, especially the success of deep learning in the 21st century, endowed agents with powerful perception and pattern recognition capabilities, enabling them to directly understand the world from raw data (such as images, sounds, text), which was unimaginable in the symbolicism era. However, how to enable agents to learn to make optimal sequential decisions in dynamic interactions with the environment required supplementation from another learning paradigm.
+
+### 2.4.2 Agents Based on Reinforcement Learning
+
+Connectionism mainly solved perception problems (for example, "What's in this picture?"), but the more core task of agents is decision-making (for example, "What should I do in this situation?"). **Reinforcement Learning (RL)** is precisely the learning paradigm focused on solving sequential decision problems. It does not directly learn from labeled static datasets but learns how to maximize its long-term benefits through direct interaction between agents and the environment, learning through "trial and error."
+
+Taking AlphaGo as an example, its core self-play learning process is a classic embodiment of reinforcement learning<sup>[9]</sup>. In this process, AlphaGo (the agent) observes the current board layout (environment state) and decides where to place the next stone (action). After a game ends, based on the win-loss result, it receives a clear signal: winning is a positive reward, losing is a negative reward. Through millions of such self-play sessions, AlphaGo continuously adjusts its internal strategy, gradually learning which actions to choose in which board situations are most likely to lead to final victory. This process is completely autonomous, not relying on direct guidance from human game records.
+
+This learning mechanism of optimizing one's own behavior through interaction with the environment and based on feedback signals is the core framework of reinforcement learning. Below we will detail its basic constituent elements and working mode.
+
+The reinforcement learning framework can be described by several core elements:
+
+- **Agent**: The learner and decision-maker. In AlphaGo's example, it's its decision-making program.
+- **Environment**: Everything external to the agent, the object with which the agent interacts. For AlphaGo, it's the rules of Go and the opponent.
+- **State (S)**: A specific description of the environment at a certain moment, the basis for the agent's decision-making. For example, the current positions of all stones on the board.
+- **Action (A)**: Operations the agent can take based on the current state. For example, placing a stone at a legal position on the board.
+- **Reward (R)**: A scalar signal fed back to the agent by the environment after the agent executes an action, used to evaluate the quality of that action in a specific state. For example, at the end of a game, victory receives a +1 reward, defeat receives a -1 reward.
+
+Based on the above core elements, reinforcement learning agents continuously iterate in a "perceive-act-learn" closed loop, with their working mode shown in Figure 2.8.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-6.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.8 Core interaction loop of reinforcement learning</p>
+</div>
+
+The specific steps of this loop are as follows:
+
+1. At time step t, the agent observes the current state $S_{t}$ of the environment.
+2. Based on state $S_{t}$, the agent selects an action $A_{t}$ according to its internal **Policy (π)** and executes it. A policy is essentially a mapping from states to actions, defining the agent's behavior.
+3. After receiving action $A_{t}$, the environment transitions to a new state $S_{t+1}$.
+4. Simultaneously, the environment feeds back an immediate reward $R_{t+1}$ to the agent.
+5. The agent uses this feedback (new state $S_{t+1}$ and reward $R_{t+1}$) to update and optimize its internal policy to make better decisions in the future. This update process is learning.
+
+The agent's learning goal is not to maximize the immediate reward at a certain time step but to maximize the **Cumulative Reward** from the current moment to the future, also called **Return**. This means the agent needs to have "foresight"; sometimes to obtain greater future rewards, it needs to sacrifice current immediate rewards (for example, the "sacrifice" strategy in Go). Through continuous exploration, feedback collection, and policy optimization in the above loop, the agent can ultimately learn to make autonomous decisions and long-term planning in complex dynamic environments.
+
+### 2.4.3 Pre-training Based on Large-Scale Data
+
+Reinforcement learning endowed agents with the ability to learn decision-making strategies from interactions, but this typically requires massive task-specific interaction data, resulting in agents lacking prior knowledge at the beginning of learning and needing to build understanding of tasks from scratch. Whether it's the common sense that symbolicism attempted to manually encode or the background knowledge humans rely on when making decisions, both are missing in RL agents. How to enable agents to have broad understanding of the world before starting to learn specific tasks? The solution to this problem ultimately emerged in the field of **Natural Language Processing (NLP)**, with its core being **Pre-training** based on large-scale data.
+
+**From Specific Tasks to General Models**
+
+Before the emergence of the pre-training paradigm, traditional natural language processing models were typically trained from scratch independently for single specific tasks (such as sentiment analysis, machine translation) on specially annotated small to medium-scale datasets. This mode led to several problems: models had narrow knowledge scope, difficulty generalizing knowledge learned in one task to another, and each new task required substantial human effort for data annotation. The proposal of the Pre-training and Fine-tuning paradigm completely changed this situation. Its core idea is divided into two steps:
+
+1. **Pre-training Phase**: First, train a super-large-scale neural network model on a general corpus containing internet-level massive text data through **Self-supervised Learning**. The goal of this phase is not to complete any specific task but to learn the inherent patterns, grammatical structures, factual knowledge, and contextual logic of language itself. The most common objective is "predicting the next word."
+2. **Fine-tuning Phase**: After completing pre-training, this model has already learned rich knowledge related to the dataset. Subsequently, for specific downstream tasks, only a small amount of annotated data for that task is needed to fine-tune the model, allowing it to adapt to the corresponding task.
+
+As shown in Figure 2.9, this intuitively demonstrates the complete process of pre-training and fine-tuning: general text data forms a foundation model through self-supervised learning, then fine-tuning with specific task data ultimately adapts to various downstream tasks.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-7.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.9 Schematic diagram of the "pre-training-fine-tuning" paradigm</p>
+</div>
+
+**Birth of Large Language Models and Emergent Abilities**
+
+Through pre-training on trillions of texts, the neural network weights of large language models have actually constructed a highly compressed implicit model of world knowledge. It solves the most troublesome "knowledge acquisition bottleneck" problem of the symbolicism era in a completely new way. More surprisingly, when the model's scale (number of parameters, data volume, computation) crosses a certain threshold, they begin to exhibit unexpected **Emergent Abilities** that were not directly trained, such as:
+
+- **In-context Learning**: Without adjusting model weights, just by providing **a few examples (Few-shot)** or even **zero examples (Zero-shot)** in the input, the model can understand and complete new tasks.
+- **Chain-of-Thought** Reasoning: By guiding the model to output step-by-step reasoning processes before answering complex questions, its accuracy on logic, arithmetic, and common-sense reasoning tasks can be significantly improved.
+
+The emergence of these abilities marks that LLMs are no longer just language models; they have evolved into components playing dual roles as both massive knowledge bases and general reasoning engines.
+
+At this point, in the long river of agent development history, several key technical puzzle pieces have all appeared: symbolicism provided the framework for logical reasoning, connectionism and reinforcement learning provided learning and decision-making capabilities, while large language models provided unprecedented world knowledge and general reasoning capabilities obtained through pre-training. In the next section, we will see how these technologies are integrated in the design of modern agents.
+
+### 2.4.4 Agents Based on Large Language Models
+
+With the rapid development of large language model technology, LLM-centric agents have become a new paradigm in the field of artificial intelligence. They can not only understand and generate human language but, more importantly, can autonomously perceive, plan, decide, and execute tasks through interaction with the environment.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-8.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.10 Core component architecture of LLM-driven agents</p>
+</div>
+
+As described in Chapter 1, the interaction between agents and the environment can be abstracted as a core loop. LLM-driven agents complete tasks through a continuously iterative closed-loop process where multiple modules work together. This process follows the architecture shown in Figure 2.10, with specific steps as follows:
+
+1. **Perception**: The process begins with the **Perception Module**. It receives raw input from the **Environment** through sensors, forming **Observations**. This observation information (such as user instructions, data returned by APIs, or changes in environment state) is the starting point for agent decision-making and will be passed to the thinking stage after processing.
+2. **Thought**: This is the cognitive core of the agent, corresponding to the collaborative work of the **Planning Module** and **Large Language Model (LLM)** in the diagram.
+   - **Planning and Decomposition**: First, the planning module receives observation information and formulates high-level strategies. Through mechanisms such as **Reflection** and **Self-criticism**, it decomposes macro goals into more specific, executable steps.
+   - **Reasoning and Decision-making**: Subsequently, the **LLM** as the hub receives instructions from the planning module and interacts with the **Memory** module to integrate historical information. The LLM performs deep reasoning and ultimately decides on the specific operation to execute next, typically manifested as a **Tool Call**.
+3. **Action**: After decision-making is complete, the action stage begins, managed by the **Execution Module**. Tool call instructions generated by the LLM are sent to the execution module. This module parses instructions, selects and calls appropriate tools from the **Tool Use** toolbox (such as code executors, search engines, APIs, etc.) to interact with the environment or execute tasks. This actual interaction with the environment is the agent's **Action**.
+4. **Observation** and Loop: Actions change the environment's state and produce results.
+   - After tool execution, a **Tool Result** is returned to the LLM, constituting direct feedback on the action's effect. Simultaneously, the agent's action changes the environment, producing a completely new **environment state**.
+   - This "tool result" and "new environment state" together constitute a new round of **Observation**. This new observation is captured again by the perception module, while the LLM **updates memory (Memory Update)** based on action results, thus initiating the next round of the "perceive-think-act" loop.
+
+This modular collaborative mechanism and continuous iterative loop constitute the core workflow of LLM-driven agents solving complex problems.
+
+### 2.4.5 Overview of Key Milestones in Agent Development
+
+The development history of artificial intelligence agents is not a straight single-lane road but a process of interweaving, competition, and fusion of several core ideological schools over more than half a century. Understanding this process helps us gain insight into the profound origins of current agent architecture paradigm formation.
+
+Among these, three major trends dominated research paradigms in different periods:
+
+1. **Symbolism**: Represented by pioneers such as **Herbert A. Simon** and **Marvin Minsky**, believing that the core of intelligence lies in symbol manipulation and logical reasoning. This idea gave birth to SHRDLU, which could understand natural language instructions, knowledge-driven expert systems, and the "Deep Blue" computer that achieved great success in chess.
+2. **Connectionism**: Its inspiration comes from simulating brain neural networks. Although early development was limited, under the promotion of researchers such as **Geoffrey Hinton**, the backpropagation algorithm laid the foundation for the revival of neural networks. Eventually, with the arrival of the deep learning era, this idea became mainstream through models such as convolutional neural networks and Transformers.
+3. **Behaviorism**: Emphasizing that agents learn optimal strategies through interaction with the environment and trial and error, its modern incarnation is reinforcement learning. From early TD-Gammon to AlphaGo, which combined with deep learning and defeated top human players, this school endowed agents with the ability to learn complex decision-making behaviors from experience.
+
+Entering the 2020s, these ideological schools have deeply integrated in unprecedented ways. Large language models represented by the GPT series are themselves products of connectionism but have become the core "brain" for executing symbolic reasoning, tool invocation, and planning decisions, forming a modern agent architecture combining neural and symbolic approaches. To systematically review this development context, Figure 2.11 below organizes key theories, projects, and events in the development history of artificial intelligence agents from the 1950s to the present, providing readers with a clear global overview as a consolidation of this chapter's knowledge.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-9.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.11 Timeline of agent development evolution (incomplete version)</p>
+</div>
+
+Thanks to breakthroughs in large language models, the agent technology stack presents unprecedented activity and diversity. Figure 2.12 shows a typical full view of the current AI Agent field technology stack, covering all aspects from underlying models to upper-layer applications.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/2-figures/1757246501849-10.png" alt="Figure description" width="90%"/>
+  <p>Figure 2.12 Overview of AI Agent technology stack</p>
+</div>
+
+This technology stack diagram was released by Letta in November 2024<sup>[10]</sup>. It layers and categorizes AI agent-related tools, platforms, and services, providing valuable reference for understanding current market landscape and technology selection.
+
+## 2.5 Chapter Summary
+
+This chapter reviewed the historical context of agent development, exploring the process from birth to evolution of its core ideas, covering several key paradigm revolutions in the field of artificial intelligence:
+
+- **Exploration and Limitations of Symbolicism**: Starting from the classical era of artificial intelligence, this chapter explained how early agents represented by expert systems attempted to simulate intelligence through "knowledge + reasoning." By personally building a rule-based chatbot, we deeply experienced the capability boundaries of this paradigm and the fundamental challenges it faced.
+- **Emergence of Distributed Intelligence Thinking**: Explored Marvin Minsky's "society of mind" theory. This revolutionary idea revealed that complex holistic intelligence can emerge from interactions of simple local units, providing important philosophical inspiration for subsequent multi-agent system research.
+- **Evolution of Learning Paradigms**: Witnessed fundamental changes in how agents acquire capabilities. From connectionism endowing agents with the ability to perceive the world, to reinforcement learning enabling them to learn optimal decision-making in interactions with the environment, to large language models (LLMs) based on large-scale data pre-training providing them with unprecedented world knowledge and general reasoning capabilities.
+- **Birth of Modern Agents**: Finally, we analyzed LLM-driven agents. Through analysis of their core components (models, memory, planning, tools, etc.) and working principles, we understood how various technical ideas in history achieved technological integration in modern Agent architecture.
+
+Through this chapter's learning, we not only understand where the modern agents introduced in Chapter 1 came from but also established a macro cognitive framework about agent technology evolution. We can discover that agent development is not simple technical iteration but a thought revolution about how to define "intelligence," acquire "knowledge," and make "decisions."
+
+Since the core of modern agents is large language models, deeply understanding their underlying principles is crucial. The next chapter will focus on large language models themselves, exploring their basic concepts, laying a solid foundation for subsequent advanced applications in multi-agent systems.
+
+## Exercises
+
+> **Note**: Some of the following exercises do not have standard answers, aiming to help learners establish systematic understanding of agent development history and cultivate "learning from history" technical insight.
+
+1. The Physical Symbol System Hypothesis<sup>[1]</sup> is the theoretical cornerstone of the symbolicism era. Please analyze:
+
+   - What do the "sufficiency assertion" and "necessity assertion" of this hypothesis mean?
+   - Combined with this chapter's content, explain which problems encountered by symbolic agents in practice challenged the "sufficiency" of this hypothesis?
+   - Do large language model-driven agents conform to the Physical Symbol System Hypothesis?
+
+2. The expert system MYCIN<sup>[2]</sup> achieved significant success in the medical diagnosis field but was ultimately not widely applied in clinical practice. Please think:
+
+   > **Hint**: Can analyze from multiple perspectives including technology, ethics, law, user acceptance, etc.
+
+   - Besides the "knowledge acquisition bottleneck" and "brittleness" mentioned in this chapter, what other factors might have hindered the application of expert systems in high-risk fields like medicine?
+   - If you were to design a medical diagnosis agent now, how would you design the system to overcome MYCIN's limitations?
+   - In which vertical domains are rule-based expert systems still a better choice than deep learning today? Please give examples.
+
+3. In Section 2.2, we implemented a simplified version of the ELIZA chatbot. Please expand on this basis:
+
+   > **Hint**: This is a hands-on practice question; actual code writing is recommended
+
+   - Add 3-5 new rules to ELIZA to enable it to handle more diverse conversation scenarios (such as discussing work, study, hobbies, etc.)
+   - Implement a simple "contextual memory" function: allow ELIZA to remember key information mentioned by users in conversations (such as name, age, occupation) and reference it in subsequent conversations
+   - Compare your expanded ELIZA with [ChatGPT](https://chatgpt.com/), listing at least 3 dimensions of essential differences
+   - Why does the rule-based approach encounter "combinatorial explosion" problems and difficulty in scaling and maintenance when handling open-domain conversations? Can you explain using mathematical methods?
+
+4. Marvin Minsky proposed a revolutionary viewpoint in the "society of mind" theory<sup>[7]</sup>: intelligence stems from collaboration of numerous simple agents, not a single perfect system.
+
+   - In the Figure 2.6 "building a block tower" example, what would happen to the entire system if the `GRASP` agent suddenly failed? What are the advantages and disadvantages of this decentralized architecture?
+   - Compare the "society of mind" theory with some current multi-agent systems (such as [CAMEL-Workforce](https://docs.camel-ai.org/key_modules/workforce), [MetaGPT](https://github.com/FoundationAgents/MetaGPT), [CrewAI](https://github.com/crewAIInc/crewAI)), what connections and differences exist between them?
+   - Marvin Minsky believed agents could be "mindless" simple processes, yet current large language models and agents often possess powerful reasoning capabilities. Does this mean the "society of mind" theory is no longer applicable in the large language model era?
+
+5. Reinforcement learning and supervised learning are two different learning paradigms. Please analyze:
+
+   - Use AlphaGo's example to explain how reinforcement learning's "trial-and-error learning" mechanism works
+   - Why is reinforcement learning particularly suitable for sequential decision problems? What is the essential difference in data requirements between it and supervised learning?
+   - Now we need to train an agent to play Super Mario. If using supervised learning and reinforcement learning respectively, what data is needed for each? Which method is more suitable for this task?
+   - In the training process of large language models, what key role does reinforcement learning play?
+
+6. The pre-training-fine-tuning paradigm is an important breakthrough in the modern artificial intelligence field. Please think deeply:
+
+   - Why does pre-training solve the "knowledge acquisition bottleneck" problem of the symbolicism era? What is the essential difference in knowledge representation methods?
+   - Most knowledge of pre-trained models comes from internet data; what problems might this bring? How to mitigate these problems?
+   - Do you think the "pre-training-fine-tuning" paradigm might be replaced by some new paradigm? Or will it exist long-term?
+
+7. Suppose you want to design an "intelligent code review assistant" that can automatically review code submissions (Pull Requests), summarize code implementation logic, check code quality, discover potential bugs, and propose improvement suggestions.
+
+   - If designing this system in the symbolicism era (1980s), how would you implement it? What difficulties would you encounter?
+   - If in the deep learning era without large language models (around 2015), how would you implement it?
+   - In the current era of large language models and agents, how would you design this agent's architecture? What modules should it include (refer to Figure 2.10)?
+   - Comparing these three eras' solutions, explain how agent technology evolution made this task change from "almost impossible" to "feasible"
+
+## References
+
+[1] NEWELL A, SIMON H A. Computer science as empirical inquiry: symbols and search[J]. Communications of the ACM, 1976, 19(3): 113-126.
+
+[2] BUCHANAN B G, SHORTLIFFE E H, ed. Rule-based expert systems: the MYCIN experiments of the Stanford Heuristic Programming Project[M]. Reading, Mass.: Addison-Wesley, 1984.
+
+[3] WINOGRAD T. Understanding natural language[M]. New York: Academic Press, 1972.
+
+[4] LENAT D B, GUHA R V. Cyc: a midterm report[J]. AI magazine, 1990, 11(3): 32.
+
+[5] MCCARTHY J, HAYES P J. Some philosophical problems from the standpoint of artificial intelligence[C]//MELTZER B, MICHIE D, ed. Machine intelligence 4. Edinburgh: Edinburgh University Press, 1969: 463-502.
+
+[6] WEIZENBAUM J. ELIZA: a computer program for the study of natural language communication between man and machine[J]. Communications of the ACM, 1966, 9(1): 36-45.
+
+[7] MINSKY M. The society of mind[M]. New York: Simon & Schuster, 1986.
+
+[8] RUMELHART D E, MCCLELLAND J L, PDP RESEARCH GROUP. Parallel distributed processing: explorations in the microstructure of cognition[M]. Cambridge, MA: MIT Press, 1986.
+
+[9] SILVER D, HUANG A, MADDISON C J, ed. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
+
+[10] LETTA. The AI agents stack[EB/OL]. (2024-11) [2025-09-07]. https://www.letta.com/blog/ai-agents-stack.
+

+ 12 - 8
docs/chapter2/第二章 智能体发展史.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter2-History-of-Agents.md">English</a> | 中文
+</div>
+
 # 第二章 智能体发展史
 
 为了深刻理解现代智能体为何呈现出如今的形态,以及其核心设计思想的由来,本章将回溯历史:从人工智能领域的古典时代出发,探寻最早的“智能”如何在逻辑与符号的规则体系中被定义;继而见证从单一、集中的智能模型到分布式、协作式智能思想的重大转折;最终理解“学习”范式如何彻底改变了智能体获取能力的方式,并催生出我们今天所见的现代智能体。
@@ -136,12 +140,12 @@ ELIZA的算法流程基于<strong>模式匹配(Pattern Matching)与文本替
 
 1. <strong>关键词识别与排序:</strong>规则库为每个关键词(如 `mother`, `dreamed`, `depressed`)设定一个优先级。当输入包含多个关键词时,程序会选择优先级最高的关键词所对应的规则进行处理。
 2. <strong>分解规则:</strong>找到关键词后,程序使用带通配符(`*`)的分解规则来捕获句子的其余部分。
-   1. <strong>规则示例</strong>: `* my *`
-   2. <strong>用户输入</strong>: `"My mother is afraid of me"`
-   3. <strong>捕获结果</strong>: `["", "mother is afraid of me"]`
+   1. <strong>规则示例</strong> `* my *`
+   2. <strong>用户输入</strong> `"My mother is afraid of me"`
+   3. <strong>捕获结果</strong> `["", "mother is afraid of me"]`
 3. <strong>重组规则:</strong>程序从与分解规则关联的一组重组规则中,选择一条来生成回应(通常随机选择以增加多样性),并可选择性地使用上一步捕获的内容。
-   1. <strong>规则示例</strong>: `"Tell me more about your family."`
-   2. <strong>生成输出</strong>: `"Tell me more about your family."`
+   1. <strong>规则示例</strong> `"Tell me more about your family."`
+   2. <strong>生成输出</strong> `"Tell me more about your family."`
 4. <strong>代词转换:</strong>在重组前,程序会进行简单的代词转换(如 `I` → `you`, `my` → `your`),以维持对话的连贯性。
 
 整个工作流程可以用一个简单的伪代码思路来表示:
@@ -154,7 +158,7 @@ FUNCTION generate_response(user_input):
     // 2. 寻找优先级最高的关键词规则
     best_rule = FIND_BEST_RULE(words)
     IF best_rule is NULL:
-        RETURN a_generic_response() // 例如"Please go on."
+        RETURN a_generic_response() // 例如:"Please go on."
 
     // 3. 使用规则分解用户输入
     decomposed_parts = DECOMPOSE(user_input, best_rule.decomposition_pattern)
@@ -180,7 +184,7 @@ FUNCTION generate_response(user_input):
 import re
 import random
 
-# 定义规则库模式(正则表达式) -> 响应模板列表
+# 定义规则库:模式(正则表达式) -> 响应模板列表
 rules = {
     r'I need (.*)': [
         "Why do you need {0}?",
@@ -278,7 +282,7 @@ Therapist: Goodbye. It was nice talking to you.
 通过上述的编程实践,我们可以直观地总结出规则驱动系统的根本局限性,这些局限正是对 `2.1.4` 节中符号主义理论挑战的直接印证:
 
 - <strong>缺乏语义理解</strong>:系统不理解词义。例如,面对“I am <strong>not</strong> happy”的输入,它仍会机械地匹配 `I am (.*)` 规则并生成语义不通的回应,因为它无法理解否定词“not”的作用。
-- <strong>无上下文记忆</strong>:系统是<strong>无状态的(Stateless)</strong>,每次回应仅基于当前单句输入,无法进行连-贯的多轮对话。
+- <strong>无上下文记忆</strong>:系统是<strong>无状态的(Stateless)</strong>,每次回应仅基于当前单句输入,无法进行连贯的多轮对话。
 - <strong>规则的扩展性问题</strong>:尝试增加更多规则会导致规则库的规模爆炸式增长,规则间的冲突与优先级管理将变得极其复杂,最终导致系统难以维护。
 
 然而,尽管存在这些显而易见的缺陷,ELIZA在当时却产生了著名的“<strong>ELIZA效应</strong>”,许多用户相信它能理解自己。这种智能的幻觉主要源于其巧妙的对话策略(如扮演被动的提问者、使用开放式模板)以及人类天生的情感投射心理。

+ 1014 - 0
docs/chapter3/Chapter3-Fundamentals-of-Large-Language-Models.md

@@ -0,0 +1,1014 @@
+<div align="right">
+  English | <a href="./第三章 大语言模型基础.md">中文</a>
+</div>
+
+# Chapter 3: Fundamentals of Large Language Models
+
+The first two chapters introduced the definition and development history of agents. This chapter will focus entirely on large language models themselves to answer a key question: How do modern agents work? We will start from the basic definition of language models, and through learning these principles, lay a solid foundation for understanding how LLMs acquire powerful knowledge reserves and reasoning capabilities.
+
+## 3.1 Language Models and Transformer Architecture
+
+### 3.1.1 From N-gram to RNN
+
+**Language Model (LM)** is the core of natural language processing, and its fundamental task is to calculate the probability of a word sequence (i.e., a sentence) appearing. A good language model can tell us what kind of sentences are fluent and natural. In multi-agent systems, language models are the foundation for agents to understand human instructions and generate responses. This section will review the evolution from classical statistical methods to modern deep learning models, laying a solid foundation for understanding the subsequent Transformer architecture.
+
+**(1) Statistical Language Models and the N-gram Idea**
+
+Before the rise of deep learning, statistical methods were the mainstream of language models. The core idea is that the probability of a sentence appearing equals the product of the conditional probabilities of each word in the sentence. For a sentence S composed of words $w_1,w_2,\cdots,w_m$, its probability P(S) can be expressed as:
+
+$$P(S)=P(w_1,w_2,…,w_m)=P(w_1)⋅P(w_2∣w_1)⋅P(w_3∣w_1,w_2)⋯P(w_m∣w_1,…,w_{m−1})$$
+
+This formula is called the chain rule of probability. However, directly calculating this formula is almost impossible because conditional probabilities like $P(w_m∣w_1,\cdots,w_{m−1})$ are too difficult to estimate from a corpus, as the word sequence $w_1,\cdots,w_{m−1}$ may have never appeared in the training data.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/3-figures/1757249275674-0.png" alt="Figure description" width="90%"/>
+  <p>Figure 3.1 Schematic diagram of Markov assumption</p>
+</div>
+
+To solve this problem, researchers introduced the **Markov Assumption**. Its core idea is: we don't need to trace back a word's entire history; we can approximately assume that a word's probability of appearing is only related to the limited $n−1$ words before it, as shown in Figure 3.1. Language models built on this assumption are called **N-gram models**. Here, "N" represents the context window size we consider. Let's look at some of the most common examples to understand this concept:
+
+- **Bigram (when N=2)**: This is the simplest case, where we assume a word's appearance is only related to the one word before it. Therefore, the complex conditional probability $P(w_i∣w_1,\cdots,w_{i−1})$ in the chain rule can be approximated to a more easily calculable form:
+
+$$P(w_{i}∣w_{1},…,w_{i−1})≈P(w_{i}∣w_{i−1})$$
+
+- **Trigram (when N=3)**: Similarly, we assume a word's appearance is only related to the two words before it:
+
+$$P(w_i∣w_1,…,w_{i−1})≈P(w_i∣w_{i−2},w_{i−1})$$
+
+These probabilities can be calculated through **Maximum Likelihood Estimation (MLE)** in large corpora. This term sounds complex, but its idea is very intuitive: what is most likely to appear is what we see most often in the data. For example, for a Bigram model, we want to calculate the probability $P(w_i∣w_{i−1})$ that the next word is $w_i$ after word $w_{i−1}$ appears. According to maximum likelihood estimation, this probability can be estimated through simple counting:
+
+$$P(w_i∣w_{i−1})=\frac{Count(w_{i−1},w_i)}{Count(w_{i−1})}$$
+
+Here, the `Count()` function represents "counting":
+
+- $Count(w_i−1,w_i)$: represents the total number of times the word pair $(w_{i−1},w_i)$ appears consecutively in the corpus.
+- $Count(w_{i−1})$: represents the total number of times the single word $w_{i−1}$ appears in the corpus.
+
+The formula's meaning is: we use "the number of times word pair $Count(w_i−1,w_i)$ appears" divided by "the total number of times word $Count(w_{i−1})$ appears" as an approximate estimate of $P(w_i∣w_{i−1})$.
+
+To make this process more concrete, let's manually perform a calculation. Suppose we have a mini corpus containing only the following two sentences: `datawhale agent learns`, `datawhale agent works`. Our goal is: using a Bigram (N=2) model, estimate the probability of the sentence `datawhale agent learns` appearing. According to the Bigram assumption, we examine consecutive pairs of words (i.e., word pairs) each time.
+
+**Step 1: Calculate the probability of the first word** $P(datawhale)$ This is the number of times `datawhale` appears divided by the total number of words. `datawhale` appears 2 times, and the total number of words is 6.
+
+$$P(\text{datawhale}) = \frac{\text{Number of "datawhale" in total corpus}}{\text{Total number of words in corpus}} = \frac{2}{6} \approx 0.333$$
+
+**Step 2: Calculate conditional probability** $P(agent∣datawhale)$ This is the number of times the word pair `datawhale agent` appears divided by the total number of times `datawhale` appears. `datawhale agent` appears 2 times, `datawhale` appears 2 times.
+
+$$P(\text{agent}|\text{datawhale}) =  \frac{\text{Count}(\text{datawhale agent})}{\text{Count}(\text{datawhale})} =  \frac{2}{2} = 1$$
+
+**Step 3: Calculate conditional probability** $P(learns∣agent)$ This is the number of times the word pair `agent learns` appears divided by the total number of times `agent` appears. `agent learns` appears 1 time, `agent` appears 2 times.
+
+$$P(\text{learns}|\text{agent}) =  \frac{\text{Count(agent learns)}}{\text{Count(agent)}} =  \frac{1}{2} = 0.5$$
+
+**Finally: Multiply the probabilities** So, the approximate probability of the entire sentence is:
+
+$$P(\text{datawhale agent learns}) \approx  P(\text{datawhale}) \cdot  P(\text{agent}|\text{datawhale}) \cdot  P(\text{learns}|\text{agent}) \approx  0.333 \cdot 1 \cdot 0.5 \approx 0.167$$
+
+```Python
+import collections
+
+# Example corpus, consistent with the corpus in the case explanation above
+corpus = "datawhale agent learns datawhale agent works"
+tokens = corpus.split()
+total_tokens = len(tokens)
+
+# --- Step 1: Calculate P(datawhale) ---
+count_datawhale = tokens.count('datawhale')
+p_datawhale = count_datawhale / total_tokens
+print(f"Step 1: P(datawhale) = {count_datawhale}/{total_tokens} = {p_datawhale:.3f}")
+
+# --- Step 2: Calculate P(agent|datawhale) ---
+# First calculate bigrams for subsequent steps
+bigrams = zip(tokens, tokens[1:])
+bigram_counts = collections.Counter(bigrams)
+count_datawhale_agent = bigram_counts[('datawhale', 'agent')]
+# count_datawhale was already calculated in step 1
+p_agent_given_datawhale = count_datawhale_agent / count_datawhale
+print(f"Step 2: P(agent|datawhale) = {count_datawhale_agent}/{count_datawhale} = {p_agent_given_datawhale:.3f}")
+
+# --- Step 3: Calculate P(learns|agent) ---
+count_agent_learns = bigram_counts[('agent', 'learns')]
+count_agent = tokens.count('agent')
+p_learns_given_agent = count_agent_learns / count_agent
+print(f"Step 3: P(learns|agent) = {count_agent_learns}/{count_agent} = {p_learns_given_agent:.3f}")
+
+# --- Finally: Multiply the probabilities ---
+p_sentence = p_datawhale * p_agent_given_datawhale * p_learns_given_agent
+print(f"Finally: P('datawhale agent learns') ≈ {p_datawhale:.3f} * {p_agent_given_datawhale:.3f} * {p_learns_given_agent:.3f} = {p_sentence:.3f}")
+
+>>>
+Step 1: P(datawhale) = 2/6 = 0.333
+Step 2: P(agent|datawhale) = 2/2 = 1.000
+Step 3: P(learns|agent) = 1/2 = 0.500
+Finally: P('datawhale agent learns') ≈ 0.333 * 1.000 * 0.500 = 0.167
+```
+
+N-gram models, although simple and effective, have two fatal flaws:
+
+1. **Data Sparsity**: If a word sequence has never appeared in the corpus, its probability estimate is 0, which is obviously unreasonable. Although this can be alleviated through smoothing techniques, it cannot be eradicated.
+2. **Poor Generalization Ability**: The model cannot understand semantic similarity between words. For example, even if the model has seen `agent learns` many times in the corpus, it cannot generalize this knowledge to semantically similar words. When we calculate the probability of `robot learns`, if the word `robot` has never appeared, or if the combination `robot learns` has never appeared, the probability calculated by the model will also be zero. The model cannot understand the semantic similarity between `agent` and `robot`.
+
+**(2) Neural Network Language Models and Word Embeddings**
+
+The fundamental flaw of N-gram models is that they treat words as isolated, discrete symbols. To overcome this problem, researchers turned to neural networks and proposed an idea: represent words with continuous vectors. In 2003, the **Feedforward Neural Network Language Model** proposed by Bengio et al. was a milestone in this field<sup>[1]</sup>.
+
+Its core idea can be divided into two steps:
+
+1. **Build a semantic space**: Create a high-dimensional continuous vector space, then map each word in the vocabulary to a point in that space. This point (i.e., vector) is called a **Word Embedding** or word vector. In this space, semantically similar words have vectors that are close together in position. For example, the vectors of `agent` and `robot` will be very close, while the vectors of `agent` and `apple` will be far apart.
+2. **Learn the mapping from context to the next word**: Utilize the powerful fitting ability of neural networks to learn a function. The input of this function is the word vectors of the previous $n−1$ words, and the output is the probability distribution of each word in the vocabulary appearing after the current context.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/3-figures/1757249275674-1.png" alt="Figure description" width="90%"/>
+  <p>Figure 3.2 Schematic diagram of neural network language model architecture</p>
+</div>
+
+As shown in Figure 3.2, in this architecture, word embeddings are automatically learned during model training. To complete the task of "predicting the next word," the model continuously adjusts the vector position of each word, ultimately making these vectors contain rich semantic information. Once we convert words into vectors, we can use mathematical tools to measure the relationships between them. The most commonly used method is **Cosine Similarity**, which measures their similarity by calculating the cosine of the angle between two vectors.
+
+$$\text{similarity}(\vec{a}, \vec{b}) = \cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}| |\vec{b}|}$$
+
+The meaning of this formula is:
+
+- If two vectors have exactly the same direction, the angle is 0°, the cosine value is 1, indicating complete correlation.
+- If two vectors are orthogonal, the angle is 90°, the cosine value is 0, indicating no relationship.
+- If two vectors have completely opposite directions, the angle is 180°, the cosine value is -1, indicating complete negative correlation.
+
+Through this method, word vectors can not only capture simple relationships like "synonyms" but also capture more complex analogical relationships.
+
+A famous example demonstrates the semantic relationships captured by word vectors: `vector('King') - vector('Man') + vector('Woman')` The result of this vector operation is surprisingly close to the position of `vector('Queen')` in the vector space. This is like performing semantic translation: we start from the point "king," subtract the vector of "male," add the vector of "female," and finally arrive at the position of "queen." This proves that word embeddings can learn abstract concepts like "gender" and "royalty."
+
+```Python
+import numpy as np
+
+# Assume we have learned simplified 2D word vectors
+embeddings = {
+    "king": np.array([0.9, 0.8]),
+    "queen": np.array([0.9, 0.2]),
+    "man": np.array([0.7, 0.9]),
+    "woman": np.array([0.7, 0.3])
+}
+
+def cosine_similarity(vec1, vec2):
+    dot_product = np.dot(vec1, vec2)
+    norm_product = np.linalg.norm(vec1) * np.linalg.norm(vec2)
+    return dot_product / norm_product
+
+# king - man + woman
+result_vec = embeddings["king"] - embeddings["man"] + embeddings["woman"]
+
+# Calculate similarity between result vector and "queen"
+sim = cosine_similarity(result_vec, embeddings["queen"])
+
+print(f"Result vector of king - man + woman: {result_vec}")
+print(f"Similarity of this result with 'queen': {sim:.4f}")
+
+>>>
+Result vector of king - man + woman: [0.9 0.2]
+Similarity of this result with 'queen': 1.0000
+```
+
+Neural network language models successfully solved the poor generalization problem of N-gram models through word embeddings. However, they still have a limitation similar to N-gram: the context window is fixed. They can only consider a fixed number of preceding words, which laid the groundwork for recurrent neural networks that can handle sequences of arbitrary length.
+
+**(3) Recurrent Neural Networks (RNN) and Long Short-Term Memory Networks (LSTM)**
+
+Although the neural network language model in the previous section introduced word embeddings to solve the generalization problem, like N-gram models, its context window is of fixed size. To predict the next word, it can only see the previous n−1 words, and earlier historical information is discarded. This obviously does not conform to how we humans understand language. To break the limitation of fixed windows, **Recurrent Neural Networks (RNN)** emerged, with a very intuitive core idea: add "memory" capability to the network<sup>[2]</sup>.
+
+As shown in Figure 3.3, RNN's design introduces a **hidden state** vector, which we can understand as the network's short-term memory. At each step of processing the sequence, the network reads the current input word and combines it with its memory from the previous moment (i.e., the hidden state from the previous time step), then generates a new memory (i.e., the hidden state of the current time step) to pass to the next moment. This cyclical process allows information to continuously propagate backward through the sequence.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/3-figures/1757249275674-2.png" alt="Figure description" width="90%"/>
+  <p>Figure 3.3 Schematic diagram of RNN structure</p>
+</div>
+
+However, standard RNNs have a serious problem in practice: the **Long-term Dependency Problem**. During training, the model needs to adjust weights deep in the network based on errors at the output end through the backpropagation algorithm. For RNNs, the length of the sequence is the depth of the network. When the sequence is very long, gradients undergo multiple multiplications during backward propagation, which causes gradient values to rapidly approach zero (**gradient vanishing**) or become extremely large (**gradient explosion**). Gradient vanishing prevents the model from effectively learning the impact of early sequence information on later outputs, making it difficult to capture long-distance dependencies.
+
+To solve the long-term dependency problem, **Long Short-Term Memory (LSTM)** was designed<sup>[3]</sup>. LSTM is a special type of RNN, and its core innovation lies in introducing **Cell State** and a sophisticated **Gating Mechanism**. The cell state can be seen as an information pathway independent of the hidden state, allowing information to pass more smoothly between time steps. The gating mechanism consists of several small neural networks that can learn how to selectively let information through, thereby controlling the addition and removal of information in the cell state. These gates include:
+
+- **Forget Gate**: Decides which information to discard from the cell state of the previous moment.
+- **Input Gate**: Decides which new information from the current input to store in the cell state.
+- **Output Gate**: Decides which information to output to the hidden state based on the current cell state.
+
+### 3.1.2 Transformer Architecture Analysis
+
+In the previous section, we saw that RNNs and LSTMs process sequential data by introducing recurrent structures, which to some extent solved the problem of capturing long-distance dependencies. However, this recurrent computation method also brought new bottlenecks: it must process data sequentially. The computation at time step t must wait for time step t−1 to complete before it can begin. This means RNNs cannot perform large-scale parallel computation and are inefficient when processing long sequences, which greatly limits the improvement of model scale and training speed. Transformer was proposed by the Google team in 2017<sup>[4]</sup>. It completely abandoned the recurrent structure and instead relied entirely on a mechanism called **Attention** to capture dependencies within sequences, thereby achieving truly parallel computation.
+
+**(1) Overall Encoder-Decoder Structure**
+
+The original Transformer model was designed for the end-to-end task of machine translation. As shown in Figure 3.4, it follows a classic **Encoder-Decoder** architecture at the macro level.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/3-figures/1757249275674-3.png" alt="Figure description" width="50%"/>
+  <p>Figure 3.4 Overall Transformer architecture diagram</p>
+</div>
+
+We can understand this structure as a team with clear division of labor:
+
+1. **Encoder**: The task is to "**understand**" the entire input sentence. It reads all input tokens (this concept will be introduced in Section 3.2.2) and ultimately generates a vector representation rich in contextual information for each token.
+2. **Decoder**: The task is to "**generate**" the target sentence. It references the preceding text it has already generated and "consults" the encoder's understanding results to generate the next word.
+
+To truly understand how Transformer works, the best method is to implement it yourself. In this section, we will adopt a "top-down" approach: first, we build the complete code framework of Transformer, defining all necessary classes and methods. Then, like completing a puzzle, we will implement the specific functions of these classes one by one.
+
+```Python
+import torch
+import torch.nn as nn
+import math
+
+# --- Placeholder modules, to be implemented in subsequent subsections ---
+
+class PositionalEncoding(nn.Module):
+    """
+    Positional encoding module
+    """
+    def forward(self, x):
+        pass
+
+class MultiHeadAttention(nn.Module):
+    """
+    Multi-head attention mechanism module
+    """
+    def forward(self, query, key, value, mask):
+        pass
+
+class PositionWiseFeedForward(nn.Module):
+    """
+    Position-wise feed-forward network module
+    """
+    def forward(self, x):
+        pass
+
+# --- Encoder core layer ---
+
+class EncoderLayer(nn.Module):
+    def __init__(self, d_model, num_heads, d_ff, dropout):
+        super(EncoderLayer, self).__init__()
+        self.self_attn = MultiHeadAttention() # To be implemented
+        self.feed_forward = PositionWiseFeedForward() # To be implemented
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+        self.dropout = nn.Dropout(dropout)
+
+    def forward(self, x, mask):
+        # Residual connection and layer normalization will be explained in detail in Section 3.1.2.4
+        # 1. Multi-head self-attention
+        attn_output = self.self_attn(x, x, x, mask)
+        x = self.norm1(x + self.dropout(attn_output))
+
+        # 2. Feed-forward network
+        ff_output = self.feed_forward(x)
+        x = self.norm2(x + self.dropout(ff_output))
+
+        return x
+
+# --- Decoder core layer ---
+
+class DecoderLayer(nn.Module):
+    def __init__(self, d_model, num_heads, d_ff, dropout):
+        super(DecoderLayer, self).__init__()
+        self.self_attn = MultiHeadAttention() # To be implemented
+        self.cross_attn = MultiHeadAttention() # To be implemented
+        self.feed_forward = PositionWiseFeedForward() # To be implemented
+        self.norm1 = nn.LayerNorm(d_model)
+        self.norm2 = nn.LayerNorm(d_model)
+        self.norm3 = nn.LayerNorm(d_model)
+        self.dropout = nn.Dropout(dropout)
+
+    def forward(self, x, encoder_output, src_mask, tgt_mask):
+        # 1. Masked multi-head self-attention (on itself)
+        attn_output = self.self_attn(x, x, x, tgt_mask)
+        x = self.norm1(x + self.dropout(attn_output))
+
+        # 2. Cross-attention (on encoder output)
+        cross_attn_output = self.cross_attn(x, encoder_output, encoder_output, src_mask)
+        x = self.norm2(x + self.dropout(cross_attn_output))
+
+        # 3. Feed-forward network
+        ff_output = self.feed_forward(x)
+        x = self.norm3(x + self.dropout(ff_output))
+
+        return x
+```
+
+**(2) From Self-Attention to Multi-Head Attention**
+
+Now, let's fill in the most critical module in the skeleton: the attention mechanism.
+
+Imagine we are reading this sentence: "The agent learns because **it** is intelligent." When we read the bolded "**it**," to understand its reference, our brain unconsciously places more attention on the word "agent" earlier in the sentence. The **Self-Attention** mechanism is a mathematical modeling of this phenomenon. It allows the model to consider all other words in the sentence when processing each word and assign different "attention weights" to these words. The higher the weight of a word, the stronger its association with the current word, and the greater the proportion its information should occupy in the current word's representation.
+
+To implement the above process, the self-attention mechanism introduces three learnable roles for each input token vector:
+
+- **Query (Q)**: Represents the current token, which is actively "querying" other tokens to obtain information.
+- **Key (K)**: Represents the "label" or "index" of tokens in the sentence that can be queried.
+- **Value (V)**: Represents the "content" or "information" carried by the token itself.
+
+These three vectors are all obtained by multiplying the original word embedding vector by three different, learnable weight matrices ($W^Q,W^K,W^V$). The entire computation process can be divided into the following steps, which we can imagine as an efficient open-book exam:
+
+- Prepare "exam questions" and "materials": For each word in the sentence, generate its $Q,K,V$ vectors through weight matrices.
+- Calculate relevance scores: To calculate the new representation of word $A$, use word $A$'s $Q$ vector to perform dot product operations with the $K$ vectors of all words in the sentence (including $A$ itself). This score reflects the importance of other words for understanding word $A$.
+- Stabilization and normalization: Divide all obtained scores by a scaling factor $\sqrt{d_{k}}$ ($d_{k}$ is the dimension of the $K$ vector) to prevent gradients from being too small, then use the Softmax function to convert scores into weights that sum to 1, which is the normalization process.
+- Weighted sum: Multiply the weights obtained in the previous step by each word's corresponding $V$ vector, then add all results together. The final vector is the new representation of word $A$ after integrating global contextual information.
+
+This process can be summarized by a concise formula:
+
+$$\text{Attention}(Q,K,V)=\text{softmax}\left(\frac{QK^{T}}{\sqrt{d_{k}}}\right)V$$
+
+If only one attention calculation is performed (i.e., single-head), the model may only learn to focus on one type of association. For example, when processing "it," it might only learn to focus on the subject. But relationships in language are complex, and we want the model to simultaneously focus on multiple relationships (such as referential relationships, tense relationships, subordinate relationships, etc.). Multi-head attention mechanism emerged. Its idea is simple: instead of doing it all at once, divide it into several groups, do them separately, then merge.
+
+It splits the original Q, K, V vectors into h parts along the dimension (h is the number of "heads"), and each part independently performs a single-head attention calculation. This is like having h different "experts" examine the sentence from different perspectives, with each expert capturing a different feature relationship. Finally, the "opinions" (i.e., output vectors) of these h experts are concatenated, then integrated through a linear transformation to obtain the final output.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/3-figures/1757249275674-4.png" alt="Figure description" width="50%"/>
+  <p>Figure 3.5 Multi-head attention mechanism</p>
+</div>
+
+As shown in Figure 3.5, this design allows the model to jointly attend to information from different positions and different representation subspaces, greatly enhancing the model's expressive power. Below is a simple implementation of multi-head attention for reference.
+
+```Python
+class MultiHeadAttention(nn.Module):
+    """
+    Multi-head attention mechanism module
+    """
+    def __init__(self, d_model, num_heads):
+        super(MultiHeadAttention, self).__init__()
+        assert d_model % num_heads == 0, "d_model must be divisible by num_heads"
+
+        self.d_model = d_model
+        self.num_heads = num_heads
+        self.d_k = d_model // num_heads
+
+        # Define linear transformation layers for Q, K, V and output
+        self.W_q = nn.Linear(d_model, d_model)
+        self.W_k = nn.Linear(d_model, d_model)
+        self.W_v = nn.Linear(d_model, d_model)
+        self.W_o = nn.Linear(d_model, d_model)
+
+    def scaled_dot_product_attention(self, Q, K, V, mask=None):
+        # 1. Calculate attention scores (QK^T)
+        attn_scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
+
+        # 2. Apply mask (if provided)
+        if mask is not None:
+            # Set positions where mask is 0 to a very small negative number, so they approach 0 after softmax
+            attn_scores = attn_scores.masked_fill(mask == 0, -1e9)
+
+        # 3. Calculate attention weights (Softmax)
+        attn_probs = torch.softmax(attn_scores, dim=-1)
+
+        # 4. Weighted sum (weights * V)
+        output = torch.matmul(attn_probs, V)
+        return output
+
+    def split_heads(self, x):
+        # Transform input x shape from (batch_size, seq_length, d_model)
+        # to (batch_size, num_heads, seq_length, d_k)
+        batch_size, seq_length, d_model = x.size()
+        return x.view(batch_size, seq_length, self.num_heads, self.d_k).transpose(1, 2)
+
+    def combine_heads(self, x):
+        # Transform input x shape from (batch_size, num_heads, seq_length, d_k)
+        # back to (batch_size, seq_length, d_model)
+        batch_size, num_heads, seq_length, d_k = x.size()
+        return x.transpose(1, 2).contiguous().view(batch_size, seq_length, self.d_model)
+
+    def forward(self, Q, K, V, mask=None):
+        # 1. Perform linear transformations on Q, K, V
+        Q = self.split_heads(self.W_q(Q))
+        K = self.split_heads(self.W_k(K))
+        V = self.split_heads(self.W_v(V))
+
+        # 2. Calculate scaled dot-product attention
+        attn_output = self.scaled_dot_product_attention(Q, K, V, mask)
+
+        # 3. Combine multi-head outputs and perform final linear transformation
+        output = self.W_o(self.combine_heads(attn_output))
+        return output
+```
+
+**(3) Feed-Forward Neural Network**
+
+In each Encoder and Decoder layer, the multi-head attention sublayer is followed by a **Position-wise Feed-Forward Network (FFN)**. If the role of the attention layer is to "dynamically aggregate" relevant information from the entire sequence, then the role of the feed-forward network is to extract higher-order features from this aggregated information.
+
+The key to this name is "position-wise." It means this feed-forward network acts independently on each token vector in the sequence. In other words, for a sequence of length `seq_len`, this FFN is actually called `seq_len` times, processing one token each time. Importantly, all positions share the same set of network weights. This design both maintains the ability to independently process each position and greatly reduces the model's parameter count. This network's structure is very simple, consisting of two linear transformations and a ReLU activation function:
+
+$$\mathrm{FFN}(x)=\max\left(0, xW_{1}+b_{1}\right) W_{2}+b_{2}$$
+
+Where $x$ is the output of the attention sublayer. $W_1,b_1,W_2,b_2$ are learnable parameters. Typically, the output dimension `d_ff` of the first linear layer is much larger than the input dimension `d_model` (for example, `d_ff = 4 * d_model`), then after ReLU activation, it is mapped back to `d_model` dimension through the second linear layer. This "expand then shrink" pattern, also called a bottleneck structure, is believed to help the model learn richer feature representations.
+
+In our PyTorch skeleton, we can implement this module with the following code:
+
+```Python
+class PositionWiseFeedForward(nn.Module):
+    """
+    Position-wise feed-forward network module
+    """
+    def __init__(self, d_model, d_ff, dropout=0.1):
+        super(PositionWiseFeedForward, self).__init__()
+        self.linear1 = nn.Linear(d_model, d_ff)
+        self.dropout = nn.Dropout(dropout)
+        self.linear2 = nn.Linear(d_ff, d_model)
+        self.relu = nn.ReLU()
+
+    def forward(self, x):
+        # x shape: (batch_size, seq_len, d_model)
+        x = self.linear1(x)
+        x = self.relu(x)
+        x = self.dropout(x)
+        x = self.linear2(x)
+        # Final output shape: (batch_size, seq_len, d_model)
+        return x
+```
+
+**(4) Residual Connections and Layer Normalization**
+
+In each encoder and decoder layer of Transformer, all submodules (such as multi-head attention and feed-forward networks) are wrapped by an `Add & Norm` operation. This combination ensures that Transformer can train stably.
+
+This operation consists of two parts:
+
+- **Residual Connection (Add)**: This operation directly adds the submodule's input `x` to the submodule's output `Sublayer(x)`. This structure solves the **Vanishing Gradients** problem in deep neural networks. During backpropagation, gradients can bypass the submodule and propagate forward directly, ensuring that even if the network has many layers, the model can be effectively trained. Its formula can be expressed as: $\text{Output} = x + \text{Sublayer}(x)$.
+- **Layer Normalization (Norm)**: This operation normalizes all features of a single sample, making its mean 0 and variance 1. This solves the **Internal Covariate Shift** problem during model training, keeping the input distribution of each layer stable, thereby accelerating model convergence and improving training stability.
+
+**3.1.2.5 Positional Encoding**
+
+We already understand that the core of Transformer is the self-attention mechanism, which captures dependencies by calculating relationships between any two tokens in a sequence. However, this computation method has an inherent problem: it does not contain any information about token order or position. For self-attention, the two sequences "agent learns" and "learns agent" are completely equivalent because it only cares about relationships between tokens and ignores their arrangement. To solve this problem, Transformer introduced **Positional Encoding**.
+
+The core idea of positional encoding is to add an additional "position vector" representing its absolute and relative position information to each token embedding vector in the input sequence. This position vector is not learned but directly calculated through a fixed mathematical formula. This way, even if two tokens (for example, two tokens both called `agent`) have the same embedding, because they are in different positions in the sentence, the vectors they ultimately input to the Transformer model will become unique due to adding different positional encodings. The positional encoding proposed in the original paper uses sine and cosine functions to generate, with the formula as follows:
+
+$$PE_{(pos,2i)}=\sin\left(\frac{pos}{10000^{2i/d_{\text{model}}}}\right),$$
+
+$$PE_{(pos,2i+1)}=\cos\left(\frac{pos}{10000^{2i/d_{\text{model}}}}\right)$$
+
+Where:
+
+- $pos$ is the position of the token in the sequence (for example, $0$, $1$, $2$, ...)
+- $i$ is the dimension index in the position vector (from $0$ to $d_{\text{model}}/2$)
+- $d_{\text{model}}$ is the dimension of the word embedding vector (consistent with what we defined in the model)
+
+Now, let's implement the `PositionalEncoding` module and complete the last part of our Transformer skeleton code.
+
+```Python
+class PositionalEncoding(nn.Module):
+    """
+    Add positional encoding to word embedding vectors of input sequence.
+    """
+    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
+        super().__init__()
+        self.dropout = nn.Dropout(p=dropout)
+
+        # Create a sufficiently long positional encoding matrix
+        position = torch.arange(max_len).unsqueeze(1)
+        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
+
+        # pe (positional encoding) size is (max_len, d_model)
+        pe = torch.zeros(1, max_len, d_model)
+
+        # Even dimensions use sin, odd dimensions use cos
+        pe[:, 0, 0::2] = torch.sin(position * div_term)
+        pe[:, 0, 1::2] = torch.cos(position * div_term)
+
+        # Register pe as buffer, so it won't be treated as model parameter but will move with the model (e.g., to(device))
+        self.register_buffer('pe', pe)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        # x.size(0) is the current input sequence length
+        # Add positional encoding to input vector
+        x = x + self.pe[:x.size(0)]
+        return self.dropout(x)
+```
+
+This subsection mainly helps understand the macro structure of Transformer and the operational details of each internal module. Since it's to supplement the knowledge system of large models in agent learning, we won't continue to implement further. At this point, we have laid a solid architectural foundation for understanding modern large language models. In the next section, we will explore the Decoder-Only architecture and see how it evolved based on Transformer's ideas.
+
+### 3.1.3 Decoder-Only Architecture
+
+In the previous section, we built a complete Transformer model by hand, which performs excellently in many end-to-end scenarios. But when the task shifts to building a general model that can converse with people, create, and serve as an agent's brain, perhaps we don't need such a complex structure.
+
+Transformer's design philosophy is "understand first, then generate." The encoder is responsible for deeply understanding the entire input sentence, forming a contextual memory containing global information, then the decoder generates translation based on this memory. But when OpenAI developed **GPT (Generative Pre-trained Transformer)**, they proposed a simpler idea<sup>[5]</sup>: **Isn't the core task of language to predict the next most likely word?**
+
+Whether answering questions, writing stories, or generating code, essentially it's adding the most reasonable content word by word after an existing text sequence. Based on this idea, GPT made a bold simplification: **It completely abandoned the encoder and only kept the decoder part.** This is the origin of the **Decoder-Only** architecture.
+
+The working mode of the Decoder-Only architecture is called **Autoregressive**. This professional-sounding term actually describes a very simple process:
+
+1. Give the model a starting text (for example, "Datawhale Agent is").
+2. The model predicts the next most likely word (for example, "a").
+3. The model adds the word "a" it just generated to the end of the input text, forming a new input ("Datawhale Agent is a").
+4. Based on this new input, the model predicts the next word again (for example, "powerful").
+5. Continuously repeat this process until a complete sentence is generated or a stop condition is reached.
+
+The model is like playing a "word chain" game, constantly "reviewing" the content it has already written, then thinking about what the next word should be.
+
+You might ask, how does the decoder ensure that when predicting the `t`-th word, it doesn't "peek" at the answer of the `t+1`-th word?
+
+The answer is **Masked Self-Attention**. In the Decoder-Only architecture, this mechanism becomes crucial. Its working principle is very clever:
+
+After the self-attention mechanism calculates the attention score matrix (i.e., each word's attention score to all other words), but before performing Softmax normalization, the model applies a "mask." This mask replaces the scores corresponding to all tokens located after the current position (i.e., not yet observed) with a very large negative number. When this matrix with negative infinity scores goes through the Softmax function, the probabilities at these positions become 0. This way, when the model calculates the output at any position, it is mathematically prevented from attending to information after it. This mechanism ensures that when predicting the next word, the model can and only can rely on all information it has already seen, located before the current position, thereby ensuring fairness of prediction and coherence of logic.
+
+**Advantages of Decoder-Only Architecture**
+
+This seemingly simple architecture has brought tremendous success, with advantages including:
+
+- **Unified Training Objective**: The model's only task is to "predict the next word," a simple goal very suitable for pre-training on massive unlabeled text data.
+- **Simple Structure, Easy to Scale**: Fewer components mean easier scaling. Today's GPT-4, Llama, and other giant models with hundreds of billions or even trillions of parameters are all based on this concise architecture.
+- **Naturally Suited for Generation Tasks**: Its autoregressive working mode perfectly matches all generative tasks (dialogue, writing, code generation, etc.), which is also the core reason it can become the foundation for building general agents.
+
+In summary, the Decoder-Only architecture evolved from Transformer's decoder, through the simple paradigm of "predicting the next word," opened the era of large language models we are in today.
+
+## 3.2 Interacting with Large Language Models
+
+### 3.2.1 Prompt Engineering
+
+If we compare large language models to an extremely capable "brain," then **Prompt** is the language we use to communicate with this "brain." Prompt engineering is the study of how to design precise prompts to guide the model to produce the responses we expect. For building agents, a carefully designed prompt can make collaboration and division of labor between agents efficient.
+
+**(1) Model Sampling Parameters**
+
+When using large models, you often see configurable parameters like `Temperature`. Their essence is to adjust the model's sampling strategy for "probability distribution" to match specific scenario needs. Configuring appropriate parameters can improve Agent performance in specific scenarios.
+
+The traditional probability distribution is calculated by the Softmax formula: $p_i = \frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}}$. The essence of sampling parameters is to "readjust" or "truncate" the distribution based on different strategies, thereby changing the next token output by the large model.
+
+`Temperature`: Temperature is a key parameter controlling the "randomness" and "determinism" of model output. Its principle is to introduce a temperature coefficient $T\gt0$, rewriting Softmax as $p_i^{(T)} = \frac{e^{z_i / T}}{\sum_{j=1}^k e^{z_j / T}}$.
+
+When T decreases, the distribution becomes "steeper," high-probability item weights are further amplified, generating more "conservative" text with higher repetition rates. When T increases, the distribution becomes "flatter," low-probability item weights increase, generating more "diverse" but possibly incoherent content.
+
+- Low temperature (0 $\leqslant$ Temperature $\lt$ 0.3): Output is more "precise, deterministic." Applicable scenarios: Factual tasks: such as Q&A, data calculation, code generation; Rigorous scenarios: legal text interpretation, technical documentation writing, academic concept explanation, etc.
+
+- Medium temperature (0.3 $\leqslant$ Temperature $\lt$ 0.7): Output is "balanced, natural." Applicable scenarios: Daily conversation: such as customer service interaction, chatbots; Regular creation: such as email writing, product copy, simple story creation.
+
+- High temperature (0.7 $\leqslant$ Temperature $\lt$ 2): Output is "innovative, divergent." Applicable scenarios: Creative tasks: such as poetry creation, science fiction story conception, advertising slogan brainstorming, artistic inspiration; Divergent thinking.
+
+`Top-k`: Its principle is to sort all tokens by probability from high to low, take the top k tokens to form a "candidate set," then "normalize" the probabilities of the filtered k tokens: $ \hat{p}_i = \frac{p_i}{\sum_{j \in \text{candidate set}} p_j}$
+
+- Difference and connection with temperature sampling: Temperature sampling adjusts the probability distribution of all tokens (smooth or steep) through temperature T, without changing the number of candidate tokens (still considering all N). Top-k sampling limits the number of candidate tokens (only keeping the top k high-probability tokens) through the k value, then samples from them. When k=1, output is completely deterministic, degenerating to "greedy sampling."
+
+`Top-p`: Its principle is to sort all tokens by probability from high to low, starting from the first token after sorting, gradually accumulating probabilities until the cumulative sum first reaches or exceeds threshold p: $\sum_{i \in S} p_{(i)} \geq p$. At this point, all tokens included in the accumulation process form the "nucleus set," and finally the nucleus set is normalized.
+
+- Difference and connection with Top-k: Compared to Top-k with fixed truncation size, Top-p can dynamically adapt to the "long tail" characteristics of different distributions, with better adaptability to extreme cases of uneven probability distribution.
+
+In text generation, when Top-p, Top-k, and temperature coefficient are set simultaneously, these parameters work together in a layered filtering manner, with priority order: temperature adjustment → Top-k → Top-p. Temperature adjusts the overall steepness of the distribution, Top-k first retains the k candidates with highest probability, then Top-p selects the minimum set with cumulative probability ≥ p from Top-k results as the final candidate set. However, usually choosing one of Top-k or Top-p is sufficient; if both are set, the actual candidate set is the intersection of the two.
+Note that if temperature is set to 0, Top-k and Top-p become irrelevant because the most likely Token will be the next predicted Token; if Top-k is set to 1, temperature and Top-p also become irrelevant because only one Token passes the Top-k criterion and it will be the next predicted Token.
+
+**(2) Zero-shot, One-shot, and Few-shot Prompting**
+
+According to the number of examples (Exemplars) we provide to the model, prompts can be divided into three types. To better understand them, let's use a sentiment classification task as an example, with the goal of having the model judge the emotional tone of a text (such as positive, negative, or neutral).
+
+**Zero-shot Prompting** This means we don't give the model any examples and directly ask it to complete the task based on instructions. This benefits from the model's powerful generalization ability acquired after pre-training on massive data.
+
+Case: We directly give the model instructions, requiring it to complete the sentiment classification task.
+
+```Python
+Text: Datawhale's AI Agent course is excellent!
+Sentiment: Positive
+```
+
+**One-shot Prompting** We provide the model with one complete example, showing it the task format and expected output style.
+
+We provide the model with one complete example, showing it the task format and expected output style.
+
+Case: We first give the model a complete "question-answer" pair as a demonstration, then pose our new question.
+
+```Python
+Text: This restaurant's service is too slow.
+Sentiment: Negative
+
+Text: Datawhale's AI Agent course is excellent!
+Sentiment:
+```
+
+The model will imitate the given example format and complete "Positive" for the second text.
+
+**Few-shot Prompting** We provide multiple examples, which allows the model to more accurately understand the task's details, boundaries, and nuances, thereby achieving better performance.
+
+Case: We provide multiple examples covering different situations, allowing the model to have a more comprehensive understanding of the task.
+
+```Python
+Text: This restaurant's service is too slow.
+Sentiment: Negative
+
+Text: This movie's plot is very bland.
+Sentiment: Neutral
+
+Text: Datawhale's AI Agent course is excellent!
+Sentiment:
+```
+
+The model will synthesize all examples and more accurately classify the sentiment of the last sentence as "Positive."
+
+**(3) Impact of Instruction Tuning**
+
+Early GPT models (such as GPT-3) were mainly "text completion" models; they were good at continuing text based on preceding text but not necessarily good at understanding and executing human instructions.
+
+**Instruction Tuning** is a fine-tuning technique that uses a large amount of "instruction-answer" format data to further train pre-trained models. After instruction tuning, models can better understand and follow user instructions. All models we use in daily work and study today (such as `ChatGPT`, `DeepSeek`, `Qwen`) are instruction-tuned models in their model families.
+
+- **Prompts for "text completion" models (you need to use few-shot prompts to "teach" the model what to do):**
+
+```Plain
+This is a program that translates English to Chinese.
+English: Hello
+Chinese: 你好
+English: How are you?
+Chinese:
+```
+
+- **Prompts for "instruction-tuned" models (you can directly give instructions):**
+
+```Plain
+Please translate the following English to Chinese:
+How are you?
+```
+
+The emergence of instruction tuning has greatly simplified how we interact with models, making direct, clear natural language instructions possible.
+
+**(4) Basic Prompting Techniques**
+
+**Role-playing** By assigning the model a specific role, we can guide its response style, tone, and knowledge scope, making its output more suitable for specific scenario needs.
+
+```Plain
+# Case
+You are now a senior Python programming expert. Please explain what GIL (Global Interpreter Lock) is in Python in a way that even a beginner can understand.
+```
+
+**In-context Example** This is consistent with the idea of few-shot prompting. By providing clear input-output examples in the prompt, we "teach" the model how to handle our requests, which is especially effective when dealing with complex formats or specific style tasks.
+
+```Plain
+# Case
+I need you to extract product names and user sentiment from product reviews. Please output strictly in the JSON format below.
+
+Review: The screen display of this "Stardust" laptop is amazing, but I don't really like its keyboard feel.
+Output: {"product_name": "Stardust Laptop", "sentiment": "Mixed"}
+
+Review: The "SoundMove" headphones I just bought have great sound quality, and the battery life exceeded my expectations!
+Output:
+```
+
+**(5) Chain-of-Thought**
+
+For complex problems requiring logical reasoning, calculation, or multi-step thinking, directly asking the model to give an answer often leads to errors. **Chain-of-Thought (CoT)** is a powerful prompting technique that improves the model's reasoning ability on complex tasks by guiding it to "think step by step."
+
+The key to implementing CoT is to add a simple guiding phrase in the prompt, such as "please think step by step" or "Let's think step by step."
+
+```Plain
+# Chain-of-Thought Prompt
+A basketball team won 60% of their 80 games in one season. In the next season, they played 15 games and won 12. What is the total winning percentage for both seasons?
+Please think step by step and solve.
+
+>>>
+(The model might output)
+Okay, let's calculate step by step.
+Step 1: Calculate the number of games won in the first season.
+80 games * 60% = 48 games.
+Step 2: Calculate the total number of games and total wins for both seasons.
+Total games = 80 + 15 = 95 games.
+Total wins = 48 + 12 = 60 games.
+Step 3: Calculate the total winning percentage.
+Total winning percentage = (Total wins / Total games) * 100% = (60 / 95) * 100% ≈ 63.16%.
+So, the total winning percentage for both seasons is approximately 63.16%.
+```
+
+By explicitly showing its reasoning process, the model not only more easily arrives at the correct answer but also makes its response more credible and easier for us to check and correct.
+
+### 3.2.2 Text Tokenization
+
+We know that computers essentially can only understand numbers. Therefore, before feeding natural language text to large language models, it must first be converted into a numerical format that the model can process. This process of converting text sequences into numerical sequences is called **Tokenization**. The role of a **Tokenizer** is to define a set of rules to split raw text into minimal units, which we call **Tokens**.
+
+**3.2.2.1 Why Tokenization is Needed**
+
+Early natural language processing tasks might adopt simple tokenization strategies:
+
+- **Word-based**: Directly split sentences into words using spaces or punctuation. This method is intuitive but faces the problem of "vocabulary explosion." A language's vocabulary is huge; if each word is treated as an independent token, the vocabulary becomes difficult to manage. Worse, the model will be unable to handle any words not appearing in the vocabulary, such as "DatawhaleAgent."
+- **Character-based**: Split text into individual characters. This method has a very small vocabulary (e.g., English letters, numbers, and punctuation) and no OOV (Out-Of-Vocabulary) problem. But its disadvantage is that most individual characters don't have independent semantics, and the model needs to spend more effort learning how to combine characters into meaningful words, leading to low learning efficiency.
+
+To balance vocabulary size and semantic expression, modern large language models generally adopt **Subword Tokenization** algorithms. The core idea is: keep common words (such as "agent") as complete tokens while splitting uncommon words (such as "Tokenization") into multiple meaningful subword fragments (such as "Token" and "ization"). This both controls vocabulary size and allows the model to understand and generate new words by combining subwords.
+
+**3.2.2.2 Byte-Pair Encoding Algorithm Analysis**
+
+Byte-Pair Encoding (BPE) is one of the most mainstream subword tokenization algorithms<sup>[6]</sup>, adopted by the GPT series models. Its core idea is very concise and can be understood as a "greedy" merging process:
+
+1. **Initialization**: Initialize the vocabulary to all basic characters appearing in the corpus.
+2. **Iterative Merging**: In the corpus, count the frequency of all adjacent token pairs, find the pair with the highest frequency, merge them into a new token, and add it to the vocabulary.
+3. **Repeat**: Repeat step 2 until the vocabulary size reaches a preset threshold.
+
+**Case Demonstration:** Suppose our mini corpus is `{"hug": 1, "pug": 1, "pun": 1, "bun": 1}`, and we want to build a vocabulary of size 10. The BPE training process can be represented by Table 3.1:
+
+<div align="center">
+  <p>Table 3.1 Example of BPE Algorithm Merging Process</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/3-figures/1757249275674-5.png" alt="Figure description" width="90%"/>
+</div>
+
+After training ends, when the vocabulary size reaches 10, we get new tokenization rules. Now, for an unseen word "bug," the tokenizer will first check if "bug" is in the vocabulary and find it's not; then check "bu" and find it's not; finally check "b" and "ug," find both are in, and thus split it into `['b', 'ug']`.
+
+Below we use a simple Python code to simulate the above process:
+
+```Python
+import re, collections
+
+def get_stats(vocab):
+    """Count token pair frequencies"""
+    pairs = collections.defaultdict(int)
+    for word, freq in vocab.items():
+        symbols = word.split()
+        for i in range(len(symbols)-1):
+            pairs[symbols[i],symbols[i+1]] += freq
+    return pairs
+
+def merge_vocab(pair, v_in):
+    """Merge token pairs"""
+    v_out = {}
+    bigram = re.escape(' '.join(pair))
+    p = re.compile(r'(?<!\S)' + bigram + r'(?!\S)')
+    for word in v_in:
+        w_out = p.sub(''.join(pair), word)
+        v_out[w_out] = v_in[word]
+    return v_out
+
+# Prepare corpus, add </w> at the end of each word to indicate ending, and split characters
+vocab = {'h u g </w>': 1, 'p u g </w>': 1, 'p u n </w>': 1, 'b u n </w>': 1}
+num_merges = 4 # Set number of merges
+
+for i in range(num_merges):
+    pairs = get_stats(vocab)
+    if not pairs:
+        break
+    best = max(pairs, key=pairs.get)
+    vocab = merge_vocab(best, vocab)
+    print(f"Merge {i+1}: {best} -> {''.join(best)}")
+    print(f"New vocabulary (partial): {list(vocab.keys())}")
+    print("-" * 20)
+
+>>>
+Merge 1: ('u', 'g') -> ug
+New vocabulary (partial): ['h ug </w>', 'p ug </w>', 'p u n </w>', 'b u n </w>']
+--------------------
+Merge 2: ('ug', '</w>') -> ug</w>
+New vocabulary (partial): ['h ug</w>', 'p ug</w>', 'p u n </w>', 'b u n </w>']
+--------------------
+Merge 3: ('u', 'n') -> un
+New vocabulary (partial): ['h ug</w>', 'p ug</w>', 'p un </w>', 'b un </w>']
+--------------------
+Merge 4: ('un', '</w>') -> un</w>
+New vocabulary (partial): ['h ug</w>', 'p ug</w>', 'p un</w>', 'b un</w>']
+--------------------
+```
+
+This code clearly demonstrates how the BPE algorithm gradually builds and expands the vocabulary by iteratively merging the highest-frequency adjacent token pairs.
+
+Many subsequent algorithms are optimizations based on BPE. Among them, WordPiece and SentencePiece developed by Google are the two most influential.
+
+- **WordPiece**: The algorithm adopted by Google's BERT model<sup>[7]</sup>. It is very similar to BPE, but the criterion for merging tokens is not "highest frequency" but "maximizing the improvement of the corpus's language model probability." Simply put, it prioritizes merging token pairs that can maximize the "fluency" improvement of the entire corpus.
+- **SentencePiece**: An open-source tokenization tool by Google<sup>[8]</sup>, adopted by the Llama series models. Its biggest feature is treating spaces as ordinary characters (usually represented by underscore `_`). This makes the tokenization and decoding process completely reversible and independent of specific languages (for example, it doesn't need to know that Chinese doesn't use spaces for word segmentation).
+
+**3.2.2.3 Significance of Tokenizers for Developers**
+
+Understanding the details of tokenization algorithms is not the goal, but as an agent developer, understanding the actual impact of tokenizers is important, as it directly relates to agent performance, cost, and stability:
+
+- **Context Window Limitation**: The model's context window (such as 8K, 128K) is calculated in **Token count**, not character count or word count. The same text may have vastly different Token counts in different languages (such as Chinese and English) or with different tokenizers. Precisely managing input length and avoiding exceeding context limits is the foundation for building long-term memory agents.
+- **API Cost**: Most model APIs charge based on Token count. Understanding how your text will be tokenized is a key step in estimating and controlling agent operating costs.
+- **Model Performance Anomalies**: Sometimes strange model behavior stems from tokenization. For example, the model might be good at calculating `2 + 2` but might make mistakes with `2+2` (without spaces) because the latter might be treated by the tokenizer as an independent, uncommon token. Similarly, a word with different capitalization of the first letter might be split into completely different Token sequences, affecting the model's understanding. Considering these "traps" when designing prompts and parsing model outputs helps improve agent robustness.
+
+### 3.2.3 Calling Open-Source Large Language Models
+
+In Chapter 1 of this book, we interacted with large language models through APIs to drive our agents. This is a fast and convenient method, but not the only one. For many scenarios requiring sensitive data processing, offline operation, or fine cost control, deploying large language models directly locally becomes crucial.
+
+**Hugging Face Transformers** is a powerful open-source library that provides standardized interfaces to load and use tens of thousands of pre-trained models. We will use it to complete this practice.
+
+**Environment Configuration and Model Selection**: To ensure most readers can run smoothly on personal computers, we deliberately chose a small-scale but powerful model: `Qwen/Qwen1.5-0.5B-Chat`. This is a dialogue model with about 500 million parameters open-sourced by Alibaba DAMO Academy. It's small in size, excellent in performance, and very suitable for introductory learning and local deployment.
+
+First, please ensure you have installed the necessary libraries:
+
+```Plain
+pip install transformers torch
+```
+
+In the `transformers` library, we typically use the `AutoModelForCausalLM` and `AutoTokenizer` classes to automatically load weights and tokenizers matching the model. The following code will automatically download required model files and tokenizer configurations from Hugging Face Hub, which may take some time depending on your network speed.
+
+```Python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+# Specify model ID
+model_id = "Qwen/Qwen1.5-0.5B-Chat"
+
+# Set device, prioritize GPU
+device = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"Using device: {device}")
+
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+
+# Load model and move it to specified device
+model = AutoModelForCausalLM.from_pretrained(model_id).to(device)
+
+print("Model and tokenizer loaded!")
+```
+
+Let's create a dialogue prompt. The Qwen1.5-Chat model follows a specific dialogue template. Then, we can use the `tokenizer` loaded in the previous step to convert the text prompt into numerical IDs (i.e., Token IDs) that the model can understand.
+
+```Python
+# Prepare dialogue input
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Hello, please introduce yourself."}
+]
+
+# Use tokenizer's template to format input
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+
+# Encode input text
+model_inputs = tokenizer([text], return_tensors="pt").to(device)
+
+print("Encoded input text:")
+print(model_inputs)
+
+>>>
+{'input_ids': tensor([[151644, 8948, 198, 2610, 525, 264,  10950, 17847, 13,151645, 198, 151644, 872, 198, 108386, 37945, 100157, 107828,1773, 151645, 198, 151644, 77091, 198]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
+       device='cuda:0')}
+```
+
+Now we can call the model's `generate()` method to generate an answer. The model will output a series of Token IDs representing its answer.
+
+Finally, we need to use the tokenizer's `decode()` method to translate these numerical IDs back into human-readable text.
+
+```Python
+# Use model to generate answer
+# max_new_tokens controls the maximum number of new Tokens the model can generate
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=512
+)
+
+# Truncate the input part from generated Token IDs
+# This way we only decode the newly generated part by the model
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+
+# Decode generated Token IDs
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+
+print("\nModel's answer:")
+print(response)
+
+>>>
+My name is Tongyi Qianwen, a pre-trained language model developed by Alibaba Cloud. I can answer questions, create text, express opinions, and write code. My main functions are to provide help in multiple fields, including but not limited to: language understanding, text generation, machine translation, question-answering systems, etc. Is there anything I can help you with?
+```
+
+After running all the code, you will see the model-generated introduction about the Qwen model on your local computer. Congratulations, you have successfully deployed and run an open-source large language model locally!
+
+### 3.2.4 Model Selection
+
+In the previous section, we successfully ran a small open-source language model locally. This naturally raises a crucial question for agent developers: in the current context of hundreds of blooming models, how should we choose the most suitable model for specific tasks?
+
+Choosing a language model is not simply pursuing "the biggest, the strongest" but a decision-making process balancing performance, cost, speed, and deployment methods. This section will first organize several key considerations for model selection, then review current mainstream closed-source and open-source models.
+
+Since large language model technology is in a stage of rapid development, with new models and versions emerging constantly and extremely fast iteration, this section strives to provide an overview of current mainstream models and selection considerations when written, but readers should note that specific model versions and performance data mentioned may change over time, and only some work is listed, not comprehensively. We focus more on introducing core technical characteristics, development trends, and general selection principles in agent development.
+
+**3.2.4.1 Key Considerations for Model Selection**
+
+When choosing a large language model for your agent, you can comprehensively evaluate from the following dimensions:
+
+- **Performance and Capability**: This is the core consideration. Different models excel at different tasks; some are good at logical reasoning and code generation, while others are better at creative writing or multilingual translation. You can refer to some public benchmark leaderboards (such as LMSys Chatbot Arena Leaderboard) to evaluate models' comprehensive capabilities.
+- **Cost**: For closed-source models, cost mainly manifests in API call fees, usually charged by Token count. For open-source models, cost manifests in hardware (GPU, memory) and operations required for local deployment. Choices need to be made based on application's expected usage and budget.
+- **Speed (Latency)**: For agents requiring real-time interaction (such as customer service, game NPCs), model response speed is crucial. Some lightweight or optimized models (such as GPT-3.5 Turbo, Claude 3.5 Sonnet) perform better in latency.
+- **Context Window**: The upper limit of Token count the model can process at once. For agents needing to understand long documents, analyze code repositories, or maintain long-term conversation memory, choosing a model with a larger context window (such as 128K Tokens or higher) is necessary.
+- **Deployment Method**: Using APIs is simplest and most convenient, but data needs to be sent to third parties and is subject to service provider terms. Local deployment can ensure data privacy and highest degree of autonomy, but has higher technical and hardware requirements.
+- **Ecosystem and Toolchain**: A model's popularity also determines the maturity of its surrounding ecosystem. Mainstream models usually have richer community support, tutorials, pre-trained models, fine-tuning tools, and compatible development frameworks (such as LangChain, LlamaIndex, Hugging Face Transformers), which can greatly accelerate development and reduce difficulty. Choosing a model with an active community and complete toolchain makes it easier to find solutions and resources when encountering problems.
+- **Fine-tunability and Customization**: For agents needing to process domain-specific data or perform specific tasks, model fine-tuning capability is crucial. Some models provide convenient fine-tuning interfaces and tools, allowing developers to customize training using their own datasets, significantly improving model performance and accuracy in specific scenarios. Open-source models usually provide greater flexibility in this regard.
+- **Safety and Ethics**: With widespread application of large language models, their potential safety risks and ethical issues are increasingly prominent. When choosing models, consider their performance in bias, toxicity, hallucination, etc., and service providers' or open-source communities' investment in model safety and responsible AI. For applications facing the public or involving sensitive information, model safety and ethical compliance are considerations that cannot be ignored.
+
+**3.2.4.2 Overview of Closed-Source Models**
+
+Closed-source models usually represent the cutting edge of current AI technology and provide stable, easy-to-use API services, making them the first choice for building high-performance agents.
+
+1. **OpenAI GPT Series**: From GPT-3 that opened the large model era, to ChatGPT that introduced RLHF (Reinforcement Learning from Human Feedback) and achieved alignment with human intent, to GPT-4 that opened the multimodal era, OpenAI continues to lead industry development. The latest GPT-5 further elevates multimodal capabilities and general intelligence to new heights, seamlessly processing text, audio, and image inputs and generating corresponding outputs, with significantly improved response speed and naturalness, especially excelling in real-time voice dialogue.
+2. **Google Gemini Series**: Google DeepMind's Gemini series models are representatives of native multimodality, with the core feature of unified processing of multiple modalities including text, code, audio/video, and images, and advantages in massive information processing with ultra-long context windows. Gemini Ultra is its most powerful model, suitable for highly complex tasks; Gemini Pro is suitable for a wide range of tasks, providing high performance and efficiency; Gemini Nano is optimized for on-device deployment. The latest Gemini 2.5 series models, such as Gemini 2.5 Pro and Gemini 2.5 Flash, further improve reasoning capabilities and context windows, especially Gemini 2.5 Flash with faster inference speed and cost-effectiveness, suitable for scenarios requiring quick responses.
+3. **Anthropic Claude Series**: Anthropic is a company focused on AI safety and responsible AI. Its Claude series models have prioritized AI safety from the design stage, renowned for reliability in handling long documents, reducing harmful outputs, and following instructions, deeply favored by enterprise applications. Claude 3 series includes Claude 3 Opus (most intelligent, strongest performance), Claude 3 Sonnet (balanced choice of performance and speed), and Claude 3 Haiku (fastest, most compact model, suitable for near real-time interaction). The latest Claude 4 series models, such as Claude 4 Opus, have made significant progress in general intelligence, complex reasoning, and code generation, further improving capabilities in handling long contexts and multimodal tasks.
+4. **Domestic Mainstream Models**: China has emerged with many competitive closed-source models in the large language model field, represented by Baidu ERNIE Bot, Tencent Hunyuan, Huawei Pangu-α, iFlytek SparkDesk, and Moonshot AI. These domestic models have natural advantages in Chinese processing and deeply empower local industries.
+
+**3.2.4.3 Overview of Open-Source Models**
+
+Open-source models provide developers with the highest degree of flexibility, transparency, and autonomy, catalyzing a prosperous community ecosystem. They allow developers to deploy locally, perform customized fine-tuning, and have complete model control.
+
+- **Meta Llama Series**: Meta's Llama series is an important milestone in open-source large language models. The series has become the foundation for many derivative projects and research with excellent comprehensive performance, open licensing agreements, and strong community support. Llama 4 series was released in April 2025, Meta's first models adopting Mixture of Experts (MoE) architecture, which significantly improves computational efficiency by only activating model parts needed to process specific tasks. The series includes three distinctly positioned models: Llama 4 Scout supports a 10 million token context window designed for long document analysis and mobile deployment. Llama 4 Maverick focuses on multimodal capabilities, excelling in coding, complex reasoning, and multilingual support. Llama 4 Behemoth outperforms competitors in multiple STEM benchmarks and is Meta's most powerful model currently.
+- **Mistral AI Series**: Mistral AI from France is renowned for its "small size, high performance" model design. Its latest model Mistral Medium 3.1 was released in August 2025, with significantly improved accuracy and response speed in tasks such as code generation, STEM reasoning, and cross-domain Q&A, with benchmark performance superior to Claude Sonnet 3.7 and Llama 4 Maverick and other similar models. It has native multimodal capabilities, can simultaneously process mixed image and text inputs, and has a built-in "tone adaptation layer" to help enterprises more easily achieve brand-aligned outputs.
+- **Domestic Open-Source Forces**: Domestic manufacturers and research institutions are also actively embracing open source, such as Alibaba's **Qwen (Tongyi Qianwen)** series and Tsinghua University's collaboration with Zhipu AI's **ChatGLM** series. They provide powerful Chinese capabilities and have built active communities around themselves.
+
+For agent developers, closed-source models provide "out-of-the-box" convenience, while open-source models grant us "customization freedom." Understanding the characteristics and representative models of these two camps is the first step in making wise technical selections for our agent projects.
+
+## 3.3 Scaling Laws and Limitations of Large Language Models
+
+Large Language Models (LLMs) have made remarkable progress in recent years, with continuously expanding capability boundaries and increasingly rich application scenarios. However, behind these achievements lies a deep understanding of the relationship between model scale, data volume, and computational resources, namely **Scaling Laws**. Meanwhile, as an emerging technology, LLMs also face many challenges and limitations. This section will deeply explore these core concepts, aiming to help readers comprehensively understand LLMs' capability boundaries, thereby leveraging strengths and avoiding weaknesses when building agents.
+
+### 3.3.1 Scaling Laws
+
+**Scaling Laws** are one of the most important discoveries in the large language model field in recent years. They reveal that there are predictable power-law relationships between model performance and model parameter count, training data volume, and computational resources. This discovery provides theoretical guidance for the continuous development of large language models, clarifying the underlying logic that increasing resource investment can systematically improve model performance.
+
+Research found that in log-log coordinate systems, model performance (usually measured by Loss) shows smooth power-law relationships with all three factors: parameter count, data volume, and computation<sup>[9]</sup>. Simply put, as long as we continuously and proportionally increase these three elements, model performance will predictably and smoothly improve without obvious bottlenecks. This discovery provides clear guidance for large model design and training: within resource constraints, maximize model scale and training data volume as much as possible.
+
+Early research focused more on increasing model parameter count, but DeepMind's "Chinchilla Law" proposed in 2022 made important corrections<sup>[10]</sup>. This law points out that under a given computational budget, to achieve optimal performance, **there is an optimal ratio between model parameter count and training data volume**. Specifically, optimal models should be smaller than previously commonly believed but need to be trained with much more data. For example, a 70 billion parameter Chinchilla model, because it was trained with 4 times more data than GPT-3 (175 billion parameters), actually outperforms the latter. This discovery corrected the one-sided perception of "bigger is better," emphasized the importance of data efficiency, and guided the design of many subsequent efficient large models (such as the Llama series).
+
+The most surprising product of scaling laws is "capability emergence." So-called capability emergence refers to when model scale reaches a certain threshold, it suddenly exhibits completely new capabilities that don't exist or perform poorly in small-scale models. For example, **Chain-of-Thought**, **Instruction Following**, multi-step reasoning, code generation, and other capabilities all significantly appeared only after model parameter counts reached tens or even hundreds of billions. This phenomenon indicates that large language models are not simply memorizing and reciting; they may have formed some deeper level of abstraction and reasoning capabilities during learning. For agent developers, capability emergence means choosing a sufficiently large-scale model is a prerequisite for achieving complex autonomous decision-making and planning capabilities.
+
+### 3.3.2 Model Hallucination
+
+**Model Hallucination** usually refers to content generated by large language models that contradicts objective facts, user input, or contextual information, or generates non-existent facts, entities, or events. The essence of hallucination is that models over-confidently "fabricate" information during generation rather than accurately retrieving or reasoning. According to manifestation forms, hallucinations can be divided into multiple types<sup>[11]</sup>, such as:
+
+- **Factual Hallucinations**: Models generate information inconsistent with real-world facts.
+- **Faithfulness Hallucinations**: In tasks like text summarization and translation, generated content fails to faithfully reflect source text meaning.
+- **Intrinsic Hallucinations**: Model-generated content directly contradicts input information.
+
+Hallucination production results from multiple factors working together. First, training data may contain erroneous or contradictory information. Second, the model's autoregressive generation mechanism determines it only predicts the next most likely token without a built-in fact-checking module. Finally, when facing tasks requiring complex reasoning, models may make errors in logical chains, thus "fabricating" wrong conclusions. For example: a travel planning Agent might recommend a non-existent scenic spot or book a ticket with an incorrect flight number.
+
+Additionally, large language models face challenges such as insufficient knowledge timeliness and biases in training data. Large language model capabilities come from their training data. This means the knowledge the model possesses is the latest material when its training data was collected. For events occurring after this date, newly emerged concepts, or latest facts, the model will be unable to perceive or correctly answer. Meanwhile, training data often contains various biases and stereotypes from human society. When models learn on this data, they inevitably absorb and reflect these biases<sup>[12]</sup>.
+
+To improve large language model reliability, researchers and developers are actively exploring multiple methods to detect and mitigate hallucinations:
+
+1. **Data Level**: Reduce hallucinations from the source through high-quality data cleaning, introducing factual knowledge, and Reinforcement Learning from Human Feedback (RLHF)<sup>[13]</sup>.
+2. **Model Level**: Explore new model architectures or enable models to express uncertainty about generated content.
+3. **Inference and Generation Level**:
+   1. **Retrieval-Augmented Generation (RAG)**<sup>[14]</sup>: This is currently one of the effective methods to mitigate hallucinations. RAG systems retrieve relevant information from external knowledge bases (such as document databases, web pages) before generation, then use retrieved information as context to guide models to generate fact-based answers.
+   2. **Multi-step Reasoning and Verification**: Guide models to perform multi-step reasoning and conduct self-checking or external verification at each step.
+   3. **Introducing External Tools**: Allow models to call external tools (such as search engines, calculators, code interpreters) to obtain real-time information or perform precise calculations.
+
+Although hallucination problems are difficult to completely eliminate in the short term, through the above strategies, their occurrence frequency and impact can be significantly reduced, improving large language model reliability and practicality in actual applications.
+
+## 3.4 Chapter Summary
+
+This chapter introduced foundational knowledge needed for building agents, focusing on large language models (LLMs) as their core component. Content started from early language model development, detailed the Transformer architecture, and introduced methods for interacting with LLMs. Finally, this chapter organized current mainstream model ecosystems, development patterns, and their inherent limitations.
+
+**Core Knowledge Review:**
+
+- **Model Evolution and Core Architecture**: This chapter traced from statistical language models (N-gram) to neural network models (RNN, LSTM), to the Transformer architecture that laid the foundation for modern LLMs. Through "top-down" code implementation, this chapter dissected Transformer's core components and explained the self-attention mechanism's key role in parallel computation and capturing long-distance dependencies.
+- **Interaction Methods with Models**: This chapter introduced two core aspects of interacting with LLMs: Prompt Engineering and Tokenization. The former guides model behavior, the latter is the foundation for understanding model input processing. Through practice of deploying and running open-source models locally, theoretical knowledge was applied to actual operations.
+- **Model Ecosystem and Selection**: This chapter systematically organized key factors to weigh when choosing models for agents and overviewed characteristics and positioning of closed-source models represented by OpenAI GPT and Google Gemini and open-source models represented by Llama and Mistral.
+- **Laws and Limitations**: This chapter explored scaling laws driving LLM capability improvement and explained underlying principles. Meanwhile, this chapter also analyzed models' inherent limitations such as factual hallucinations and outdated knowledge, which is crucial for building reliable, robust agents.
+
+**From LLM Foundations to Building Agents:**
+
+This chapter's LLM foundations mainly help everyone better understand large models' birth and development process, which also contains some thinking about agent design. For example, how to design effective prompts to guide Agent planning and decision-making, how to choose appropriate models based on task requirements, and how to add verification mechanisms in Agent workflows to avoid model hallucinations—solutions to these problems are all built on this chapter's foundation. We are now ready to transition from theory to practice. In the next chapter, we will begin exploring classic agent paradigm construction, applying knowledge learned in this chapter to actual agent design.
+
+## Exercises
+
+1. In natural language processing, language models have evolved from statistical to neural network models.
+
+   - Please use the mini corpus provided in this chapter (`datawhale agent learns`, `datawhale agent works`) to calculate the probability of the sentence `agent works` under the Bigram model
+   - The core assumption of N-gram models is the Markov assumption. Please explain the meaning of this assumption and what fundamental limitations N-gram models have?
+   - How do neural network language models (RNN/LSTM) and Transformer overcome N-gram model limitations respectively? What are their respective advantages?
+
+2. The Transformer architecture<sup>[4]</sup> is the foundation of modern large language models. Among them:
+
+   > **Hint**: Can combine code implementation in Section 3.1.2 of this chapter to aid understanding
+
+   - What is the core idea of the Self-Attention mechanism?
+   - Why can Transformer process sequences in parallel while RNN must process serially? What role does Positional Encoding play?
+   - What is the difference between Decoder-Only architecture and complete Encoder-Decoder architecture? Why do current mainstream large language models all adopt Decoder-Only architecture?
+
+3. Text subword tokenization algorithms are a key technology for large language models, responsible for converting text into token sequences the model can process. Why can't we directly use "characters" or "words" as model input units? What problem does the BPE (Byte Pair Encoding) algorithm solve?
+
+4. Section 3.2.3 of this chapter introduced how to deploy open-source large language models locally. Please complete the following practice and analysis:
+
+   > **Hint**: This is a hands-on practice question; actual operation is recommended
+
+   - Following this chapter's guidance, deploy a lightweight open-source model locally (recommend [Qwen3-0.6B](https://modelscope.cn/models/Qwen/Qwen3-0.6B)), try adjusting sampling parameters and observe their impact on output
+   - Choose a specific task (such as text classification, information extraction, code generation, etc.), design and compare different prompt strategies (such as Zero-shot, Few-shot, Chain-of-Thought) and their effect differences on output results
+   - Compare closed-source models and open-source models from dimensions of performance, cost, controllability, privacy, etc.
+   - If you want to build an enterprise-level customer service agent, which type of model would you choose? What factors need to be considered?
+
+5. Model Hallucination<sup>[11]</sup> is one of the key limitations of current large language models. This chapter introduced methods to mitigate hallucinations (such as retrieval-augmented generation, multi-step reasoning, external tool invocation)
+
+   - Please choose one and explain its working principle and applicable scenarios
+   - Research cutting-edge studies and papers—are there other methods to mitigate model hallucinations, and what improvements and advantages do they have?
+
+6. Suppose you want to design a paper-assisted reading agent that can help researchers quickly read and understand academic papers, including: summarizing core content of paper research, answering questions about papers, extracting key information, comparing viewpoints of different papers, etc. Please answer:
+
+   - Which model would you choose as the base model when designing the agent? What factors need to be considered when choosing?
+   - How to design prompts to guide the model to better understand academic papers? Academic papers are usually very long and may exceed the model's context window limit—how would you solve this problem?
+   - Academic research is rigorous, meaning we need to ensure information generated by the agent is accurate, objective, and faithful to the original text. What designs do you think should be added to the system to better achieve this requirement?
+
+## References
+
+[1] Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. *Journal of Machine Learning Research*, 3, 1137-1155.
+
+[2] Elman, J. L. (1990). Finding structure in time. *Cognitive Science*, 14(2), 179-211.
+
+[3] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. *Neural Computation*, 9(8), 1735-1780.
+
+[4] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In *Advances in neural information processing systems* (pp. 5998-6008).
+
+[5] Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI.
+
+[6] Gage, P. (1994). A new algorithm for data compression. *C Users Journal*, *12*(2), 23-38.
+
+[7] Schuster, M., & Nakajima, K. (2012, March). Japanese and korean voice search. In *2012 IEEE international conference on acoustics, speech and signal processing (ICASSP)* (pp. 5149-5152). IEEE.
+
+[8] Kudo, T., & Richardson, J. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. *arXiv preprint arXiv:1808.06226*.
+
+[9] Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., ... & Amodei, D. (2020). Scaling Laws for Neural Language Models. arXiv preprint arXiv:2001.08361.
+
+[10] Hoffmann, J., Borgeaud, E., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, R., ... & Sifre, L. (2022). Training Compute-Optimal Large Language Models. arXiv preprint arXiv:2203.07678.
+
+[11] Ji, Z., Lee, N., Fries, R., Yu, T., & Su, D. (2023). Survey of Hallucination in Large Language Models.
+
+[12] Bender, E. M., Gebru, T., McMillan-Major, A., & Mitchell, M. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? .
+
+[13] Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. *arXiv preprint arXiv:1706.03741*.
+
+[14] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goswami, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In *Advances in neural information processing systems* (pp. 9459-9474).
+

+ 45 - 41
docs/chapter3/第三章 大语言模型基础.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter3-Fundamentals-of-Large-Language-Models.md">English</a> | 中文
+</div>
+
 # 第三章 大语言模型基础
 
 前两章分别介绍了智能体的定义和发展历史,本章将完全聚焦于大语言模型本身解答一个关键问题:现代智能体是如何工作的?我们将从语言模型的基本定义出发,通过对这些原理的学习,为理解LLM如何获得强大的知识储备与推理能力打下坚实的基础。
@@ -68,12 +72,12 @@ corpus = "datawhale agent learns datawhale agent works"
 tokens = corpus.split()
 total_tokens = len(tokens)
 
-# --- 第一步计算 P(datawhale) ---
+# --- 第一步:计算 P(datawhale) ---
 count_datawhale = tokens.count('datawhale')
 p_datawhale = count_datawhale / total_tokens
 print(f"第一步: P(datawhale) = {count_datawhale}/{total_tokens} = {p_datawhale:.3f}")
 
-# --- 第二步计算 P(agent|datawhale) ---
+# --- 第二步:计算 P(agent|datawhale) ---
 # 先计算 bigrams 用于后续步骤
 bigrams = zip(tokens, tokens[1:])
 bigram_counts = collections.Counter(bigrams)
@@ -82,13 +86,13 @@ count_datawhale_agent = bigram_counts[('datawhale', 'agent')]
 p_agent_given_datawhale = count_datawhale_agent / count_datawhale
 print(f"第二步: P(agent|datawhale) = {count_datawhale_agent}/{count_datawhale} = {p_agent_given_datawhale:.3f}")
 
-# --- 第三步计算 P(learns|agent) ---
+# --- 第三步:计算 P(learns|agent) ---
 count_agent_learns = bigram_counts[('agent', 'learns')]
 count_agent = tokens.count('agent')
 p_learns_given_agent = count_agent_learns / count_agent
 print(f"第三步: P(learns|agent) = {count_agent_learns}/{count_agent} = {p_learns_given_agent:.3f}")
 
-# --- 最后将概率连乘 ---
+# --- 最后:将概率连乘 ---
 p_sentence = p_datawhale * p_agent_given_datawhale * p_learns_given_agent
 print(f"最后: P('datawhale agent learns') ≈ {p_datawhale:.3f} * {p_agent_given_datawhale:.3f} * {p_learns_given_agent:.3f} = {p_sentence:.3f}")
 
@@ -179,9 +183,9 @@ king - man + woman 的结果向量: [0.9 0.2]
 
 为了解决长期依赖问题,<strong>长短时记忆网络 (Long Short-Term Memory, LSTM)</strong> 被设计出来<sup>[3]</sup>。LSTM 是一种特殊的 RNN,其核心创新在于引入了<strong>细胞状态 (Cell State)</strong> 和一套精密的<strong>门控机制 (Gating Mechanism)</strong> 。细胞状态可以看作是一条独立于隐藏状态的信息通路,允许信息在时间步之间更顺畅地传递。门控机制则是由几个小型神经网络构成,它们可以学习如何有选择地让信息通过,从而控制细胞状态中信息的增加与移除。这些门包括:
 
-- <strong>遗忘门 (Forget Gate)</strong> : 决定从上一时刻的细胞状态中丢弃哪些信息。
-- <strong>输入门 (Input Gate)</strong> : 决定将当前输入中的哪些新信息存入细胞状态。
-- <strong>输出门 (Output Gate)</strong> : 决定根据当前的细胞状态,输出哪些信息到隐藏状态。
+- <strong>遗忘门 (Forget Gate)</strong>决定从上一时刻的细胞状态中丢弃哪些信息。
+- <strong>输入门 (Input Gate)</strong>决定将当前输入中的哪些新信息存入细胞状态。
+- <strong>输出门 (Output Gate)</strong>决定根据当前的细胞状态,输出哪些信息到隐藏状态。
 
 ### 3.1.2 Transformer 架构解析
 
@@ -292,9 +296,9 @@ class DecoderLayer(nn.Module):
 
 为了实现上述过程,自注意力机制为每个输入的词元向量引入了三个可学习的角色:
 
-- <strong>查询 (Query, Q)</strong> : 代表当前词元,它正在主动地“查询”其他词元以获取信息。
-- <strong>键 (Key, K)</strong> : 代表句子中可被查询的词元“标签”或“索引”。
-- <strong>值 (Value, V)</strong> : 代表词元本身所携带的“内容”或“信息”。
+- <strong>查询 (Query, Q)</strong>代表当前词元,它正在主动地“查询”其他词元以获取信息。
+- <strong>键 (Key, K)</strong>代表句子中可被查询的词元“标签”或“索引”。
+- <strong>值 (Value, V)</strong>代表词元本身所携带的“内容”或“信息”。
 
 这三个向量都是由原始的词嵌入向量乘以三个不同的、可学习的权重矩阵 ($W^Q,W^K,W^V$) 得到的。整个计算过程可以分为以下几步,我们可以把它想象成一次高效的开卷考试:
 
@@ -420,8 +424,8 @@ class PositionWiseFeedForward(nn.Module):
 
 这个操作由两个部分组成:
 
-- <strong>残差连接 (Add)</strong> : 该操作将子模块的输入 `x` 直接加到该子模块的输出 `Sublayer(x)` 上。这一结构解决了深度神经网络中的<strong>梯度消失 (Vanishing Gradients)</strong> 问题。在反向传播时,梯度可以绕过子模块直接向前传播,从而保证了即使网络层数很深,模型也能得到有效的训练。其公式可以表示为:$\text{Output} = x + \text{Sublayer}(x)$。
-- <strong>层归一化 (Norm)</strong> : 该操作对单个样本的所有特征进行归一化,使其均值为0,方差为1。这解决了模型训练过程中的<strong>内部协变量偏移 (Internal Covariate Shift)</strong> 问题,使每一层的输入分布保持稳定,从而加速模型收敛并提高训练的稳定性。
+- <strong>残差连接 (Add)</strong>该操作将子模块的输入 `x` 直接加到该子模块的输出 `Sublayer(x)` 上。这一结构解决了深度神经网络中的<strong>梯度消失 (Vanishing Gradients)</strong> 问题。在反向传播时,梯度可以绕过子模块直接向前传播,从而保证了即使网络层数很深,模型也能得到有效的训练。其公式可以表示为:$\text{Output} = x + \text{Sublayer}(x)$。
+- <strong>层归一化 (Norm)</strong>该操作对单个样本的所有特征进行归一化,使其均值为0,方差为1。这解决了模型训练过程中的<strong>内部协变量偏移 (Internal Covariate Shift)</strong> 问题,使每一层的输入分布保持稳定,从而加速模型收敛并提高训练的稳定性。
 
 <strong>3.1.2.5 位置编码</strong>
 
@@ -530,11 +534,11 @@ Decoder-Only 架构的工作模式被称为<strong>自回归 (Autoregressive)</s
 
 - 高温度(0.7 $\leqslant$ Temperature $\lt$ 2):输出 “创新、发散”。适用场景: 创意性任务:如诗歌创作、科幻故事构思、广告 slogan brainstorm、艺术灵感启发; 发散性思考。
 
-`Top-k `:其原理是将所有 token 按概率从高到低排序,取排名前 k 个的 token 组成 “候选集”,随后对筛选出的 k 个 token 的概率进行 “归一化”: $ \hat{p}_i = \frac{p_i}{\sum_{j \in \text{候选集}} p_j}$
+`Top-k `:其原理是将所有 token 按概率从高到低排序,取排名前 k 个的 token 组成 “候选集”,随后对筛选出的 k 个 token 的概率进行 “归一化” $ \hat{p}_i = \frac{p_i}{\sum_{j \in \text{候选集}} p_j}$
 
 - 与温度采样的区别与联系:温度采样通过温度 T 调整所有 token 的概率分布(平滑或陡峭),不改变候选 token 的数量(仍考虑全部 N 个)。Top-k 采样通过 k 值限制候选 token 的数量(只保留前 k 个高概率 token),再从其中采样。当k=1时输出完全确定,退化为 “贪心采样”。
 
-`Top-p `:其原理是将所有 token 按概率从高到低排序,从排序后的第一个 token 开始,逐步累加概率,直到累积和首次达到或超过阈值 p: $\sum_{i \in S} p_{(i)} \geq p$,此时累加过程中包含的所有 token 组成 “核集合”,最后对核集合进行归一化。
+`Top-p `:其原理是将所有 token 按概率从高到低排序,从排序后的第一个 token 开始,逐步累加概率,直到累积和首次达到或超过阈值 p $\sum_{i \in S} p_{(i)} \geq p$,此时累加过程中包含的所有 token 组成 “核集合”,最后对核集合进行归一化。
 
 - 与Top-k的区别与联系:相对于固定截断大小的 Top-k,Top-p 能动态适应不同分布的“长尾”特性,对概率分布不均匀的极端情况的适应性更好。
 
@@ -553,8 +557,8 @@ Decoder-Only 架构的工作模式被称为<strong>自回归 (Autoregressive)</s
 案例: 我们直接向模型下达指令,要求它完成情感分类任务。
 
 ```Python
-文本Datawhale的AI Agent课程非常棒!
-情感正面
+文本:Datawhale的AI Agent课程非常棒!
+情感:正面
 ```
 
 <strong>单样本提示 (One-shot Prompting)</strong> 我们给模型提供一个完整的示例,向它展示任务的格式和期望的输出风格。
@@ -564,11 +568,11 @@ Decoder-Only 架构的工作模式被称为<strong>自回归 (Autoregressive)</s
 案例: 我们先给模型一个完整的“问题-答案”对作为示范,然后提出我们的新问题。
 
 ```Python
-文本这家餐厅的服务太慢了。
-情感负面
+文本:这家餐厅的服务太慢了。
+情感:负面
 
-文本Datawhale的AI Agent课程非常棒!
-情感
+文本:Datawhale的AI Agent课程非常棒!
+情感:
 ```
 
 模型会模仿给出的示例格式,为第二段文本补全“正面”。
@@ -578,14 +582,14 @@ Decoder-Only 架构的工作模式被称为<strong>自回归 (Autoregressive)</s
 案例: 我们提供涵盖了不同情况的多个示例,让模型对任务有更全面的理解。
 
 ```Python
-文本这家餐厅的服务太慢了。
-情感负面
+文本:这家餐厅的服务太慢了。
+情感:负面
 
-文本这部电影的情节很平淡。
-情感中性
+文本:这部电影的情节很平淡。
+情感:中性
 
-文本Datawhale的AI Agent课程非常棒!
-情感
+文本:Datawhale的AI Agent课程非常棒!
+情感:
 ```
 
 模型会综合所有示例,更准确地将最后一句的情感分类为“正面”。
@@ -600,16 +604,16 @@ Decoder-Only 架构的工作模式被称为<strong>自回归 (Autoregressive)</s
 
 ```Plain
 这是一段将英文翻译成中文的程序。
-英文Hello
-中文你好
-英文How are you?
-中文
+英文:Hello
+中文:你好
+英文:How are you?
+中文:
 ```
 
 - <strong>对“指令调优”模型的提示(你可以直接下达指令):</strong>
 
 ```Plain
-请将下面的英文翻译成中文
+请将下面的英文翻译成中文:
 How are you?
 ```
 
@@ -630,11 +634,11 @@ How are you?
 # 案例
 我需要你从产品评论中提取产品名称和用户情感。请严格按照下面的JSON格式输出。
 
-评论这款“星尘”笔记本电脑的屏幕显示效果惊人,但我不太喜欢它的键盘手感。
-输出{"product_name": "星尘笔记本电脑", "sentiment": "混合"}
+评论:这款“星尘”笔记本电脑的屏幕显示效果惊人,但我不太喜欢它的键盘手感。
+输出:{"product_name": "星尘笔记本电脑", "sentiment": "混合"}
 
-评论我刚买的“声动”耳机音质很棒,续航也超出了我的预期!
-输出
+评论:我刚买的“声动”耳机音质很棒,续航也超出了我的预期!
+输出:
 ```
 
 <strong>(5)思维链</strong>
@@ -651,12 +655,12 @@ How are you?
 >>>
 (模型可能会输出)
 好的,我们来一步步计算。
-第一步计算第一个赛季赢得的比赛数。
+第一步:计算第一个赛季赢得的比赛数。
 80场 * 60% = 48场。
-第二步计算两个赛季的总比赛数和总胜利数。
+第二步:计算两个赛季的总比赛数和总胜利数。
 总比赛数 = 80 + 15 = 95场。
 总胜利数 = 48 + 12 = 60场。
-第三步计算总胜率。
+第三步:计算总胜率。
 总胜率 = (总胜利数 / 总比赛数) * 100% = (60 / 95) * 100% ≈ 63.16%。
 所以,两个赛季的总胜率约为63.16%。
 ```
@@ -750,8 +754,8 @@ for i in range(num_merges):
 
 后续的许多算法都是在BPE的基础上进行优化的。其中,Google 开发的 WordPiece 和 SentencePiece 是影响力最大的两种。
 
-- <strong>WordPiece</strong>: Google BERT 模型采用的算法<sup>[7]</sup>。它与 BPE 非常相似,但合并词元的标准不是“最高频率”,而是“能最大化提升语料库的语言模型概率”。简单来说,它会优先合并那些能让整个语料库的“通顺度”提升最大的词元对。
-- <strong>SentencePiece</strong>: Google 开源的一款分词工具<sup>[8]</sup>,Llama 系列模型采用了此算法。它最大的特点是,将空格也视作一个普通字符(通常用下划线 `_` 表示)。这使得分词和解码过程完全可逆,且不依赖于特定的语言(例如,它不需要知道中文不使用空格分词)。
+- <strong>WordPiece</strong>Google BERT 模型采用的算法<sup>[7]</sup>。它与 BPE 非常相似,但合并词元的标准不是“最高频率”,而是“能最大化提升语料库的语言模型概率”。简单来说,它会优先合并那些能让整个语料库的“通顺度”提升最大的词元对。
+- <strong>SentencePiece</strong>Google 开源的一款分词工具<sup>[8]</sup>,Llama 系列模型采用了此算法。它最大的特点是,将空格也视作一个普通字符(通常用下划线 `_` 表示)。这使得分词和解码过程完全可逆,且不依赖于特定的语言(例如,它不需要知道中文不使用空格分词)。
 
 <strong>3.2.2.3 分词器对开发者的意义</strong>
 
@@ -850,7 +854,7 @@ print(response)
 
 >>>
 我叫通义千问,是由阿里云研发的预训练语言模型,可以回答问题、创作文字,还能表达观点、撰写代码。我主要的功能是在多个领域提
-供帮助,包括但不限于语言理解、文本生成、机器翻译、问答系统等。有什么我可以帮到你的吗?
+供帮助,包括但不限于:语言理解、文本生成、机器翻译、问答系统等。有什么我可以帮到你的吗?
 ```
 
 当你运行完所有代码后,你将会在本地电脑上看到模型生成的关于Qwen模型的介绍。恭喜你,你已经成功地在本地部署并运行了一个开源大语言模型!

+ 1309 - 0
docs/chapter4/Chapter4-Building-Classic-Agent-Paradigms.md

@@ -0,0 +1,1309 @@
+<div align="right">
+  English | <a href="./第四章%20智能体经典范式构建.md">中文</a>
+</div>
+
+# Chapter 4: Building Classic Agent Paradigms
+
+In the previous chapter, we deeply explored large language models as the "brain" of modern agents. We learned about their internal Transformer architecture, methods for interacting with them, and their capability boundaries. Now, it's time to transform this theoretical knowledge into practice and build agents with our own hands.
+
+The core capability of a modern agent lies in its ability to connect the reasoning power of large language models with the external world. It can autonomously understand user intent, decompose complex tasks, and achieve goals by calling a series of "tools" such as code interpreters, search engines, and APIs to obtain information and execute operations. However, agents are not omnipotent; they also face challenges from the "hallucination" problem inherent in large models, potential reasoning loops in complex tasks, and incorrect tool usage, which constitute the capability boundaries of agents.
+
+To better organize the "thinking" and "acting" processes of agents, the industry has emerged with multiple classic architectural paradigms. In this chapter, we will focus on the three most representative ones and implement them step by step from scratch:
+
+- **ReAct (Reasoning and Acting):** A paradigm that tightly combines "thinking" and "acting," allowing agents to think while doing and dynamically adjust.
+- **Plan-and-Solve:** A "think before you act" paradigm where agents first generate a complete action plan and then strictly execute it.
+- **Reflection:** A paradigm that endows agents with "reflection" capabilities, optimizing results through self-criticism and correction.
+
+After understanding these, you might ask: with many excellent frameworks like LangChain and LlamaIndex already available, why "reinvent the wheel"? The answer lies in the fact that although mature frameworks have significant advantages in engineering efficiency, directly using highly abstracted tools does not help us understand how the underlying design mechanisms work or what benefits they offer. Secondly, this process exposes engineering challenges in projects. Frameworks handle many issues for us, such as parsing model output formats, retrying failed tool calls, and preventing agents from falling into infinite loops. Handling these issues firsthand is the most direct way to cultivate system design capabilities. Finally, and most importantly, mastering design principles allows you to truly transform from a framework "user" to an intelligent application "creator." When standard components cannot meet your complex needs, you will have the ability to deeply customize or even build a completely new agent from scratch.
+
+## 4.1 Environment Preparation and Basic Tool Definition
+
+Before we start building, we need to set up the development environment and define some basic components. This will help us avoid repetitive work and focus more on core logic when implementing different paradigms later.
+
+### 4.1.1 Installing Dependencies
+
+The practical part of this book will mainly use the Python language, and Python 3.10 or higher is recommended. First, please ensure you have installed the `openai` library for interacting with large language models, and the `python-dotenv` library for securely managing our API keys.
+
+Run the following command in your terminal:
+
+```bash
+pip install openai python-dotenv
+```
+
+### 4.1.2 Configuring API Keys
+
+To make our code more universal, we will uniformly configure model service-related information (model ID, API key, service address) in environment variables.
+
+1. In your project root directory, create a file named `.env`.
+2. In this file, add the following content. You can point it to OpenAI's official service or any local/third-party service compatible with the OpenAI interface according to your needs.
+3. If you really don't know how to obtain it, you can refer to Section [1.2 API Setup](https://datawhalechina.github.io/handy-multi-agent/#/chapter1/1.2.api-setup) in another Datawhale tutorial.
+
+```bash
+# .env file
+LLM_API_KEY="YOUR-API-KEY"
+LLM_MODEL_ID="YOUR-MODEL"
+LLM_BASE_URL="YOUR-URL"
+```
+
+Our code will automatically load these configurations from this file.
+
+### 4.1.3 Encapsulating Basic LLM Call Functions
+
+To make the code structure clearer and more reusable, let's define a dedicated LLM client class. This class will encapsulate all details of interacting with model services, allowing our main logic to focus more on agent construction.
+
+```python
+import os
+from openai import OpenAI
+from dotenv import load_dotenv
+from typing import List, Dict
+
+# Load environment variables from .env file
+load_dotenv()
+
+class HelloAgentsLLM:
+    """
+    A customized LLM client for the book "Hello Agents".
+    It is used to call any service compatible with the OpenAI interface and uses streaming responses by default.
+    """
+    def __init__(self, model: str = None, apiKey: str = None, baseUrl: str = None, timeout: int = None):
+        """
+        Initialize the client. Prioritize passed parameters; if not provided, load from environment variables.
+        """
+        self.model = model or os.getenv("LLM_MODEL_ID")
+        apiKey = apiKey or os.getenv("LLM_API_KEY")
+        baseUrl = baseUrl or os.getenv("LLM_BASE_URL")
+        timeout = timeout or int(os.getenv("LLM_TIMEOUT", 60))
+        
+        if not all([self.model, apiKey, baseUrl]):
+            raise ValueError("Model ID, API key, and service address must be provided or defined in the .env file.")
+
+        self.client = OpenAI(api_key=apiKey, base_url=baseUrl, timeout=timeout)
+
+    def think(self, messages: List[Dict[str, str]], temperature: float = 0) -> str:
+        """
+        Call the large language model to think and return its response.
+        """
+        print(f"🧠 Calling {self.model} model...")
+        try:
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                temperature=temperature,
+                stream=True,
+            )
+            
+            # Handle streaming response
+            print("✅ Large language model response successful:")
+            collected_content = []
+            for chunk in response:
+                content = chunk.choices[0].delta.content or ""
+                print(content, end="", flush=True)
+                collected_content.append(content)
+            print()  # Newline after streaming output ends
+            return "".join(collected_content)
+
+        except Exception as e:
+            print(f"❌ Error occurred when calling LLM API: {e}")
+            return None
+
+# --- Client Usage Example ---
+if __name__ == '__main__':
+    try:
+        llmClient = HelloAgentsLLM()
+        
+        exampleMessages = [
+            {"role": "system", "content": "You are a helpful assistant that writes Python code."},
+            {"role": "user", "content": "Write a quicksort algorithm"}
+        ]
+        
+        print("--- Calling LLM ---")
+        responseText = llmClient.think(exampleMessages)
+        if responseText:
+            print("\n\n--- Complete Model Response ---")
+            print(responseText)
+
+    except ValueError as e:
+        print(e)
+
+
+>>>
+--- Calling LLM ---
+🧠 Calling xxxxxx model...
+✅ Large language model response successful:
+Quicksort is a very efficient sorting algorithm...
+```
+
+
+
+## 4.2 ReAct
+
+After preparing the LLM client, we will build the first and most classic agent paradigm: **ReAct (Reason + Act)**. ReAct was proposed by Shunyu Yao in 2022<sup>[1]</sup>. Its core idea is to mimic how humans solve problems by explicitly combining **Reasoning** and **Acting** to form a "think-act-observe" loop.
+
+### 4.2.1 ReAct Workflow
+
+Before ReAct emerged, mainstream methods could be divided into two categories: one is the "pure thinking" type, such as **Chain-of-Thought**, which can guide models to perform complex logical reasoning but cannot interact with the external world and is prone to factual hallucinations; the other is the "pure action" type, where models directly output actions to execute but lack planning and error correction capabilities.
+
+The ingenuity of ReAct lies in recognizing that **thinking and acting are complementary**. Thinking guides action, while action results in turn correct thinking. To this end, the ReAct paradigm uses a special prompt engineering to guide the model so that each step of its output follows a fixed trajectory:
+
+- **Thought (Thinking):** This is the agent's "inner monologue." It analyzes the current situation, decomposes tasks, formulates the next plan, or reflects on the results of the previous step.
+- **Action (Acting):** This is the specific action the agent decides to take, usually calling an external tool, such as `Search['Huawei's latest phone']`.
+- **Observation (Observing):** This is the result returned from the external tool after executing the `Action`, such as a summary of search results or an API return value.
+
+The agent will continuously repeat this **Thought -> Action -> Observation** loop, appending new observation results to the history to form a continuously growing context until it determines in `Thought` that it has found the final answer and then outputs the result. This process forms a powerful synergy: **reasoning makes actions more purposeful, while actions provide factual basis for reasoning.**
+
+We can formally express this process, as shown in Figure 4.1. Specifically, at each time step $t$, the agent's policy (i.e., the large language model $\pi$) generates the current thought $th_t$ and action $a_t$ based on the initial question $q$ and the historical trajectory of all previous "action-observation" steps $((a_1,o_1),\dots,(a_{t-1},o_{t-1}))$:
+
+$$\left(th_t,a_t\right)=\pi\left(q,(a_1,o_1),\ldots,(a_{t-1},o_{t-1})\right)$$
+
+Subsequently, the tool $T$ in the environment executes action $a_t$ and returns a new observation result $o_t$:
+
+$$o_t = T(a_t)$$
+
+This loop continues, appending new $(a_t,o_t)$ pairs to the history until the model determines in thought $th_t$ that the task is complete.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/4-figures/4-1.png" alt="Think-Act-Observe synergistic loop in ReAct paradigm" width="90%"/>
+  <p>Figure 4.1 Think-Act-Observe Synergistic Loop in ReAct Paradigm</p>
+</div>
+
+This mechanism is particularly suitable for the following scenarios:
+
+- **Tasks requiring external knowledge**: Such as querying real-time information (weather, news, stock prices), searching for knowledge in professional domains, etc.
+- **Tasks requiring precise calculations**: Delegating mathematical problems to calculator tools to avoid LLM calculation errors.
+- **Tasks requiring API interaction**: Such as operating databases, calling a service's API to complete specific functions.
+
+Therefore, we will build a ReAct agent with the capability to **use external tools** to answer questions that large language models cannot directly answer with their own knowledge base alone. For example: "What is Huawei's latest phone? What are its main selling points?" This question requires the agent to understand that it needs to search online, call tools to search for results, and summarize the answer.
+
+### 4.2.2 Tool Definition and Implementation
+
+If large language models are the brain of an agent, then **Tools** are its "hands and feet" for interacting with the external world. To enable the ReAct paradigm to truly solve the problems we set, the agent needs the capability to call external tools.
+
+For the goal set in this section—answering questions about "Huawei's latest phone"—we need to provide the agent with a web search tool. Here we choose **SerpApi**, which provides structured Google search results through an API and can directly return "answer summary boxes" or precise knowledge graph information.
+
+First, you need to install the library:
+
+```bash
+pip install google-search-results
+```
+
+At the same time, you need to go to the [SerpApi official website](https://serpapi.com/) to register a free account, obtain your API key, and add it to the `.env` file in our project root directory:
+
+```bash
+# .env file
+# ... (Keep previous LLM configuration)
+SERPAPI_API_KEY="YOUR_SERPAPI_API_KEY"
+```
+
+Next, we will define and manage this tool through code. We will proceed step by step: first implement the core functionality of the tool, then build a general tool manager.
+
+(1) Implementing the Core Logic of the Search Tool
+
+A well-defined tool should contain the following three core elements:
+
+1. **Name**: A concise, unique identifier for the agent to call in `Action`, such as `Search`.
+2. **Description**: A clear natural language description explaining the purpose of this tool. **This is the most critical part of the entire mechanism** because the large language model will rely on this description to determine when to use which tool.
+3. **Execution Logic**: The function or method that actually performs the task.
+
+Our first tool is the `search` function, which receives a query string and then returns search results.
+
+```python
+from serpapi import SerpApiClient
+
+def search(query: str) -> str:
+    """
+    A practical web search engine tool based on SerpApi.
+    It intelligently parses search results, prioritizing direct answers or knowledge graph information.
+    """
+    print(f"🔍 Executing [SerpApi] web search: {query}")
+    try:
+        api_key = os.getenv("SERPAPI_API_KEY")
+        if not api_key:
+            return "Error: SERPAPI_API_KEY not configured in .env file."
+
+        params = {
+            "engine": "google",
+            "q": query,
+            "api_key": api_key,
+            "gl": "cn",  # Country code
+            "hl": "zh-cn", # Language code
+        }
+        
+        client = SerpApiClient(params)
+        results = client.get_dict()
+        
+        # Intelligent parsing: prioritize finding the most direct answer
+        if "answer_box_list" in results:
+            return "\n".join(results["answer_box_list"])
+        if "answer_box" in results and "answer" in results["answer_box"]:
+            return results["answer_box"]["answer"]
+        if "knowledge_graph" in results and "description" in results["knowledge_graph"]:
+            return results["knowledge_graph"]["description"]
+        if "organic_results" in results and results["organic_results"]:
+            # If no direct answer, return summaries of the first three organic results
+            snippets = [
+                f"[{i+1}] {res.get('title', '')}\n{res.get('snippet', '')}"
+                for i, res in enumerate(results["organic_results"][:3])
+            ]
+            return "\n\n".join(snippets)
+        
+        return f"Sorry, no information found about '{query}'."
+
+    except Exception as e:
+        return f"Error occurred during search: {e}"
+```
+
+In the above code, it first checks whether `answer_box` (Google's answer summary box) or `knowledge_graph` (knowledge graph) information exists. If it does, it directly returns these most precise answers. If not, it falls back to returning summaries of the first three regular search results. This "intelligent parsing" can provide higher-quality information input for the LLM.
+
+(2) Building a General Tool Executor
+
+When an agent needs to use multiple tools (for example, in addition to search, it may also need calculation, database queries, etc.), we need a unified manager to register and dispatch these tools. For this, we create a `ToolExecutor` class.
+
+```python
+from typing import Dict, Any
+
+class ToolExecutor:
+    """
+    A tool executor responsible for managing and executing tools.
+    """
+    def __init__(self):
+        self.tools: Dict[str, Dict[str, Any]] = {}
+
+    def registerTool(self, name: str, description: str, func: callable):
+        """
+        Register a new tool in the toolbox.
+        """
+        if name in self.tools:
+            print(f"Warning: Tool '{name}' already exists and will be overwritten.")
+        self.tools[name] = {"description": description, "func": func}
+        print(f"Tool '{name}' registered.")
+
+    def getTool(self, name: str) -> callable:
+        """
+        Get a tool's execution function by name.
+        """
+        return self.tools.get(name, {}).get("func")
+
+    def getAvailableTools(self) -> str:
+        """
+        Get a formatted description string of all available tools.
+        """
+        return "\n".join([
+            f"- {name}: {info['description']}" 
+            for name, info in self.tools.items()
+        ])
+
+```
+
+(3) Testing
+
+Now, we will register the `search` tool in the `ToolExecutor` and simulate a call to verify that the entire process works properly.
+
+```python
+# --- Tool Initialization and Usage Example ---
+if __name__ == '__main__':
+    # 1. Initialize tool executor
+    toolExecutor = ToolExecutor()
+
+    # 2. Register our practical search tool
+    search_description = "A web search engine. Use this tool when you need to answer questions about current events, facts, and information not found in your knowledge base."
+    toolExecutor.registerTool("Search", search_description, search)
+
+    # 3. Print available tools
+    print("\n--- Available Tools ---")
+    print(toolExecutor.getAvailableTools())
+
+    # 4. Agent's Action call, this time we ask a real-time question
+    print("\n--- Execute Action: Search['What is NVIDIA's latest GPU model'] ---")
+    tool_name = "Search"
+    tool_input = "What is NVIDIA's latest GPU model"
+
+    tool_function = toolExecutor.getTool(tool_name)
+    if tool_function:
+        observation = tool_function(tool_input)
+        print("--- Observation ---")
+        print(observation)
+    else:
+        print(f"Error: Tool named '{tool_name}' not found.")
+
+>>>
+Tool 'Search' registered.
+
+--- Available Tools ---
+- Search: A web search engine. Use this tool when you need to answer questions about current events, facts, and information not found in your knowledge base.
+
+--- Execute Action: Search['What is NVIDIA's latest GPU model'] ---
+🔍 Executing [SerpApi] web search: What is NVIDIA's latest GPU model
+--- Observation ---
+[1] GeForce RTX 50 Series Graphics Cards
+GeForce RTX™ 50 Series GPUs are powered by NVIDIA Blackwell architecture, bringing new gameplay for gamers and creators. RTX 50 Series has powerful AI computing power, bringing upgraded experience and more realistic graphics.
+
+[2] Compare GeForce Series Latest Generation and Previous Generation Graphics Cards
+Compare the latest RTX 30 series graphics cards with previous RTX 20 series, GTX 10 and 900 series graphics cards. View specifications, features, technical support, etc.
+
+[3] GeForce Graphics Cards | NVIDIA
+DRIVE AGX. Powerful in-vehicle computing power for AI-driven intelligent vehicle systems · Clara AGX. AI computing for innovative medical devices and imaging. Gaming and Creation. GeForce. Explore graphics cards, gaming solutions, AI ...
+```
+
+So far, we have equipped the agent with a `Search` tool that connects to the real-world internet, providing a solid foundation for the subsequent ReAct loop.
+
+
+
+### 4.2.3 Coding Implementation of ReAct Agent
+
+Now, we will assemble all independent components—the LLM client and tool executor—to build a complete ReAct agent. We will encapsulate its core logic through a `ReActAgent` class. For ease of understanding, we will break down the implementation process of this class into the following key parts for explanation.
+
+(1) System Prompt Design
+
+The prompt is the cornerstone of the entire ReAct mechanism, providing operational instructions for the large language model. We need to carefully design a template that will dynamically insert available tools, user questions, and the interaction history of intermediate steps.
+
+```bash
+# ReAct Prompt Template
+REACT_PROMPT_TEMPLATE = """
+Please note that you are an intelligent assistant capable of calling external tools.
+
+Available tools are as follows:
+{tools}
+
+Please respond strictly in the following format:
+
+Thought: Your thinking process, used to analyze problems, decompose tasks, and plan the next action.
+Action: The action you decide to take, must be in one of the following formats:
+- `{tool_name}[{tool_input}]`: Call an available tool.
+- `Finish[final answer]`: When you believe you have obtained the final answer.
+- When you have collected enough information to answer the user's final question, you must use `finish(answer="...")` after the Action: field to output the final answer.
+
+Now, please start solving the following problem:
+Question: {question}
+History: {history}
+"""
+```
+
+This template defines the specification for interaction between the agent and the LLM:
+
+- **Role Definition**: "You are an intelligent assistant capable of calling external tools" sets the LLM's role.
+- **Tool List (`{tools}`)**: Informs the LLM what "hands and feet" it has available.
+- **Format Convention (`Thought`/`Action`)**: This is the most important part, forcing the LLM's output to be structured so we can precisely parse its intent through code.
+- **Dynamic Context (`{question}`/`{history}`)**: Injects the user's original question and continuously accumulated interaction history, allowing the LLM to make decisions based on complete context.
+
+(2) Core Loop Implementation
+
+The core of `ReActAgent` is a loop that continuously "formats prompt -> calls LLM -> executes action -> integrates results" until the task is complete or the maximum step limit is reached.
+
+```python
+class ReActAgent:
+    def __init__(self, llm_client: HelloAgentsLLM, tool_executor: ToolExecutor, max_steps: int = 5):
+        self.llm_client = llm_client
+        self.tool_executor = tool_executor
+        self.max_steps = max_steps
+        self.history = []
+
+    def run(self, question: str):
+        """
+        Run the ReAct agent to answer a question.
+        """
+        self.history = [] # Reset history for each run
+        current_step = 0
+
+        while current_step < self.max_steps:
+            current_step += 1
+            print(f"--- Step {current_step} ---")
+
+            # 1. Format prompt
+            tools_desc = self.tool_executor.getAvailableTools()
+            history_str = "\n".join(self.history)
+            prompt = REACT_PROMPT_TEMPLATE.format(
+                tools=tools_desc,
+                question=question,
+                history=history_str
+            )
+
+            # 2. Call LLM to think
+            messages = [{"role": "user", "content": prompt}]
+            response_text = self.llm_client.think(messages=messages)
+
+            if not response_text:
+                print("Error: LLM failed to return a valid response.")
+                break
+
+            # ... (Subsequent parsing, execution, integration steps)
+
+```
+
+The `run` method is the entry point of the agent. Its `while` loop constitutes the main body of the ReAct paradigm, and the `max_steps` parameter is an important safety valve to prevent the agent from falling into an infinite loop and exhausting resources.
+
+(3) Output Parser Implementation
+
+The LLM returns plain text, and we need to precisely extract `Thought` and `Action` from it. This is accomplished through several auxiliary parsing functions, which typically use regular expressions.
+
+```python
+# (These methods are part of the ReActAgent class)
+    def _parse_output(self, text: str):
+        """Parse LLM output to extract Thought and Action."""
+        thought_match = re.search(r"Thought: (.*)", text)
+        action_match = re.search(r"Action: (.*)", text)
+        thought = thought_match.group(1).strip() if thought_match else None
+        action = action_match.group(1).strip() if action_match else None
+        return thought, action
+
+    def _parse_action(self, action_text: str):
+        """Parse Action string to extract tool name and input."""
+        match = re.match(r"(\w+)\[(.*)\]", action_text)
+        if match:
+            return match.group(1), match.group(2)
+        return None, None
+```
+
+- `_parse_output`: Responsible for separating the two main parts `Thought` and `Action` from the LLM's complete response.
+- `_parse_action`: Responsible for further parsing the `Action` string, for example, extracting the tool name `Search` and tool input `Huawei's latest phone` from `Search[Huawei's latest phone]`.
+
+(4) Tool Invocation and Execution
+
+```python
+# (This logic is inside the while loop of the run method)
+            # 3. Parse LLM output
+            thought, action = self._parse_output(response_text)
+
+            if thought:
+                print(f"Thought: {thought}")
+
+            if not action:
+                print("Warning: Failed to parse valid Action, process terminated.")
+                break
+
+            # 4. Execute Action
+            if action.startswith("Finish"):
+                # If it's a Finish instruction, extract the final answer and end
+                final_answer = re.match(r"Finish\[(.*)\]", action).group(1)
+                print(f"🎉 Final Answer: {final_answer}")
+                return final_answer
+
+            tool_name, tool_input = self._parse_action(action)
+            if not tool_name or not tool_input:
+                # ... Handle invalid Action format ...
+                continue
+
+            print(f"🎬 Action: {tool_name}[{tool_input}]")
+
+            tool_function = self.tool_executor.getTool(tool_name)
+            if not tool_function:
+                observation = f"Error: Tool named '{tool_name}' not found."
+            else:
+                observation = tool_function(tool_input) # Call real tool
+
+```
+
+This code is the execution center of `Action`. It first checks whether it's a `Finish` instruction; if so, the process ends. Otherwise, it obtains the corresponding tool function through `tool_executor` and executes it to get the `observation`.
+
+(5) Integration of Observation Results
+
+The last step, and the key to forming a closed loop, is to add the `Action` itself and the `Observation` after tool execution back to the history, providing new context for the next loop.
+
+```python
+# (This logic follows tool invocation, at the end of the while loop)
+            print(f"👀 Observation: {observation}")
+
+            # Add this round's Action and Observation to history
+            self.history.append(f"Action: {action}")
+            self.history.append(f"Observation: {observation}")
+
+        # Loop ends
+        print("Maximum steps reached, process terminated.")
+        return None
+```
+
+By appending `Observation` to `self.history`, the agent can "see" the results of the previous action when generating the prompt in the next round, and conduct new thinking and planning accordingly.
+
+(6) Running Instance and Analysis
+
+Combining all the above parts, we get the complete `ReActAgent` class. The complete code running instance can be found in the `code` folder of this book's accompanying code repository.
+
+Below is a real running record:
+
+```
+Tool 'Search' registered.
+
+--- Step 1 ---
+🧠 Calling xxxxxx model...
+✅ Large language model response successful:
+Thought: To answer this question, I need to search for Huawei's latest released phone model and its main features. This information may be outside my existing knowledge base, so I need to use a search engine to obtain the latest data.
+Action: Search[Huawei latest phone model and main selling points]
+🤔 Thought: To answer this question, I need to search for Huawei's latest released phone model and its main features. This information may be outside my existing knowledge base, so I need to use a search engine to obtain the latest data.
+🎬 Action: Search[Huawei latest phone model and main selling points]
+🔍 Executing [SerpApi] web search: Huawei latest phone model and main selling points
+👀 Observation: [1] Huawei Phones - Huawei Official Website
+Smartphones ; Mate Series. Extraordinary Flagship · HUAWEI Mate XTs. Extraordinary Master ; Pura Series. Pioneer Imaging · HUAWEI Pura 80 Pro+ ; Pocket Series. Aesthetic New Chapter. HUAWEI Pocket 2 ; nova Series. Professional Portrait.
+
+[2] Which Huawei phone has the best value in 2025? Huawei phone recommendations and market analysis ...
+Now the biggest selling point of Huawei phones is only the HarmonyOS system, and the controversial brand faith. Here are several Huawei series phones worth buying, choose according to different budgets:. Huawei's currently most popular, also equipped with ...
+
+[3] Which new Huawei phone has the best value in 2025? 10 new Huawei phone recommendations
+Choosing Huawei mainly recommends high-end phones, Mate 70 and Pura 70 series are the latest flagship models. HUAWEI Mate 70. Advantages are, photography configuration is still top-level, full focal length coverage, suitable for professional photography, excellent workmanship, outdoor drop resistance ...
+
+--- Step 2 ---
+🧠 Calling xxxxxx model...
+✅ Large language model response successful:
+Thought: Based on the search results, Huawei's latest flagship models include Mate 70 and Pura 80 Pro+. To determine the latest model and its main selling points, I will focus on this information. From the provided links, both Mate 70 series and Pura 80 Pro+ are recently released products, but which one is "latest" needs further confirmation. Meanwhile, I can extract their main selling points from this information.
+Action: Finish[According to the latest information, Huawei's latest phones may be HUAWEI Pura 80 Pro+ or HUAWEI Mate 70. Among them, HUAWEI Mate 70's main selling points include top-level photography configuration, full focal length coverage, suitable for professional photography, excellent workmanship, and good outdoor drop resistance. While HUAWEI Pura 80 Pro+ emphasizes pioneer imaging technology.]
+🤔 Thought: Based on the search results, Huawei's latest flagship models include Mate 70 and Pura 80 Pro+. To determine the latest model and its main selling points, I will focus on this information. From the provided links, both Mate 70 series and Pura 80 Pro+ are recently released products, but which one is "latest" needs further confirmation. Meanwhile, I can extract their main selling points from this information.
+🎉 Final Answer: According to the latest information, Huawei's latest phones may be HUAWEI Pura 80 Pro+ or HUAWEI Mate 70. Among them, HUAWEI Mate 70's main selling points include top-level photography configuration, full focal length coverage, suitable for professional photography, excellent workmanship, and good outdoor drop resistance. While HUAWEI Pura 80 Pro+ emphasizes pioneer imaging technology.
+```
+
+From the above output, we can see that the agent clearly demonstrates its chain of thought: it first realizes its knowledge is insufficient and needs to use the search tool; then, it reasons and summarizes based on search results, arriving at the final answer within two steps.
+
+It's worth noting that since the model's knowledge and internet information are constantly updated, your running results may not be exactly the same as this. As of September 8, 2025, when this section was written, the HUAWEI Mate 70 and HUAWEI Pura 80 Pro+ mentioned in search results were indeed Huawei's latest flagship series phones at that time. This fully demonstrates the powerful capability of the ReAct paradigm in handling time-sensitive issues.
+
+### 4.2.4 Characteristics, Limitations, and Debugging Techniques of ReAct
+
+By implementing a ReAct agent firsthand, we not only mastered its workflow but should also have a deeper understanding of its internal mechanisms. Any technical paradigm has its highlights and areas for improvement; this section will summarize ReAct.
+
+(1) Main Characteristics of ReAct
+
+1. **High Interpretability**: One of ReAct's greatest advantages is transparency. Through the `Thought` chain, we can clearly see the agent's "mental journey" at each step—why it chose this tool and what it plans to do next. This is crucial for understanding, trusting, and debugging agent behavior.
+2. **Dynamic Planning and Error Correction Capability**: Unlike paradigms that generate complete plans at once, ReAct is "take one step, look one step." It dynamically adjusts subsequent `Thought` and `Action` based on `Observation` obtained from the external world at each step. If the previous search results are unsatisfactory, it can correct the search terms in the next step and try again.
+3. **Tool Synergy Capability**: The ReAct paradigm naturally combines the reasoning capability of large language models with the execution capability of external tools. LLMs are responsible for strategizing (planning and reasoning), tools are responsible for solving specific problems (searching, calculating), and the two work synergistically, breaking through the inherent limitations of single LLMs in knowledge timeliness, computational accuracy, etc.
+
+(2) Inherent Limitations of ReAct
+
+1. **Strong Dependence on LLM's Own Capabilities**: The success of the ReAct process highly depends on the comprehensive capabilities of the underlying LLM. If the LLM's logical reasoning ability, instruction-following ability, or formatted output ability is insufficient, it's easy to produce wrong planning in the `Thought` stage or generate instructions that don't conform to the format in the `Action` stage, causing the entire process to be interrupted.
+2. **Execution Efficiency Issues**: Due to its step-by-step nature, completing a task usually requires multiple LLM calls. Each call is accompanied by network latency and computational cost. For complex tasks requiring many steps, this serial "think-act" loop may lead to high total time and cost.
+3. **Prompt Fragility**: The stable operation of the entire mechanism is built on a carefully designed prompt template. Any minor change in the template, even differences in wording, may affect LLM behavior. Additionally, not all models can consistently follow preset formats, increasing uncertainty in practical applications.
+4. **May Fall into Local Optima**: The step-by-step decision-making mode means the agent lacks a global, long-term plan. It may choose a path that seems correct in the short term but is not optimal in the long run due to immediate `Observation`, or even fall into a "spinning in place" loop in some cases.
+
+(3) Debugging Techniques
+
+When your built ReAct agent behaves unexpectedly, you can debug from the following aspects:
+
+- **Check Complete Prompt**: Before each LLM call, print out the final formatted complete prompt containing all history. This is the most direct way to trace the source of LLM decisions.
+- **Analyze Raw Output**: When output parsing fails (for example, regular expressions didn't match `Action`), be sure to print out the raw, unprocessed text returned by the LLM. This can help you determine whether the LLM didn't follow the format or your parsing logic is wrong.
+- **Verify Tool Input and Output**: Check whether the `tool_input` generated by the agent is in the format expected by the tool function, and also ensure the `observation` returned by the tool is in a format the agent can understand and process.
+- **Adjust Examples in Prompt (Few-shot Prompting)**: If the model frequently makes errors, you can add one or two complete successful "Thought-Action-Observation" cases in the prompt to guide the model to better follow your instructions through examples.
+- **Try Different Models or Parameters**: Switching to a more capable model or adjusting the `temperature` parameter (usually set to 0 to ensure output determinism) can sometimes directly solve the problem.
+
+## 4.3 Plan-and-Solve
+
+After mastering ReAct, this reactive, step-by-step decision-making agent paradigm, we will next explore a method with a very different style but equally powerful: **Plan-and-Solve**. As the name suggests, this paradigm explicitly divides task processing into two stages: **Plan first, then Solve**.
+
+If ReAct is like an experienced detective who reasons step by step based on clues at the scene (Observation) and adjusts investigation direction at any time; then Plan-and-Solve is more like an architect who must first draw a complete blueprint (Plan) before starting construction, then strictly build according to the blueprint (Solve). In fact, many large model tools' Agent modes we use now incorporate this design pattern.
+
+### 4.3.1 Working Principle of Plan-and-Solve
+
+Plan-and-Solve Prompting was proposed by Lei Wang in 2023<sup>[2]</sup>. Its core motivation is to solve the problem that chain-of-thought easily "goes off track" when handling multi-step, complex problems.
+
+Unlike ReAct, which integrates thinking and acting at each step, Plan-and-Solve decouples the entire process into two core stages, as shown in Figure 4.2:
+
+1. **Planning Phase**: First, the agent receives the user's complete question. Its first task is not to directly solve the problem or call tools, but to **decompose the problem and formulate a clear, step-by-step action plan**. This plan itself is the product of a large language model call.
+2. **Solving Phase**: After obtaining the complete plan, the agent enters the execution phase. It will **strictly execute according to the steps in the plan, one by one**. Each step's execution may be an independent LLM call or processing of the previous step's results, until all steps in the plan are completed and the final answer is obtained.
+
+This "plan before acting" strategy enables the agent to maintain higher goal consistency when handling complex tasks requiring long-term planning, avoiding getting lost in intermediate steps.
+
+We can formally express this two-stage process. First, the planning model $\pi_{\text{plan}}$ generates a plan $P = (p_1, p_2, \dots, p_n)$ containing $n$ steps based on the original question $q$:
+
+$$
+P = \pi_{\text{plan}}(q)
+$$
+
+Subsequently, in the execution phase, the execution model $\pi_{\text{solve}}$ will complete the steps in the plan one by one. For the $i$-th step, the generation of its solution $s_i$ will depend on the original question $q$, the complete plan $P$, and the execution results of all previous steps $(s_1, \dots, s_{i-1})$:
+
+$$
+s_i = \pi_{\text{solve}}(q, P, (s_1, \dots, s_{i-1}))
+$$
+
+The final answer is the execution result of the last step $s_n$.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/4-figures/4-2.png" alt="Two-stage workflow of Plan-and-Solve paradigm" width="90%"/>
+  <p>Figure 4.2 Two-Stage Workflow of Plan-and-Solve Paradigm</p>
+</div>
+
+Plan-and-Solve is especially suitable for complex tasks with strong structure that can be clearly decomposed, such as:
+
+- **Multi-step math word problems**: Need to first list calculation steps, then solve one by one.
+- **Report writing integrating multiple information sources**: Need to first plan the report structure (introduction, data source A, data source B, summary), then fill in content one by one.
+- **Code generation tasks**: Need to first conceive the structure of functions, classes, and modules, then implement one by one.
+
+### 4.3.2 Planning Phase
+
+To highlight the advantages of the Plan-and-Solve paradigm in structured reasoning tasks, we will not use tools but complete a reasoning task through prompt design.
+
+The characteristic of this type of task is that the answer cannot be obtained through a single query or calculation; the problem must first be decomposed into a series of logically coherent sub-steps, then solved in order. This precisely leverages Plan-and-Solve's core capability of "plan first, execute later."
+
+**Our target problem is:** "A fruit store sold 15 apples on Monday. The number of apples sold on Tuesday was twice that of Monday. The number sold on Wednesday was 5 fewer than Tuesday. How many apples were sold in total over these three days?"
+
+This problem is not particularly difficult for large language models, but it contains a clear logical chain for reference. For some actual logical puzzles, if the large model cannot reason out accurate answers with high quality, you can refer to this design pattern to design your own Agent to complete the task. The agent needs to:
+
+1. **Planning Phase**: First, decompose the problem into three independent calculation steps (calculate Tuesday sales, calculate Wednesday sales, calculate total sales).
+2. **Execution Phase**: Then, strictly follow the plan, execute calculations step by step, and use each step's result as input for the next step, finally obtaining the total.
+
+The goal of the planning phase is to have the large language model receive the original problem and output a clear, step-by-step action plan. This plan must be structured so our code can easily parse and execute it one by one. Therefore, the prompt we design needs to clearly tell the model its role and task and provide an example of the output format.
+
+````python
+PLANNER_PROMPT_TEMPLATE = """
+You are a top AI planning expert. Your task is to decompose complex problems posed by users into an action plan consisting of multiple simple steps.
+Please ensure that each step in the plan is an independent, executable subtask and is strictly arranged in logical order.
+Your output must be a Python list, where each element is a string describing a subtask.
+
+Question: {question}
+
+Please strictly output your plan in the following format, with ```python and ``` as prefix and suffix being necessary:
+```python
+["Step 1", "Step 2", "Step 3", ...]
+```
+"""
+````
+
+This prompt ensures output quality and stability through the following points:
+- **Role Setting**: "Top AI planning expert" activates the model's professional capabilities.
+- **Task Description**: Clearly defines the goal of "decomposing problems."
+- **Format Constraint**: Forces output to be a string in Python list format, which greatly simplifies subsequent code parsing work, making it more stable and reliable than parsing natural language.
+
+Next, we encapsulate this prompt logic into a `Planner` class, which is also our planner.
+
+```python
+# Assume the HelloAgentsLLM class in llm_client.py is already defined
+# from llm_client import HelloAgentsLLM
+
+class Planner:
+    def __init__(self, llm_client):
+        self.llm_client = llm_client
+
+    def plan(self, question: str) -> list[str]:
+        """
+        Generate an action plan based on user question.
+        """
+        prompt = PLANNER_PROMPT_TEMPLATE.format(question=question)
+
+        # To generate a plan, we build a simple message list
+        messages = [{"role": "user", "content": prompt}]
+
+        print("--- Generating Plan ---")
+        # Use streaming output to get the complete plan
+        response_text = self.llm_client.think(messages=messages) or ""
+
+        print(f"✅ Plan Generated:\n{response_text}")
+
+        # Parse the list string output by LLM
+        try:
+            # Find content between ```python and ```
+            plan_str = response_text.split("```python")[1].split("```")[0].strip()
+            # Use ast.literal_eval to safely execute the string and convert it to a Python list
+            plan = ast.literal_eval(plan_str)
+            return plan if isinstance(plan, list) else []
+        except (ValueError, SyntaxError, IndexError) as e:
+            print(f"❌ Error parsing plan: {e}")
+            print(f"Raw response: {response_text}")
+            return []
+        except Exception as e:
+            print(f"❌ Unknown error occurred while parsing plan: {e}")
+            return []
+```
+
+### 4.3.3 Executor and State Management
+
+After the planner (`Planner`) generates a clear action blueprint, we need an executor (`Executor`) to complete the tasks in the plan one by one. The executor is not only responsible for calling the large language model to solve each sub-problem but also plays a crucial role: **state management**. It must record the execution results of each step and provide them as context for subsequent steps, ensuring information flows smoothly throughout the entire task chain.
+
+The executor's prompt is different from the planner's. Its goal is not to decompose problems but to **focus on solving the current step based on existing context**. Therefore, the prompt needs to include the following key information:
+
+- **Original Question**: Ensure the model always understands the ultimate goal.
+- **Complete Plan**: Let the model understand the current step's position in the entire task.
+- **Historical Steps and Results**: Provide work completed so far as direct input for the current step.
+- **Current Step**: Clearly instruct the model which specific task it needs to solve now.
+
+```python
+EXECUTOR_PROMPT_TEMPLATE = """
+You are a top AI execution expert. Your task is to strictly follow the given plan and solve the problem step by step.
+You will receive the original question, the complete plan, and the steps and results completed so far.
+Please focus on solving the "current step" and only output the final answer for that step, without any additional explanations or dialogue.
+
+# Original Question:
+{question}
+
+# Complete Plan:
+{plan}
+
+# Historical Steps and Results:
+{history}
+
+# Current Step:
+{current_step}
+
+Please only output the answer for the "current step":
+"""
+```
+
+We encapsulate the execution logic into the `Executor` class. This class will loop through the plan, call the LLM, and maintain a history (state).
+
+```python
+class Executor:
+    def __init__(self, llm_client):
+        self.llm_client = llm_client
+
+    def execute(self, question: str, plan: list[str]) -> str:
+        """
+        Execute step by step according to the plan and solve the problem.
+        """
+        history = "" # String to store historical steps and results
+
+        print("\n--- Executing Plan ---")
+
+        for i, step in enumerate(plan):
+            print(f"\n-> Executing step {i+1}/{len(plan)}: {step}")
+
+            prompt = EXECUTOR_PROMPT_TEMPLATE.format(
+                question=question,
+                plan=plan,
+                history=history if history else "None", # If it's the first step, history is empty
+                current_step=step
+            )
+
+            messages = [{"role": "user", "content": prompt}]
+
+            response_text = self.llm_client.think(messages=messages) or ""
+
+            # Update history for the next step
+            history += f"Step {i+1}: {step}\nResult: {response_text}\n\n"
+
+            print(f"✅ Step {i+1} completed, result: {response_text}")
+
+        # After the loop ends, the last step's response is the final answer
+        final_answer = response_text
+        return final_answer
+```
+
+Now we have separately built the `Planner` responsible for "planning" and the `Executor` responsible for "execution." The last step is to integrate these two components into a unified agent `PlanAndSolveAgent` and give it complete problem-solving capabilities. We will create a main class `PlanAndSolveAgent` whose responsibility is very clear: receive an LLM client, initialize internal planner and executor, and provide a simple `run` method to start the entire process.
+
+```python
+class PlanAndSolveAgent:
+    def __init__(self, llm_client):
+        """
+        Initialize the agent and create planner and executor instances.
+        """
+        self.llm_client = llm_client
+        self.planner = Planner(self.llm_client)
+        self.executor = Executor(self.llm_client)
+
+    def run(self, question: str):
+        """
+        Run the agent's complete process: plan first, then execute.
+        """
+        print(f"\n--- Starting to Process Question ---\nQuestion: {question}")
+
+        # 1. Call planner to generate plan
+        plan = self.planner.plan(question)
+
+        # Check if plan was successfully generated
+        if not plan:
+            print("\n--- Task Terminated --- \nUnable to generate valid action plan.")
+            return
+
+        # 2. Call executor to execute plan
+        final_answer = self.executor.execute(question, plan)
+
+        print(f"\n--- Task Completed ---\nFinal Answer: {final_answer}")
+```
+
+The design of this `PlanAndSolveAgent` class embodies the principle of "composition over inheritance." It doesn't contain complex logic itself but acts as an orchestrator, clearly calling its internal components to complete tasks.
+
+### 4.3.4 Running Instance and Analysis
+
+The complete code can also be found in the `code` folder of this book's accompanying code repository; here we only demonstrate the final results.
+
+````bash
+--- Starting to Process Question ---
+Question: A fruit store sold 15 apples on Monday. The number of apples sold on Tuesday was twice that of Monday. The number sold on Wednesday was 5 fewer than Tuesday. How many apples were sold in total over these three days?
+--- Generating Plan ---
+🧠 Calling xxxx model...
+✅ Large language model response successful:
+```python
+["Calculate Monday's apple sales: 15", "Calculate Tuesday's apple sales: Monday's quantity × 2 = 15 × 2 = 30", "Calculate Wednesday's apple sales: Tuesday's quantity - 5 = 30 - 5 = 25", "Calculate total sales for three days: Monday + Tuesday + Wednesday = 15 + 30 + 25 = 70"]
+```
+✅ Plan Generated:
+```python
+["Calculate Monday's apple sales: 15", "Calculate Tuesday's apple sales: Monday's quantity × 2 = 15 × 2 = 30", "Calculate Wednesday's apple sales: Tuesday's quantity - 5 = 30 - 5 = 25", "Calculate total sales for three days: Monday + Tuesday + Wednesday = 15 + 30 + 25 = 70"]
+```
+
+--- Executing Plan ---
+
+-> Executing step 1/4: Calculate Monday's apple sales: 15
+🧠 Calling xxxx model...
+✅ Large language model response successful:
+15
+✅ Step 1 completed, result: 15
+
+-> Executing step 2/4: Calculate Tuesday's apple sales: Monday's quantity × 2 = 15 × 2 = 30
+🧠 Calling xxxx model...
+✅ Large language model response successful:
+30
+✅ Step 2 completed, result: 30
+
+-> Executing step 3/4: Calculate Wednesday's apple sales: Tuesday's quantity - 5 = 30 - 5 = 25
+🧠 Calling xxxx model...
+✅ Large language model response successful:
+25
+✅ Step 3 completed, result: 25
+
+-> Executing step 4/4: Calculate total sales for three days: Monday + Tuesday + Wednesday = 15 + 30 + 25 = 70
+🧠 Calling xxxx model...
+✅ Large language model response successful:
+70
+✅ Step 4 completed, result: 70
+
+--- Task Completed ---
+Final Answer: 70
+````
+
+From the above output log, we can clearly see the workflow of the Plan-and-Solve paradigm:
+
+1. **Planning Phase**: The agent first calls `Planner` and successfully decomposes the complex word problem into a Python list containing four logical steps. This structured plan lays the foundation for subsequent execution.
+2. **Execution Phase**: `Executor` strictly executes step by step according to the generated plan. In each step, it uses historical results as context, ensuring correct information transfer (for example, step 2 correctly uses step 1's result "15", and step 3 also correctly uses step 2's result "30").
+3. **Result**: The entire process is logically clear with explicit steps, and the agent accurately arrives at the correct answer "70".
+
+## 4.4 Reflection
+
+In the ReAct and Plan-and-Solve paradigms we have already implemented, once the agent completes a task, its workflow ends. However, the initial answers they generate, whether action trajectories or final results, may contain errors or have room for improvement. The core idea of the Reflection mechanism is to introduce a **post-hoc self-correction loop** for the agent, enabling it to review its work, discover deficiencies, and iteratively optimize, just like humans do.
+
+### 4.4.1 Core Idea of Reflection Mechanism
+
+The inspiration for the Reflection mechanism comes from the human learning process: we proofread after completing a first draft and verify after solving a math problem. This idea is embodied in multiple studies, such as the Reflexion framework proposed by Shinn, Noah in 2023<sup>[3]</sup>. Its core workflow can be summarized as a concise three-step loop: **Execute -> Reflect -> Refine**.
+
+1. **Execution**: First, the agent attempts to complete the task using familiar methods (such as ReAct or Plan-and-Solve), generating a preliminary solution or action trajectory. This can be seen as a "first draft."
+2. **Reflection**: Next, the agent enters the reflection phase. It calls an independent large language model instance, or one with special prompts, to play the role of a "reviewer." This "reviewer" examines the "first draft" generated in the first step and evaluates it from multiple dimensions, such as:
+   - **Factual Errors**: Is there content that contradicts common sense or known facts?
+   - **Logical Flaws**: Are there inconsistencies or contradictions in the reasoning process?
+   - **Efficiency Issues**: Is there a more direct, more concise path to complete the task?
+   - **Missing Information**: Are some key constraints or aspects of the problem overlooked? Based on the evaluation, it generates structured **Feedback**, pointing out specific problems and improvement suggestions.
+3. **Refinement**: Finally, the agent uses the "first draft" and "feedback" as new context, calls the large language model again, and asks it to revise the first draft based on the feedback content, generating a more complete "revised draft."
+
+As shown in Figure 4.3, this loop can be repeated multiple times until the reflection phase no longer finds new problems or reaches a preset iteration limit. We can formally express this iterative optimization process. Assuming $O_i$ is the output produced by the $i$-th iteration ($O_0$ is the initial output), the reflection model $\pi_{\text{reflect}}$ generates feedback $F_i$ for $O_i$:
+$$
+F_i = \pi_{\text{reflect}}(\text{Task}, O_i)
+$$
+Subsequently, the refinement model $\pi_{\text{refine}}$ combines the original task, the previous version's output, and feedback to generate a new version's output $O_{i+1}$:
+$$
+O_{i+1} = \pi_{\text{refine}}(\text{Task}, O_i, F_i)
+$$
+
+
+
+<div align="center">
+<img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/4-figures/4-3.png" alt="Execute-Reflect-Refine iterative loop in Reflection mechanism" width="70%"/>
+<p>Figure 4.3 Execute-Reflect-Refine Iterative Loop in Reflection Mechanism</p>
+</div>
+
+
+
+Compared to the previous two paradigms, the value of Reflection lies in:
+
+- It provides the agent with an internal error correction loop, making it no longer completely dependent on external tool feedback (ReAct's Observation), thus able to correct higher-level logical and strategic errors.
+- It transforms one-time task execution into a continuous optimization process, significantly improving the final success rate and answer quality for complex tasks.
+- It builds a temporary **"short-term memory"** for the agent. The entire "execute-reflect-refine" trajectory forms a valuable experience record; the agent not only knows the final answer but also remembers how it iterated from a flawed first draft to the final version. Furthermore, this memory system can also be **multimodal**, allowing the agent to reflect on and revise outputs beyond text (such as code, images, etc.), laying the foundation for building more powerful multimodal agents.
+
+### 4.4.2 Case Setting and Memory Module Design
+
+To embody the Reflection mechanism in practice, we will introduce a memory management mechanism, because reflection usually corresponds to information storage and retrieval. If the context is long enough, having the "reviewer" directly obtain all information and then reflect often introduces a lot of redundant information. In this practical step, we mainly complete **code generation and iterative optimization**.
+
+The goal task for this step is: "Write a Python function to find all prime numbers between 1 and n."
+
+This task is an excellent scenario for testing the Reflection mechanism:
+
+1. **Clear Optimization Path Exists**: The code initially generated by the large language model is likely a simple but inefficient recursive implementation.
+2. **Clear Reflection Points**: Through reflection, problems like "excessively high time complexity" or "redundant calculations" can be discovered.
+3. **Clear Optimization Direction**: Based on feedback, it can be optimized to a more efficient iterative version or a version using the memoization pattern.
+
+The core of Reflection lies in iteration, and the prerequisite for iteration is the ability to remember previous attempts and received feedback. Therefore, a "short-term memory" module is essential for implementing this paradigm. This memory module will be responsible for storing the complete trajectory of each "execute-reflect" loop.
+
+```python
+from typing import List, Dict, Any, Optional
+
+class Memory:
+    """
+    A simple short-term memory module for storing the agent's action and reflection trajectory.
+    """
+
+    def __init__(self):
+        """
+        Initialize an empty list to store all records.
+        """
+        self.records: List[Dict[str, Any]] = []
+
+    def add_record(self, record_type: str, content: str):
+        """
+        Add a new record to memory.
+
+        Parameters:
+        - record_type (str): Type of record ('execution' or 'reflection').
+        - content (str): Specific content of the record (e.g., generated code or reflection feedback).
+        """
+        record = {"type": record_type, "content": content}
+        self.records.append(record)
+        print(f"📝 Memory updated, added a '{record_type}' record.")
+
+    def get_trajectory(self) -> str:
+        """
+        Format all memory records into a coherent string text for building prompts.
+        """
+        trajectory_parts = []
+        for record in self.records:
+            if record['type'] == 'execution':
+                trajectory_parts.append(f"--- Previous Attempt (Code) ---\n{record['content']}")
+            elif record['type'] == 'reflection':
+                trajectory_parts.append(f"--- Reviewer Feedback ---\n{record['content']}")
+
+        return "\n\n".join(trajectory_parts)
+
+    def get_last_execution(self) -> Optional[str]:
+        """
+        Get the most recent execution result (e.g., the latest generated code).
+        Returns None if it doesn't exist.
+        """
+        for record in reversed(self.records):
+            if record['type'] == 'execution':
+                return record['content']
+        return None
+```
+
+The design of this `Memory` class is relatively concise, with the main structure as follows:
+
+- Uses a list `records` to store each action and reflection in order.
+- The `add_record` method is responsible for adding new entries to memory.
+- The `get_trajectory` method is the core; it "serializes" the memory trajectory into a text segment that can be directly inserted into subsequent prompts, providing complete context for the model's reflection and optimization.
+- `get_last_execution` makes it convenient to obtain the latest "first draft" for reflection.
+
+
+
+### 4.4.3 Coding Implementation of Reflection Agent
+
+With the `Memory` module as a foundation, we can now proceed to build the core logic of `ReflectionAgent`. The entire agent's workflow will revolve around the "execute-reflect-refine" loop we discussed earlier and guide the large language model to play different roles through carefully designed prompts.
+
+(1) Prompt Design
+
+Unlike previous paradigms, the Reflection mechanism requires multiple prompts for different roles to work together.
+
+1. **Initial Execution Prompt**: This is the prompt for the agent's first attempt to solve the problem, with relatively straightforward content, only requiring the model to complete the specified task.
+
+```bash
+INITIAL_PROMPT_TEMPLATE = """
+You are a senior Python programmer. Please write a Python function according to the following requirements.
+Your code must include a complete function signature, docstring, and follow PEP 8 coding standards.
+
+Requirement: {task}
+
+Please output the code directly without any additional explanations.
+"""
+```
+
+2. **Reflection Prompt**: This prompt is the soul of the Reflection mechanism. It instructs the model to play the role of a "code reviewer," critically analyze the code generated in the previous round, and provide specific, actionable feedback.
+
+````bash
+REFLECT_PROMPT_TEMPLATE = """
+You are an extremely strict code review expert and senior algorithm engineer with ultimate requirements for code performance.
+Your task is to review the following Python code and focus on finding its main bottlenecks in <strong>algorithm efficiency</strong>.
+
+# Original Task:
+{task}
+
+# Code to Review:
+```python
+{code}
+```
+
+Please analyze the time complexity of this code and consider whether there is an <strong>algorithmically superior</strong> solution to significantly improve performance.
+If one exists, please clearly point out the deficiencies of the current algorithm and propose specific, feasible algorithm improvement suggestions (e.g., using sieve method instead of trial division).
+Only if the code has reached optimality at the algorithm level can you answer "no improvement needed."
+
+Please output your feedback directly without any additional explanations.
+"""
+````
+
+3. **Refinement Prompt**: After receiving feedback, this prompt will guide the model to revise and optimize the original code based on the feedback content.
+
+````bash
+
+REFINE_PROMPT_TEMPLATE = """
+You are a senior Python programmer. You are optimizing your code based on feedback from a code review expert.
+
+# Original Task:
+{task}
+
+# Your Previous Code Attempt:
+```
+{last_code_attempt}
+Reviewer's Feedback:
+{feedback}
+
+Please generate an optimized new version of the code based on the reviewer's feedback.
+Your code must include a complete function signature, docstring, and follow PEP 8 coding standards.
+Please output the optimized code directly without any additional explanations.
+"""
+````
+
+(2) Agent Encapsulation and Implementation
+
+Now, we will integrate this set of prompt logic and the `Memory` module into the `ReflectionAgent` class.
+
+```python
+# Assume llm_client.py and memory.py are already defined
+# from llm_client import HelloAgentsLLM
+# from memory import Memory
+
+class ReflectionAgent:
+    def __init__(self, llm_client, max_iterations=3):
+        self.llm_client = llm_client
+        self.memory = Memory()
+        self.max_iterations = max_iterations
+
+    def run(self, task: str):
+        print(f"\n--- Starting to Process Task ---\nTask: {task}")
+
+        # --- 1. Initial Execution ---
+        print("\n--- Performing Initial Attempt ---")
+        initial_prompt = INITIAL_PROMPT_TEMPLATE.format(task=task)
+        initial_code = self._get_llm_response(initial_prompt)
+        self.memory.add_record("execution", initial_code)
+
+        # --- 2. Iterative Loop: Reflection and Refinement ---
+        for i in range(self.max_iterations):
+            print(f"\n--- Iteration {i+1}/{self.max_iterations} ---")
+
+            # a. Reflection
+            print("\n-> Performing Reflection...")
+            last_code = self.memory.get_last_execution()
+            reflect_prompt = REFLECT_PROMPT_TEMPLATE.format(task=task, code=last_code)
+            feedback = self._get_llm_response(reflect_prompt)
+            self.memory.add_record("reflection", feedback)
+
+            # b. Check if stopping is needed
+            if "no improvement needed" in feedback.lower():
+                print("\n✅ Reflection considers code needs no improvement, task completed.")
+                break
+
+            # c. Refinement
+            print("\n-> Performing Refinement...")
+            refine_prompt = REFINE_PROMPT_TEMPLATE.format(
+                task=task,
+                last_code_attempt=last_code,
+                feedback=feedback
+            )
+            refined_code = self._get_llm_response(refine_prompt)
+            self.memory.add_record("execution", refined_code)
+
+        final_code = self.memory.get_last_execution()
+        print(f"\n--- Task Completed ---\nFinal Generated Code:\n```python\n{final_code}\n```")
+        return final_code
+
+    def _get_llm_response(self, prompt: str) -> str:
+        """A helper method for calling LLM and getting complete streaming response."""
+        messages = [{"role": "user", "content": prompt}]
+        response_text = self.llm_client.think(messages=messages) or ""
+        return response_text
+
+```
+
+### 4.4.4 Running Instance and Analysis
+
+The complete code can also be found in the `code` folder of this book's accompanying code repository; here we provide an output instance.
+
+````python
+--- Starting to Process Task ---
+Task: Write a Python function to find all prime numbers between 1 and n.
+
+--- Performing Initial Attempt ---
+🧠 Calling xxxxxx model...
+✅ Large language model response successful:
+```python
+def find_primes(n):
+    ...
+    return primes
+```
+📝 Memory updated, added an 'execution' record.
+
+--- Iteration 1/2 ---
+
+-> Performing Reflection...
+🧠 Calling xxxxxx model...
+✅ Large language model response successful:
+The current code has a time complexity of O(n * sqrt(n)). While this implementation is acceptable for smaller n values, performance will significantly degrade when n is very large. The main bottleneck is that each number needs trial division checking, leading to high time overhead.
+
+It is recommended to use the Sieve of Eratosthenes algorithm, which has a time complexity of O(n log(log n)) and can significantly improve the efficiency of finding prime numbers.
+
+Improved code as follows:
+```python
+def find_primes(n):
+    ...
+    return primes
+```
+📝 Memory updated, added a 'reflection' record.
+
+-> Performing Refinement...
+🧠 Calling xxxxxx model...
+✅ Large language model response successful:
+```python
+def find_primes(n):
+    ...
+    return primes
+```
+📝 Memory updated, added an 'execution' record.
+
+--- Iteration 2/2 ---
+
+-> Performing Reflection...
+🧠 Calling xxxxxx model...
+✅ Large language model response successful:
+The current code uses the Sieve of Eratosthenes with time complexity O(n log log n) and space complexity O(n). This algorithm is already very efficient for finding all prime numbers between 1 and n, and usually requires no further optimization. However, in some specific scenarios, the following improvements can be considered:
+
+1. <strong>Segmented Sieve</strong>: Suitable for cases where n is very large but memory is limited. Divide the interval into multiple small segments, process each segment separately with the sieve method, reducing memory usage.
+2. <strong>Odd Number Sieve</strong>: Except for 2, all prime numbers are odd. When initializing the `is_prime` array, only mark odd numbers, which can reduce space complexity by half while reducing some unnecessary calculations.
+
+However, these improvements are not necessary for most application scenarios because the standard Sieve of Eratosthenes is already efficient enough. Therefore, in general cases, <strong>no improvement needed</strong>.
+📝 Memory updated, added a 'reflection' record.
+
+✅ Reflection considers code needs no improvement, task completed.
+
+--- Task Completed ---
+Final Generated Code:
+```python
+def find_primes(n):
+    """
+    Finds all prime numbers between 1 and n using the Sieve of Eratosthenes algorithm.
+
+    :param n: The upper limit of the range to find prime numbers.
+    :return: A list of all prime numbers between 1 and n.
+    """
+    if n < 2:
+        return []
+
+    is_prime = [True] * (n + 1)
+    is_prime[0] = is_prime[1] = False
+
+    p = 2
+    while p * p <= n:
+        if is_prime[p]:
+            for i in range(p * p, n + 1, p):
+                is_prime[i] = False
+        p += 1
+
+    primes = [num for num in range(2, n + 1) if is_prime[num]]
+    return primes
+```
+````
+
+This running instance demonstrates how the Reflection mechanism drives the agent to perform deep optimization:
+
+1. **Effective "Criticism" is the Prerequisite for Optimization**: In the first round of reflection, because we used an "extremely strict" and "focused on algorithm efficiency" prompt, the agent was not satisfied with the functionally correct initial code but precisely pointed out its `O(n * sqrt(n))` time complexity bottleneck and proposed algorithm-level improvement suggestions—the Sieve of Eratosthenes.
+2. **Iterative Improvement**: After receiving clear feedback, the agent successfully implemented a more efficient sieve method in the refinement phase, reducing algorithm complexity to `O(n log log n)`, completing the first meaningful self-iteration.
+3. **Convergence and Termination**: In the second round of reflection, facing the already efficient sieve method, the agent demonstrated deeper knowledge. It not only affirmed the current algorithm's efficiency but even mentioned more advanced optimization directions like segmented sieve, but ultimately made the correct judgment of "no improvement needed in general cases." This judgment triggered our termination condition, allowing the optimization process to converge.
+
+This case fully proves that a well-designed Reflection mechanism's value lies not only in fixing errors but more importantly in **driving solutions to achieve step-wise improvements in quality and efficiency**, making it one of the key technologies for building complex, high-quality agents.
+
+### 4.4.5 Cost-Benefit Analysis of Reflection Mechanism
+
+Although the Reflection mechanism performs excellently in improving task solution quality, this capability is not without cost. In practical applications, we need to weigh the benefits it brings against the corresponding costs.
+
+(1) Main Costs
+
+1. **Increased Model Call Overhead**: This is the most direct cost. Each iteration requires at least two additional large language model calls (one for reflection, one for refinement). If iterating multiple rounds, API call costs and computational resource consumption will increase exponentially.
+
+2. **Significantly Increased Task Latency**: Reflection is a serial process; each round of refinement must wait for the previous round's reflection to complete. This significantly extends the total task time, making it unsuitable for scenarios with high real-time requirements.
+
+3. **Increased Prompt Engineering Complexity**: As our case demonstrates, the success of Reflection largely depends on high-quality, targeted prompts. Designing and debugging effective prompts for different stages like "execution," "reflection," and "refinement" requires more development effort.
+
+(2) Core Benefits
+
+1. **Leap in Solution Quality**: The greatest benefit is that it can iteratively optimize a "qualified" initial solution into an "excellent" final solution. This improvement from functionally correct to performance-efficient, from rough logic to rigorous logic, is crucial in many critical tasks.
+
+2. **Enhanced Robustness and Reliability**: Through internal self-correction loops, the agent can discover and fix potential logical flaws, factual errors, or improper boundary case handling in the initial solution, greatly improving the reliability of the final result.
+
+In summary, the Reflection mechanism is a typical "cost for quality" strategy. It is very suitable for scenarios that **have extremely high requirements for the quality, accuracy, and reliability of final results, and have relatively relaxed requirements for task completion real-time performance**. For example:
+
+- Generating critical business code or technical reports.
+- Conducting complex logical reasoning in scientific research.
+- Decision support systems requiring deep analysis and planning.
+
+Conversely, if the application scenario requires quick responses, or a "roughly correct" answer is already sufficient, using lighter ReAct or Plan-and-Solve paradigms may be a more cost-effective choice.
+
+## 4.5 Chapter Summary
+
+In this chapter, building on the large language model knowledge mastered in Chapter 3, we coded and implemented three classic industry agent construction paradigms from scratch through "building wheels ourselves": ReAct, Plan-and-Solve, and Reflection. We not only explored their core working principles but also deeply understood their respective advantages, limitations, and applicable scenarios through specific practical cases.
+
+**Core Knowledge Review:**
+
+1. ReAct: We built a ReAct agent that can interact with the external world. Through the dynamic loop of "thought-action-observation," it successfully used search engines to answer real-time questions that its own knowledge base couldn't cover. Its core advantages lie in **environmental adaptability** and **dynamic error correction capability**, making it the first choice for handling exploratory tasks requiring external tool input.
+2. Plan-and-Solve: We implemented a Plan-and-Solve agent that plans first then executes, and used it to solve math word problems requiring multi-step reasoning. It decomposes complex tasks into clear steps, then executes them one by one. Its core advantages lie in **structure** and **stability**, particularly suitable for handling tasks with determined logical paths and intensive internal reasoning.
+3. Reflection (Self-Reflection and Iteration): We built a Reflection agent with self-optimization capabilities. By introducing the "execute-reflect-refine" iterative loop, it successfully optimized an initially inefficient code solution into an algorithmically superior high-performance version. Its core value lies in **significantly improving solution quality**, suitable for scenarios with extremely high requirements for result accuracy and reliability.
+
+The three paradigms explored in this chapter represent three different strategies for agents to solve problems, as shown in Table 4.1. In practical applications, which one to choose depends on the core requirements of the task:
+
+<div align="center">
+<p>Table 4.1 Selection Strategy for Different Agent Loops</p>
+<img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/4-figures/4-4.png" alt="" width="70%"/>
+</div>
+
+At this point, we have mastered the core technologies for building individual agents. To transition knowledge and gain deeper insights into practical applications, in the next section we will explore how to use different low-code platforms and lightweight code solutions for building agents.
+
+## Exercises
+
+> **Note**: Some exercises do not have standard answers; the focus is on cultivating learners' comprehensive understanding and practical ability in agent paradigm design.
+
+1. This chapter introduced three classic agent paradigms: `ReAct`, `Plan-and-Solve`, and `Reflection`. Please analyze:
+
+   - What are the essential differences in how these three paradigms organize "thinking" and "action"?
+   - If you were to design a "smart home control assistant" (needs to control lights, air conditioning, curtains, and other devices, and automatically adjust based on user habits), which paradigm would you choose as the basic architecture? Why?
+   - Can these three paradigms be combined? If so, please try to design a hybrid paradigm agent architecture and explain its applicable scenarios.
+
+2. In the `ReAct` implementation in Section 4.2, we used regular expressions to parse the large language model's output (such as `Thought` and `Action`). Please consider:
+
+   - What potential fragilities exist in the current parsing method? Under what circumstances might it fail?
+   - Besides regular expressions, what are some more robust output parsing solutions?
+   - Try modifying the code in this chapter to use a more reliable output format, and compare the pros and cons of the two approaches.
+
+3. Tool invocation is one of the core capabilities of modern agents. Based on the `ToolExecutor` design in Section 4.2.2, please complete the following extension practice:
+
+   > **Note**: This is a hands-on practice question; it is recommended to actually write code.
+
+   - Add a "calculator" tool to the `ReAct` agent so it can handle complex mathematical calculation problems (such as "Calculate the result of `(123 + 456) × 789 / 12 = ?`").
+   - Design and implement a "tool selection failure" handling mechanism: when the agent repeatedly calls the wrong tool or provides wrong parameters, how should the system guide it to correct?
+   - Consider: If the number of callable tools increases to 50 or even 100, will the current tool description method still work effectively? From an engineering perspective, how can we optimize the organization and retrieval mechanism of tools when the number of callable tools significantly increases with business needs?
+
+4. The `Plan-and-Solve` paradigm decomposes tasks into two stages: "planning" and "execution." Please analyze in depth:
+
+   - In the implementation in Section 4.3, the plan generated in the planning phase is "static" (generated once, not modifiable). If during execution it is found that a certain step cannot be completed or the result does not meet expectations, how should a "dynamic replanning" mechanism be designed?
+   - Compare `Plan-and-Solve` with `ReAct`: When handling a task like "booking a business trip from Beijing to Shanghai (including flights, hotels, car rental)," which paradigm is more suitable? Why?
+   - Try designing a "hierarchical planning" system: first generate a high-level abstract plan, then generate detailed sub-plans for each high-level step. What advantages does this design have?
+
+5. The `Reflection` mechanism improves output quality through the "execute-reflect-refine" loop. Please consider:
+
+   - In the code generation case in Section 4.4, the same model is used for different stages. If two different models are used (for example, using a more powerful model for reflection and a faster model for execution), what impact would it have?
+   - The termination condition for the `Reflection` mechanism is "feedback contains **no improvement needed**" or "maximum iteration count reached." Is this design reasonable? Can a more intelligent termination condition be designed?
+   - Suppose you want to build an "academic paper writing assistant" that can generate drafts and continuously optimize paper content. Please design a multi-dimensional Reflection mechanism that reflects and improves from multiple perspectives such as paragraph logic, method innovation, language expression, and citation standards.
+
+6. Prompt engineering is a key technology affecting the final effect of agents. This chapter demonstrated multiple carefully designed prompt templates. Please analyze:
+
+   - Compare the `ReAct` prompt in Section 4.2.3 and the `Plan-and-Solve` prompt in Section 4.3.2; they obviously have significant differences in structural design. How do these differences serve the core logic of their respective paradigms?
+   - In the `Reflection` prompt in Section 4.4.3, we used a role setting like "you are an extremely strict code review expert." Try modifying this role setting (such as changing it to "you are an open-source project maintainer who values code readability"), observe the changes in output results, and summarize the impact of role settings on agent behavior.
+   - Adding `few-shot` examples to prompts can often significantly improve the model's ability to follow specific formats. Please try adding `few-shot` examples to one of the agents in this chapter and compare the effects.
+
+7. An e-commerce startup now hopes to use a "customer service agent" to replace human customer service for cost reduction and efficiency improvement. It needs to have the following functions:
+
+   a. Understand the user's refund request reason
+
+   b. Query the user's order information and logistics status
+
+   c. Intelligently judge whether the refund should be approved based on company policy
+
+   d. Generate a proper reply email and send it to the user's email
+
+   e. If the judgment decision is somewhat controversial (self-confidence is below a threshold), be able to self-reflect and provide more prudent suggestions
+
+   As the product manager of this product:
+   - Which paradigm (or combination of paradigms) from this chapter would you choose as the core architecture of the system?
+   - What tools does this system need? Please list at least 3 tools and their functional descriptions.
+   - How to design prompts to ensure that the agent's decisions both align with company interests and maintain a friendly attitude toward users?
+   - What risks and challenges might this product face after launch? How can these risks be reduced through technical means?
+
+## References
+
+[1] Yao S, Zhao J, Yu D, et al. React: Synergizing reasoning and acting in language models[C]//International Conference on Learning Representations (ICLR). 2023.
+
+[2] Wang L, Xu W, Lan Y, et al. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models[J]. arXiv preprint arXiv:2305.04091, 2023.
+
+[3] Shinn N, Cassano F, Gopinath A, et al. Reflexion: Language agents with verbal reinforcement learning[J]. Advances in Neural Information Processing Systems, 2023, 36: 8634-8652.
+

+ 83 - 79
docs/chapter4/第四章 智能体经典范式构建.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter4-Building-Classic-Agent-Paradigms.md">English</a> | 中文
+</div>
+
 # 第四章 智能体经典范式构建
 
 在上一章中,我们深入探讨了作为现代智能体“大脑”的大语言模型。我们了解了其内部的Transformer架构、与之交互的方法,以及它的能力边界。现在,是时候将这些理论知识转化为实践,亲手构建智能体了。
@@ -6,9 +10,9 @@
 
 为了更好地组织智能体的“思考”与“行动”过程,业界涌现出了多种经典的架构范式。在本章中,我们将聚焦于其中最具代表性的三种,并一步步从零实现它们:
 
-- **ReAct (Reasoning and Acting):** 一种将“思考”和“行动”紧密结合的范式,让智能体边想边做,动态调整。
-- **Plan-and-Solve:** 一种“三思而后行”的范式,智能体首先生成一个完整的行动计划,然后严格执行。
-- **Reflection:** 一种赋予智能体“反思”能力的范式,通过自我批判和修正来优化结果。
+- **ReAct (Reasoning and Acting)** 一种将“思考”和“行动”紧密结合的范式,让智能体边想边做,动态调整。
+- **Plan-and-Solve** 一种“三思而后行”的范式,智能体首先生成一个完整的行动计划,然后严格执行。
+- **Reflection** 一种赋予智能体“反思”能力的范式,通过自我批判和修正来优化结果。
 
 了解了这些之后,你可能会问,市面上已有LangChain、LlamaIndex等众多优秀框架,为何还要“重复造轮子”?答案在于,尽管成熟的框架在工程效率上优势显著,但直接使用高度抽象的工具,并不利于我们了解背后的设计机制是怎么运行的,或者是有何好处。其次,这个过程会暴露出项目的工程挑战。框架为我们处理了许多问题,例如模型输出格式的解析、工具调用失败的重试、防止智能体陷入死循环等。亲手处理这些问题,是培养系统设计能力的最直接方式。最后,也是最重要的一点,掌握了设计原理,你才能真正地从一个框架的“使用者”转变为一个智能体应用的“创造者”。当标准组件无法满足你的复杂需求时,你将拥有深度定制乃至从零构建一个全新智能体的能力。
 
@@ -141,9 +145,9 @@ if __name__ == '__main__':
 
 ReAct的巧妙之处在于,它认识到<strong>思考与行动是相辅相成的</strong>。思考指导行动,而行动的结果又反过来修正思考。为此,ReAct范式通过一种特殊的提示工程来引导模型,使其每一步的输出都遵循一个固定的轨迹:
 
-- <strong>Thought (思考):</strong> 这是智能体的“内心独白”。它会分析当前情况、分解任务、制定下一步计划,或者反思上一步的结果。
-- <strong>Action (行动):</strong> 这是智能体决定采取的具体动作,通常是调用一个外部工具,例如 `Search['华为最新款手机']`。
-- <strong>Observation (观察):</strong> 这是执行`Action`后从外部工具返回的结果,例如搜索结果的摘要或API的返回值。
+- <strong>Thought (思考)</strong> 这是智能体的“内心独白”。它会分析当前情况、分解任务、制定下一步计划,或者反思上一步的结果。
+- <strong>Action (行动)</strong> 这是智能体决定采取的具体动作,通常是调用一个外部工具,例如 `Search['华为最新款手机']`。
+- <strong>Observation (观察)</strong> 这是执行`Action`后从外部工具返回的结果,例如搜索结果的摘要或API的返回值。
 
 智能体将不断重复这个 <strong>Thought -> Action -> Observation</strong> 的循环,将新的观察结果追加到历史记录中,形成一个不断增长的上下文,直到它在`Thought`中认为已经找到了最终答案,然后输出结果。这个过程形成了一个强大的协同效应:<strong>推理使得行动更具目的性,而行动则为推理提供了事实依据。</strong>
 
@@ -196,9 +200,9 @@ SERPAPI_API_KEY="YOUR_SERPAPI_API_KEY"
 
 一个良好定义的工具应包含以下三个核心要素:
 
-1. <strong>名称 (Name)</strong>: 一个简洁、唯一的标识符,供智能体在 `Action` 中调用,例如 `Search`。
-2. <strong>描述 (Description)</strong>: 一段清晰的自然语言描述,说明这个工具的用途。<strong>这是整个机制中最关键的部分</strong>,因为大语言模型会依赖这段描述来判断何时使用哪个工具。
-3. <strong>执行逻辑 (Execution Logic)</strong>: 真正执行任务的函数或方法。
+1. <strong>名称 (Name)</strong> 一个简洁、唯一的标识符,供智能体在 `Action` 中调用,例如 `Search`。
+2. <strong>描述 (Description)</strong> 一段清晰的自然语言描述,说明这个工具的用途。<strong>这是整个机制中最关键的部分</strong>,因为大语言模型会依赖这段描述来判断何时使用哪个工具。
+3. <strong>执行逻辑 (Execution Logic)</strong> 真正执行任务的函数或方法。
 
 我们的第一个工具是 `search` 函数,它的作用是接收一个查询字符串,然后返回搜索结果。
 
@@ -214,7 +218,7 @@ def search(query: str) -> str:
     try:
         api_key = os.getenv("SERPAPI_API_KEY")
         if not api_key:
-            return "错误SERPAPI_API_KEY 未在 .env 文件中配置。"
+            return "错误:SERPAPI_API_KEY 未在 .env 文件中配置。"
 
         params = {
             "engine": "google",
@@ -227,7 +231,7 @@ def search(query: str) -> str:
         client = SerpApiClient(params)
         results = client.get_dict()
         
-        # 智能解析优先寻找最直接的答案
+        # 智能解析:优先寻找最直接的答案
         if "answer_box_list" in results:
             return "\n".join(results["answer_box_list"])
         if "answer_box" in results and "answer" in results["answer_box"]:
@@ -269,7 +273,7 @@ class ToolExecutor:
         向工具箱中注册一个新工具。
         """
         if name in self.tools:
-            print(f"警告工具 '{name}' 已存在,将被覆盖。")
+            print(f"警告:工具 '{name}' 已存在,将被覆盖。")
         self.tools[name] = {"description": description, "func": func}
         print(f"工具 '{name}' 已注册。")
 
@@ -319,7 +323,7 @@ if __name__ == '__main__':
         print("--- 观察 (Observation) ---")
         print(observation)
     else:
-        print(f"错误未找到名为 '{tool_name}' 的工具。")
+        print(f"错误:未找到名为 '{tool_name}' 的工具。")
         
 >>>
 工具 'Search' 已注册。
@@ -357,18 +361,18 @@ DRIVE AGX. 强大的车载计算能力,适用于AI 驱动的智能汽车系统
 REACT_PROMPT_TEMPLATE = """
 请注意,你是一个有能力调用外部工具的智能助手。
 
-可用工具如下
+可用工具如下:
 {tools}
 
-请严格按照以下格式进行回应
+请严格按照以下格式进行回应:
 
 Thought: 你的思考过程,用于分析问题、拆解任务和规划下一步行动。
-Action: 你决定采取的行动,必须是以下格式之一
-- `{tool_name}[{tool_input}]`调用一个可用工具。
-- `Finish[最终答案]`当你认为已经获得最终答案时。
+Action: 你决定采取的行动,必须是以下格式之一:
+- `{tool_name}[{tool_input}]`:调用一个可用工具。
+- `Finish[最终答案]`:当你认为已经获得最终答案时。
 - 当你收集到足够的信息,能够回答用户的最终问题时,你必须在Action:字段后使用 finish(answer="...") 来输出最终答案。
 
-现在,请开始解决以下问题
+现在,请开始解决以下问题:
 Question: {question}
 History: {history}
 """
@@ -376,10 +380,10 @@ History: {history}
 
 这个模板定义了智能体与LLM之间交互的规范:
 
-- <strong>角色定义</strong>: “你是一个有能力调用外部工具的智能助手”,设定了LLM的角色。
-- <strong>工具清单 (`{tools}`)</strong>: 告知LLM它有哪些可用的“手脚”。
-- <strong>格式规约 (`Thought`/`Action`)</strong>: 这是最重要的部分,它强制LLM的输出具有结构性,使我们能通过代码精确解析其意图。
-- <strong>动态上下文 (`{question}`/`{history}`)</strong>: 将用户的原始问题和不断累积的交互历史注入,让LLM基于完整的上下文进行决策。
+- <strong>角色定义</strong> “你是一个有能力调用外部工具的智能助手”,设定了LLM的角色。
+- <strong>工具清单 (`{tools}`)</strong> 告知LLM它有哪些可用的“手脚”。
+- <strong>格式规约 (`Thought`/`Action`)</strong> 这是最重要的部分,它强制LLM的输出具有结构性,使我们能通过代码精确解析其意图。
+- <strong>动态上下文 (`{question}`/`{history}`)</strong> 将用户的原始问题和不断累积的交互历史注入,让LLM基于完整的上下文进行决策。
 
 (2)核心循环的实现
 
@@ -418,7 +422,7 @@ class ReActAgent:
             response_text = self.llm_client.think(messages=messages)
             
             if not response_text:
-                print("错误LLM未能返回有效响应。")
+                print("错误:LLM未能返回有效响应。")
                 break
 
             # ... (后续的解析、执行、整合步骤)
@@ -449,8 +453,8 @@ LLM 返回的是纯文本,我们需要从中精确地提取出`Thought`和`Act
         return None, None
 ```
 
-- `_parse_output`: 负责从LLM的完整响应中分离出`Thought`和`Action`两个主要部分。
-- `_parse_action`: 负责进一步解析`Action`字符串,例如从 `Search[华为最新手机]` 中提取出工具名 `Search` 和工具输入 `华为最新手机`。
+- `_parse_output` 负责从LLM的完整响应中分离出`Thought`和`Action`两个主要部分。
+- `_parse_action` 负责进一步解析`Action`字符串,例如从 `Search[华为最新手机]` 中提取出工具名 `Search` 和工具输入 `华为最新手机`。
 
 (4) 工具调用与执行
 
@@ -463,7 +467,7 @@ LLM 返回的是纯文本,我们需要从中精确地提取出`Thought`和`Act
                 print(f"思考: {thought}")
 
             if not action:
-                print("警告未能解析出有效的Action,流程终止。")
+                print("警告:未能解析出有效的Action,流程终止。")
                 break
 
             # 4. 执行Action
@@ -482,7 +486,7 @@ LLM 返回的是纯文本,我们需要从中精确地提取出`Thought`和`Act
             
             tool_function = self.tool_executor.getTool(tool_name)
             if not tool_function:
-                observation = f"错误未找到名为 '{tool_name}' 的工具。"
+                observation = f"错误:未找到名为 '{tool_name}' 的工具。"
             else:
                 observation = tool_function(tool_input) # 调用真实工具
 
@@ -530,7 +534,7 @@ Action: Search[华为最新手机型号及主要卖点]
 智能手机 ; Mate 系列. 非凡旗舰 · HUAWEI Mate XTs. 非凡大师 ; Pura 系列. 先锋影像 · HUAWEI Pura 80 Pro+ ; Pocket 系列. 美学新篇. HUAWEI Pocket 2 ; nova 系列. 专业人像.
 
 [2] 2025年华为手机哪一款性价比高?华为手机推荐与市场分析 ...
-现在华为手机最大的卖点只剩下鸿蒙HarmonyOS系统,以及饱受争议的品牌信仰。 这里推荐目前值得入手的几款华为系列手机,根据不同预算自行选择. 华为目前最受欢迎,也是搭载 ...
+现在华为手机最大的卖点只剩下鸿蒙HarmonyOS系统,以及饱受争议的品牌信仰。 这里推荐目前值得入手的几款华为系列手机,根据不同预算自行选择:. 华为目前最受欢迎,也是搭载 ...
 
 [3] 2025年华为新款手机哪个性价比高?10款华为新款手机推荐
 选华为主要还是要推荐高端手机,Mate 70和Pura 70系列是最新发布的旗舰机型。 HUAWEI Mate 70. 优点是,拍照配置依旧顶级,全焦段覆盖,适合专业摄影,做工出色,户外抗摔 ...
@@ -589,8 +593,8 @@ Plan-and-Solve Prompting 由 Lei Wang 在2023年提出<sup>[2]</sup>。其核心
 
 与 ReAct 将思考和行动融合在每一步不同,Plan-and-Solve 将整个流程解耦为两个核心阶段,如图4.2所示:
 
-1. <strong>规划阶段 (Planning Phase)</strong>: 首先,智能体会接收用户的完整问题。它的第一个任务不是直接去解决问题或调用工具,而是<strong>将问题分解,并制定出一个清晰、分步骤的行动计划</strong>。这个计划本身就是一次大语言模型的调用产物。
-2. <strong>执行阶段 (Solving Phase)</strong>: 在获得完整的计划后,智能体进入执行阶段。它会<strong>严格按照计划中的步骤,逐一执行</strong>。每一步的执行都可能是一次独立的 LLM 调用,或者是对上一步结果的加工处理,直到计划中的所有步骤都完成,最终得出答案。
+1. <strong>规划阶段 (Planning Phase)</strong> 首先,智能体会接收用户的完整问题。它的第一个任务不是直接去解决问题或调用工具,而是<strong>将问题分解,并制定出一个清晰、分步骤的行动计划</strong>。这个计划本身就是一次大语言模型的调用产物。
+2. <strong>执行阶段 (Solving Phase)</strong> 在获得完整的计划后,智能体进入执行阶段。它会<strong>严格按照计划中的步骤,逐一执行</strong>。每一步的执行都可能是一次独立的 LLM 调用,或者是对上一步结果的加工处理,直到计划中的所有步骤都完成,最终得出答案。
 
 这种“先谋后动”的策略,使得智能体在处理需要长远规划的复杂任务时,能够保持更高的目标一致性,避免在中间步骤中迷失方向。
 
@@ -650,9 +654,9 @@ PLANNER_PROMPT_TEMPLATE = """
 ````
 
 这个提示词通过以下几点确保了输出的质量和稳定性:
-- <strong>角色设定</strong>: “顶级的AI规划专家”,激发模型的专业能力。
-- <strong>任务描述</strong>: 清晰地定义了“分解问题”的目标。
-- <strong>格式约束</strong>: 强制要求输出为一个 Python 列表格式的字符串,这极大地简化了后续代码的解析工作,使其比解析自然语言更稳定、更可靠。
+- <strong>角色设定</strong> “顶级的AI规划专家”,激发模型的专业能力。
+- <strong>任务描述</strong> 清晰地定义了“分解问题”的目标。
+- <strong>格式约束</strong> 强制要求输出为一个 Python 列表格式的字符串,这极大地简化了后续代码的解析工作,使其比解析自然语言更稳定、更可靠。
 
 接下来,我们将这个提示词逻辑封装成一个 `Planner` 类,这个类也是我们的规划器。
 
@@ -701,10 +705,10 @@ class Planner:
 
 执行器的提示词与规划器不同。它的目标不是分解问题,而是<strong>在已有上下文的基础上,专注解决当前这一个步骤</strong>。因此,提示词需要包含以下关键信息:
 
-- <strong>原始问题</strong>: 确保模型始终了解最终目标。
-- <strong>完整计划</strong>: 让模型了解当前步骤在整个任务中的位置。
-- <strong>历史步骤与结果</strong>: 提供至今为止已经完成的工作,作为当前步骤的直接输入。
-- <strong>当前步骤</strong>: 明确指示模型现在需要解决哪一个具体任务。
+- <strong>原始问题</strong> 确保模型始终了解最终目标。
+- <strong>完整计划</strong> 让模型了解当前步骤在整个任务中的位置。
+- <strong>历史步骤与结果</strong> 提供至今为止已经完成的工作,作为当前步骤的直接输入。
+- <strong>当前步骤</strong> 明确指示模型现在需要解决哪一个具体任务。
 
 ```python
 EXECUTOR_PROMPT_TEMPLATE = """
@@ -781,7 +785,7 @@ class PlanAndSolveAgent:
 
     def run(self, question: str):
         """
-        运行智能体的完整流程先规划,后执行。
+        运行智能体的完整流程:先规划,后执行。
         """
         print(f"\n--- 开始处理问题 ---\n问题: {question}")
         
@@ -812,11 +816,11 @@ class PlanAndSolveAgent:
 🧠 正在调用 xxxx 模型...
 ✅ 大语言模型响应成功:
 ```python
-["计算周一卖出的苹果数量: 15个", "计算周二卖出的苹果数量: 周一数量 × 2 = 15 × 2 = 30个", "计算周三卖出的苹果数量: 周二数量 - 5 = 30 - 5 = 25个", "计算三天总销量: 周一 + 周二 + 周三 = 15 + 30 + 25 = 70个"]
+["计算周一卖出的苹果数量: 15个", "计算周二卖出的苹果数量: 周一数量 × 2 = 15 × 2 = 30个", "计算周三卖出的苹果数量: 周二数量 - 5 = 30 - 5 = 25个", "计算三天总销量: 周一 + 周二 + 周三 = 15 + 30 + 25 = 70个"]
 ```
 ✅ 计划已生成:
 ```python
-["计算周一卖出的苹果数量: 15个", "计算周二卖出的苹果数量: 周一数量 × 2 = 15 × 2 = 30个", "计算周三卖出的苹果数量: 周二数量 - 5 = 30 - 5 = 25个", "计算三天总销量: 周一 + 周二 + 周三 = 15 + 30 + 25 = 70个"]
+["计算周一卖出的苹果数量: 15个", "计算周二卖出的苹果数量: 周一数量 × 2 = 15 × 2 = 30个", "计算周三卖出的苹果数量: 周二数量 - 5 = 30 - 5 = 25个", "计算三天总销量: 周一 + 周二 + 周三 = 15 + 30 + 25 = 70个"]
 ```
 
 --- 正在执行计划 ---
@@ -851,8 +855,8 @@ class PlanAndSolveAgent:
 
 从上面的输出日志中,我们可以清晰地看到 Plan-and-Solve 范式的工作流程:
 
-1.  <strong>规划阶段</strong>: 智能体首先调用 `Planner`,成功地将复杂的应用题分解成了一个包含四个逻辑步骤的 Python 列表。这个结构化的计划为后续的执行奠定了基础。
-2.  <strong>执行阶段</strong>: `Executor` 严格按照生成的计划,一步一步地向下执行。在每一步中,它都将历史结果作为上下文,确保了信息的正确传递(例如,步骤2正确地使用了步骤1的结果“15个”,步骤3也正确使用了步骤2的结果“30个”)。
+1.  <strong>规划阶段</strong> 智能体首先调用 `Planner`,成功地将复杂的应用题分解成了一个包含四个逻辑步骤的 Python 列表。这个结构化的计划为后续的执行奠定了基础。
+2.  <strong>执行阶段</strong> `Executor` 严格按照生成的计划,一步一步地向下执行。在每一步中,它都将历史结果作为上下文,确保了信息的正确传递(例如,步骤2正确地使用了步骤1的结果“15个”,步骤3也正确使用了步骤2的结果“30个”)。
 3.  <strong>结果</strong>:整个过程逻辑清晰,步骤明确,最终智能体准确地得出了正确答案“70个”。
 
 ## 4.4 Reflection
@@ -1025,7 +1029,7 @@ REFINE_PROMPT_TEMPLATE = """
 # 你上一轮尝试的代码:
 ```
 {last_code_attempt}
-评审员的反馈:
+评审员的反馈
 {feedback}
 
 请根据评审员的反馈,生成一个优化后的新版本代码。
@@ -1058,7 +1062,7 @@ class ReflectionAgent:
         initial_code = self._get_llm_response(initial_prompt)
         self.memory.add_record("execution", initial_code)
 
-        # --- 2. 迭代循环反思与优化 ---
+        # --- 2. 迭代循环:反思与优化 ---
         for i in range(self.max_iterations):
             print(f"\n--- 第 {i+1}/{self.max_iterations} 轮迭代 ---")
 
@@ -1102,11 +1106,11 @@ class ReflectionAgent:
 
 ````python
 --- 开始处理任务 ---
-任务: 编写一个Python函数,找出1到n之间所有的素数 (prime numbers)。
+任务 编写一个Python函数,找出1到n之间所有的素数 (prime numbers)。
 
 --- 正在进行初始尝试 ---
 🧠 正在调用 xxxxxx 模型...
-✅ 大语言模型响应成功:
+✅ 大语言模型响应成功
 ```python
 def find_primes(n):
     ...
@@ -1118,7 +1122,7 @@ def find_primes(n):
 
 -> 正在进行反思...
 🧠 正在调用 xxxxxx 模型...
-✅ 大语言模型响应成功:
+✅ 大语言模型响应成功
 当前代码的时间复杂度为O(n * sqrt(n))。虽然对于较小的n值,这种实现是可以接受的,但当n非常大时,性能会显著下降。主要瓶颈在于每个数都需要进行试除法检查,这导致了较高的时间开销。
 
 建议使用埃拉托斯特尼筛法(Sieve of Eratosthenes),该算法的时间复杂度为O(n log(log n)),能够显著提高查找素数的效率。
@@ -1133,7 +1137,7 @@ def find_primes(n):
 
 -> 正在进行优化...
 🧠 正在调用 xxxxxx 模型...
-✅ 大语言模型响应成功:
+✅ 大语言模型响应成功
 ```python
 def find_primes(n):
     ...
@@ -1145,7 +1149,7 @@ def find_primes(n):
 
 -> 正在进行反思...
 🧠 正在调用 xxxxxx 模型...
-✅ 大语言模型响应成功:
+✅ 大语言模型响应成功
 当前代码使用了Eratosthenes筛法,时间复杂度为O(n log log n),空间复杂度为O(n)。此算法在寻找1到n之间的所有素数时已经非常高效,通常情况下无需进一步优化。但在某些特定场景下,可以考虑以下改进:
 
 1. <strong>分段筛法(Segmented Sieve)</strong>:适用于n非常大但内存有限的情况。将区间分成多个小段,每段分别用筛法处理,减少内存使用。
@@ -1157,7 +1161,7 @@ def find_primes(n):
 ✅ 反思认为代码已无需改进,任务完成。
 
 --- 任务完成 ---
-最终生成的代码:
+最终生成的代码
 ```python
 def find_primes(n):
     """
@@ -1184,9 +1188,9 @@ def find_primes(n):
 ```
 ````
 
-这个运行实例展示了 Reflection 机制是如何驱动智能体进行深度优化的
+这个运行实例展示了 Reflection 机制是如何驱动智能体进行深度优化的:
 
-1. <strong>有效的“批判”是优化的前提</strong>在第一轮反思中,由于我们使用了“极其严格”且“专注于算法效率”的提示词,智能体没有满足于功能正确的初版代码,而是精准地指出了其 `O(n * sqrt(n))` 的时间复杂度瓶颈,并提出了算法层面的改进建议——埃拉托斯特尼筛法。
+1. <strong>有效的“批判”是优化的前提</strong>:在第一轮反思中,由于我们使用了“极其严格”且“专注于算法效率”的提示词,智能体没有满足于功能正确的初版代码,而是精准地指出了其 `O(n * sqrt(n))` 的时间复杂度瓶颈,并提出了算法层面的改进建议——埃拉托斯特尼筛法。
 2. <strong>迭代式改进</strong>: 智能体在接收到明确的反馈后,于优化阶段成功地实现了更高效的筛法,将算法复杂度降至 `O(n log log n)`,完成了第一次有意义的自我迭代。
 3. <strong>收敛与终止</strong>: 在第二轮反思中,智能体面对已经高效的筛法,展现出了更深层次的知识。它不仅肯定了当前算法的效率,甚至还提及了分段筛法等更高级的优化方向,但最终做出了“在一般情况下无需改进”的正确判断。这个判断触发了我们的终止条件,使优化过程得以收敛。
 
@@ -1198,19 +1202,19 @@ def find_primes(n):
 
 (1)主要成本
 
-1. <strong>模型调用开销增加</strong>这是最直接的成本。每进行一轮迭代,至少需要额外调用两次大语言模型(一次用于反思,一次用于优化)。如果迭代多轮,API 调用成本和计算资源消耗将成倍增加。
+1. <strong>模型调用开销增加</strong>:这是最直接的成本。每进行一轮迭代,至少需要额外调用两次大语言模型(一次用于反思,一次用于优化)。如果迭代多轮,API 调用成本和计算资源消耗将成倍增加。
 
-2. <strong>任务延迟显著提高</strong>Reflection 是一个串行过程,每一轮的优化都必须等待上一轮的反思完成。这使得任务的总耗时显著延长,不适合对实时性要求高的场景。
+2. <strong>任务延迟显著提高</strong>:Reflection 是一个串行过程,每一轮的优化都必须等待上一轮的反思完成。这使得任务的总耗时显著延长,不适合对实时性要求高的场景。
 
-3. <strong>提示工程复杂度上升</strong>如我们的案例所示,Reflection 的成功在很大程度上依赖于高质量、有针对性的提示词。为“执行”、“反思”、“优化”等不同阶段设计和调试有效的提示词,需要投入更多的开发精力。
+3. <strong>提示工程复杂度上升</strong>:如我们的案例所示,Reflection 的成功在很大程度上依赖于高质量、有针对性的提示词。为“执行”、“反思”、“优化”等不同阶段设计和调试有效的提示词,需要投入更多的开发精力。
 
 (2)核心收益
 
-1. <strong>解决方案质量的跃迁</strong>最大的收益在于,它能将一个“合格”的初始方案,迭代优化成一个“优秀”的最终方案。这种从功能正确到性能高效、从逻辑粗糙到逻辑严谨的提升,在很多关键任务中是至关重要的。
+1. <strong>解决方案质量的跃迁</strong>:最大的收益在于,它能将一个“合格”的初始方案,迭代优化成一个“优秀”的最终方案。这种从功能正确到性能高效、从逻辑粗糙到逻辑严谨的提升,在很多关键任务中是至关重要的。
 
-2. <strong>鲁棒性与可靠性增强</strong>通过内部的自我纠错循环,智能体能够发现并修复初始方案中可能存在的逻辑漏洞、事实性错误或边界情况处理不当等问题,从而大大提高了最终结果的可靠性。
+2. <strong>鲁棒性与可靠性增强</strong>:通过内部的自我纠错循环,智能体能够发现并修复初始方案中可能存在的逻辑漏洞、事实性错误或边界情况处理不当等问题,从而大大提高了最终结果的可靠性。
 
-综上所述,Reflection 机制是一种典型的“以成本换质量”的策略。它非常适合那些<strong>对最终结果的质量、准确性和可靠性有极高要求,且对任务完成的实时性要求相对宽松</strong>的场景。例如
+综上所述,Reflection 机制是一种典型的“以成本换质量”的策略。它非常适合那些<strong>对最终结果的质量、准确性和可靠性有极高要求,且对任务完成的实时性要求相对宽松</strong>的场景。例如:
 
 - 生成关键的业务代码或技术报告。
 - 在科学研究中进行复杂的逻辑推演。
@@ -1220,15 +1224,15 @@ def find_primes(n):
 
 ## 4.5 本章小结
 
-在本章中,以第三章掌握的大语言模型知识为基础,我们通过“亲手造轮子”的方式,从零开始编码实现了三种业界经典的智能体构建范式ReAct、Plan-and-Solve 与 Reflection。我们不仅探索了它们的核心工作原理,还通过具体的实战案例,深入了解了各自的优势、局限与适用场景。
+在本章中,以第三章掌握的大语言模型知识为基础,我们通过“亲手造轮子”的方式,从零开始编码实现了三种业界经典的智能体构建范式:ReAct、Plan-and-Solve 与 Reflection。我们不仅探索了它们的核心工作原理,还通过具体的实战案例,深入了解了各自的优势、局限与适用场景。
 
-<strong>核心知识点回顾</strong>
+<strong>核心知识点回顾:</strong>
 
-1. ReAct我们构建了一个能与外部世界交互的 ReAct 智能体。通过“思考-行动-观察”的动态循环,它成功地利用搜索引擎回答了自身知识库无法覆盖的实时性问题。其核心优势在于<strong>环境适应性</strong>和<strong>动态纠错能力</strong>,使其成为处理探索性、需要外部工具输入的任务的首选。
-2. Plan-and-Solve我们实现了一个先规划后执行的 Plan-and-Solve 智能体,并利用它解决了需要多步推理的数学应用题。它将复杂的任务分解为清晰的步骤,然后逐一执行。其核心优势在于<strong>结构性</strong>和<strong>稳定性</strong>,特别适合处理逻辑路径确定、内部推理密集的任务。
-3. Reflection (自我反思与迭代)我们构建了一个具备自我优化能力的 Reflection 智能体。通过引入“执行-反思-优化”的迭代循环,它成功地将一个效率较低的初始代码方案,优化为了一个算法上更优的高性能版本。其核心价值在于能<strong>显著提升解决方案的质量</strong>,适用于对结果的准确性和可靠性有极高要求的场景。
+1. ReAct:我们构建了一个能与外部世界交互的 ReAct 智能体。通过“思考-行动-观察”的动态循环,它成功地利用搜索引擎回答了自身知识库无法覆盖的实时性问题。其核心优势在于<strong>环境适应性</strong>和<strong>动态纠错能力</strong>,使其成为处理探索性、需要外部工具输入的任务的首选。
+2. Plan-and-Solve:我们实现了一个先规划后执行的 Plan-and-Solve 智能体,并利用它解决了需要多步推理的数学应用题。它将复杂的任务分解为清晰的步骤,然后逐一执行。其核心优势在于<strong>结构性</strong>和<strong>稳定性</strong>,特别适合处理逻辑路径确定、内部推理密集的任务。
+3. Reflection (自我反思与迭代):我们构建了一个具备自我优化能力的 Reflection 智能体。通过引入“执行-反思-优化”的迭代循环,它成功地将一个效率较低的初始代码方案,优化为了一个算法上更优的高性能版本。其核心价值在于能<strong>显著提升解决方案的质量</strong>,适用于对结果的准确性和可靠性有极高要求的场景。
 
-本章探讨的三种范式,代表了智能体解决问题的三种不同策略,如表4.1所示。在实际应用中,选择哪一种,取决于任务的核心需求
+本章探讨的三种范式,代表了智能体解决问题的三种不同策略,如表4.1所示。在实际应用中,选择哪一种,取决于任务的核心需求:
 
 <div align="center">
 <p>表 4.1 不同 Agent Loop 的选择策略</p>
@@ -1239,47 +1243,47 @@ def find_primes(n):
 
 ## 习题
 
-> <strong>提示</strong>部分习题没有标准答案,重点在于培养学习者对智能体范式设计的综合理解和实践能力。
+> <strong>提示</strong>:部分习题没有标准答案,重点在于培养学习者对智能体范式设计的综合理解和实践能力。
 
-1. 本章介绍了三种经典的智能体范式:`ReAct`、`Plan-and-Solve` 和 `Reflection`。请分析:
+1. 本章介绍了三种经典的智能体范式:`ReAct`、`Plan-and-Solve` 和 `Reflection`。请分析:
 
    - 这三种范式在"思考"与"行动"的组织方式上有什么本质区别?
    - 如果要设计一个"智能家居控制助手"(需要控制灯光、空调、窗帘等多个设备,并根据用户习惯自动调节),你会选择哪种范式作为基础架构?为什么?
    - 是否可以将这三种范式进行组合使用?若可以,请尝试设计一个混合范式的智能体架构,并说明其适用场景。
 
-2. 在4.2节的 `ReAct` 实现中,我们使用了正则表达式来解析大语言模型的输出(如 `Thought` 和 `Action`)。请思考
+2. 在4.2节的 `ReAct` 实现中,我们使用了正则表达式来解析大语言模型的输出(如 `Thought` 和 `Action`)。请思考:
 
    - 当前的解析方法存在哪些潜在的脆弱性?在什么情况下可能会失败?
    - 除了正则表达式,还有哪些更鲁棒的输出解析方案?
    - 尝试修改本章的代码,使用一种更可靠的输出格式,并对比两种方案的优缺点
 
-3. 工具调用是现代智能体的核心能力之一。基于4.2.2节的 `ToolExecutor` 设计,请完成以下扩展实践
+3. 工具调用是现代智能体的核心能力之一。基于4.2.2节的 `ToolExecutor` 设计,请完成以下扩展实践:
 
-   > <strong>提示</strong>这是一道动手实践题,建议实际编写代码
+   > <strong>提示</strong>:这是一道动手实践题,建议实际编写代码
 
    - 为 `ReAct` 智能体添加一个"计算器"工具,使其能够处理复杂的数学计算问题(如"计算 `(123 + 456) × 789/ 12 = ?` 的结果")
-   - 设计并实现一个"工具选择失败"的处理机制当智能体多次调用错误的工具或提供错误的参数时,系统应该如何引导它纠正?
-   - 思考如果可调用工具的数量增加到$50$个甚至$100$个,当前的工具描述方式是否还能有效工作?在可调用工具数量随业务需求显著增加时,从工程角度如何优化工具的组织和检索机制?
+   - 设计并实现一个"工具选择失败"的处理机制:当智能体多次调用错误的工具或提供错误的参数时,系统应该如何引导它纠正?
+   - 思考:如果可调用工具的数量增加到$50$个甚至$100$个,当前的工具描述方式是否还能有效工作?在可调用工具数量随业务需求显著增加时,从工程角度如何优化工具的组织和检索机制?
 
-4. `Plan-and-Solve` 范式将任务分解为"规划"和"执行"两个阶段。请深入分析
+4. `Plan-and-Solve` 范式将任务分解为"规划"和"执行"两个阶段。请深入分析:
 
    - 在4.3节的实现中,规划阶段生成的计划是"静态"的(一次性生成,不可修改)。如果在执行过程中发现某个步骤无法完成或结果不符合预期,应该如何设计一个"动态重规划"机制?
-   - 对比 `Plan-and-Solve` 与 `ReAct`在处理"预订一次从北京到上海的商务旅行(包括机票、酒店、租车)"这样的任务时,哪种范式更合适?为什么?
-   - 尝试设计一个"分层规划"系统先生成高层次的抽象计划,然后针对每个高层步骤再生成详细的子计划。这种设计有什么优势?
+   - 对比 `Plan-and-Solve` 与 `ReAct`:在处理"预订一次从北京到上海的商务旅行(包括机票、酒店、租车)"这样的任务时,哪种范式更合适?为什么?
+   - 尝试设计一个"分层规划"系统:先生成高层次的抽象计划,然后针对每个高层步骤再生成详细的子计划。这种设计有什么优势?
 
-5. `Reflection` 机制通过"执行-反思-优化"循环来提升输出质量。请思考
+5. `Reflection` 机制通过"执行-反思-优化"循环来提升输出质量。请思考:
 
    - 在4.4节的代码生成案例中,不同阶段使用的是同一个模型。如果使用两个不同的模型(例如,用一个更强大的模型来做反思,用一个更快的模型来做执行),会带来什么影响?
    - `Reflection` 机制的终止条件是"反馈中包含<strong>无需改进</strong>"或"达到最大迭代次数"。这种设计是否合理?能否设计一个更智能的终止条件?
    - 假设你要搭建一个"学术论文写作助手",它能够生成初稿并不断优化论文内容。请设计一个多维度的Reflection机制,从段落逻辑性、方法创新性、语言表达、引用规范等多个角度进行反思和改进。
 
-6. 提示词工程是影响智能体最终效果的关键技术。本章展示了多个精心设计的提示词模板。请分析
+6. 提示词工程是影响智能体最终效果的关键技术。本章展示了多个精心设计的提示词模板。请分析:
 
    - 对比4.2.3节的 `ReAct` 提示词和4.3.2节的 `Plan-and-Solve` 提示词,它们显然存在结构设计上的明显不同,这些差异是如何服务于各自范式的核心逻辑的?
    - 在4.4.3节的 `Reflection` 提示词中,我们使用了"你是一位极其严格的代码评审专家"这样的角色设定。尝试修改这个角色设定(如改为"你是一位注重代码可读性的开源项目维护者"),观察输出结果的变化,并总结角色设定对智能体行为的影响。
    - 在提示词中加入 `few-shot` 示例往往能显著提升模型对特定格式的遵循能力。请为本章的某个智能体尝试添加 `few-shot` 示例,并对比其效果。
 
-7. 某电商初创公司现在希望使用"客服智能体"来代替真人客服实现降本增效,它需要具备以下功能
+7. 某电商初创公司现在希望使用"客服智能体"来代替真人客服实现降本增效,它需要具备以下功能:
 
    a. 理解用户的退款申请理由
 
@@ -1291,7 +1295,7 @@ def find_primes(n):
 
    e. 如果判断决策存在一定争议(自我置信度低于阈值),能够进行自我反思并给出更审慎的建议
 
-   此时作为该产品的负责人
+   此时作为该产品的负责人:
    - 你会选择本章的哪种范式(或哪些范式的组合)作为系统的核心架构?
    - 这个系统需要哪些工具?请列出至少3个工具及其功能描述。
    - 如何设计提示词来确保智能体的决策既符合公司利益,又能保持对用户的友好态度?

+ 1065 - 0
docs/chapter5/Chapter5-Building-Agents-with-Low-Code-Platforms.md

@@ -0,0 +1,1065 @@
+<div align="right">
+  English | <a href="./第五章%20基于低代码平台的智能体搭建.md">中文</a>
+</div>
+
+# Chapter 5: Building Agents with Low-Code Platforms
+
+In the previous chapter, by writing Python code, we implemented various classic agent workflows from scratch, including ReAct, Plan-and-Solve, and Reflection. This process laid a solid technical foundation for us and gave us a deep understanding of the internal mechanisms of agents. However, for a rapidly developing field, pure code development is not always the most efficient choice, especially in scenarios where ideas need to be quickly validated or non-professional developers want to participate in building.
+
+## 5.1 The Rise of Platform-Based Construction
+
+As technology matures, we see more and more capabilities being "platformized." Just as website development has evolved from hand-writing HTML/CSS/JS to using website building platforms like WordPress and Wix, agent construction has also ushered in a wave of platformization. This chapter will focus on how to use graphical, modular low-code platforms to quickly and intuitively build, debug, and deploy agent applications, shifting our focus from "implementation details" to "business logic."
+
+### 5.1.1 Why Low-Code Platforms Are Needed
+
+"Reinventing the wheel" is crucial for deep learning, but in practical work pursuing engineering efficiency and innovation, we often need to stand on the shoulders of giants. Although we encapsulated reusable classes like `ReActAgent` and `PlanAndSolveAgent` in Chapter 4, when business logic becomes complex, the maintenance cost and development cycle of pure code will rise sharply. The emergence of low-code platforms is precisely to solve these pain points.
+
+Their core value is mainly reflected in the following aspects:
+
+1. **Lowering Technical Barriers**: Low-code platforms encapsulate complex technical details (such as API calls, state management, concurrency control) into easy-to-understand "nodes" or "modules." Users don't need to be proficient in programming; they only need to drag and connect these nodes to build powerful workflows. This enables non-technical personnel such as product managers, designers, and business experts to participate in the design and creation of agents, greatly expanding the boundaries of innovation.
+2. **Improving Development Efficiency**: For professional developers, platforms can also bring huge efficiency improvements. In the early stages of a project, when an idea needs to be quickly validated or a prototype needs to be built, using a low-code platform can complete work that would originally take days of coding in hours or even minutes. Developers can invest more energy in business logic organization and prompt engineering optimization rather than low-level engineering implementation.
+3. **Providing Better Visualization and Observability**: Compared to printing logs in the terminal, graphical platforms naturally provide end-to-end visualization of agent running trajectories. You can clearly see how data flows between each node, which link takes the longest time, and which tool call fails. This intuitive debugging experience is incomparable to pure code development.
+4. **Standardization and Best Practice Accumulation**: Excellent low-code platforms usually have many industry best practices built in. For example, they provide preset ReAct templates, optimized knowledge base retrieval engines, standardized tool integration specifications, etc. This not only prevents developers from "stepping on landmines" but also makes team collaboration smoother because everyone develops based on the same set of standards and components.
+
+In short, low-code platforms are not meant to replace code but provide a higher level of abstraction. They allow us to free ourselves from tedious low-level implementation and focus more on the logic of agent "thinking" and "action" itself, thereby turning ideas into reality faster and better.
+
+### 5.1.2 Choosing a Low-Code Platform
+
+Currently, the low-code platform market for agents and LLM applications presents a flourishing situation, with each platform having its unique positioning and advantages. Which platform to choose often depends on your core needs, technical background, and the ultimate goal of the project. In the subsequent content of this chapter, we will focus on introducing and practicing three representative platforms: Coze, Dify, and n8n. Before that, let's give them a brief introduction.
+
+**Coze**
+
+- **Core Positioning**: Launched by ByteDance, Coze<sup>[1]</sup> focuses on zero-code/low-code Agent building experience, allowing users without programming backgrounds to easily create.
+- **Feature Analysis**: Coze has an extremely friendly visual interface. Users can create agents by dragging and dropping plugins, configuring knowledge bases, and setting workflows, just like building LEGO blocks. It has a very rich plugin library built in and supports one-click publishing to mainstream platforms such as Douyin, Feishu, and WeChat Official Accounts, greatly simplifying the distribution process.
+- **Target Audience**: Entry-level users of AI applications, product managers, operations personnel, and individual creators who want to quickly turn ideas into interactive products.
+
+**Dify**
+
+- **Core Positioning**: Dify is an open-source, full-featured LLM application development and operation platform<sup>[2]</sup>, aiming to provide developers with a one-stop solution from prototype construction to production deployment.
+- **Feature Analysis**: It integrates the concepts of backend services and model operations, supporting multiple capabilities such as Agent workflows, RAG Pipeline, data annotation, and fine-tuning. For enterprise-level applications pursuing professionalism, stability, and scalability, Dify provides a solid foundation.
+- **Target Audience**: Developers with some technical background, teams that need to build scalable enterprise-level AI applications.
+
+**n8n**
+
+- **Core Positioning**: n8n is essentially an open-source workflow automation tool<sup>[3]</sup>, not a pure LLM platform. In recent years, it has actively integrated AI capabilities.
+
+- **Feature Analysis**: n8n's strength lies in "connection." It has hundreds of preset nodes that can easily connect various SaaS services, databases, and APIs into complex automated business processes. You can embed LLM nodes in this process, making it part of the entire automation chain. Although it is not as specialized in LLM functionality as the first three, its general automation capability is unique. However, its learning curve is also relatively steep.
+
+- **Target Audience**: Developers and enterprises that need to deeply integrate AI capabilities into existing business processes and achieve highly customized automation.
+
+In the following subsections, we will get hands-on experience with these platforms one by one, and more intuitively feel their respective charms through actual operations.
+
+## 5.2 Platform One: Coze
+Coze is a super cool AI agent creation tool! It is also currently the most widely used agent platform on the market. With its intuitive visual interface and rich functional modules, the platform allows users to easily create various types of agent applications, such as chatbots that can chat with you, creative machines that automatically write stories, and even directly help you turn stories into movie MVs! One of its highlights is its powerful ecosystem integration capability. Developed agents can be published to mainstream platforms such as WeChat, Feishu, and Doubao with one click, achieving seamless cross-platform deployment. For enterprise users, Coze also provides flexible API interfaces, supporting the integration of agent capabilities into existing business systems, achieving "building block-style" AI application construction.
+### 5.2.1 Functional Modules of Coze
+(1) Platform Interface Overview
+
+Overall layout introduction: Recently, Coze has updated its UI interface again, as shown in Figure 5.1. Now the leftmost sidebar is the development workspace of the Coze platform homepage, including core project development, resource library, effect evaluation, and space configuration. The area below is the supporting material space for Coze development, including official templates for one-click copying, Coze's biggest advantage - a rich and diverse plugin store, the largest agent community with a dazzling array, API management for API testing, as well as detailed tutorial documentation and general management for enterprises. On the right side, there are four templates. At the top is Coze's latest update announcement, telling you about Coze's latest progress so you can learn about the latest tools and features. Below that is the beginner tutorial. Click on it and you'll find the beginner tutorial documentation, and you can start building agents in minutes. Next are your follows and agent recommendations. Here you can also follow your favorite AI developers and bookmark their agents for your own use.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-01.png" alt="Image description" width="90%"/>
+  <p>Figure 5.1 Overall Schematic of Coze Agent Platform</p>
+</div>
+
+(2) Core Function Introduction
+
+First, we click the plus sign on the left sidebar to see the entry point for creating agents. Currently, there are two types of AI applications: one is to create agents, and the other is called applications. Among them, agents are divided into single-agent autonomous planning mode, single-agent dialogue flow mode, and multi-agent mode. AI applications are also divided into two types: not only can you design user interfaces for desktop and web, but you can also easily build interfaces for mini-programs and H5, as shown in Figure 5.2.
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-02.png" alt="Image description" width="90%"/>
+  <p>Figure 5.2 Coze Agent Creation Entry</p>
+</div>
+The project space is your agent repository, where all the agents or applications you have developed or copied are stored. It is also the place you will visit most often when developing agents in Coze, as shown in Figure 5.3.
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-03.png" alt="Image description" width="90%"/>
+  <p>Figure 5.3 Coze Agent Project Space</p>
+</div>
+The resource library is your core arsenal for developing Coze agents. The resource library stores your workflows, knowledge bases, cards, prompt libraries, and a series of other tools for developing agents. What kind of agent you can make depends first on the model's capabilities, but most importantly, it depends on how you equip the agent with "equipment and skills." The model determines the lower limit of the agent, but the Coze resource library gives you infinite upper limits for the agent's capabilities, allowing you to develop according to your own ideas, imagination, and creativity, as shown in Figure 5.4.
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-04.png" alt="Image description" width="90%"/>
+  <p>Figure 5.4 Coze Agent Resource Library</p>
+</div>
+Space configuration includes a unified management channel for agents, plugins, workflows, and publishing channels, as well as model management where you can see the various large models you call, as shown in Figure 5.5.
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-05.png" alt="Image description" width="90%"/>
+  <p>Figure 5.5 Coze Agent Publishing Channels</p>
+</div>
+If I were to make a simple summary of Coze's agent development, I would compare it to the various components of a game. The combination of each part to create wonderful agents is very much like playing a "game." Every time you complete an agent, it's like defeating a boss and gaining a lot, whether it's "experience" or "equipment."
+
+- Workflow: Level clearance route map
+- Dialogue flow: NPC dialogue clearance
+- Plugins: Character skill cards
+- Knowledge base: Game encyclopedia
+- Cards: Quick item bar
+- Prompts: Character movement keys
+- Database: "Cloud save"
+- Publishing management: Level reviewer
+- Model management: Game character library or character creation system
+- Effect evaluation: Level scoring system
+
+
+
+
+### 5.2.2 Building a "Daily AI Brief" Assistant
+
+**Case Description:** This practical case aims to deeply analyze Coze platform's plugin integration capabilities and guide readers to build a powerful "Daily AI Brief" agent from scratch. This agent can automatically capture the latest AI field headlines, academic papers, and open-source project updates from multiple information sources (including 36Kr, Huxiu, IT Home, InfoQ, GitHub, arXiv) and integrate them into a vivid and concise brief in a structured and professional manner.
+
+Through this case, you will systematically master the following core skills:
+
+  * **Multi-source Information Aggregation:** Use Coze's plugin ecosystem to achieve seamless integration of cross-platform, cross-type data flows.
+  * **Agent Behavior Definition:** Through role setting and prompt engineering, precisely control the agent's task execution and content generation to ensure output meets preset professional standards.
+  * **Automated Workflow Construction:** Learn how to link multiple steps such as data acquisition, content processing, and formatted output into an efficient, automated workflow.
+
+
+
+**Step 1: Add and Configure Information Source Plugins**
+
+The primary task of building a "Daily AI Brief" agent is to connect it to rich and authoritative information sources. On the Coze platform, this is achieved by adding and configuring corresponding plugins.
+
+1.  **Plugin Integration:** In Coze's plugin library, search for and add the required plugins. For example, subscribe to RSS feeds from media platforms through the **RSS** plugin (as shown in Figure 5.6), track open-source projects through the **GitHub** plugin (as shown in Figure 5.7), and obtain the latest academic research results through the **arXiv** plugin (as shown in Figure 5.8).
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-06.png" alt="Image description" width="90%"/>
+  <p>Figure 5.6 RSS Source Plugin for Media Platforms</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-07.png" alt="Image description" width="90%"/>
+  <p>Figure 5.7 GitHub Plugin</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-08.png" alt="Image description" width="90%"/>
+  <p>Figure 5.8 Arxiv Plugin</p>
+</div>
+
+2.  **Personalized Configuration:** Perform fine-grained configuration for each plugin to ensure it can accurately obtain the required data. For example, in the RSS plugin, enter specific RSS subscription links for websites like 36Kr and Huxiu; in the GitHub plugin, set keyword query quantities and latest update settings to be monitored; in the arXiv plugin, define keywords of interest such as "LLM," "AI," etc., and define quantities and latest update settings.
+
+```
+RSS Link Configuration
+
+- **36Kr:** https://www.36kr.com/feed
+- **Huxiu:** https://rss.huxiu.com/
+- **IT Home:** http://www.ithome.com/rss/
+- **InfoQ:** https://feed.infoq.com/ai-ml-data-eng/
+
+GitHub Plugin Configuration
+
+- q:AI
+- per_page:10
+- sort:updated
+
+Arxiv Plugin Configuration
+
+- count: 5
+- search_query: AI
+- sort_by: 2
+```
+
+3.  **Orchestration and Connection:** In the agent's visual orchestration interface, use these configured information source plugins (such as `rss_24Hbj`, `searchRepository`, `arxiv`, etc.) as data input nodes and connect them to subsequent logical processing modules (such as the **Large Model** module) to build a complete data processing path, as shown in Figure 5.9.
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-09.png" alt="Image description" width="90%"/>
+  <p>Figure 5.9 Daily AI Brief Orchestration Flowchart</p>
+</div>
+
+
+**Step 2: Set Agent Role and Prompts**
+
+Role setting and prompt writing are the core steps in defining agent behavior and output quality. This step aims to transform abstract instructions into specific tasks that the agent can understand and execute.
+
+(1) Role Setting
+
+We set the agent as a **senior and authoritative technology media editor**. This role gives the agent a clear professional positioning, enabling it to imitate the thinking mode of professional editors in subsequent content creation, performing efficient information screening, integration, and summarization.
+
+(2) Prompt Writing and Structuring
+
+Prompts are the instruction manual for the agent to execute tasks. We divide them into **System Prompt and User Prompt** to ensure instructions are clear, complete, and controllable.
+
+**System Prompt**
+
+The system prompt is used to define the agent's long-term behavioral guidelines and output format specifications.
+
+```
+# Role
+You are a senior and authoritative technology media editor, skilled at efficiently and precisely integrating and creating highly professional technology briefs, with deep analytical and integration capabilities especially in AI field technical developments, cutting-edge academic research results, and popular open-source projects.
+
+## Workflow
+### Daily Report Output Format
+1. The daily report should prominently display "AI Daily Report", "by@jasonhuang", and the current date at the beginning, for example: "AI Daily Report | September 24, 2025 | by@jasonhuang".
+2. <!!!important!!!> Add a unique Emoji symbol at the beginning of each title based on the different content of each AI technology news, each AI academic paper, and each AI open-source project.
+3. All output content must be highly relevant to AI, LLM, AIGC, large models, and other technical topics, firmly excluding any irrelevant information, advertisements, and marketing content.
+4. Must provide the original link for each item (including AI technology news, AI academic papers, AI open-source projects).
+5. Provide a brief and precise summary description for each news item or project output.
+```
+
+**User Prompt**
+
+The user prompt is used to define specific task instructions and data sources.
+
+```
+- **Information Extraction and Integration:** From input sources `{{articles}}`, `{{articles1}}`, `{{articles2}}`, and `{{articles3}}`, filter and extract article titles and corresponding links related to AI, large models, AIGC, LLM, and other topics, and organize them into the **"AI Technology News"** module.
+- **Academic Paper Summary:** From input source `{{arxiv}}`, based on fields `arxiv_title` and `arxiv_link`, summarize and organize the latest paper content to form the **"AI Academic Papers"** module.
+- **Open-Source Project Filtering:** From input source `{{GitHub}}`, filter out the **5 most prominent and influential AI open-source projects**. Extract the titles and corresponding links of these projects and organize them into the **"AI Open-Source Projects"** module.
+
+# Attention
+- Strictly follow the daily report output format defined in the system prompt.
+- The total output content should be: **10 AI technology news items, 5 AI academic papers, 5 AI open-source projects**.
+```
+
+
+
+**Step 3: Testing, Debugging, and Multi-Channel Publishing**
+
+After completing the core logic construction of the agent, rigorous testing and debugging must be performed to ensure its output meets expectations.
+
+**Run Preview:** Run the agent in Coze platform's preview interface and observe the brief content it generates.
+
+```
+# AI Daily Report by@jasonhuang 2025-09-24
+
+## 🚀 AI Technology News
+
+🤖 **Zhiyuan Robot GO-1 Universal Embodied Foundation Model Fully Open-Sourced**
+Link: https://36kr.com/p/3479085489708163?f=rss
+Summary: Zhiyuan Robot announced the full open-sourcing of its GO-1 universal embodied foundation model, providing powerful AI foundational capabilities for the robotics field.
+
+🔬 **Microsoft Overcomes Data Center Chip Cooling Bottleneck: Microfluidics + AI Precision Cooling**
+Link: https://www.ithome.com/0/885/391.htm
+Summary: Microsoft achieves precise temperature control of data center chips through the combination of microfluidic technology and AI algorithms, improving energy efficiency.
+......
+
+## 📚 AI Academic Papers
+
+🧪 **Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation**
+Link: http://arxiv.org/pdf/2509.19296v1
+Summary: Proposes an innovative framework for 3D scene generation through video diffusion model self-distillation, without requiring multi-view training data.
+
+📊 **The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review**
+Link: http://arxiv.org/pdf/2408.13430v3
+Summary: Studies the effectiveness of author self-assessment in machine learning conference review processes and proposes methods to improve review mechanisms.
+......
+
+## 💻 AI Open-Source Projects
+
+🤖 **llmling-agent - Multi-Agent Workflow Framework**
+Link: https://github.com/phil65/llmling-agent
+Summary: Multi-agent interaction framework supporting YAML configuration and programming methods, integrating MCP and ACP protocol support.
+
+🚌 **College_EV_AI_Transportation - Campus AI Electric Transportation System**
+Link: https://github.com/LuisMc2005v/College_EV_AI_Transportation
+Summary: AI-driven campus electric transportation optimization system, achieving real-time tracking and efficient carpooling services.
+......
+```
+
+Carefully check the content accuracy, format completeness, and language style of the brief. If parts are found that do not meet expectations, return to the prompt or plugin configuration stage for detailed adjustments. For example, if the content is not concise enough, modify the summarization requirements in the prompt; if data acquisition is inaccurate, check plugin configuration parameters.
+
+Multi-Channel Publishing: Coze provides the ability to publish agents to multiple mainstream application platforms (such as WeChat, Doubao, Feishu, etc.) with one click, greatly expanding the application scenarios of agents, as shown in Figure 5.10.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-10.png" alt="Image description" width="90%"/>
+  <p>Figure 5.10 Diverse Publishing Channels of Coze Platform</p>
+</div>
+
+After the agent is published, we can see the AI agent we created in the Coze store, and it can also be integrated into AI applications to provide services to users, as shown in Figures 5.11 and 5.12. Here is also the [Daily AI News Agent Experience Link](https://www.coze.cn/store/agent/7506052197071962153?bot_id=true&bid=6hkt3je8o2g16)
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-11.png" alt="Image description" width="90%"/>
+  <p>Figure 5.11 AI Agent - Daily AI News</p>
+</div>
+
+Furthermore, we can click this [experience link](https://www.coze.cn/store/project/7458678213078777893?from=store_search_suggestion&bid=6gu3cmr7k5g1i) to view Daily AI News in the AI application.
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-12.png" alt="Image description" width="90%"/>
+  <p>Figure 5.12 Daily AI News in AI Application</p>
+</div>
+**Publishing Configuration:** If you want to publish your own agent, you also need to configure an appropriate name, avatar, and welcome message for the agent before publishing to provide a more friendly user experience, as shown in Figures 5.13 and 5.14.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-13.png" alt="Image description" width="90%"/>
+  <p>Figure 5.13 Configure Basic Information for Agent</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/coze-14.png" alt="Image description" width="90%"/>
+  <p>Figure 5.14 Configure Opening Remarks and Preset Questions for Agent</p>
+</div>
+
+
+### 5.2.3 Analysis of Coze's Advantages and Limitations
+
+**Advantages:**
+
+  * **Powerful Plugin Ecosystem:** The core advantage of the Coze platform lies in its rich plugin library, which enables agents to easily access external services and data sources, achieving high extensibility of functions.
+  * **Intuitive Visual Orchestration:** The platform provides a low-threshold visual workflow orchestration interface. Users can build complex workflows through "drag and drop" without deep programming knowledge, greatly reducing development difficulty.
+  * **Flexible Prompt Control:** Through precise role setting and prompt writing, users can perform fine-grained control over agent behavior and content generation, achieving highly customized output. It also supports prompt management and templates, greatly facilitating developers in agent development.
+  * **Convenient Multi-Platform Deployment:** Supports publishing the same agent to different application platforms, achieving seamless cross-platform integration and application. Moreover, Coze is continuously integrating new platforms into its ecosystem, with more and more mobile phone manufacturers and hardware manufacturers gradually supporting the publishing of Coze agents.
+
+**Limitations:**
+
+  * **Does Not Support MCP:** I think this is the most fatal. Although Coze's plugin market is extremely rich and attractive, not supporting MCP may become a shackle limiting its development. If opened up, it will be another killer feature.
+  * **High Complexity of Some Plugin Configurations:** For plugins that require API Keys or other advanced parameters, users may need some technical background to complete correct configuration. Complex workflow orchestration is also not something that can be mastered with zero foundation; it requires some JavaScript or Python basics.
+  * **Cannot Export Orchestration JSON Files:** Previously, Coze had no export function, but now the paid version can export, but what is exported is not a JSON file like Dify or n8n, but a zip file. That is to say, you can only export from Coze and then import into Coze.
+
+
+
+## 5.3 Platform Two: Dify
+### 5.3.1 Introduction to Dify and Its Ecosystem
+
+Dify is an open-source large language model (LLM) application development platform that integrates the concepts of Backend as a Service (BaaS) and LLMOps, providing full-process support from prototype design to production deployment, as shown in Figure 5.15. It adopts a layered modular architecture, divided into data layer, development layer, orchestration layer, and foundation layer, with each layer decoupled for easy expansion.
+
+Dify is highly model-neutral and compatible: whether open-source or commercial models, users can integrate them through simple configuration and call their inference capabilities through a unified interface. It has built-in support for integration with hundreds of open-source or proprietary LLMs, covering models such as GPT, Deepseek, Llama, as well as any model compatible with the OpenAI API.
+
+At the same time, Dify supports local deployment (official Docker Compose one-click startup) and cloud deployment. Users can choose to self-deploy Dify in local/private environments (ensuring data privacy) or use the official SaaS cloud service (detailed in the business model section below). This deployment flexibility makes it suitable for enterprise intranet environments with security requirements or developer groups with operational convenience requirements.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-01.png" alt="Image description" width="90%"/>
+  <p>Figure 5.15 Dify Official Website</p>
+</div>
+
+Marketplace Plugin Ecosystem: Dify Marketplace provides one-stop plugin management and one-click deployment functionality, enabling developers to discover, extend, or submit plugins, bringing more possibilities to the community, as shown in Figure 5.16.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-02.png" alt="Image description" width="90%"/>
+  <p>Figure 5.16 Dify Marketplace Plugin Ecosystem</p>
+</div>
+Marketplace includes:
+
+
+- Models
+- Tools
+- Agent Strategies
+- Extensions
+- Bundles
+
+Currently, Dify Marketplace has over 8,677 plugins covering various functions and application scenarios. Among them, officially recommended plugins include:
+- Google Search: langgenius/google
+- Azure OpenAI: langgenius/azure_openai
+- Notion: langgenius/notion
+- DuckDuckGo: langgenius/duckduckgo
+
+
+Dify provides powerful development support for plugin developers, including remote debugging functionality that seamlessly collaborates with popular IDEs, requiring minimal environment setup. Developers can connect to Dify's SaaS service while forwarding all plugin operations to the local environment for testing. This developer-friendly approach aims to empower plugin creators and accelerate innovation in the Dify ecosystem. This is also why Dify can become one of the most successful agent platforms currently, because models can all be integrated, prompts and orchestration can be copied, but the presence and richness of tool plugins directly determine whether your agent can achieve better results or unexpectedly powerful functions.
+
+### 5.3.2 Building a Super Agent Personal Assistant
+
+In the previous Coze case, we built a daily AI brief agent. Although its function is clear, its single brief generation capability is somewhat limited. This section will use Dify to build a fully functional super agent personal assistant, covering multiple scenarios such as daily Q&A, copywriting optimization, multimodal generation, and data analysis. Before starting, let's briefly understand Dify's main interface and functional modules.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-14.png" alt="Image description" width="90%"/>
+  <p>Figure 5.17 Dify Agent Building Homepage</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-18.png" alt="Image description" width="90%"/>
+  <p>Figure 5.18 Dify Official Template Library</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-15.png" alt="Image description" width="90%"/>
+  <p>Figure 5.19 Dify Knowledge Base</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-16.png" alt="Image description" width="90%"/>
+  <p>Figure 5.20 Dify Plugin Market</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-17.png" alt="Image description" width="90%"/>
+  <p>Figure 5.21 Dify Large Model Configuration</p>
+</div>
+
+**(1) Creating Plugins and Configuring MCP**
+
+Before building the agent, necessary plugin installation and MCP configuration must be completed first. As shown in Figure 5.22, these are the core plugins required for this case.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-19.png" alt="Image description" width="90%"/>
+  <p>Figure 5.22 Dify Plugin Installation Configuration</p>
+</div>
+
+The plugins marked with red boxes in the figure need to be searched for and installed from the Dify plugin market. Users can click to view details to understand the specific functions of each plugin.
+
+Next, configure MCP (Model Context Protocol). We won't expand on the detailed principles of MCP here; we'll focus on demonstrating how to use cloud-deployed MCP services. This case uses the domestic ModelScope community MCP market for demonstration, as shown in Figure 5.23.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-20.png" alt="Image description" width="90%"/>
+  <p>Figure 5.23 ModelScope Community MCP Market</p>
+</div>
+
+Open the ModelScope community MCP market and select the hosted type. Taking Amap MCP as an example, after entering its homepage, select SSE mode on the right side and click connection configuration to generate a dedicated MCP configuration JSON, as shown in Figure 5.24. MCP supports multiple communication modes, but using SSE mode communication in Dify is smoother and more stable, so SSE mode is recommended.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-21.png" alt="Image description" width="90%"/>
+  <p>Figure 5.24 Amap MCP Configuration Example</p>
+</div>
+
+**(2) Agent Design and Effect Display**
+
+This case will create a comprehensive personal assistant covering the following functional modules:
+
+- Daily life Q&A
+- Copywriting polishing and optimization
+- Multimodal content generation (images, videos)
+- Data query and visualization analysis
+- MCP tool integration (Amap, dietary recommendations, news information)
+
+The overall agent orchestration architecture is shown in Figure 5.25.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-12.png" alt="Image description" width="90%"/>
+  <p>Figure 5.25 Agent Orchestration</p>
+</div>
+
+For the multi-agent architecture, we use a question classifier for intelligent routing. In the classifier, define the core functions and task scope for each agent to ensure user requests can be accurately distributed to the corresponding processing modules.
+
+**Daily Assistant Module**
+
+This is a basic dialogue module configured with a large language model and time tools, serving as a fallback general Q&A service.
+
+Prompt configuration:
+```
+# Role: Daily Question Consultation Expert
+
+## Profile
+- language: Chinese
+- description: Specializes in answering general questions in users' daily lives, providing practical, accurate, and easy-to-understand advice and answers
+- background: Possesses rich life experience and extensive knowledge reserves, skilled at simplifying complex problems
+- personality: Kind and friendly, patient and meticulous, pragmatic and reliable
+- expertise: Daily life, health and wellness, family management, interpersonal relationships, practical tips
+
+
+## Skills
+
+1. Problem Analysis Ability
+   - Quick Understanding: Rapidly grasp the core points of user questions
+   - Classification Recognition: Accurately judge the life domain to which the question belongs
+   - Demand Mining: Deeply understand users' potential needs
+   - Priority Sorting: Reasonably assess the importance and urgency of problems
+
+2. Answer Providing Ability
+   - Knowledge Integration: Comprehensively apply multi-domain knowledge to provide answers
+   - Solution Formulation: Provide specific and feasible solutions
+   - Step Decomposition: Break down complex problems into simple steps
+   - Alternative Solutions: Prepare multiple backup solutions for users to choose from
+
+3. Communication and Expression Ability
+   - Popular Language: Use simple and easy-to-understand everyday language
+   - Clear Logic: Organize answer content in a well-organized manner
+   - Illustrative Examples: Help understanding through specific cases
+   - Highlight Key Points: Emphasize key information and precautions
+
+## Rules
+
+1. Answer Principles:
+   - Practicality First: Ensure the advice provided is actionable
+   - Accuracy Guarantee: Give answers based on reliable information and common sense
+   - Neutral and Objective: Avoid personal bias and subjective assumptions
+   - Moderate Advice: Provide appropriate depth of answers based on problem complexity
+
+2. Code of Conduct:
+   - Timely Response: Quickly respond to users' questions
+   - Patient and Meticulous: Maintain patience with repetitive or simple questions
+   - Active Guidance: Encourage users to provide more background information
+   - Continuous Improvement: Optimize answer quality based on feedback
+
+
+## Workflows
+
+- Goal: Provide users with practical and reliable daily problem solutions
+- Step 1: Carefully read and understand the daily questions raised by users
+- Step 2: Analyze the problem type and users' potential needs
+- Step 3: Provide specific and feasible suggestions based on common sense and experience
+- Step 4: Organize answer content in easy-to-understand language
+- Step 5: Check the practicality and safety of the answer
+
+
+## Initialization
+As a daily question consultation expert, you must abide by the above Rules and execute tasks according to Workflows.
+```
+
+The effect demonstration is shown in Figure 5.26:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-03.png" alt="Image description" width="90%"/>
+  <p>Figure 5.26 Daily Assistant</p>
+</div>
+
+**Copywriting Optimization Module**
+
+According to OpenAI's data report, over 60% of users use ChatGPT for text optimization-related tasks, including polishing, modification, expansion, and abbreviation. Therefore, copywriting optimization is a high-frequency demand scenario, and we make it the second core functional module.
+
+Prompt configuration:
+```
+# I. Role Setting (Role)
+You are a professional copywriting optimization expert with rich experience in marketing copywriting and optimization, skilled at improving the attractiveness, conversion rate, and readability of copy. Your perspective is from the angle of the target audience and marketing goals, with professional boundaries limited to the copywriting optimization field, not involving technical implementation or product development.
+
+# II. Background
+The user has provided a piece of original copy that needs your optimization to improve its overall effectiveness. Background information includes: the copy may be used for marketing, brand promotion, or information communication scenarios, but the specific use is not detailed. The known condition is that the user hopes the copy is more attractive, clear, or persuasive, but has not provided the original copy content, so you need to work based on general optimization principles.
+
+# III. Task Objectives (Task)
+- Analyze and optimize the structure, language, and style of the copy to make it more in line with the preferences of the target audience.
+- Improve the attractiveness, readability, and conversion potential of the copy, ensuring clear information delivery.
+- Make adjustments according to common optimization principles (such as conciseness, emotional resonance, call to action, etc.), without content rewriting unless necessary.
+- While maintaining core information, appropriately expand and enrich copy content to provide a more comprehensive optimized version.
+
+# IV. Limitation Prompts (Limit)
+- Avoid changing the core information or intent of the original copy unless explicitly requested by the user.
+- Do not add fictional or irrelevant content, ensuring optimization is based on logic and best practices.
+- Avoid using overly technical or professional terminology unless the target audience is professionals.
+- Do not involve optimization of images, layouts, or other non-text elements.
+
+# V. Output Format Requirements (Example)
+The output should be optimized copy text with clear structure, fluent language, and substantial content. For example:
+- If the original copy is "Our product is very good, come and buy it"
+The optimized version can be: "In this era full of choices, what truly touches people's hearts is never exaggerated propaganda, but good products that can withstand the test of time and users. Our product is exactly that. It not only pays attention to details and quality in design but also continuously polishes and innovates in function, just to bring a better user experience to every user. Whether it's the texture of the appearance or the stability of performance, we always adhere to high standards and strict requirements, striving to make every customer who chooses us feel the surprise of value for money.
+We deeply understand that purchasing a product is not just a simple consumption but a choice of lifestyle. Therefore, from material selection, craftsmanship to after-sales service, we have poured full sincerity and professionalism into every link, carefully guarding your every experience. Whether you pursue practicality, value quality, or want unique personalization, our products can provide you with ideal solutions.
+Now, let us prove everything with action. A truly good product does not need too much embellishment; it itself is the best spokesperson. Act now, choose us, let quality change life, and have a different experience from now on!"
+- The output should directly present optimized content without additional explanations or annotations unless requested by the user. Please ensure that the optimized copy content is richer and more complete, and the optimized copy text must exceed 500 words.
+```
+
+The effect demonstration is shown in Figure 5.27:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-04.png" alt="Image description" width="90%"/>
+  <p>Figure 5.27 Copywriting Assistant</p>
+</div>
+
+**Multimodal Generation Module**
+
+Image and video generation is another high-frequency application scenario. With the evolution of models like Doubao image generation and Google Imagen, as well as breakthroughs in video generation technologies like Keling, Google Veo 3, and OpenAI Sora 2, the quality of multimodal content generation has reached a practical level.
+
+This case uses the Doubao plugin to implement image and video generation. Configuration steps are as follows:
+
+1. Add Doubao image/video generation plugin in the workflow
+2. Configure parameters (such as image ratio 1:1, model selection doubao seedream)
+3. Output the generated file
+
+Image generation configuration and effects are shown in Figures 5.28 and 5.29.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-13.png" alt="Image description" width="90%"/>
+  <p>Figure 5.28 Image Generation Settings</p>
+</div>
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-05.png" alt="Image description" width="90%"/>
+  <p>Figure 5.29 Image Generation Assistant</p>
+</div>
+
+The video generation effect is shown in Figure 5.30.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-06.png" alt="Image description" width="90%"/>
+  <p>Figure 5.30 Video Assistant</p>
+</div>
+
+**Data Query and Analysis Module**
+
+Data processing is one of the important capabilities of agents. This module demonstrates how to connect to a database in Dify to implement data query and visualization analysis.
+
+First, install the data query tool plugin; this case uses the `rookie-text2data` plugin. The key to data query is to provide the large model with clear table structure and field information so it can generate accurate SQL query statements. Common practices include:
+
+- Directly providing the DDL statement of the data table
+- Providing a description of the correspondence between table names and field names
+
+Configure database connection information (IP address, database name, port, account, password, etc.), as shown in Figure 5.31. Query results need to be organized through a large model node and converted into easy-to-understand natural language output.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-22.png" alt="Image description" width="90%"/>
+  <p>Figure 5.31 Database Configuration</p>
+</div>
+
+Prompt settings:
+
+```
+# I. Role Setting (Role)
+You are a professional data query specialist, skilled at data organization, with clear logical thinking and concise expression ability.
+
+# II. Background
+The user has provided raw data queried from the database. This data may have issues such as inconsistent formats, missing fields, and duplicate records, and needs professional organization before effective display.
+
+# III. Task Objectives (Task)
+1. Summarize and organize raw data
+2. Classify and sort data according to correct logic
+3. Data display highlights key information and data insights
+4. Provide easy-to-understand data display
+
+# IV. Limitation Prompts (Limit)
+1. Must not arbitrarily delete important data
+2. Avoid using overly complex or professional statistical terminology
+3. Must not tamper with the true values of raw data
+4. Avoid displaying too much redundant information, keep it concise and clear
+5. Must not leak sensitive data or personal privacy information
+
+# V. Output Format Requirements (Example)
+ Data Overview: Simply briefly explain the data content
+```
+
+The effect display is shown in Figure 5.32:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-07.png" alt="Image description" width="90%"/>
+  <p>Figure 5.32 Data Query Assistant</p>
+</div>
+
+Prompt settings:
+
+```
+# I. Role Setting (Role)
+You are a professional data analyst with data organization, cleaning, and visualization capabilities, able to extract key information from raw data and transform it into intuitive visual displays.
+
+# II. Background
+The user has queried a batch of raw data from the database. This data may contain multiple fields, missing values, or inconsistent formats, and needs to be organized before generating visualization charts.
+
+# III. Task Objectives (Task)
+# Workflow
+1. Data Analysis
+Analyze, organize, and summarize data according to reasonable rules
+2. Analysis & Visualization
+Generate at least 1 chart (choose one or more from bar / line / pie chart)
+Can call tools: "generate_pie_chart" | "generate_column_chart" | "generate_line_chart"
+
+# IV. Limitation Prompts (Limit)
+1. Avoid using overly complex chart types, ensure visualization results are easy to understand
+2. Do not ignore data quality issues, must perform necessary data cleaning
+3. Avoid using too many colors or elements in visualization, keep it concise and clear
+4. Do not omit labeling and explanation of key data
+5. Must perform summary and chart generation, regardless of data volume
+
+# V. Output Format Requirements (Example)
+Please output in the following format:
+1. Data overview summary (do not output field names, do not list points, just a short paragraph)
+2. Display generated charts
+```
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-08.png" alt="Image description" width="90%"/>
+  <p>Figure 5.33 Data Analysis Assistant</p>
+</div>
+
+The only difference in the data analysis assistant is that we added data visualization tools, namely the "generate_pie_chart" | "generate_column_chart" | "generate_line_chart" BI chart generation tool plugins. If you have installed these as required earlier, you can directly add and use them, and add corresponding descriptions like in the prompt above.
+
+**MCP Tool Integration**
+
+Finally, the integration application of MCP tools. We have already completed the MCP configuration earlier, now we will integrate it into the agent. Configuration steps are as follows:
+
+1. Select an agent strategy that supports MCP calls
+2. Select ReAct mode
+3. Configure MCP service (note to delete the `mcp-server` prefix, select SSE mode)
+4. Fill in the corresponding prompts
+
+The configuration interface is shown in Figure 5.34.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-23.png" alt="Image description" width="90%"/>
+  <p>Figure 5.34 Agent MCP Configuration</p>
+</div>
+
+The effects of Amap assistant, dietary assistant, and news assistant are shown in Figures 5.35, 5.36, and 5.37 respectively.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-09.png" alt="Image description" width="90%"/>
+  <p>Figure 5.35 Amap Assistant</p>
+</div>
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-10.png" alt="Image description" width="90%"/>
+  <p>Figure 5.36 Dietary Assistant</p>
+</div>
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-11.png" alt="Image description" width="90%"/>
+  <p>Figure 5.37 News Assistant</p>
+</div>
+
+At this point, we have completed a fully functional super agent personal assistant. This assistant covers multiple aspects of life: when you need new clothes, you can have Doubao generate designs; before going out, you can have the Amap assistant plan routes; when you don't know what to eat, you can get dietary recommendations; when you want to understand learning situations, you can perform data analysis. This agent can handle various work and life tasks, and we look forward to seeing everyone build more creative personal agent assistants.
+
+### 5.3.3 Analysis of Dify's Advantages and Limitations
+
+As a leading AI application development platform, Dify demonstrates significant advantages in multiple aspects:
+
+1. Core Advantages
+
+- Full-Stack Development Experience: Dify integrates RAG pipelines, AI workflows, model management, and other functions into one platform, providing a one-stop development experience
+- Balance Between Low-Code and High Extensibility: Dify achieves a good balance between the convenience of low-code development and the flexibility of professional development
+
+- Enterprise-Level Security and Compliance: Dify provides AES-256 encryption, RBAC permission control, and audit logs, meeting strict security and compliance requirements
+
+- Rich Tool Integration Capability: Dify supports 9000+ tools and API extensions, providing extensive functional extensibility
+- Active Open-Source Community: Dify has an active open-source community, providing rich learning resources and support
+
+
+
+2. Main Limitations
+- Steep Learning Curve: For users with no technical background at all, there is still a certain learning curve
+
+- Performance Bottlenecks: May face performance challenges in high-concurrency scenarios, requiring appropriate optimization. The core server-side components of the Dify system are implemented in Python, which has relatively poor performance compared to languages like C++, Golang, and Rust
+
+- Insufficient Multimodal Support: Currently mainly focused on text processing, with limited support for images, videos, HTML, etc.
+
+- High Enterprise Edition Cost: Dify's enterprise edition pricing is relatively high, which may exceed the budget of small teams
+
+- API Compatibility Issues: Dify's API format is not compatible with OpenAI, which may limit integration with certain third-party systems
+
+
+## 5.4 Platform Three: n8n
+
+As we introduced earlier, n8n's core identity is a general workflow automation platform, not a pure LLM application building tool. Understanding this is key to mastering n8n. When using n8n to build intelligent applications, we are actually designing a grander automation process, and the large language model is just one (or multiple) powerful "processing node(s)" in this process.
+
+### 5.4.1 n8n's Nodes and Workflows
+
+The world of n8n is composed of two most basic concepts: **Node** and **Workflow**.
+
+- **Node**: A node is the smallest unit that performs specific operations in a workflow. You can think of it as a "building block" with specific functions. n8n provides hundreds of preset nodes covering various common operations from sending emails, reading and writing databases, calling APIs to processing files. Each node has inputs and outputs and provides a graphical configuration interface. Nodes can be roughly divided into two categories:
+  - **Trigger Node**: It is the starting point of the entire workflow, responsible for initiating the process. For example, "when a new Gmail email is received," "triggered once every hour," or "when a Webhook request is received." A workflow must have one and only one trigger node.
+  - **Regular Node**: Responsible for processing specific data and logic. For example, "read Google Sheets spreadsheet," "call OpenAI model," or "insert a record in the database."
+- **Workflow**: A workflow is an automation flowchart composed of multiple connected nodes. It defines the complete path of how data starts from the trigger node, passes step by step between different nodes, is processed, and finally completes the preset task. Data is passed between nodes in structured JSON format, which allows us to precisely control the input and output of each link.
+
+
+The real power of n8n lies in its strong "connection" capability. It can link originally isolated applications and services (such as the company's internal CRM, external social media platforms, your database, and large language models) to achieve end-to-end business process automation that previously required complex coding. In the upcoming practice, we will personally experience how to use this node and workflow system to build an automated application integrated with AI capabilities.
+
+### 5.4.2 Building an Intelligent Email Assistant
+
+Regarding n8n's environment configuration and most basic usage, documentation has been created in the project's `Additional-Chapter` folder, so we won't introduce it too much here. In the previous section, we learned about the basic concepts of n8n. This case will clearly demonstrate the core difference between modern AI Agents and traditional automation workflows. Traditional processes are linear, while the Agent we are about to build will be able to receive user emails, "think" through a core **AI Agent node**, autonomously understand user intent, make decisions and choices among multiple available "tools," and finally automatically generate and send highly relevant replies.
+
+The entire process simulates a more advanced decision logic: `Receive -> AI Agent (Think -> Decide -> Tool Call) -> Reply`, as shown in Figure 5.38.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-01.png" alt="Image description" width="90%"/>
+  <p>Figure 5.38 Integrated Intelligent Email Agent Architecture Diagram</p>
+</div>
+
+Unlike the traditional method of splitting tools into multiple sub-workflows, n8n's `AI Agent` node allows us to integrate components such as large language models (LLM), memory, and tools in a unified interface, greatly simplifying the construction process.
+
+The entire construction process is divided into two core steps:
+
+1. **Prepare Agent's "Memory"**: Create an independent process to load a private knowledge base for the Agent.
+2. **Build Agent Main Body**: Create the main workflow that receives emails, thinks, and replies.
+
+### 5.4.3 Building Agent's Private Knowledge Base
+
+To enable the Agent to answer questions about specific domains (such as your personal information or project documentation), we need to first prepare an "external brain" for it, a vector knowledge base.
+
+In n8n, we can use the `Simple Vector Store` node to quickly build a knowledge base in memory. This preparation process usually only needs to be run once when updating knowledge.
+
+**(1) Define Knowledge Source**
+
+First, we use the `Code` node to store our raw knowledge text. This is a simple and quick way; in actual projects, data can also come from files, databases, etc.
+
+- **Node**: `Code`
+- **Content**: Write your knowledge in JSON format.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-02.png" alt="Screenshot of knowledge base JSON text filled in Code node" width="90%"/>
+  <p>Figure 5.39 Defining Knowledge Source in Code Node</p>
+</div>
+
+```javascript
+return [
+  {
+    "doc_id": "work-schedule-001",
+    "content": "My working hours are Monday to Friday, 9 AM to 5 PM. The timezone is Australian Eastern Standard Time (AEST)."
+  },
+  {
+    "doc_id": "off-hours-policy-001",
+    "content": "During non-working hours (including weekends and public holidays), I cannot reply to emails immediately."
+  },
+  {
+    "doc_id": "auto-reply-instruction-001",
+    "content": "If an email is received during non-working hours, the AI assistant should inform the sender that the email has been received and I will process and reply as soon as possible between 9 AM and 5 PM on the next working day."
+  }
+];
+```
+
+**(2) Text Vectorization (Embeddings)**
+
+Computers cannot directly understand text and need to convert it into vectors. We use the `Embeddings` node to complete this "translation" work.
+
+- **Node**: `Embeddings Google Gemini`, select model as `gemini-embedding-exp-03-07`. Here we use Google API for demonstration; if you don't know how to obtain Google API, you can refer to Section 5.5.3.
+- **Configuration**: Connect it after the `Code` node, and it will automatically convert the text passed from upstream into vector data.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-03.png" alt="" width="90%"/>
+  <p>Figure 5.40 Vectorizing Data in Code</p>
+</div>
+
+**(3) Store in Vector Storage**
+
+Finally, we store the vectorized knowledge in an in-memory database, as shown in Figure 5.41.
+
+- **Node**: `Simple Vector Store`
+- **Configuration**:
+  - **Operation Mode**: `Insert Documents` (write mode).
+  - **Memory Key**: Give this knowledge base a unique name, for example `my-dailytime`. This Key is equivalent to the "table name" of the database, and the Agent will use it to find information later.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-04.png" alt="" width="90%"/>
+  <p>Figure 5.41 Storing Data from Code into Vector Storage</p>
+</div>
+
+After completing the configuration, **manually execute this process once**. After success, your private knowledge is loaded into n8n's memory, as shown in Figure 5.42.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-05.png" alt="" width="90%"/>
+  <p>Figure 5.42 Complete Knowledge Base Loading Workflow</p>
+</div>
+
+### 5.4.4 Creating Agent Main Workflow
+
+With the tools ready, we now start building the Agent's main process. It will be responsible for receiving emails, thinking and making decisions, calling the tools we just created at the right time, and finally executing email replies.
+
+(1) Configure Gmail Trigger
+
+Create a new workflow named `Agent: Customer Support`. Use the `Gmail` node as a trigger, set its **Event** to `Message Received`, and configure your email account. This way, whenever a new email enters the inbox, the workflow will be automatically triggered, as shown in Figure 5.43.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-06.png" alt="" width="90%"/>
+  <p>Figure 5.43 Creating Gmail Node</p>
+</div>
+
+The configuration process can refer to [n8n official documentation](https://docs.n8n.io/integrations/builtin/credentials/google/oauth-single-service/?utm_source=n8n_app&utm_medium=credential_settings&utm_campaign=create_new_credentials_modal#enable-apis). Gmail's API is configured [here](https://console.cloud.google.com/apis/library/gmail.googleapis.com?project=apt-entropy-471905-b9). You need to create credentials, select Web application type, and finally get the required client ID and client secret. You also need to add the OAuth Redirect URL given by n8n to the authorized redirect URIs. At the same time, you also need to add your own email address in Add users in [Audience](https://console.cloud.google.com/auth/audience?project=apt-entropy-471905-b9). The final configured page is shown in Figure 5.44.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-07.png" alt="" width="90%"/>
+  <p>Figure 5.44 Gmail Account Successfully Loaded</p>
+</div>
+
+Now we can click `Fetch Test Event` to get emails, as shown in Figure 5.45!
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-08.png" alt="" width="90%"/>
+  <p>Figure 5.45 Getting Real-time Emails</p>
+</div>
+
+(2) Configure AI Agent Node
+
+This is the brain of the entire workflow. Drag an `AI Agent` node from the node menu and configure it as follows:
+
+- **Chat Model**: Connect your chosen large language model, such as `Google Gemini Chat Model`. This is the Agent's "thinking core."
+- **Memory**: Connect a `Simple Memory` node. This allows the Agent to remember previous conversation history when processing multiple back-and-forth emails under the same email thread.
+- **Tools**: We can connect multiple tools here. In our case, we connect two tools:
+  1. `SerpAPI`: This is the API we used in the Chapter 4 case, giving the Agent the ability to search for public information online.
+  2. `Simple Vector Store`: Gives the Agent the ability to query the private knowledge base we created in the first part.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-09.png" alt="" width="90%"/>
+  <p>Figure 5.46 AI Agent Node Settings</p>
+</div>
+
+This is the first step of Agent "thinking." Add a `Gemini` node (or other LLM node), set the mode to `Chat`. Our goal is to have it analyze email content and judge user intent. Prompt design is crucial; a clear instruction can make the LLM complete the task more accurately. We pass the email body and subject (`{{ $json.snippet }}{{ $json.Subject }}`) as variables into the Prompt. If you don't have an API, you can go to [Google AI Studio](https://aistudio.google.com/prompts/new_chat) and click Get API key to create an available one.
+
+For the AI Agent node, we mainly need to fill in the `User Message` and `System Message` sections, as shown in Figure 5.47.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-10.png" alt="" width="90%"/>
+  <p>Figure 5.47 AI Agent Node Details</p>
+</div>
+
+Here is the Prompt used in our case:
+
+```json
+# Prompt (User Message)
+# Context Information
+- Current Time: {{ new Date().toLocaleString('en-AU', { timeZone: 'Australia/Sydney', hour12: false }) }} (Sydney, Australia time)
+- Sender: {{ $json.From }}
+- Subject: {{ $json.Subject }}
+- Email Body: {{ $json.snippet }}
+
+# System Message
+# Role and Goal
+You are a 24/7 on-call, professional and efficient AI email assistant. Your task is: to do your best to answer all questions in emails using public information at the first opportunity, and add contextual status reminders at the beginning of replies based on my work schedule.
+
+# Context Information
+- Current Time: {{ new Date().toLocaleString('en-AU', { timeZone: 'Australia/Sydney', hour12: false }) }} (Sydney, Australia time)
+- Email information is in the input data.
+
+# Available Tools
+- Simple Vector Store2: Used to query my exact working hours (e.g., Monday to Friday, 9 AM to 5 PM).
+- SerpAPI: **[Primary Information Source]** Prioritize using this tool to search the internet to answer specific questions in emails.
+
+# Execution Steps
+1.  **Analyze the Question**: First, carefully read the email content and extract the sender's core question.
+
+2.  **Parallel Information Gathering**: Execute the following two operations simultaneously to collect information:
+    a. Use the `SerpAPI` tool to search online for answers to the sender's questions.
+    b. Use the `Simple Vector Store2` tool to get my set exact working hours.
+
+3.  **Draft Core Reply**: Based on the information collected by `SerpAPI`, clearly and directly answer the sender's question. This part will serve as the main body of the email reply.
+
+4.  **Add Status Prefix and Integrate**:
+    a. Compare "Current Time" with the working hours I obtained from the tool.
+    b. **If currently "Non-working Hours"**: Create a status reminder prefix. This prefix **must include** the specific working hours obtained from `Simple Vector Store2`.
+        * **Prefix Example**: "Hello, thank you for your email. You have contacted me during my non-working hours (my working hours are: [insert queried working hours here]). I will personally review this email on the next working day. In the meantime, here is a preliminary reply found for you based on public information:**<br><br>---<br><br>**"
+    c. **If currently "Working Hours"**: Just use a simple greeting.
+        * **Prefix Example**: "Hello, regarding your question, the reply is as follows:**<br><br>---<br><br>**"
+    d. Concatenate the generated prefix and the core reply you drafted (result of step 3) to form the final email body.
+
+5.  **Formatted Output**: You must output the finally generated email content in a strict JSON format. The format is as follows, do not add any additional explanations or text:
+    {
+      "shouldReply": true,
+      "subject": "Re: [Original Email Subject]",
+      "body": "[Here is the concatenated, complete email reply body, **all line breaks must use HTML <br> tags**]"
+    }
+
+# Rules and Restrictions
+- **Always Try to Answer First**: At any time, your primary task is to use `SerpAPI` to provide valuable replies to users.
+- **Must Declare Status**: If replying during non-working hours, you must clearly state this at the beginning of the email and attach my exact working hours.
+- **Information Sources Must Be Accurate**: Working hours must strictly follow the results of `Simple Vector Store2`; question answers mainly come from `SerpAPI`, do not fabricate information.
+- **Output Format**: **In the final output JSON, all line breaks in the `body` field must use `<br>` tags, not `\n`.**
+```
+
+(3) Configure Agent's Tools
+
+For the `Simple Vector Store` tool, we need to perform key configurations to ensure it can correctly "read" the knowledge we stored earlier:
+
+- **Operation Mode**: `Retrieve Documents (As Tool for AI Agent)` (read mode as a tool).
+- **Memory Key**: Must fill in the **exact same** Key as in the first part, i.e., `my_private_knowledge`.
+- **Embeddings**: Must use the **exact same** `Embeddings Google Gemini` model as in the first part.
+
+Only when the `Memory Key` and `Embeddings` model are completely consistent can the Agent use the correct "key" and "language" to access the knowledge base, as shown in Figure 5.48.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-11.png" alt="" width="90%"/>
+  <p>Figure 5.48 Simple Vector Store Tool Configuration</p>
+</div>
+
+The Description parameter is the description definition of the tool when the AI Agent calls it. Here is the corresponding Prompt:
+
+```json
+This is the Simple Vector Store2 tool, used to query my personal information, especially my working hours and email reply policy. When you need to determine whether it is currently working hours, or need to inform the other party when I will reply to emails, you must use this tool.
+```
+
+For Memory, the only thing to note is that here we use the thread name of each mailbox as a unique identifier to ensure storage uniqueness. The Key is set to `{{ $('Gmail').item.json.threadId }}`
+
+
+
+(4) Send Final Reply
+
+The last step is execution. Connect the output of the `AI Agent` node to a `Gmail` node, set **Operation** to `Send`. Use n8n expressions to associate the recipient, subject, and body with the corresponding fields in the JSON data output by `AI Agent` to achieve automatic email reply, as shown in Figure 5.49.
+
+- **To**: `{{ $('Gmail').item.json.From }}` (or sender field in other triggers)
+- **Subject**: `Re:  {{ $('Gmail').item.json.Subject }}`
+- **Message**: `{{ $json.output }}`
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-12.png" alt="" width="90%"/>
+  <p>Figure 5.49 Final Reply Tool Diagram</p>
+</div>
+
+And when the sending is successful, you can also receive real return email information in your personal mailbox, as shown in Figure 5.50.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-13.png" alt="" width="90%"/>
+  <p>Figure 5.50 Personal Mailbox Return Email Format</p>
+</div>
+
+At this point, an integrated intelligent customer service based on the `AI Agent` node is completed. You can send a test email to verify its work results. This architecture has extremely strong extensibility. In the future, you can directly add more tools (such as calendars, databases, CRM, etc.) to the `AI Agent` node. You only need to teach the Agent how to use them in the Prompt to continuously empower your Agent with more powerful capabilities.
+
+### 5.4.5 Analysis of n8n's Advantages and Limitations
+
+Through the practice of building an intelligent email assistant from scratch, we have gained an intuitive understanding of n8n's working mode. As a powerful low-code automation platform, n8n performs excellently in empowering Agent application development, but it is not omnipotent. As shown in Table 5.1, we will objectively analyze its advantages and potential limitations.
+
+<div align="center">
+  <p>Table 5.1 Summary of n8n Platform's Advantages and Limitations</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/n8n-14.png" alt="" width="90%"/>
+</div>
+
+First, n8n's most significant advantage lies in its **development efficiency**. It abstracts complex logic into intuitive visual workflows. Whether it's email reception, AI decision-making, tool invocation, or final reply, the entire data flow and processing chain are clear at a glance on the canvas. This low-code characteristic greatly lowers the technical threshold, allowing developers to quickly build and verify the core logic of Agents, greatly shortening the distance from idea to prototype.
+
+Second, the platform is **powerful and highly integrated**. n8n has a rich built-in node library that can easily connect hundreds of common services like Gmail and Google Gemini. More importantly, its advanced `AI Agent` node highly integrates model, memory, and tool management, allowing us to implement complex autonomous decision-making with one node, which is much more elegant and powerful than traditional multi-node manual routing. At the same time, for scenarios that built-in functions cannot cover, the `Code` node also provides the flexibility to write custom code, ensuring the upper limit of functionality.
+
+Finally, at the **deployment and operation** level, n8n supports **private deployment**, and it is currently a relatively simple private Agent solution that can deploy a complete version of the project. This is crucial for enterprises that value data security and privacy. We can deploy the entire service on our own servers to ensure that sensitive information such as internal emails and customer data does not leave our own environment, providing a solid foundation for the compliance of Agent applications.
+
+Of course, every tool has its trade-offs. While enjoying the convenience brought by n8n, we must also recognize its limitations.
+
+Behind **development efficiency** is **relatively cumbersome debugging and error handling**. When workflows become complex, once data format errors occur, developers may need to check the input and output of each node one by one to locate the problem, which is sometimes not as direct as setting breakpoints in code.
+
+In terms of functionality, the biggest limitation is reflected in its **non-persistence of built-in storage**. The `Simple Memory` and `Simple Vector Store` we used in the case are both memory-based, which means that once the n8n service restarts, all conversation history and knowledge bases will be lost. This is fatal for production environment applications. Therefore, in actual deployment, they must be replaced with external persistent databases such as Redis and Pinecone, which also increases additional configuration and maintenance costs.
+
+In addition, in terms of **deployment and operation** and team collaboration, n8n's **version control and multi-person collaboration are not as mature as traditional code**. Although workflows can be exported as JSON files for management, comparing their changes is far less clear than `git diff` code, and multiple people editing the same workflow at the same time can easily cause conflicts.
+
+Finally, regarding **performance**, n8n can fully meet the vast majority of enterprise automation and medium-to-low frequency Agent tasks. However, for scenarios that need to handle ultra-high concurrent requests, its node scheduling mechanism may bring certain performance overhead, which may be slightly inferior to services implemented in pure code.
+
+## 5.5 Chapter Summary
+
+This chapter systematically introduces the concepts, methods, and practices of building agent applications based on low-code platforms, marking our important transition from "hand-written code" to "platform-based development."
+
+In the first section, we elaborated on the background and value of the rise of low-code platforms. Compared with the purely code-implemented agents in Chapter 4, low-code platforms significantly lower the technical threshold, improve development efficiency, and provide a better visual debugging experience through graphical and modular approaches. This "higher level of abstraction" allows developers to focus their energy on business logic and prompt engineering rather than underlying implementation details.
+
+Subsequently, we deeply practiced three distinctive representative platforms:
+
+**Coze** stands out with its zero-code friendly experience and rich plugin ecosystem. Through the "Daily AI Brief" case, we experienced how to quickly integrate multi-source information through drag-and-drop configuration and publish to multiple mainstream platforms with one click. Coze is particularly suitable for non-technical background users and scenarios that need to quickly verify ideas, but its limitations of not supporting MCP and inability to export standardized configuration files are also worth noting.
+
+**Dify**, as an open-source enterprise-level platform, demonstrates full-stack development capabilities. The "Super Agent Personal Assistant" case covers multiple modules such as daily Q&A, copywriting optimization, multimodal generation, data analysis, and MCP tool integration, fully demonstrating Dify's powerful orchestration capabilities in complex business scenarios. Its rich plugin market (8000+), flexible deployment methods, and enterprise-level security features make it an ideal choice for professional developers and enterprise teams. However, the relatively steep learning curve and performance challenges in high-concurrency scenarios also need to be weighed.
+
+**n8n** opens up another path with its unique "connection" capability. Through the "Intelligent Email Assistant" case, we saw how to seamlessly embed AI capabilities into complex business automation processes. n8n's AI Agent node highly integrates models, memory, and tools, and combined with its hundreds of preset nodes, can achieve highly customized automation solutions. Its support for private deployment is particularly important for enterprises that value data security. However, the non-persistence of built-in storage and the immaturity of version control require additional engineering processing in production environments.
+
+Through the comparative practice of the three platforms, we can draw the following selection suggestions:
+- **Rapid prototype validation, non-technical users**: Prioritize Coze
+- **Enterprise-level applications, complex business logic**: Prioritize Dify
+- **Deep business integration, automation processes**: Prioritize n8n
+
+It is worth emphasizing that low-code platforms are not meant to replace code development but provide a complementary choice. In actual projects, we can flexibly switch according to the needs of different stages: use low-code platforms to quickly verify ideas, use code to achieve fine-grained control; use platforms to handle standardized processes, use code to handle special logic. This "hybrid development" mindset is the best practice for agent engineering.
+
+In the next chapter, we will further explore more underlying agent frameworks to help readers build more reliable and interesting applications.
+
+
+## Exercises
+
+1. This chapter introduces three distinctive low-code platforms: `Coze`, `Dify`, and `n8n`. Please analyze:
+
+   - What are the differences in core positioning and design philosophy among these three platforms? What pain points in agent development do they respectively solve?
+   - Low-code platforms and pure code development each have their advantages and disadvantages. In addition, there is also a "hybrid development" mode where some functions are implemented using platforms and some using code. Think about which scenarios each of the three development modes is suitable for? Please give examples.
+
+2. In the `Coze` case in Section 5.2, we built a "Daily AI Brief" agent. Please extend your thinking based on this case:
+
+   > **Tip**: This is a hands-on practice question, actual operation is recommended
+
+   - The current brief generation is passively triggered (users actively ask). How to transform this agent so that it can automatically generate briefs and push them to designated Feishu groups or WeChat official accounts at 8 AM every day?
+   - The quality of the brief highly depends on prompt design. Please try to optimize the prompt in Section 5.2.2 to make the generated brief more professional, with a clearer structure, or add new functions such as "hot spot analysis" and "trend prediction."
+   - `Coze` currently not supporting the `MCP` protocol is considered an important limitation (during the writing of the exercises, although `feature-mcp` is in the [`Coze Studio Q4 2025 Product Roadmap`](https://github.com/coze-dev/coze-studio/issues/2218), it has not yet been implemented). Please briefly describe what the `MCP` protocol is? Why is it important? If `Coze` supports `MCP` in the future, what new possibilities will it bring?
+
+3. In the `Dify` case in Section 5.3, we built a fully functional "Super Agent Personal Assistant." Please analyze in depth:
+
+   - The case uses a "question classifier" for intelligent routing, distributing different types of requests to different sub-agents. What are the advantages of this multi-agent architecture? If you don't use a classifier but let a single agent handle all tasks, what problems will you encounter?
+   - The data query module needs to provide the large model with clear table structure information. If the database has 50 tables, each with 20 fields, directly putting all `DDL` statements into the prompt will cause the context to be too long. Please design a smarter solution to solve this problem.
+   - `Dify` supports both local deployment and cloud deployment modes. Please compare the differences between these two modes in terms of data security, cost, performance, and maintenance difficulty, and explain the applicable scenarios for each.
+
+4. In the `n8n` case in Section 5.4, we built an "Intelligent Email Assistant." Please think about the following questions:
+
+   > **Tip**: This is a hands-on practice question, actual operation is recommended
+
+   - The `Simple Vector Store` and `Simple Memory` used in the case are both memory-based, and data will be lost after service restart. Please consult the `n8n` documentation, try to replace them with persistent storage solutions (such as `Pinecone`, `Redis`, etc.), and explain the configuration process.
+   - The current email assistant can only handle text emails. If the email sent by the user contains attachments (such as `PDF` documents, images), how would you extend this workflow to enable the agent to understand attachment content and make corresponding replies?
+   - The core advantage of `n8n` lies in its "connection" capability. Please design a more complex automation scenario: when a customer places an order on an e-commerce platform, automatically trigger a series of operations (send confirmation email, update inventory database, notify logistics system, record customer information in `CRM`). Please draw the node connection diagram of the workflow and explain key configurations.
+
+5. Prompt engineering is equally crucial in low-code platforms. This chapter shows multiple platform prompt design cases. Please analyze:
+
+   - Compare the prompt designs in Section 5.2.2 (`Coze`), Section 5.3.2 (`Dify`), and Section 5.4.4 (`n8n`). What are the differences in structure, style, and focus? Are these differences related to platform characteristics?
+   - In `Dify`'s "Copywriting Optimization Module," the prompt requires output "exceeding 500 words." Is this hard requirement on output length reasonable? In what situations should output length be limited, and in what situations should the model be allowed to freely express?
+
+6. Tools and plugins are the core capability extension methods of low-code platforms. Please think:
+
+   - `Coze` has a rich plugin store, `Dify` has a plugin market of 8000+, and `n8n` has hundreds of preset nodes. If none of these three platforms have a specific tool you need (such as "connecting to the company's internal system `API`"), how would you solve it?
+   - In Section 5.3.2, we used the `MCP` protocol to integrate services such as Amap and dietary recommendations. Please research and explain: What are the differences between the `MCP` protocol and traditional `RESTful API` and `Tool Calling`? Why is `MCP` called the "new standard" for agent tool invocation?
+   - Suppose you want to develop a custom plugin for `Dify` to enable it to call your company's internal knowledge base system. Please consult `Dify`'s plugin development documentation and outline the development process and key technical points.
+
+7. Platform selection is one of the key decisions for the success of agent products. Suppose you are the technical leader of a startup company, and the company plans to develop the following three AI applications. Please select the most suitable platform for each application (`Coze`, `Dify`, `n8n`, or pure code development) and explain in detail:
+
+   **Application A**: A "AI Writing Assistant" mini-program for C-end users, needs to be launched quickly to verify market demand, with a limited budget, and the team has only 1 front-end engineer and 1 product manager.
+
+   **Application B**: An "Intelligent Contract Review System" for enterprise customers, needs to handle sensitive legal documents, requires that data cannot leave the customer's private environment, and needs deep integration with the customer's existing OA system and document management system.
+
+   **Application C**: An internal "R&D Efficiency Improvement Tool," needs to automate multiple R&D process links such as code review, test report generation, bug tracking, and project progress synchronization. The team has strong technical capabilities.
+
+   For each application, please analyze from the following dimensions (including but not limited to):
+
+   > **Tip**: Whether platform capabilities meet requirements, how quickly it can be launched, development costs, operating costs, difficulty of subsequent iterations, space for future function expansion
+
+   - Technical feasibility
+   - Development efficiency
+   - Cost control
+   - Maintainability
+   - Scalability
+   - Data security and compliance
+
+## References
+
+[1] Coze - Next-generation AI application development platform. https://www.coze.cn/
+
+[2] Dify - Open-source LLM application development platform. https://dify.ai/
+
+[3] n8n - Workflow automation tool. https://n8n.io/
+

+ 5 - 1
docs/chapter5/第五章 基于低代码平台的智能体搭建.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter5-Building-Agents-with-Low-Code-Platforms.md">English</a> | 中文
+</div>
+
 # 第五章 基于低代码平台的智能体搭建
 
 在前一章中,通过编写 Python 代码,从零开始实现了 ReAct、Plan-and-Solve 和 Reflection 多种经典的智能体工作流。这个过程为我们打下了坚实的技术基础,让我们深刻理解了智能体内部的运作机理。然而,对于一个快速发展的领域而言,纯代码的开发模式并非总是最高效的选择,尤其是在需要快速验证想法、或者非专业开发者希望参与构建的场景中。
@@ -395,7 +399,7 @@ Dify 为插件开发者提供了强大的开发支持,包括远程调试功能
 整个智能体的编排架构如图5.25所示。
 
 <div align="center">
-  <img src="https://github.com/HeteroCat/hello-agents/blob/main/docs/images/5-figures/dify-12-new.png" alt="图片描述" width="90%"/>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/5-figures/dify-12.png" alt="图片描述" width="60%"/>
   <p>图 5.25 智能体编排</p>
 </div>
 

+ 1343 - 0
docs/chapter6/Chapter6-Framework-Development-Practice.md

@@ -0,0 +1,1343 @@
+<div align="right">
+  English | <a href="./第六章%20框架开发实践.md">中文</a>
+</div>
+
+# Chapter 6 Framework Development Practice
+
+In Chapter 4, we implemented the core workflows of several agents such as ReAct, Plan-and-Solve, and Reflection by writing native code. This process gave us an understanding of the internal execution logic of agents. Subsequently, in Chapter 5, we switched to the "user" perspective and experienced the convenience and efficiency brought by low-code platforms.
+
+The goal of this chapter is to explore how to use some mainstream **agent frameworks** in the industry to efficiently and standardly build reliable agent applications. We will first overview the current mainstream agent frameworks on the market, and then experience the framework-driven development model through a complete practical case for several representative frameworks.
+
+## 6.1 From Manual Implementation to Framework Development
+
+Moving from writing one-time scripts to using a mature framework is an important mental leap in the field of software engineering. The code we wrote in Chapter 4 was primarily for teaching and understanding purposes. They can complete specific tasks well, but if we want to use them to build multiple, different types of agents with complex logic, we will soon encounter bottlenecks.
+
+The essence of a framework is to provide a set of validated "specifications." It abstracts and encapsulates all the repetitive work common to all agents (such as main loops, state management, tool invocation, logging, etc.), allowing us to focus on their unique business logic when building new agents, rather than general underlying implementations.
+
+### 6.1.1 Why Agent Frameworks Are Needed
+
+Before we start the practical work, we first need to clarify why we should use frameworks. Compared to directly writing independent agent scripts, the value of using frameworks is mainly reflected in the following aspects:
+
+1. **Improve Code Reuse and Development Efficiency**: This is the most direct value. A good framework will provide a general `Agent` base class or executor that encapsulates the core loop of agent operation (Agent Loop). Whether it's ReAct or Plan-and-Solve, they can be quickly built based on standard components provided by the framework, thus avoiding repetitive work.
+2. **Achieve Decoupling and Extensibility of Core Components**: A robust agent system should consist of multiple loosely coupled modules. The framework's design will force us to separate different concerns:
+   - **Model Layer**: Responsible for interacting with large language models, can easily replace different models (OpenAI, Anthropic, local models).
+   - **Tool Layer**: Provides standardized tool definition, registration, and execution interfaces; adding new tools will not affect other code.
+   - **Memory Layer**: Handles short-term and long-term memory, can switch different memory strategies according to needs (such as sliding window, summary memory). This modular design makes the entire system highly extensible, making it simple to replace or upgrade any component.
+3. **Standardize Complex State Management**: The `Memory` class we implemented in `ReflectionAgent` is just a simple start. In real, long-running agent applications, state management is a huge challenge that needs to handle context window limitations, historical information persistence, multi-turn conversation state tracking, and other issues. A framework can provide a powerful and general state management mechanism, so developers don't have to deal with these complex issues every time.
+4. **Simplify Observability and Debugging Process**: When agent behavior becomes complex, understanding its decision-making process becomes crucial. A well-designed framework can have built-in powerful observability capabilities. For example, by introducing an event callback mechanism (Callbacks), we can automatically trigger logging or data reporting at key nodes in the agent lifecycle (such as `on_llm_start`, `on_tool_end`, `on_agent_finish`), making it easy to track and debug the complete running trajectory of the agent. This is far more efficient and systematic than manually adding `print` statements in code.
+
+Therefore, moving from manual implementation to framework development is not only a change in code organization, but also the necessary path to building complex, reliable, and maintainable agent applications.
+
+### 6.1.2 Selection and Comparison of Mainstream Frameworks
+
+The ecosystem of agent frameworks is developing at an unprecedented speed. If LangChain and LlamaIndex defined the paradigm of the first generation of general LLM application frameworks, then the new generation of frameworks is more focused on solving deep challenges in specific domains, especially **Multi-Agent Collaboration** and **Complex Workflow Control**.
+
+In the subsequent practical work of this chapter, we will focus on four frameworks that are highly representative in these cutting-edge fields: AutoGen, AgentScope, CAMEL, and LangGraph. Their design philosophies are different, representing different technical paths for implementing complex agent systems, as shown in Figure 6.1.
+
+<div align="center">
+  <p>Table 6.1 Comparison of Four Agent Frameworks</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/6-figures/01.png" alt="" width="90%"/>
+</div>
+
+
+- **AutoGen**: The core idea of AutoGen is to achieve collaboration through conversation<sup>[1]</sup>. It abstracts multi-agent systems as a group chat composed of multiple "conversable" agents. Developers can define different roles (such as `Coder`, `ProductManager`, `Tester`) and set interaction rules between them (for example, after `Coder` finishes writing code, `Tester` automatically takes over). The task-solving process is the process where these agents continuously converse, collaborate, and iterate in the group chat through automated message passing until the final goal is achieved.
+- **AgentScope**: AgentScope is a fully functional development platform designed specifically for multi-agent applications<sup>[2]</sup>. Its core features are **ease of use** and **engineering**. It provides a very friendly programming interface that allows developers to easily define agents, build communication networks, and manage the entire application lifecycle. Its built-in **message passing mechanism** and support for distributed deployment make it very suitable for building and operating complex, large-scale multi-agent systems.
+- **CAMEL**: CAMEL provides a novel collaboration method called **Role-Playing**<sup>[3]</sup>. Its core concept is that we only need to set the respective roles and common task goals for two agents (for example, `AI Researcher` and `Python Programmer`), and they can autonomously conduct multiple rounds of dialogue under the guidance of "**Inception Prompting**," inspiring and cooperating with each other to complete tasks together. It greatly reduces the complexity of designing multi-agent dialogue processes.
+- **LangGraph**: As an extension of the LangChain ecosystem, LangGraph takes a different approach by modeling the agent's execution process as a **Graph**<sup>[4]</sup>. In traditional chain structures, information can only flow in one direction. LangGraph defines each operation (such as calling LLM, executing tools) as a **Node** in the graph and uses **Edges** to define the jump logic between nodes. This design naturally supports **Cycles**, making it exceptionally simple and intuitive to implement complex workflows such as Reflection that involve iteration, correction, and self-reflection.
+
+In the following sections, we will deeply experience the framework-driven development model through a complete practical case for each of these four frameworks. **Please note** that all demonstrated project source files will be placed in the `code` folder, and only the principle part will be explained in the main text.
+
+## 6.2 Framework One: AutoGen
+
+As mentioned earlier, AutoGen's design philosophy is rooted in "driving collaboration through conversation." It cleverly maps complex task-solving processes to a series of automated conversations between agents with different roles. Based on this core concept, the AutoGen framework continues to evolve. We will use version `0.7.4` as an example because it is the latest version to date and represents an important architectural refactoring, transitioning from class inheritance design to a more flexible compositional architecture. To deeply understand and apply this framework, we first need to explain its most core constituent elements and underlying conversation interaction mechanisms.
+
+### 6.2.1 Core Mechanisms of AutoGen
+
+The release of version `0.7.4` is an important milestone in AutoGen's development, marking a fundamental innovation in the framework's underlying design. This update is not a simple addition of features but a rethinking of the overall architecture, aimed at improving the framework's modularity, concurrency performance, and developer experience.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/6-figures/02.png" alt="" width="90%"/>
+  <p>Figure 6.1 AutoGen Architecture Diagram</p>
+</div>
+
+(1) Evolution of Framework Structure
+
+As shown in Figure 6.1, the most significant change in the new architecture is the introduction of clear layering and asynchronous-first design philosophy.
+
+- **Layered Design:** The framework is split into two core modules:
+  - `autogen-core`: As the underlying foundation of the framework, it encapsulates core functions such as interaction with language models and message passing. Its existence ensures the stability and future extensibility of the framework.
+  - `autogen-agentchat`: Built on top of `core`, it provides high-level interfaces for developing conversational agent applications, simplifying the development process of multi-agent applications. This layering strategy makes each component's responsibilities clear and reduces system coupling.
+- **Asynchronous First:** The new architecture fully transitions to asynchronous programming (`async/await`). In multi-agent collaboration scenarios, network requests (such as calling LLM APIs) are the main time-consuming operations. Asynchronous mode allows the system to handle other tasks while waiting for one agent's response, thus avoiding thread blocking and significantly improving concurrent processing capabilities and system resource utilization efficiency.
+
+(2) Core Agent Components
+
+Agents are the basic units for executing tasks. In version `0.7.4`, agent design is more focused and modular.
+
+- **AssistantAgent (Assistant Agent):** This is the main task solver, whose core is encapsulating a large language model (LLM). Its responsibility is to generate logical and knowledgeable replies based on conversation history, such as proposing plans, writing articles, or writing code. Through different system messages (System Message), we can assign it different "expert" roles.
+- **UserProxyAgent (User Proxy Agent):** This is a functionally unique component in AutoGen. It plays a dual role: it is both the "spokesperson" for human users, responsible for initiating tasks and conveying intentions; and a reliable "executor" that can be configured to execute code or call tools and feed results back to other agents. This design clearly distinguishes "thinking" (completed by `AssistantAgent`) from "action."
+
+(3) From GroupChatManager to Team
+
+When tasks require multiple agents to collaborate, a mechanism is needed to coordinate the conversation process. In earlier versions, `GroupChatManager` assumed this responsibility. In the new architecture, a more flexible `Team` or group chat concept is introduced, such as `RoundRobinGroupChat`.
+
+- **Round Robin Group Chat (RoundRobinGroupChat):** This is a clear, sequential conversation coordination mechanism. It will have participating agents speak in turn according to a predefined order. This mode is very suitable for tasks with fixed processes, such as a typical software development process: the product manager first proposes requirements, then the engineer writes code, and finally the code reviewer checks.
+- **Workflow:**
+  1. First, create a `RoundRobinGroupChat` instance and add all agents participating in collaboration (such as product managers, engineers, etc.) to it.
+  2. When a task starts, the group chat will activate the corresponding agents in turn according to the preset order.
+  3. The selected agent responds based on the current conversation context.
+  4. The group chat adds the new reply to the conversation history and activates the next agent.
+  5. This process continues until the maximum number of conversation rounds is reached or preset termination conditions are met.
+
+In this way, AutoGen simplifies complex collaborative relationships into an automated "round table meeting" with a clear process that is easy to manage. Developers only need to define the role and speaking order of each team member, and the rest of the collaboration process can be autonomously driven by the group chat mechanism.
+
+In the next section, we will personally experience how to define agents with different roles in the new architecture and organize them in a group chat coordinated by `RoundRobinGroupChat` to collaboratively complete a real programming task by building an instance of a simulated software development team.
+
+### 6.2.2 Software Development Team
+
+After understanding AutoGen's core components and conversation mechanisms, this section will specifically demonstrate how to apply these new features through a complete practical case. We will build a simulated software development team composed of multiple agents with different professional skills, who will collaborate to complete a real software development task.
+
+(1) Business Objective
+
+Our goal is to develop a web application with a clear function: **display the current price of Bitcoin in real-time**. Although this task is small, it completely covers typical stages of software development: from requirement analysis, technology selection, coding implementation to code review and final testing. This makes it an ideal scenario for testing AutoGen's automated collaboration process.
+
+(2) Agent Team Roles
+
+To simulate a real software development process, we designed four agents with distinct responsibilities:
+
+- **ProductManager (Product Manager):** Responsible for transforming users' vague requirements into clear, executable development plans.
+- **Engineer:** Based on the development plan, responsible for writing specific application code.
+- **CodeReviewer (Code Reviewer):** Responsible for reviewing code submitted by engineers to ensure its quality, readability, and robustness.
+- **UserProxy (User Proxy):** Represents the end user, initiates the initial task, and is responsible for executing and verifying the final delivered code.
+
+This role division is a key step in multi-agent system design, breaking down a complex task into multiple subtasks handled by domain "experts."
+
+### 6.2.3 Core Code Implementation
+
+Below, we will analyze the core code of this automated team step by step.
+
+(1) Model Client Configuration
+
+All LLM-based agents need a model client to interact with language models. AutoGen `0.7.4` provides a standardized `OpenAIChatCompletionClient` that can conveniently interface with any model service compatible with the OpenAI API specification (including OpenAI official service, Azure OpenAI, and local model services such as Ollama, etc.).
+
+We create and configure the model client through an independent function and manage API Key and service address through environment variables. This is a good engineering practice that enhances code flexibility and security.
+
+```python
+from autogen_ext.models.openai import OpenAIChatCompletionClient
+
+def create_openai_model_client():
+    """Create and configure OpenAI model client"""
+    return OpenAIChatCompletionClient(
+        model=os.getenv("LLM_MODEL_ID", "gpt-4o"),
+        api_key=os.getenv("LLM_API_KEY"),
+        base_url=os.getenv("LLM_BASE_URL", "https://api.openai.com/v1")
+    )
+```
+
+(2) Definition of Agent Roles
+
+The core of defining agents lies in writing high-quality system messages (System Message). System messages are like setting "behavioral guidelines" and "professional knowledge bases" for agents, precisely specifying the agent's role, responsibilities, workflow, and even the way it interacts with other agents. A well-designed system message is key to ensuring that multi-agent systems can collaborate efficiently and accurately. In our software development team, we created an independent function for each role to encapsulate its definition.
+
+**Product Manager (ProductManager)**
+
+The product manager is responsible for initiating the entire process. Its system message not only defines its responsibilities but also standardizes the structure of its output and includes clear instructions to guide the conversation to the next stage (engineer).
+
+```python
+def create_product_manager(model_client):
+    """Create product manager agent"""
+    system_message = """You are an experienced product manager specializing in requirement analysis and project planning for software products.
+
+Your core responsibilities include:
+1. **Requirement Analysis**: Deeply understand user needs, identify core functions and boundary conditions
+2. **Technical Planning**: Develop clear technical implementation paths based on requirements
+3. **Risk Assessment**: Identify potential technical risks and user experience issues
+4. **Coordination and Communication**: Communicate effectively with engineers and other team members
+
+When receiving a development task, please analyze it according to the following structure:
+1. Requirement understanding and analysis
+2. Functional module division
+3. Technology selection recommendations
+4. Implementation priority sorting
+5. Acceptance criteria definition
+
+Please respond concisely and clearly, and say "Please engineer start implementation" after completing the analysis."""
+
+    return AssistantAgent(
+        name="ProductManager",
+        model_client=model_client,
+        system_message=system_message,
+    )
+```
+
+**Engineer**
+
+The engineer's system message focuses on technical implementation. It lists the engineer's technical expertise and specifies the specific action steps after receiving a task, also including instructions to guide the process to the code reviewer.
+
+```python
+def create_engineer(model_client):
+    """Create software engineer agent"""
+    system_message = """You are a senior software engineer skilled in Python development and web application construction.
+
+Your technical expertise includes:
+1. **Python Programming**: Proficient in Python syntax and best practices
+2. **Web Development**: Expert in frameworks such as Streamlit, Flask, Django
+3. **API Integration**: Rich experience in third-party API integration
+4. **Error Handling**: Focus on code robustness and exception handling
+
+When receiving a development task, please:
+1. Carefully analyze technical requirements
+2. Choose appropriate technical solutions
+3. Write complete code implementation
+4. Add necessary comments and explanations
+5. Consider boundary cases and exception handling
+
+Please provide complete runnable code and say "Please code reviewer check" after completion."""
+
+    return AssistantAgent(
+        name="Engineer",
+        model_client=model_client,
+        system_message=system_message,
+    )
+```
+
+**Code Reviewer (CodeReviewer)**
+
+The code reviewer's definition focuses on code quality, security, and standardization. Its system message details the review focus and process, ensuring a quality checkpoint before code delivery.
+
+```python
+def create_code_reviewer(model_client):
+    """Create code reviewer agent"""
+    system_message = """You are an experienced code review expert focusing on code quality and best practices.
+
+Your review focus includes:
+1. **Code Quality**: Check code readability, maintainability, and performance
+2. **Security**: Identify potential security vulnerabilities and risk points
+3. **Best Practices**: Ensure code follows industry standards and best practices
+4. **Error Handling**: Verify the completeness and rationality of exception handling
+
+Review process:
+1. Carefully read and understand code logic
+2. Check code standards and best practices
+3. Identify potential issues and improvement points
+4. Provide specific modification suggestions
+5. Evaluate overall code quality
+
+Please provide specific review comments and say "Code review completed, please user proxy test" after completion."""
+
+    return AssistantAgent(
+        name="CodeReviewer",
+        model_client=model_client,
+        system_message=system_message,
+    )
+```
+
+**User Proxy (UserProxy)**
+
+`UserProxyAgent` is a special agent that does not rely on LLM for replies but acts as a user's proxy in the system. Its `description` field clearly describes its responsibilities. Especially important is that it is responsible for issuing the `TERMINATE` instruction after the task is finally completed to normally end the entire collaboration process.
+
+```python
+def create_user_proxy():
+    """Create user proxy agent"""
+    return UserProxyAgent(
+        name="UserProxy",
+        description="""User proxy, responsible for the following duties:
+1. Propose development requirements on behalf of users
+2. Execute final code implementation
+3. Verify whether functions meet expectations
+4. Provide user feedback and suggestions
+
+Please reply TERMINATE after completing the test.""",
+    )
+```
+
+Through these four independent definition functions, we not only built a fully functional "virtual team" but also demonstrated that "prompt engineering" through system messages is a core part of designing efficient multi-agent applications.
+
+(3) Define Team Collaboration Process
+
+In this case, the software development process is relatively fixed (requirements -> coding -> review -> testing), so `RoundRobinGroupChat` (round-robin group chat) is the ideal choice. We add the four agents to the participant list in business logic order.
+
+```python
+from autogen_agentchat.teams import RoundRobinGroupChat
+from autogen_agentchat.conditions import TextMentionTermination
+
+# Define team chat and collaboration rules
+team_chat = RoundRobinGroupChat(
+    participants=[
+        product_manager,
+        engineer,
+        code_reviewer,
+        user_proxy
+    ],
+    termination_condition=TextMentionTermination("TERMINATE"),
+    max_turns=20,
+)
+```
+
+- **Participant Order:** The order of the `participants` list determines the order in which agents speak.
+- **Termination Condition:** `termination_condition` is key to controlling when the collaboration process ends. Here we set that when any message contains the keyword "TERMINATE," the conversation ends. In our design, this instruction is issued by `UserProxy` after completing the final test.
+- **Maximum Turns:** `max_turns` is a safety valve used to prevent conversations from falling into infinite loops and avoid unnecessary resource consumption.
+
+(4) Startup and Execution
+
+Since AutoGen `0.7.4` adopts an asynchronous architecture, the startup and execution of the entire collaboration process are completed in an asynchronous function and finally executed through `asyncio.run()`.
+
+```python
+async def run_software_development_team():
+    # ... Initialize client and agents ...
+
+    # Define task description
+    task = """We need to develop a Bitcoin price display application with the following specific requirements:
+            Core functions:
+            - Display Bitcoin current price in real-time (USD)
+            - Display 24-hour price change trend (percentage and amount of increase/decrease)
+            - Provide price refresh function
+
+            Technical requirements:
+            - Use Streamlit framework to create web application
+            - Simple and beautiful interface, user-friendly
+            - Add appropriate error handling and loading status
+
+            Please team collaborate to complete this task, from requirement analysis to final implementation."""
+
+    # Asynchronously execute team collaboration and stream output conversation process
+    result = await Console(team_chat.run_stream(task=task))
+    return result
+
+# Main program entry
+if __name__ == "__main__":
+    result = asyncio.run(run_software_development_team())
+```
+
+When the program runs, `task` is passed into `team_chat` as the initial message, the product manager receives the message as the first participant, and then the entire automated collaboration process begins.
+
+(5) Expected Collaboration Effect
+
+When we run this software development team, we can observe a complete collaboration process:
+
+```bash
+🔧 Initializing model client...
+👥 Creating agent team...
+🚀 Starting AutoGen software development team collaboration...
+============================================================
+---------- TextMessage (user) ----------
+We need to develop a Bitcoin price display application with the following specific requirements:
+...
+Please team collaborate to complete this task, from requirement analysis to final implementation.
+---------- TextMessage (ProductManager) ----------
+### 1. Requirement Understanding and Analysis
+...
+Please engineer start implementation.
+---------- TextMessage (Engineer) ----------
+### Technical Solution Implementation
+...
+Please code reviewer check.
+---------- TextMessage (CodeReviewer) ----------
+### Code Review
+...
+Code review completed, please user proxy test.
+---------- TextMessage (UserProxy) ----------
+Requirements completed
+---------- TextMessage (ProductManager) ----------
+Great, thank you for your feedback! If you have any questions during use, or have other functional requirements and improvement suggestions, please feel free to let us know. We will continue to provide support and improvements. Looking forward to you having a pleasant experience with our application!
+---------- TextMessage (Engineer) ----------
+Glad to hear the project was completed successfully. If you or users have any questions or need help, please feel free to contact us. Thank you for your support of our work, let's work together to ensure the application runs stably and continuously optimize user experience!
+---------- TextMessage (CodeReviewer) ----------
+Thank you very much for everyone's efforts and collaboration, which enabled the project to be completed successfully. In the future, if there are more technical support needs or areas that need improvement, we are willing to contribute to the continuous optimization of the project. Looking forward to users enjoying a smooth experience, and also welcome more feedback and suggestions. Thank you again for the team's cooperation!
+---------- TextMessage (UserProxy) ----------
+Enter your response: TERMINATE
+============================================================
+✅ Team collaboration completed!
+
+📋 Collaboration result summary:
+- Number of participating agents: 4
+- Task completion status: Success
+```
+
+The entire collaboration process demonstrates the advantages of the AutoGen framework: **natural conversation-driven collaboration**, **role specialization division**, **process automation management**, and **complete development closed loop**.
+
+### 6.2.4 Analysis of AutoGen's Advantages and Limitations
+
+Any technical framework has its specific applicable scenarios and design trade-offs. In this section, we will objectively analyze AutoGen's core advantages and the limitations and challenges it may face in practical applications.
+
+(1) Advantages
+
+- As shown in the case, we do not need to design complex state machines or control flow logic for the agent team, but naturally map a complete software development process to conversations between product managers, engineers, and reviewers. This approach is closer to the collaboration mode of human teams and significantly lowers the threshold for modeling complex tasks. Developers can focus more energy on defining "who (role)" and "what to do (responsibility)" rather than "how to do it (process control)."
+- The framework allows assigning highly specialized roles to each agent through system messages (System Message). In the case, `ProductManager` focuses on requirements, while `CodeReviewer` focuses on quality. A well-designed agent can be reused in different projects, easy to maintain and extend.
+- For process-oriented tasks, mechanisms like `RoundRobinGroupChat` provide clear, predictable collaboration processes. At the same time, the design of `UserProxyAgent` provides a natural interface for "Human-in-the-loop." It can serve as both the initiator of tasks and the supervisor and final acceptor of the process. This design ensures that automated systems are always under human supervision.
+
+(2) Limitations
+
+- Although `RoundRobinGroupChat` provides a sequential process, conversations based on LLM are inherently uncertain. Agents may produce replies that deviate from expectations, causing conversations to go in unexpected directions or even fall into loops.
+- When the work results of the agent team do not meet expectations, the debugging process can be very tricky. Unlike traditional programs, we don't get a clear error stack but a long conversation history. This is called the "conversational debugging" dilemma.
+
+(3) Configuration Supplement for Non-OpenAI Models
+
+If you want to use non-OpenAI series models (such as DeepSeek, Tongyi Qianwen, etc.), in version 0.7.4, you need to pass a model information dictionary in the parameters of `OpenAIChatCompletionClient`. Taking DeepSeek as an example:
+
+```python
+from autogen_ext.models.openai import OpenAIChatCompletionClient
+
+model_client = OpenAIChatCompletionClient(
+    model="deepseek-chat",
+    api_key=os.getenv("DEEPSEEK_API_KEY"),
+    base_url="https://api.deepseek.com/v1",
+    model_info={
+        "function_calling": True,
+        "max_tokens": 4096,
+        "context_length": 32768,
+        "vision": False,
+        "json_output": True,
+        "family": "deepseek",
+        "structured_output": True,
+    }
+)
+```
+
+This `model_info` dictionary helps AutoGen understand the model's capability boundaries, thereby better adapting to different model services.
+
+
+
+## 6.3 Framework Two: AgentScope
+
+If AutoGen's design philosophy is "driving collaboration through conversation," then AgentScope represents another technical path: **engineering-first multi-agent platform**. AgentScope, developed by Alibaba DAMO Academy, is specifically designed for building large-scale, highly reliable multi-agent applications. It not only provides an intuitive and easy-to-use programming interface but, more importantly, has built-in enterprise-level features such as distributed deployment, fault recovery, and observability, making it particularly suitable for building production environment applications that need to run stably for a long time.
+
+### 6.3.1 Design of AgentScope
+
+Compared with AutoGen, the core difference of AgentScope lies in its **message-driven architectural design** and **industrial-grade engineering practices**. If AutoGen is more like a flexible "conversation studio," then AgentScope is a complete "agent operating system," providing developers with full lifecycle support from development, testing to deployment. Unlike the inheritance-based design adopted by many frameworks, AgentScope chooses **compositional architecture** and **message-driven mode**. This design not only enhances the modularity of the system but also lays the foundation for its excellent concurrency performance and distributed capabilities.
+
+(1) Layered Architecture System
+
+As shown in Figure 6.2, AgentScope adopts a clear layered modular design, forming a complete agent development ecosystem from bottom-level basic components to top-level application orchestration.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/6-figures/03.png" alt="" width="90%"/>
+  <p>Figure 6.2 AgentScope Architecture Diagram</p>
+</div>
+
+In this architecture, the bottom layer is the **Foundational Components** layer, which provides core building blocks for the entire framework. The `Message` component defines a unified message format, supporting everything from simple text interaction to complex multimodal content; the `Memory` component provides short-term and long-term memory management; the `Model API` layer abstracts calls to different large language models; and the `Tool` component encapsulates the agent's ability to interact with the external world.
+
+Above the basic components, the **Agent-level Infrastructure** layer provides higher-level abstractions. This layer not only includes various pre-built agents (such as browser-using agents, deep research agents) but also implements the classic ReAct paradigm, supporting advanced features such as agent hooks, parallel tool calling, and state management. Particularly noteworthy is that this layer natively supports **asynchronous execution and real-time control**, which is an important advantage of AgentScope compared to other frameworks.
+
+The **Multi-Agent Cooperation** layer is where AgentScope's core innovation lies. `MsgHub` serves as the message center, responsible for message routing and state management between agents; while the `Pipeline` system provides flexible workflow orchestration capabilities, supporting various execution modes such as sequential and concurrent. This design allows developers to easily build complex multi-agent collaboration scenarios.
+
+The top **Deployment & Development** layer reflects AgentScope's emphasis on engineering. `AgentScope Runtime` provides a production-grade runtime environment, while `AgentScope Studio` provides developers with a complete visual development toolchain.
+
+(2) Message-Driven
+
+AgentScope's core innovation lies in its **message-driven architecture**. In this architecture, all agent interactions are abstracted as the sending and receiving of **messages**, rather than traditional function calls.
+
+```python
+from agentscope.message import Msg
+
+# Standard structure of message
+message = Msg(
+    name="Alice",           # Sender name
+    content="Hello, Bob!",  # Message content
+    role="user",           # Role type
+    metadata={             # Metadata information
+        "timestamp": "2024-01-15T10:30:00Z",
+        "message_type": "text",
+        "priority": "normal"
+    }
+)
+```
+
+Using messages as the basic unit of interaction brings several key advantages:
+
+- **Asynchronous Decoupling**: The sender and receiver of messages are decoupled in time, without needing to wait for each other, naturally supporting high-concurrency scenarios.
+- **Location Transparency**: Agents do not need to care whether another agent is in a local process or on a remote server; the message system automatically handles routing.
+- **Observability**: Every message can be logged, tracked, and analyzed, greatly simplifying debugging and monitoring of complex systems.
+- **Reliability**: Messages can be persistently stored and retried. Even if the system fails, it can ensure the eventual consistency of interactions, improving the system's fault tolerance.
+
+(3) Agent Lifecycle Management
+
+In AgentScope, each agent has a clear lifecycle (initialization, running, pausing, destruction, etc.) and is implemented based on a unified base class `AgentBase`. Developers usually only need to focus on its core `reply` method.
+
+```python
+from agentscope.agents import AgentBase
+
+class CustomAgent(AgentBase):
+    def __init__(self, name: str, **kwargs):
+        super().__init__(name=name, **kwargs)
+        # Agent initialization logic
+
+    def reply(self, x: Msg) -> Msg:
+        # Agent's core response logic
+        response = self.model(x.content)
+        return Msg(name=self.name, content=response, role="assistant")
+
+    def observe(self, x: Msg) -> None:
+        # Agent's observation logic (optional)
+        self.memory.add(x)
+```
+
+This design pattern separates the agent's internal logic from external communication. Developers only need to define how the agent "thinks and responds" in the `reply` method.
+
+(4) Message Passing Mechanism
+
+AgentScope has a built-in **Message Center (MsgHub)**, which is the hub of the entire message-driven architecture. MsgHub is not only responsible for message routing and distribution but also integrates advanced functions such as persistence and distributed communication. It has the following characteristics:
+
+- **Flexible Message Routing**: Supports multiple communication modes such as point-to-point, broadcast, and multicast, and can build flexible and complex interaction networks.
+- **Message Persistence**: Can automatically save all messages to databases (such as SQLite, MongoDB), ensuring that the state of long-running tasks can be recovered.
+- **Native Distributed Support**: This is a signature feature of AgentScope. Agents can be deployed on different processes or servers, and `MsgHub` will automatically handle cross-node communication through RPC (Remote Procedure Call), completely transparent to developers.
+
+These engineering capabilities provided by the underlying architecture make AgentScope more advantageous than traditional conversation-driven frameworks when handling complex application scenarios that require high concurrency and high reliability. Of course, this also requires developers to understand and adapt to the asynchronous programming paradigm of message-driven.
+
+In the next section, we will deeply experience the capabilities of the AgentScope framework through a specific practical case, the Three Kingdoms Werewolf game, especially its advantages in handling concurrent interactions.
+
+### 6.3.2 Three Kingdoms Werewolf Game
+
+To deeply understand AgentScope's message-driven architecture and multi-agent collaboration capabilities, we will build a "Three Kingdoms Werewolf" game that integrates Chinese classical cultural elements. This case not only demonstrates AgentScope's advantages in handling complex multi-agent interactions but, more importantly, demonstrates how to fully leverage the power of message-driven architecture in a scenario that requires **real-time collaboration**, **role-playing**, and **strategic gaming**. Unlike traditional Werewolf, our "Three Kingdoms Werewolf" introduces classic characters such as Liu Bei, Guan Yu, and Zhuge Liang into the game. Each agent not only has to complete the basic tasks of Werewolf (such as werewolf killing, seer verification, villager reasoning) but also embodies the personality traits and behavior patterns of the corresponding Three Kingdoms characters. This design allows us to observe AgentScope's performance in handling **multi-level role modeling**.
+
+(1) Architecture Design and Core Components
+
+The system design of this case follows the principle of layered decoupling, dividing the game logic into three independent levels, each of which maps to one or more core components of AgentScope:
+
+- **Game Control Layer**: A `ThreeKingdomsWerewolfGame` class serves as the main controller of the game, responsible for maintaining global state (such as player survival list, current game stage), advancing the game process (calling night phase, day phase), and judging victory or defeat.
+- **Agent Interaction Layer**: Completely driven by `MsgHub`. All communication between agents, whether it's secret negotiations between werewolves or public debates during the day, is routed and distributed through the message center.
+- **Role Modeling Layer**: Each player is an instance based on `DialogAgent`. Through carefully designed system prompts, we inject each agent with the dual identity of "game role" and "Three Kingdoms personality."
+
+(2) Message-Driven Game Flow
+
+The core design of this case is to use **message-driven** instead of **state machine** to manage the game flow. In traditional implementations, game phase transitions are usually controlled by a centralized state machine. In the AgentScope paradigm, the game flow is naturally modeled as a series of well-defined message interaction patterns.
+
+For example, the implementation of the werewolf phase is not a simple function call but dynamically creates a temporary, private communication channel that only includes werewolf players through `MsgHub`:
+
+```python
+async def werewolf_phase(self, round_num: int):
+    """Werewolf phase - demonstrating message-driven collaboration mode"""
+    if not self.werewolves:
+        return None
+
+    # Establish werewolf-exclusive communication channel through message center
+    async with MsgHub(
+        self.werewolves,
+        enable_auto_broadcast=True,
+        announcement=await self.moderator.announce(
+            f"Werewolves, please discuss tonight's kill target. Surviving players: {format_player_list(self.alive_players)}"
+        ),
+    ) as werewolves_hub:
+        # Discussion phase: werewolves exchange strategies through messages
+        for _ in range(MAX_DISCUSSION_ROUND):
+            for wolf in self.werewolves:
+                await wolf(structured_model=DiscussionModelCN)
+
+        # Voting phase: collect and count werewolves' kill decisions
+        werewolves_hub.set_auto_broadcast(False)
+        kill_votes = await fanout_pipeline(
+            self.werewolves,
+            msg=await self.moderator.announce("Please choose kill target"),
+            structured_model=WerewolfKillModelCN,
+            enable_gather=False,
+        )
+```
+
+The advantage of this design is that game logic is clearly expressed as "in a specific context, what mode of message exchange to conduct," rather than a series of rigid state transitions. Day discussion (full broadcast), seer verification (point-to-point request), and other phases all follow the same design paradigm.
+
+(3) Constraining Game Rules with Structured Output
+
+A key challenge in Werewolf games is how to ensure that agent behavior conforms to game rules. AgentScope's **structured output mechanism** provides a solution to this problem. We define strict data models for different game behaviors:
+
+```python
+class DiscussionModelCN(BaseModel):
+    """Output format for discussion phase"""
+    reach_agreement: bool = Field(
+        description="Whether consensus has been reached",
+        default=False
+    )
+    confidence_level: int = Field(
+        description="Confidence level in current reasoning (1-10)",
+        ge=1, le=10,
+        default=5
+    )
+    key_evidence: Optional[str] = Field(
+        description="Key evidence supporting your viewpoint",
+        default=None
+    )
+
+class WitchActionModelCN(BaseModel):
+    """Output format for witch action"""
+    use_antidote: bool = Field(description="Whether to use antidote")
+    use_poison: bool = Field(description="Whether to use poison")
+    target_name: Optional[str] = Field(description="Poison target player name")
+```
+
+In this way, we not only ensure **format consistency** of agent output but, more importantly, achieve **automated constraint of game rules**. For example, the witch agent cannot use both antidote and poison on the same target at the same time, and the seer can only verify one player per night. These constraints are automatically executed through field definitions and validation logic of data models.
+
+(4) Dual Challenge of Role Modeling
+
+In this case, the most interesting technical challenge is how to make agents play two levels of roles well at the same time: **game functional role** (werewolf, seer, etc.) and **cultural personality role** (Liu Bei, Cao Cao, etc.). We solve this problem through prompt engineering:
+
+```python
+def get_role_prompt(role: str, character: str) -> str:
+    """Get role prompt - integrating game rules and character personality"""
+    base_prompt = f"""You are {character}, playing {role} in this Three Kingdoms Werewolf game.
+
+Important rules:
+1. You can only participate in the game through dialogue and reasoning
+2. Do not attempt to call any external tools or functions
+3. Strictly reply in the required JSON format
+
+Role characteristics:
+"""
+
+    if role == "Werewolf":
+        return base_prompt + f"""
+- You are in the werewolf camp, with the goal of eliminating all good people
+- At night, you can negotiate with other werewolves on kill targets
+- During the day, you must hide your identity and mislead good people
+- Speak and act with {character}'s personality
+"""
+```
+
+This design allows us to observe an interesting phenomenon: different Three Kingdoms characters, when playing the same game role, will exhibit completely different strategies and speech styles. For example, "Cao Cao" playing a werewolf may appear more cunning and good at disguise, while "Zhang Fei" playing a werewolf may appear more direct and impulsive.
+
+(5) Concurrent Processing and Fault Tolerance Mechanism
+
+AgentScope's asynchronous architecture plays an important role in this multi-agent game. The game often has scenarios that require **simultaneously collecting decisions from multiple agents**, such as the voting phase:
+
+```python
+# Collect voting decisions from all players in parallel
+vote_msgs = await fanout_pipeline(
+    self.alive_players,
+    await self.moderator.announce("Please vote to choose the player to eliminate"),
+    structured_model=get_vote_model_cn(self.alive_players),
+    enable_gather=False,
+)
+```
+
+`fanout_pipeline` allows us to send the same message to all agents in parallel and asynchronously collect their responses. This not only improves the execution efficiency of the game but, more importantly, simulates the "simultaneous voting" scenario in real Werewolf games. At the same time, we add fault tolerance handling at key points:
+
+```python
+try:
+    response = await wolf(
+        "Please analyze the current situation and express your viewpoint.",
+        structured_model=DiscussionModelCN
+    )
+except Exception as e:
+    print(f"⚠️ {wolf.name} error during discussion: {e}")
+    # Create default response to ensure game continues
+    default_response = DiscussionModelCN(
+        reach_agreement=False,
+        confidence_level=5,
+        key_evidence="Unable to analyze temporarily"
+    )
+```
+
+This design ensures that even if an agent encounters an exception, the entire game process can continue.
+
+(6) Case Output and Summary
+
+To more intuitively experience AgentScope's operating mechanism, the following is a real running log excerpt from the game's night phase, showing the process of two werewolf agents playing "Sun Quan" and "Zhou Yu" conducting secret negotiations and executing a kill.
+
+```
+🎮 Welcome to Three Kingdoms Werewolf!
+
+=== Game Initialization ===
+Game Moderator: 📢 【Sun Quan】You are playing a werewolf in this Three Kingdoms Werewolf game, your character is Sun Quan. You can kill a player at night
+Game Moderator: 📢 【Zhou Yu】You are playing a werewolf in this Three Kingdoms Werewolf game, your character is Zhou Yu. You can kill a player at night
+...
+
+Game Moderator: 📢 Three Kingdoms Werewolf game begins! Participants: Sun Quan, Zhou Yu, Cao Cao, Zhang Fei, Sima Yi, Zhao Yun
+✅ Game setup complete, 6 players in total
+
+=== Round 1 ===
+🌙 Night 1 falls, everyone close your eyes...
+
+【Werewolf Phase】
+Game Moderator: 📢 🐺 Werewolves please open your eyes, choose tonight's kill target...
+Game Moderator: 📢 Werewolves, please discuss tonight's kill target. Surviving players: Sun Quan, Zhou Yu, Cao Cao, Zhang Fei, Sima Yi, Zhao Yun
+
+Sun Quan: Tonight we should eliminate Zhou Yu, this person is extremely intelligent and poses a great threat to us.
+Zhou Yu: Sun Quan, you make a good point. But although Zhou Yu is intelligent, he may not be the biggest threat tonight. Cao Cao has great power, if we don't eliminate him early, it may be unfavorable to us.
+Sun Quan: Cao Cao is indeed a threat, but if Zhou Yu lives, he can see through our schemes. Better to solve the immediate hidden danger first.
+Zhou Yu: Sun Quan, your concern is not without reason. But if Cao Cao becomes our enemy, he can unite other forces to pose a greater threat to us.
+Sun Quan: What you say also makes sense, Cao Cao's alliance is indeed troublesome. Then let's deal with Cao Cao first.
+Zhou Yu: Very good, Sun Quan. Cao Cao is our primary target tonight.
+
+Game Moderator: 📢 Please choose kill target
+Sun Quan: I agree, Cao Cao must be eliminated.
+Zhou Yu: I agree, Cao Cao is the target we need to solve tonight.
+
+【Seer Phase】
+Game Moderator: 📢 🔮 Seer please open your eyes, choose the player to verify...
+Cao Cao: I want to verify Sun Quan.
+Game Moderator: 📢 Verification result: Sun Quan is a werewolf
+
+【Witch Phase】
+Game Moderator: 📢 🧙‍♀️ Witch please open your eyes...
+Game Moderator: 📢 Tonight Cao Cao was killed by werewolves
+Zhang Fei: I used the antidote to save Cao Cao last night, now the antidote has been used up.
+Game Moderator: 📢 You used the antidote to save Cao Cao
+
+Game Moderator: 📢 Last night was peaceful, no one died.
+
+【Day Discussion Phase】
+Game Moderator: 📢 ☀️ Day 1 dawns, everyone please open your eyes...
+Game Moderator: 📢 Now begin free discussion. Surviving players: Sun Quan, Zhou Yu, Cao Cao, Zhang Fei, Sima Yi, Zhao Yun
+
+Sun Quan: Everyone, Cao Cao has great power and is a potential threat to us all. Tonight I suggest we concentrate our forces against him.
+Zhou Yu: Sun Quan is absolutely right, Cao Cao is not only powerful himself but may also ally with other players, posing a greater threat to us.
+Cao Cao: I verified Sun Quan last night, originally thought he was a good person, but the game moderator's result was werewolf. This means there's a werewolf lying.
+Zhang Fei: I did save Cao Cao last night, indicating he was attacked by werewolves. But Cao Cao's verification result of Sun Quan is suspicious.
+Sima Yi: There seems to be a contradiction between Cao Cao's verification result and Zhang Fei's saving action, we need more information to determine who is the werewolf.
+Zhao Yun: The situation is indeed complex, we need to carefully analyze everyone's statements.
+
+【Voting Phase】
+Game Moderator: 📢 Please vote to choose the player to eliminate
+
+Sun Quan: Cao Cao's threat is still great, I insist we should vote for him.
+Zhou Yu: Based on last night's situation, I think we should vote for Cao Cao first, his threat is the greatest.
+Cao Cao: I choose to vote for Sun Quan. According to the game moderator's feedback, Sun Quan is indeed a werewolf.
+Zhang Fei: I insist on the fact that I saved Cao Cao last night, but the fact that Sun Quan was found to be a werewolf confuses me.
+Sima Yi: We need more information to determine who is the werewolf.
+
+[Game continues...]
+```
+
+### 6.3.3 Analysis of AgentScope's Advantages and Limitations
+
+Through this "Three Kingdoms Werewolf" case, we deeply experienced the core advantages of the AgentScope framework. The framework, with its message-driven architecture as the core, elegantly maps complex game processes into a series of concurrent, asynchronous message passing events, thereby avoiding the rigidity and complexity of traditional state machines. Combined with its powerful structured output capability, we directly transform game rules into code-level constraints, greatly improving system stability and predictability. This design paradigm not only demonstrates its native concurrency advantages in performance but also ensures that even if a single agent encounters an exception, the overall process can run robustly in fault tolerance handling.
+
+However, AgentScope's engineering advantages also bring a certain complexity cost. Although its message-driven architecture is powerful, it has high technical requirements for developers, requiring understanding of asynchronous programming, distributed communication, and other concepts. For simple multi-agent conversation scenarios, this architecture may seem overly complex, with the risk of "over-engineering." In addition, as a relatively new framework, its ecosystem and community resources still need further improvement. Therefore, AgentScope is more suitable for building large-scale, highly reliable production-level multi-agent systems, while for rapid prototype development or simple application scenarios, choosing a more lightweight framework may be more appropriate.
+
+
+
+## 6.4 Framework Three: CAMEL
+
+Unlike comprehensive frameworks like AutoGen and AgentScope, CAMEL's original core goal is to explore how to enable two agents to autonomously collaborate to solve complex tasks through "role-playing" with minimal human intervention.
+
+### 6.4.1 Autonomous Collaboration in CAMEL
+
+The cornerstone of CAMEL's autonomous collaboration is two core concepts: **Role-Playing** and **Inception Prompting**.
+
+(1) Role-Playing
+
+In CAMEL's original design, a task is usually completed by two agents collaborating. These two agents are assigned complementary, clearly defined "roles." One plays the **"AI User"**, responsible for proposing requirements, issuing instructions, and conceiving task steps; the other plays the **"AI Assistant"**, responsible for executing specific operations and providing solutions based on instructions.
+
+For example, in a task to "develop a stock trading strategy analysis tool":
+
+- The **AI User** role might be a "senior stock trader." It understands the market and strategies but doesn't understand programming.
+- The **AI Assistant** role is an "excellent Python programmer." It is proficient in programming but knows nothing about stock trading.
+
+Through this setup, the task-solving process is naturally transformed into a conversation between two "cross-domain experts." The trader proposes professional requirements, the programmer transforms them into code implementation, and the two collaborate to complete complex tasks that neither could accomplish independently.
+
+(2) Inception Prompting
+
+Simply setting roles is not enough. How can we ensure that two AIs can always "stay in their roles" and efficiently move toward a common goal without continuous human supervision? This is where CAMEL's core technology, inception prompting, comes into play. "Inception prompting" is a carefully designed, structured initial instruction (System Prompt) injected into both agents before the conversation begins. This instruction is like an "action program" implanted in the agents, and it usually includes the following key parts:
+
+- **Clarify own role**: For example, "You are a senior stock trader..."
+- **Inform collaborator's role**: For example, "You are working with an excellent Python programmer..."
+- **Define common goal**: For example, "Your common goal is to develop a stock trading strategy analysis tool."
+- **Set behavioral constraints and communication protocols**: This is the most critical part. For example, the instruction will require the AI user to "propose only one clear, specific step at a time" and require the AI assistant to "not ask for more details before completing the previous step," while also specifying that both parties need to use specific markers (such as `<SOLUTION>`) at the end of their replies to identify task completion.
+
+These constraints ensure that the conversation does not deviate from the topic or fall into ineffective loops but advances in a highly structured, task-driven manner, as shown in Figure 6.3.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/6-figures/04.png" alt="" width="90%"/>
+  <p>Figure 6.3 CAMEL Creating Stock Trading Robot</p>
+</div>
+
+In the next section, we will experience this process through a specific example.
+
+### 6.4.2 AI Popular Science E-book
+
+To understand CAMEL framework's role-playing capabilities, we will build a practical collaborative case: having an AI psychologist work with an AI author to co-create a short e-book on "The Psychology of Procrastination." This case embodies CAMEL's core advantage of allowing two agents to leverage their respective professional domains to collaboratively complete complex creative tasks that a single agent would struggle with.
+
+(1) Task Setup
+
+**Scenario Setup**: Create a popular science e-book on the psychology of procrastination for general readers, requiring both scientific rigor and good readability.
+
+**Agent Roles**:
+
+- **Psychologist**: Possesses deep theoretical foundation in psychology, familiar with cognitive behavioral science, neuroscience, and other related fields, able to provide professional academic insights and empirical research support
+- **Writer**: Has excellent writing skills and narrative ability, good at transforming complex academic concepts into vivid and easy-to-understand text, focusing on reader experience and content readability
+
+(2) Define Collaboration Task
+
+First, we need to clarify the common goal of the two AI experts. We define this task through a detailed string `task_prompt`.
+
+```python
+from colorama import Fore
+from camel.societies import RolePlaying
+from camel.utils import print_text_animated
+
+# Define collaboration task
+task_prompt = """
+Create a short e-book on "The Psychology of Procrastination" for general readers interested in psychology.
+Requirements:
+1. Content should be scientifically rigorous, based on empirical research
+2. Language should be easy to understand, avoiding excessive professional terminology
+3. Include practical improvement suggestions and case analysis
+4. Length controlled at 8000-10000 words
+5. Clear structure, including introduction, core chapters, and summary
+"""
+
+print(Fore.YELLOW + f"Collaboration task:\n{task_prompt}\n")
+```
+
+`task_prompt` is the "task specification" for the entire collaboration. It is not only the goal we want to achieve but will also be used behind the scenes by CAMEL to generate "inception prompts," ensuring that the conversation between the two agents always revolves around this core goal.
+
+(3) Initialize Role-Playing "Society"
+
+Next, we create a `RolePlaying` session instance. This is CAMEL's core operation, which quickly builds a two-agent collaboration "society" based on the roles and tasks we provide.
+
+```python
+# Initialize role-playing session
+# AI writer as "user", responsible for proposing writing structure and requirements
+# AI psychologist as "assistant", responsible for providing professional knowledge and content
+role_play_session = RolePlaying(
+    assistant_role_name="Psychologist",
+    user_role_name="Writer",
+    task_prompt=task_prompt,
+    with_task_specify=False, # In this example, we directly use the given task_prompt
+)
+
+print(Fore.CYAN + f"Specific task description:\n{role_play_session.task_prompt}\n")
+```
+
+`RolePlaying` is a high-level API provided by CAMEL that encapsulates complex prompt engineering. We only need to pass in the names of the two roles and the task. In CAMEL's design, the `user` role is the "driver" and "demander" of the conversation, while the `assistant` role is the "executor" and "solution provider." Therefore, we assign the "writer" responsible for planning structure to `user_role_name` and the "psychologist" responsible for providing professional knowledge to `assistant_role_name`.
+
+(4) Start and Run Automated Conversation
+
+Finally, we write a loop to drive the entire conversation process, allowing the two AI experts to begin their automated collaboration.
+
+```python
+# Start collaboration conversation
+chat_turn_limit, n = 30, 0
+# Call init_chat() to get the initial conversation message generated by AI
+input_msg = role_play_session.init_chat()
+
+while n < chat_turn_limit:
+    n += 1
+    # step() method drives a complete round of conversation, AI user and AI assistant each speak once
+    assistant_response, user_response = role_play_session.step(input_msg)
+
+    # Check if messages are returned to prevent premature conversation termination
+    if assistant_response.msg is None or user_response.msg is None:
+        break
+
+    print_text_animated(Fore.BLUE + f"Writer (AI User):\n\n{user_response.msg.content}\n")
+    print_text_animated(Fore.GREEN + f"Psychologist (AI Assistant):\n\n{assistant_response.msg.content}\n")
+
+    # Check task completion flag
+    if "<CAMEL_TASK_DONE>" in user_response.msg.content or "<CAMEL_TASK_DONE>" in assistant_response.msg.content:
+        print(Fore.MAGENTA + "✅ E-book creation completed!")
+        break
+
+    # Use assistant's reply as input for next round of conversation
+    input_msg = assistant_response.msg
+
+print(Fore.YELLOW + f"Total of {n} rounds of collaborative conversation")
+```
+
+This `while` loop is the core of automated collaboration. The conversation is automatically initiated by the `init_chat()` method based on the task and roles, without the need to manually write an opening. Each step of the loop drives a complete round of interaction by calling `step()` (writer proposes requirements, psychologist provides content), and uses the psychologist's output from the previous round as input for the next round, forming a chain of creation. The entire process will continue until the preset conversation turn limit is reached, or automatically terminates after either agent outputs the task completion flag `<CAMEL_TASK_DONE>`.
+
+(5) Collaboration Process Demonstration
+
+When executing the above code, we don't just get a long string of monotonous Q&A but can observe a highly structured collaboration process, like a human expert team, automatically proceeding. The entire creation process naturally divides into several stages:
+
+**Stage 1 (approximately rounds 1-5): Framework Building and Goal Alignment** In the early stages of the conversation, the "writer" agent first plays the leading role, proposing initial ideas for the overall structure and chapter arrangement of the e-book. Subsequently, the "psychologist" reviews and supplements this framework from a professional perspective, ensuring that core academic modules (such as theoretical foundations, key concepts, etc.) are not omitted, thereby reaching consensus on the final output at the beginning of collaboration.
+
+**Stage 2 (approximately rounds 6-20): Core Content Generation and Knowledge Translation** This is the most efficient content creation stage. The collaboration mode becomes a stable "request-response" loop:
+
+- **Psychologist**: Responsible for providing "hardcore" professional knowledge, such as scientific explanations of core concepts like "temporal discounting theory" and "executive function deficits," and citing relevant experimental research to support viewpoints.
+- **Writer**: Plays the role of "translator," transforming these rigorous but potentially obscure academic concepts into vivid, figurative metaphors and life-related cases. For example, it might compare the concept of "present bias in the brain" to "a willful child who only cares about immediate candy and not long-term health."
+
+**Stage 3 (approximately rounds 21-25): Iterative Optimization and Quality Assurance** When the main content of the book is completed, the focus of the conversation shifts to polishing and improving the existing text. At this time, the roles of the two agents undergo subtle changes:
+
+- **Writer**: More focused on examining the overall fluency, logical coherence, and language style of the article, proposing revision suggestions from the perspective of "reader experience."
+- **Psychologist**: Again plays the role of "fact checker," ensuring that the scientific accuracy of core knowledge is not lost during translation and polishing, and supplementing certain viewpoints with more powerful empirical research support.
+
+**Stage 4 (Conclusion): Summary and Elevation** In the last few rounds of conversation, both parties collaborate to complete the summary of practical suggestions and the review of the entire book, ensuring that the e-book has a clear, powerful ending that leaves a deep impression on readers and provides practical value.
+
+```
+Collaboration task:
+Create a short e-book on "The Psychology of Procrastination" for general readers interested in psychology.
+Requirements:
+1. Content should be scientifically rigorous, based on empirical research
+2. Language should be easy to understand, avoiding excessive professional terminology
+3. Include practical improvement suggestions and case analysis
+4. Length controlled at 8000-10000 words
+5. Clear structure, including introduction, core chapters, and summary
+
+Specific task description:
+Write an 8000–10000 word short e-book "The Psychology of Procrastination" for general readers: empirically based, easy to understand. Structure: introduction, causes (cognitive/emotional/reward), motivation and decision-making, habit formation and intervention, practical strategies and exercises, three case analyses, summary and resources. Each chapter contains research citations and actionable steps.
+
+Writer:
+Instruction: Please write a 400–600 word Chinese draft for the "Introduction" chapter of the e-book...
+Input: None
+
+Psychologist:
+Solution:
+Draft: Procrastination refers to the behavior and internal tendency of repeatedly postponing or avoiding a task despite knowing it should be completed. It can be an occasional time management problem...
+
+Next request.
+
+Writer:
+Instruction: Please revise the following introduction draft into a 450–550 word Chinese text...
+Input: Draft: Procrastination refers to the behavior and internal tendency of repeatedly postponing or avoiding a task...
+.....
+```
+
+### 6.4.3 Analysis of CAMEL's Advantages and Limitations
+
+Through the previous e-book creation case, we deeply experienced CAMEL framework's unique role-playing paradigm. Now let's objectively analyze the advantages and limitations of this design philosophy to make wise technical choices in actual projects.
+
+(1) Advantages
+
+CAMEL's greatest advantage lies in its "light architecture, heavy prompting" design philosophy. Compared to AutoGen's complex conversation management and AgentScope's distributed architecture, CAMEL can achieve high-quality agent collaboration through carefully designed initial prompts. This naturally emergent collaborative behavior is often more flexible and efficient than hard-coded workflows.
+
+It's worth noting that the CAMEL framework is undergoing rapid development and evolution. From its [GitHub repository](https://github.com/camel-ai/camel), we can see that CAMEL is far more than a simple two-agent collaboration framework and currently has:
+
+- **Multimodal Capabilities**: Supports agent collaboration in multiple modalities such as text, image, and audio
+- **Tool Integration**: Built-in rich tool library, including search, calculation, code execution, etc.
+- **Model Adaptation**: Supports multiple LLM backends such as OpenAI, Anthropic, Google, and open-source models
+- **Ecosystem Linkage**: Achieved interoperability with mainstream frameworks such as LangChain, CrewAI, and AutoGen
+
+(2) Main Limitations
+
+1. High Dependence on Prompt Engineering
+
+CAMEL's success largely depends on the quality of initial prompts. This brings several challenges:
+
+- **Prompt Design Threshold**: Requires deep understanding of the target domain and LLM behavioral characteristics
+- **Debugging Complexity**: When collaboration is ineffective, it's difficult to pinpoint whether the problem lies in role definition, task description, or interaction rules
+- **Consistency Challenge**: Different LLMs may have different understandings of the same prompt
+
+2. Collaboration Scale Limitations
+
+Although CAMEL performs excellently in two-agent collaboration, it faces challenges when handling large-scale multi-agent scenarios:
+
+- **Conversation Management**: Lacks complex conversation routing mechanisms like AutoGen
+- **State Synchronization**: Doesn't have distributed state management capabilities like AgentScope
+- **Conflict Resolution**: Lacks effective arbitration mechanisms when multiple agents disagree
+
+3. Task Applicability Boundaries
+
+CAMEL is particularly suitable for tasks requiring deep collaboration and creative thinking, but may not be the optimal choice in certain scenarios:
+
+- **Strict Process Control**: For tasks requiring precise step control, LangGraph's graph structure is more suitable
+- **Large-scale Concurrency**: AgentScope's message-driven architecture has more advantages in high-concurrency scenarios
+- **Complex Decision Trees**: AutoGen's group chat mode is more flexible in multi-party decision scenarios
+
+Overall, CAMEL represents a unique and elegant multi-agent collaboration paradigm. Through its "human-centered" role-playing design, it transforms complex system engineering problems into intuitive interpersonal collaboration patterns. As its ecosystem continues to improve and functions continue to expand, CAMEL is becoming one of the important choices for building intelligent collaboration systems.
+
+## 6.5 Framework Four: LangGraph
+
+### 6.5.1 LangGraph Structure Overview
+
+LangGraph, as an important extension of the LangChain ecosystem, represents a completely new direction in agent framework design. Unlike the "conversation"-based frameworks introduced earlier (such as AutoGen and CAMEL), LangGraph models the agent's execution flow as a **State Machine** and represents it as a **Directed Graph**. In this paradigm, the graph's **Nodes** represent specific computational steps (such as calling LLM, executing tools), while **Edges** define the transition logic from one node to another. The revolutionary aspect of this design is that it natively supports loops, making it unprecedentedly intuitive and simple to build complex agent workflows capable of iteration, reflection, and self-correction.
+
+To understand LangGraph, we need to first grasp its three basic components.
+
+**First, is the global state (State)**. The entire graph's execution process revolves around a shared state object. This state is usually defined as a Python `TypedDict`, which can contain any information you need to track, such as conversation history, intermediate results, iteration count, etc. All nodes can read and update this central state.
+
+```python
+from typing import TypedDict, List
+
+# Define global state data structure
+class AgentState(TypedDict):
+    messages: List[str]      # Conversation history
+    current_task: str        # Current task
+    final_answer: str        # Final answer
+    # ... any other state to track
+```
+
+**Second, are the nodes (Nodes)**. Each node is a Python function that receives the current state as input and returns an updated state as output. Nodes are units that perform specific work.
+
+```python
+# Define a "planner" node function
+def planner_node(state: AgentState) -> AgentState:
+    """Formulate a plan based on current task and update state."""
+    current_task = state["current_task"]
+    # ... call LLM to generate plan ...
+    plan = f"Plan generated for task '{current_task}'..."
+
+    # Append new message to state
+    state["messages"].append(plan)
+    return state
+
+# Define an "executor" node function
+def executor_node(state: AgentState) -> AgentState:
+    """Execute latest plan and update state."""
+    latest_plan = state["messages"][-1]
+    # ... execute plan and get result ...
+    result = f"Result of executing plan '{latest_plan}'..."
+
+    state["messages"].append(result)
+    return state
+```
+
+**Finally, are the edges (Edges)**. Edges are responsible for connecting nodes and defining the direction of the workflow. The simplest edge is a regular edge, which specifies that the output of one node always flows to another fixed node. LangGraph's most powerful feature lies in **Conditional Edges**. It uses a function to judge the current state and then dynamically decides which node to jump to next. This is the key to implementing loops and complex logical branches.
+
+```python
+def should_continue(state: AgentState) -> str:
+    """Condition function: decide next route based on state."""
+    # Assume if messages are less than 3, need to continue planning
+    if len(state["messages"]) < 3:
+        # Returned string needs to match the key defined when adding conditional edge
+        return "continue_to_planner"
+    else:
+        state["final_answer"] = state["messages"][-1]
+        return "end_workflow"
+```
+
+After defining state, nodes, and edges, we can assemble them into an executable workflow like building blocks.
+
+```python
+from langgraph.graph import StateGraph, END
+
+# Initialize a state graph and bind our defined state structure
+workflow = StateGraph(AgentState)
+
+# Add node functions to the graph
+workflow.add_node("planner", planner_node)
+workflow.add_node("executor", executor_node)
+
+# Set graph entry point
+workflow.set_entry_point("planner")
+
+# Add regular edge, connecting planner and executor
+workflow.add_edge("planner", "executor")
+
+# Add conditional edge, implementing dynamic routing
+workflow.add_conditional_edges(
+    # Starting node
+    "executor",
+    # Judgment function
+    should_continue,
+    # Route mapping: map judgment function's return value to target node
+    {
+        "continue_to_planner": "planner", # If returns "continue_to_planner", jump back to planner node
+        "end_workflow": END               # If returns "end_workflow", end process
+    }
+)
+
+# Compile graph, generate executable application
+app = workflow.compile()
+
+# Run graph
+inputs = {"current_task": "Analyze recent AI industry news", "messages": []}
+for event in app.stream(inputs):
+    print(event)
+```
+
+### 6.5.2 Three-Step Q&A Assistant
+After understanding LangGraph's core concepts, we will consolidate what we've learned through a practical case. We will build a simplified Q&A dialogue assistant that follows a clear, fixed three-step process to answer user questions:
+
+1. **Understand**: First, analyze the user's query intent.
+2. **Search**: Then, simulate searching for information related to the intent.
+3. **Answer**: Finally, generate the final answer based on the intent and searched information.
+
+This case will clearly demonstrate how to define state, create nodes, and linearly connect them into a complete workflow. We will break down the code into four core steps: define state, create nodes, build graph, and run application.
+
+(1) Define Global State
+
+First, we need to define a global state that runs through the entire workflow. **This is a shared data structure that is passed between each node of the graph, serving as the persistent context of the workflow.** Each node can read data from this structure and update it.
+
+```python
+from typing import TypedDict, Annotated
+from langgraph.graph.message import add_messages
+
+class SearchState(TypedDict):
+    messages: Annotated[list, add_messages]
+    user_query: str      # User requirement summary after LLM understanding
+    search_query: str    # Optimized search query for Tavily API
+    search_results: str  # Results returned by Tavily search
+    final_answer: str    # Final generated answer
+    step: str            # Mark current step
+```
+
+We created the `SearchState` `TypedDict`, defining a clear data schema for the state object. A key design is the inclusion of both `user_query` and `search_query` fields. This allows the agent to first optimize the user's natural language question into refined keywords more suitable for search engines, thereby significantly improving the quality of search results.
+
+(2) Define Workflow Nodes
+
+After defining the state structure, the next step is to create the various nodes that make up our workflow. In LangGraph, each node is a Python function that performs a specific task. These functions receive the current state object as input and return a dictionary containing updated fields.
+
+Before defining nodes, we first complete the project initialization setup, including loading environment variables and instantiating the large language model.
+
+```python
+import os
+from dotenv import load_dotenv
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
+from tavily import TavilyClient
+
+# Load environment variables from .env file
+load_dotenv()
+
+# Initialize model
+# We will use this llm instance to drive the intelligence of all nodes
+llm = ChatOpenAI(
+    model=os.getenv("LLM_MODEL_ID", "gpt-4o-mini"),
+    api_key=os.getenv("LLM_API_KEY"),
+    base_url=os.getenv("LLM_BASE_URL", "https://api.openai.com/v1"),
+    temperature=0.7
+)
+# Initialize Tavily client
+tavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
+```
+
+Now, let's create the three core nodes one by one.
+
+(1) Understand and Query Node
+
+This node is the first step of the workflow. Its responsibility is to understand user intent and generate an optimized search query for it.
+
+```python
+def understand_query_node(state: SearchState) -> dict:
+    """Step 1: Understand user query and generate search keywords"""
+    user_message = state["messages"][-1].content
+
+    understand_prompt = f"""Analyze the user's query: "{user_message}"
+Please complete two tasks:
+1. Concisely summarize what the user wants to know
+2. Generate keywords most suitable for search engines (Chinese or English, must be precise)
+
+Format:
+Understanding: [User requirement summary]
+Search terms: [Best search keywords]"""
+
+    response = llm.invoke([SystemMessage(content=understand_prompt)])
+    response_text = response.content
+
+    # Parse LLM's output, extract search keywords
+    search_query = user_message # Default to using original query
+    if "Search terms:" in response_text or "搜索词:" in response_text:
+        if "Search terms:" in response_text:
+            search_query = response_text.split("Search terms:")[1].strip()
+        else:
+            search_query = response_text.split("搜索词:")[1].strip()
+
+    return {
+        "user_query": response_text,
+        "search_query": search_query,
+        "step": "understood",
+        "messages": [AIMessage(content=f"I will search for you: {search_query}")]
+    }
+```
+
+This node uses a structured prompt to require the LLM to simultaneously complete two tasks: "intent understanding" and "keyword generation," and updates the parsed dedicated search keywords to the state's `search_query` field, preparing for the next step of precise search.
+
+(2) Search Node
+
+This node is responsible for executing the agent's "tool usage" capability. It will call the Tavily API for real internet search and has basic error handling functionality.
+
+```python
+def tavily_search_node(state: SearchState) -> dict:
+    """Step 2: Use Tavily API for real search"""
+    search_query = state["search_query"]
+    try:
+        print(f"🔍 Searching: {search_query}")
+        response = tavily_client.search(
+            query=search_query, search_depth="basic", max_results=5, include_answer=True
+        )
+        # ... (process and format search results) ...
+        search_results = ... # Formatted result string
+
+        return {
+            "search_results": search_results,
+            "step": "searched",
+            "messages": [AIMessage(content="✅ Search completed! Organizing answer...")]
+        }
+    except Exception as e:
+        # ... (handle error) ...
+        return {
+            "search_results": f"Search failed: {e}",
+            "step": "search_failed",
+            "messages": [AIMessage(content="❌ Search encountered a problem...")]
+        }
+```
+
+This node initiates a real API call through `tavily_client.search`. It is wrapped in a `try...except` block to catch possible exceptions. If the search fails, it updates the `step` state to `"search_failed"`, which will be used by the next node to trigger a fallback plan.
+
+(3) Answer Node
+
+The final answer node can choose different answering strategies based on whether the previous search was successful, possessing a certain degree of flexibility.
+
+```python
+def generate_answer_node(state: SearchState) -> dict:
+    """Step 3: Generate final answer based on search results"""
+    if state["step"] == "search_failed":
+        # If search failed, execute fallback strategy, answer based on LLM's own knowledge
+        fallback_prompt = f"Search API is temporarily unavailable, please answer the user's question based on your knowledge:\nUser question: {state['user_query']}"
+        response = llm.invoke([SystemMessage(content=fallback_prompt)])
+    else:
+        # Search successful, generate answer based on search results
+        answer_prompt = f"""Provide a complete and accurate answer to the user based on the following search results:
+User question: {state['user_query']}
+Search results:\n{state['search_results']}
+Please synthesize the search results and provide an accurate, useful answer..."""
+        response = llm.invoke([SystemMessage(content=answer_prompt)])
+
+    return {
+        "final_answer": response.content,
+        "step": "completed",
+        "messages": [AIMessage(content=response.content)]
+    }
+```
+
+This node executes conditional logic by checking the value of `state["step"]`. If the search fails, it will use the LLM's internal knowledge to answer and inform the user of the situation. If the search succeeds, it will use a prompt containing real-time search results to generate a timely and evidence-based answer.
+
+(4) Build Graph
+
+We connect all nodes together.
+
+```python
+from langgraph.graph import StateGraph, START, END
+from langgraph.checkpoint.memory import InMemorySaver
+
+def create_search_assistant():
+    workflow = StateGraph(SearchState)
+
+    # Add nodes
+    workflow.add_node("understand", understand_query_node)
+    workflow.add_node("search", tavily_search_node)
+    workflow.add_node("answer", generate_answer_node)
+
+    # Set linear process
+    workflow.add_edge(START, "understand")
+    workflow.add_edge("understand", "search")
+    workflow.add_edge("search", "answer")
+    workflow.add_edge("answer", END)
+
+    # Compile graph
+    memory = InMemorySaver()
+    app = workflow.compile(checkpointer=memory)
+    return app
+```
+
+(5) Running Case Demonstration
+
+After running this script, you can ask some questions that require real-time information, such as the case in our first chapter: `I'm going to Beijing tomorrow, what's the weather like? Are there suitable attractions?`
+
+You will see the terminal clearly display the agent's "thinking" process:
+
+```
+🔍 Intelligent Search Assistant Started!
+I will use Tavily API to search for the latest and most accurate information for you
+Supports various questions: news, technology, knowledge Q&A, etc.
+(Enter 'quit' to exit)
+
+🤔 What would you like to know: I'm going to Beijing tomorrow, what's the weather like? Are there suitable attractions?
+
+============================================================
+🧠 Understanding phase: I understand your needs: Understanding: The user wants to know about tomorrow's weather in Beijing and suitable attraction recommendations.
+Search terms: Beijing tomorrow weather attraction recommendations Beijing weather tomorrow attractions
+🔍 Searching: Beijing tomorrow weather attraction recommendations Beijing weather tomorrow attractions
+🔍 Search phase: ✅ Search completed! Found relevant information, organizing answer for you...
+
+💡 Final Answer:
+Tomorrow (September 17, 2025) Beijing's weather forecast shows it is expected to be cloudy, with temperatures ranging from 17°C (62°F) to 25°C (77°F). This mild weather is very suitable for outdoor activities.
+
+### Suitable Attraction Recommendations:
+1. **Great Wall**: As one of China's most famous historical sites, the Great Wall is a must-visit. You can choose popular sections like Badaling or Mutianyu for your tour.
+
+2. **Forbidden City**: The Forbidden City was the imperial palace of the Ming and Qing dynasties, with rich history and culture, suitable for tourists interested in Chinese history.
+
+3. **Tiananmen Square**: This is one of China's symbols, with many important buildings and monuments on the square, suitable for taking photos.
+
+4. **Summer Palace**: A very beautiful royal garden, suitable for strolling and enjoying natural scenery, especially the lakes and ancient buildings.
+
+5. **798 Art District**: If you're interested in modern art, the 798 Art District is a place that integrates art, culture, and creativity, suitable for exploration and photography.
+
+### Tips:
+- Since tomorrow's weather is good, it's recommended to plan your travel route in advance and prepare some water and snacks to maintain sufficient energy during the tour.
+- Since weather changes may affect the tour experience, it's recommended to check real-time weather updates.
+
+Hope this information helps you arrange a pleasant Beijing trip! If you need more information about attractions or travel advice, feel free to ask anytime.
+
+============================================================
+
+🤔 What would you like to know:
+```
+
+And it is a continuously interactive assistant, you can continue to ask questions.
+
+### 6.5.3 Analysis of LangGraph's Advantages and Limitations
+
+Any technical framework has its specific applicable scenarios and design trade-offs. In this section, we will objectively analyze LangGraph's core advantages and the limitations it may face in practical applications.
+
+(1) Advantages
+
+- As shown in our intelligent search assistant case, LangGraph explicitly defines a complete real-time Q&A process as a "flowchart" composed of states, nodes, and edges. The greatest advantage of this design is **high controllability and predictability**. Developers can precisely plan every step of the agent's behavior, which is crucial for building production-level applications that require high reliability and auditability. Its most powerful feature lies in **native support for cycles**. Through conditional edges, we can easily build "reflection-correction" loops. For example, in our case, if the search fails, we can design a path to fall back to a backup plan. This is key to building agents capable of self-optimization and fault tolerance.
+
+- In addition, since each node is an independent Python function, this brings **high modularity**. At the same time, inserting a node waiting for human review in the process becomes very straightforward, providing a solid foundation for implementing reliable "Human-in-the-loop" collaboration.
+
+(2) Limitations
+
+- Compared to conversation-based frameworks, LangGraph requires developers to write more **boilerplate code**. Defining states, nodes, edges, and a series of operations makes the development process more cumbersome for simple tasks. Developers need to think more about "how to control the process (how)" rather than just "what to do (what)". Since the workflow is predefined, LangGraph's behavior is controllable but also lacks the dynamic, **"emergent" interaction** of conversational agents. Its strength lies in executing a determined, reliable process, rather than simulating open-ended, unpredictable social collaboration.
+
+- The debugging process also presents challenges. Although the process is clearer than conversation history, problems may occur at multiple points: logical errors within a node, mutations in state data passed between nodes, or mistakes in edge transition condition judgments. This requires developers to have a global understanding of the entire graph's operating mechanism.
+
+## 6.6 Chapter Summary
+
+In this chapter, we experienced some of the most cutting-edge agent frameworks through hands-on practice in the form of cases.
+
+We saw that each framework has its own approach to implementing agent construction:
+
+- **AutoGen** abstracts complex collaboration as a multi-role, automatically conducted "group chat," with its core being "driving collaboration through conversation."
+- **AgentScope** focuses on the robustness and scalability of industrial-grade applications, providing a solid engineering foundation for building high-concurrency, distributed multi-agent systems.
+- **CAMEL** demonstrates how to stimulate deep, autonomous collaboration between two expert agents with minimal code through its lightweight "role-playing" and "inception prompting" paradigm.
+- **LangGraph** returns to a more fundamental "state machine" model, giving developers precise control over workflows through explicit graph structures, especially its loop capability, paving the way for building reflective and correctable agents.
+
+Through in-depth analysis of these frameworks, we can distill a design trade-off: **the choice between "emergent collaboration" and "explicit control"**. AutoGen and CAMEL rely more on defining agents' "roles" and "goals," allowing complex collaborative behaviors to "emerge" from simple conversation rules. This approach is closer to human interaction patterns but is sometimes difficult to predict and debug. LangGraph requires developers to explicitly define every step and transition condition, sacrificing some "emergent" surprises in exchange for high reliability, controllability, and observability. At the same time, AgentScope reveals a second equally important dimension: **engineering**. Regardless of which collaboration paradigm we choose, to push it from experimental prototype to production application, we must face engineering challenges such as concurrency, fault tolerance, and distributed deployment. AgentScope was born to solve these problems, representing the critical leap from "can run" to "can serve stably."
+
+In summary, there is not just one way to build agents. Deeply understanding the framework design philosophies explored in this chapter can make us not only better "tool users" but also understand the various pros and cons and trade-offs in framework design.
+
+In the next chapter, we will enter the core content of this tutorial, building our own agent framework from scratch, integrating all theory and practice.
+
+
+## Exercises
+
+1. This chapter introduced four distinctive agent frameworks: `AutoGen`, `AgentScope`, `CAMEL`, and `LangGraph`. Please analyze:
+
+   - In Table 6.1 of Section 6.1.2, multiple dimensions of these four frameworks were compared. Please select the two frameworks you are most familiar with and further compare them in depth from three dimensions: "collaboration mode," "control method," and "applicable scenarios."
+   - This chapter mentioned the trade-off between "emergent collaboration" and "explicit control." How do you understand the meaning of these two design philosophies?
+
+2. In the `AutoGen` case in Section 6.2, we built a "software development team." Please extend your thinking based on this case:
+
+   > **Hint**: This is a hands-on practice question, actual operation is recommended
+
+   - The current team uses `RoundRobinGroupChat` (round-robin group chat) mode, where agents speak in a fixed order. If requirements change and the engineer's code needs to be returned to the product manager for re-review, how should the collaboration process be modified? Please design a mechanism that supports "dynamic rollback."
+   - In the case, we defined the role and responsibilities of each agent through `System Message`. Please try to add a new role "Quality Assurance" to this team and design its system message so that it can perform automated testing after code review.
+   - `AutoGen`'s conversational collaboration has potential instability, which may cause conversations to deviate from the topic or fall into loops. Please think: How to design a "conversation quality monitoring" mechanism to intervene in time when anomalies are detected?
+
+3. In the `AgentScope` case in Section 6.3, we implemented a "Three Kingdoms Werewolf" game. Please analyze in depth:
+
+   - The case used `MsgHub` (message center) to manage communication between agents. Please explain what advantages message-driven architecture has compared to traditional function calls? In what scenarios is this architecture particularly valuable?
+   - The game used structured output (such as `DiscussionModelCN`, `WitchActionModelCN`) to constrain agent behavior. Please design a new game role "Hunter" and define its corresponding structured output model, including field definitions and validation rules.
+   - `AgentScope` supports distributed deployment, which means different agents can run on different servers. Please think: In a real-time game scenario like "Three Kingdoms Werewolf," what technical challenges will distributed deployment bring? How to ensure message ordering and consistency?
+
+4. In the `CAMEL` case in Section 6.4, we had a psychologist and writer collaborate to create an e-book.
+
+   - In the case, collaboration is forcibly terminated when the `<CAMEL_TASK_DONE>` flag is detected. But what if the two agents disagree (one thinks it can be terminated, one thinks it shouldn't) and cannot reach consensus? Please design a "conflict resolution" compatibility mechanism.
+   - `CAMEL` was originally designed for two-agent collaboration but has now been extended to support multi-agent. Please consult `CAMEL`'s latest documentation to understand its multi-agent collaboration module [`workforce`](https://docs.camel-ai.org/key_modules/workforce), and explain how it differs from `AutoGen`'s group chat mode in combination with the architecture diagram.
+
+5. In the `LangGraph` case in Section 6.5, we built a "three-step Q&A assistant." Please analyze:
+
+   - `LangGraph` models the agent process as a state machine and directed graph. Please draw the graph structure of the "understand-search-answer" process in the case, marking nodes, edges, and state transition conditions.
+   - The current assistant is a linear process. Please extend this case by adding a "reflection" node: if the generated answer quality is low (e.g., too brief or lacking details), the system should re-search or regenerate the answer. Please design the conditional edge logic for this loop mechanism.
+   - `LangGraph`'s advantage lies in native support for loops. Please design a more complex application scenario that fully utilizes this feature: for example, "code generation-testing-fixing" loop, "paper writing-review-revision" loop, etc. Draw the complete graph structure and explain the function of key nodes.
+
+6. Framework selection is one of the key decisions in agent product development. Suppose you are a technical architect at an `AI` company, and the company plans to develop the following three agent product applications. Please select the most suitable framework for each application (`AutoGen`, `AgentScope`, `CAMEL`, `LangGraph`, or develop from scratch without a framework) and explain in detail:
+
+   **Application A**: Intelligent customer service system, needs to handle a large number of concurrent user requests (1000+ per second), requires response time less than 2 seconds, system needs to run stably 7×24 hours, and support horizontal scaling.
+
+   **Application B**: Scientific research paper writing assistance platform, needs a "researcher agent" and a "writer agent" to collaborate deeply, jointly completing literature review, experimental design, data analysis, and paper writing. Requires agents to conduct multiple rounds of in-depth discussion and autonomously advance tasks.
+
+   **Application C**: Financial risk control approval system, needs to process loan applications according to strict procedures: document review → risk assessment → quota calculation → compliance check → manual review → final decision. Each link has clear judgment criteria and branch logic, requiring traceable and auditable processes.
+
+
+## References
+
+[1] Wu Q, Bansal G, Zhang J, et al. Autogen: Enabling next-gen LLM applications via multi-agent conversations[C]//First Conference on Language Modeling. 2024.
+
+[2] Gao D, Li Z, Pan X, et al. Agentscope: A flexible yet robust multi-agent platform[J]. arXiv preprint arXiv:2402.14034, 2024.
+
+[3] Li G, Hammoud H, Itani H, et al. Camel: Communicative agents for" mind" exploration of large language model society[J]. Advances in Neural Information Processing Systems, 2023, 36: 51991-52008.
+
+[4] LangChain. LangGraph [EB/OL]. (2024). https://github.com/langchain-ai/langgraph.
+
+[5] Microsoft. AutoGen - UserProxyAgent [EB/OL]. (2024). https://microsoft.github.io/autogen/stable/reference/python/autogen_agentchat.agents.html#autogen_agentchat.agents.UserProxyAgent.
+

+ 4 - 0
docs/chapter6/第六章 框架开发实践.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter6-Framework-Development-Practice.md">English</a> | 中文
+</div>
+
 # 第六章 框架开发实践
 
 在第四章中,我们通过编写原生代码,实现了 ReAct、Plan-and-Solve 和 Reflection 这几种智能体的核心工作流。这个过程让我们对智能体的内在执行逻辑有了理解。随后,在第五章,我们切换到“使用者”的视角,体验了低代码平台带来的便捷与高效。

+ 2083 - 0
docs/chapter7/Chapter7-Building-Your-Agent-Framework.md

@@ -0,0 +1,2083 @@
+<div align="right">
+  English | <a href="./第七章%20构建你的Agent框架.md">中文</a>
+</div>
+
+# Chapter 7 Building Your Agent Framework
+
+In the previous chapters, we explained the fundamentals of agents and experienced the development convenience brought by mainstream frameworks. Starting from this chapter, we will enter a more challenging and valuable stage: **building an agent framework from scratch—HelloAgents**.
+
+To ensure the continuity and reproducibility of the learning process, HelloAgents will advance development through version iterations. Each chapter will add new functional modules based on the previous chapter and integrate and implement agent-related knowledge points. Ultimately, we will use this self-built framework to efficiently implement the advanced application cases in the subsequent chapters of this book.
+
+## 7.1 Overall Framework Architecture Design
+
+### 7.1.1 Why Build Your Own Agent Framework
+
+In today's rapidly developing agent technology landscape, there are already many mature Agent frameworks on the market. So why do we still need to build a new framework from scratch?
+
+(1) Rapid Iteration and Limitations of Market Frameworks
+
+The agent field is a rapidly developing area where new concepts emerge constantly. Each framework has its own positioning and understanding of agent design, but the core knowledge points of agents are consistent.
+
+- **Complexity of Over-abstraction**: Many frameworks introduce numerous abstraction layers and configuration options in pursuit of generality. Taking LangChain as an example, although its chain invocation mechanism is flexible, it has a steep learning curve for beginners, often requiring understanding of many concepts to complete simple tasks.
+- **Instability from Rapid Iteration**: Commercial frameworks frequently change API interfaces to capture market share. Developers often face the frustration of code not running after version upgrades, with maintenance costs remaining high.
+- **Black-box Implementation Logic**: Many frameworks encapsulate core logic too tightly, making it difficult for developers to understand the internal working mechanisms of Agents and lacking deep customization capabilities. When encountering problems, they can only rely on documentation and community support, especially if the community is not active enough, feedback may take a very long time without anyone pushing it forward, affecting subsequent development efficiency.
+- **Complexity of Dependencies**: Mature frameworks often carry a large number of dependency packages, with large installation package sizes, which may cause dependency conflict problems when needing to cooperate with other project code.
+
+(2) Capability Leap from User to Builder
+
+Building your own Agent framework is actually a process of transforming from a "user" to a "builder." The value brought by this transformation is long-term.
+
+- **Deep Understanding of Agent Working Principles**: By implementing each component hands-on, developers can truly understand the Agent's thinking process, tool invocation mechanisms, and the pros and cons and differences of various design patterns.
+- **Gaining Complete Control**: A self-built framework means complete control over every line of code, allowing precise tuning according to specific needs without being constrained by third-party framework design philosophies.
+- **Cultivating System Design Capabilities**: The framework construction process involves core software engineering skills such as modular design, interface abstraction, and error handling, which are of significant value to developers' long-term growth.
+
+(3) Necessity of Customization Needs and Deep Mastery
+
+In practical applications, the needs for agents vary greatly across different scenarios, often requiring secondary development based on general frameworks.
+
+- **Optimization Needs for Specific Domains**: Vertical domains such as finance, healthcare, and education often require targeted prompt templates, special tool integration, and customized security strategies.
+- **Precise Control of Performance and Resources**: In production environments, there are strict requirements for response time, memory usage, and concurrent processing capabilities. The "one-size-fits-all" solutions of general frameworks often cannot meet refined needs.
+- **Transparency Requirements for Learning and Teaching**: In our teaching scenario, learners expect to clearly see every step of the agent construction process and understand the working mechanisms of different paradigms, which requires the framework to have high observability and interpretability.
+
+### 7.1.2 Design Philosophy of HelloAgents Framework
+
+Building a new Agent framework is not about the number of features but whether the design philosophy can truly solve the pain points of existing frameworks. The design of the HelloAgents framework revolves around a core question: How can learners both get started quickly and deeply understand the working principles of Agents?
+
+When you first encounter any mature framework, you may be attracted by its rich features, but you will soon discover a problem: to complete a simple task, you often need to understand more than a dozen different concepts such as Chain, Agent, Tool, Memory, Retriever, etc. Each concept has its own abstraction layer, making the learning curve extremely steep. Although this complexity brings powerful functionality, it also becomes an obstacle for beginners. The HelloAgents framework attempts to find a balance between functional completeness and learning friendliness, forming four core design philosophies.
+
+(1) Balance Between Lightweight and Teaching-Friendly
+
+An excellent learning framework should have complete readability. HelloAgents separates core code by chapters, based on a simple principle: any developer with a certain programming foundation should be able to fully understand the framework's working principles within a reasonable time. In dependency management, the framework adopts a minimalist strategy. Except for OpenAI's official SDK and a few necessary basic libraries, no heavy dependencies are introduced. When encountering problems, we can directly locate the framework's own code without searching for answers in complex dependency relationships.
+
+(2) Pragmatic Choice Based on Standard APIs
+
+OpenAI's API has become an industry standard, and almost all mainstream LLM providers are working hard to be compatible with this interface. HelloAgents chooses to build on this standard rather than reinventing an abstract interface. This decision is mainly motivated by several points. First is the guarantee of compatibility. After mastering the use of HelloAgents, when migrating to other frameworks or integrating it into existing projects, the underlying API invocation logic is completely consistent. Second is the reduction of learning costs. You don't need to learn new conceptual models because all operations are based on standard interfaces you are already familiar with.
+
+(3) Careful Design of Progressive Learning Path
+
+HelloAgents provides a clear learning path. We will save the learning code for each chapter as a historical version that can be downloaded via pip, so there is no need to worry about the cost of using the code, because every core function will be written by yourself. This design allows you to move forward according to your own needs and pace. Each upgrade is natural, without conceptual jumps or understanding gaps. It's worth mentioning that the content of this chapter is also based on the content of the previous six chapters. Similarly, this chapter also lays the framework foundation for subsequent advanced knowledge learning.
+
+(4) Unified "Tool" Abstraction: Everything is a Tool
+
+To thoroughly implement the lightweight and teaching-friendly philosophy, HelloAgents made a key simplification in architecture: except for the core Agent class, everything is Tools. Memory, RAG (Retrieval-Augmented Generation), RL (Reinforcement Learning), MCP (Protocol), and other modules that need to be learned independently in many other frameworks are all uniformly abstracted as a "tool" in HelloAgents. The original intention of this design is to eliminate unnecessary abstraction layers, allowing learners to return to the most intuitive core logic of "agents calling tools," thereby truly achieving the unity of quick start and deep understanding.
+
+### 7.1.3 Learning Objectives of This Chapter
+
+Let's first look at the core learning content of Chapter 7:
+
+```
+hello-agents/
+├── hello_agents/
+│   │
+│   ├── core/                     # Core framework layer
+│   │   ├── agent.py              # Agent base class
+│   │   ├── llm.py                # HelloAgentsLLM unified interface
+│   │   ├── message.py            # Message system
+│   │   ├── config.py             # Configuration management
+│   │   └── exceptions.py         # Exception system
+│   │
+│   ├── agents/                   # Agent implementation layer
+│   │   ├── simple_agent.py       # SimpleAgent implementation
+│   │   ├── react_agent.py        # ReActAgent implementation
+│   │   ├── reflection_agent.py   # ReflectionAgent implementation
+│   │   └── plan_solve_agent.py   # PlanAndSolveAgent implementation
+│   │
+│   ├── tools/                    # Tool system layer
+│   │   ├── base.py               # Tool base class
+│   │   ├── registry.py           # Tool registration mechanism
+│   │   ├── chain.py              # Tool chain management system
+│   │   ├── async_executor.py     # Asynchronous tool executor
+│   │   └── builtin/              # Built-in tool set
+│   │       ├── calculator.py     # Calculator tool
+│   │       └── search.py         # Search tool
+└──
+```
+
+Before starting to write specific code, we need to first establish a clear architectural blueprint. The architectural design of HelloAgents follows the core principles of "layered decoupling, single responsibility, unified interface," which maintains code organization and facilitates content expansion by chapters.
+
+**Quick Start: Installing HelloAgents Framework**
+
+To allow readers to quickly experience the complete functionality of this chapter, we provide a directly installable Python package. You can install the version corresponding to this chapter with the following command:
+
+```bash
+# Python version needs to be >= 3.10
+pip install "hello-agents==0.1.1"
+```
+
+Learning this chapter can be done in two ways:
+
+1. **Experiential Learning**: Directly install the framework using `pip`, run example code, and quickly experience various functions
+2. **Deep Learning**: Follow the content of this chapter, implement each component from scratch, and deeply understand the framework's design ideas and implementation details
+
+We recommend adopting the "experience first, then implement" learning path. In this chapter, we provide complete test files. You can rewrite core functions and run tests to verify whether your implementation is correct. This learning method ensures both practicality and learning effectiveness. If you want to deeply understand the framework's implementation details or wish to participate in the framework's development, you can visit this [GitHub repository](https://github.com/jjyaoao/helloagents).
+
+Before starting, let's experience building a simple agent using Hello-agents in 30 seconds!
+
+```python
+# Configure the LLM API in the .env file in the same-level folder. You can refer to the .env.example in the code folder, or reuse the .env file from previous chapter cases.
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from dotenv import load_dotenv
+
+# Load environment variables
+load_dotenv()
+
+# Create LLM instance - framework automatically detects provider
+llm = HelloAgentsLLM()
+
+# Or manually specify provider (optional)
+# llm = HelloAgentsLLM(provider="modelscope")
+
+# Create SimpleAgent
+agent = SimpleAgent(
+    name="AI Assistant",
+    llm=llm,
+    system_prompt="You are a helpful AI assistant"
+)
+
+# Basic conversation
+response = agent.run("Hello! Please introduce yourself")
+print(response)
+
+# Add tool functionality (optional)
+from hello_agents.tools import CalculatorTool
+calculator = CalculatorTool()
+# Need to implement MySimpleAgent in 7.4.1 for invocation, subsequent chapters will support this invocation method
+# agent.add_tool(calculator)
+
+# Now you can use tools
+response = agent.run("Please help me calculate 2 + 3 * 4")
+print(response)
+
+# View conversation history
+print(f"Number of historical messages: {len(agent.get_history())}")
+```
+
+
+
+## 7.2 HelloAgentsLLM Extension
+
+The content of this section will be an iterative upgrade based on the `HelloAgentsLLM` created in Section 4.1.3. We will transform this basic client into a more adaptive model invocation hub. This upgrade mainly revolves around the following three goals:
+
+1. **Multi-provider Support**: Achieve seamless switching between various mainstream LLM service providers such as OpenAI, ModelScope, Zhipu AI, etc., avoiding framework binding to specific vendors.
+2. **Local Model Integration**: Introduce VLLM and Ollama, two high-performance local deployment solutions, as production-grade supplements to the Hugging Face Transformers solution in Section 3.2.3, meeting the needs of data privacy and cost control.
+3. **Automatic Detection Mechanism**: Establish an automatic recognition mechanism that enables the framework to intelligently infer the type of LLM service used based on environment information, simplifying the user's configuration process.
+
+### 7.2.1 Supporting Multiple Providers
+
+The `HelloAgentsLLM` class we previously defined can already connect to any service compatible with the OpenAI interface through the two core parameters `api_key` and `base_url`. This theoretically guarantees universality, but in practical applications, different service providers have differences in environment variable naming, default API addresses, and recommended models. If users need to manually query and modify code every time they switch service providers, it will greatly affect development efficiency. To solve this problem, we introduce `provider`. The improvement idea is: let `HelloAgentsLLM` handle the configuration details of different service providers internally, thereby providing users with a unified and concise invocation experience. We will elaborate on the specific implementation details in Section 7.2.3 "Automatic Detection Mechanism." Here, we first focus on how to use this mechanism to extend the framework.
+
+Below, we will demonstrate how to add support for the ModelScope platform by inheriting `HelloAgentsLLM`. We hope readers will not only learn how to "use" the framework but also master how to "extend" it. Directly modifying the source code of installed libraries is not a recommended practice because it makes subsequent library upgrades difficult.
+
+(1) Create Custom LLM Class and Inherit
+
+Suppose we have a `my_llm.py` file in our project directory. We first import the `HelloAgentsLLM` base class from the `hello_agents` library, then create a new class named `MyLLM` that inherits from it.
+
+```python
+# my_llm.py
+import os
+from typing import Optional
+from openai import OpenAI
+from hello_agents import HelloAgentsLLM
+
+class MyLLM(HelloAgentsLLM):
+    """
+    A custom LLM client that adds support for ModelScope through inheritance.
+    """
+    pass # Leave empty for now
+```
+
+(2) Override `__init__` Method to Support New Provider
+
+Next, we override the `__init__` method in the `MyLLM` class. Our goal is: when the user passes `provider="modelscope"`, execute our custom logic; otherwise, call the original logic of the parent class `HelloAgentsLLM`, enabling it to continue supporting other built-in providers like OpenAI.
+
+```python
+class MyLLM(HelloAgentsLLM):
+    def __init__(
+        self,
+        model: Optional[str] = None,
+        api_key: Optional[str] = None,
+        base_url: Optional[str] = None,
+        provider: Optional[str] = "auto",
+        **kwargs
+    ):
+        # Check if provider is 'modelscope' that we want to handle
+        if provider == "modelscope":
+            print("Using custom ModelScope Provider")
+            self.provider = "modelscope"
+
+            # Parse ModelScope credentials
+            self.api_key = api_key or os.getenv("MODELSCOPE_API_KEY")
+            self.base_url = base_url or "https://api-inference.modelscope.cn/v1/"
+
+            # Validate credentials exist
+            if not self.api_key:
+                raise ValueError("ModelScope API key not found. Please set MODELSCOPE_API_KEY environment variable.")
+
+            # Set default model and other parameters
+            self.model = model or os.getenv("LLM_MODEL_ID") or "Qwen/Qwen2.5-VL-72B-Instruct"
+            self.temperature = kwargs.get('temperature', 0.7)
+            self.max_tokens = kwargs.get('max_tokens')
+            self.timeout = kwargs.get('timeout', 60)
+
+            # Create OpenAI client instance with obtained parameters
+            self._client = OpenAI(api_key=self.api_key, base_url=self.base_url, timeout=self.timeout)
+
+        else:
+            # If not modelscope, use parent class's original logic to handle
+            super().__init__(model=model, api_key=api_key, base_url=base_url, provider=provider, **kwargs)
+
+```
+
+This code demonstrates the idea of "overriding": we intercept the case of `provider="modelscope"` and handle it specially. For all other cases, we hand it back to the parent class through `super().__init__(...)`, preserving all the original framework functionality.
+
+(3) Using the Custom `MyLLM` Class
+
+Now, we can use our own `MyLLM` class in the project's business logic just like using the native `HelloAgentsLLM`.
+
+First, configure the ModelScope API key in the `.env` file:
+
+```bash
+# .env file
+MODELSCOPE_API_KEY="your-modelscope-api-key"
+```
+
+Then, import and use `MyLLM` in the main program:
+
+```python
+# my_main.py
+from dotenv import load_dotenv
+from my_llm import MyLLM # Note: Import our own class here
+
+# Load environment variables
+load_dotenv()
+
+# Instantiate our overridden client and specify provider
+llm = MyLLM(provider="modelscope")
+
+# Prepare messages
+messages = [{"role": "user", "content": "Hello, please introduce yourself."}]
+
+# Make the call, think and other methods are inherited from parent class, no need to override
+response_stream = llm.think(messages)
+
+# Print response
+print("ModelScope Response:")
+for chunk in response_stream:
+    # chunk is already a text fragment, can be used directly
+    print(chunk, end="", flush=True)
+```
+
+Through the above steps, we have successfully extended new functionality to the `hello-agents` library without modifying its source code. This method not only ensures code cleanliness and maintainability but also ensures that our customized functionality will not be lost when upgrading the `hello-agents` library in the future.
+
+### 7.2.2 Local Model Invocation
+
+In Section 3.2.3, we learned how to use the Hugging Face Transformers library to run open-source models locally. This method is very suitable for introductory learning and functional verification, but its underlying implementation has limited performance when handling high-concurrency requests and is usually not the first choice for production environments.
+
+To achieve high-performance, production-grade model inference services locally, the community has produced excellent tools such as VLLM and Ollama. They significantly improve model throughput and operational efficiency through techniques such as continuous batching and PagedAttention, and encapsulate models as API services compatible with OpenAI standards. This means we can seamlessly integrate them into `HelloAgentsLLM`.
+
+**VLLM**
+
+VLLM is a high-performance Python library designed for LLM inference. Through advanced technologies such as PagedAttention, it can achieve throughput several times higher than standard Transformers implementations. Below are the complete steps to deploy a VLLM service locally:
+
+First, you need to install VLLM according to your hardware environment (especially CUDA version). It is recommended to follow its [official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html) for installation to avoid version mismatch issues.
+
+```python
+pip install vllm
+```
+
+After installation, use the following command to start an OpenAI-compatible API service. VLLM will automatically download the specified model weights from Hugging Face Hub (if they don't exist locally). We still use the Qwen1.5-0.5B-Chat model as an example:
+
+```
+# Start VLLM service and load Qwen1.5-0.5B-Chat model
+python -m vllm.entrypoints.openai.api_server \
+    --model Qwen/Qwen1.5-0.5B-Chat \
+    --host 0.0.0.0 \
+    --port 8000
+```
+
+After the service starts, it will provide an OpenAI-compatible API at the `http://localhost:8000/v1` address.
+
+**Ollama**
+
+Ollama further simplifies local model management and deployment by encapsulating model download, configuration, and service startup into a single command, making it very suitable for quick start. Visit the Ollama [official website](https://ollama.com) to download and install the client for your operating system.
+
+After installation, open the terminal and execute the following command to download and run a model (using Llama 3 as an example). Ollama will automatically handle model download, service encapsulation, and hardware acceleration configuration.
+
+```
+# First run will automatically download the model, subsequent runs will directly start the service
+ollama run llama3
+```
+
+When you see the model's interactive prompt in the terminal, it indicates that the service has successfully started in the background. Ollama will expose an OpenAI-compatible API interface at the `http://localhost:11434/v1` address by default.
+
+**Integrating with `HelloAgentsLLM`**
+
+Since both VLLM and Ollama follow industry-standard APIs, integrating them into `HelloAgentsLLM` is very simple. We only need to treat them as a new `provider` when instantiating the client.
+
+For example, connecting to a locally running **VLLM** service:
+
+```python
+llm_client = HelloAgentsLLM(
+    provider="vllm",
+    model="Qwen/Qwen1.5-0.5B-Chat", # Must match the model specified when starting the service
+    base_url="http://localhost:8000/v1",
+    api_key="vllm" # Local services usually don't need a real API Key, can fill in any non-empty string
+)
+```
+
+Or, by setting environment variables and letting the client auto-detect, achieve zero code modification:
+
+```bash
+# Set in .env file
+LLM_BASE_URL="http://localhost:8000/v1"
+LLM_API_KEY="vllm"
+
+# Directly instantiate in Python code
+llm_client = HelloAgentsLLM() # Will automatically detect as vllm
+```
+
+Similarly, connecting to a local **Ollama** service is just as simple:
+
+```python
+llm_client = HelloAgentsLLM(
+    provider="ollama",
+    model="llama3", # Must match the model specified in `ollama run`
+    base_url="http://localhost:11434/v1",
+    api_key="ollama" # Local services also don't need a real Key
+)
+```
+
+Through this unified design, our agent core code requires no modifications to freely switch between cloud APIs and local models. This provides great flexibility for subsequent application development, deployment, cost control, and data privacy protection.
+
+### 7.2.3 Automatic Detection Mechanism
+
+To minimize the user's configuration burden as much as possible and follow the principle of "convention over configuration," `HelloAgentsLLM` internally designs two core auxiliary methods: `_auto_detect_provider` and `_resolve_credentials`. They work together, with `_auto_detect_provider` responsible for inferring the service provider based on environment information, while `_resolve_credentials` completes specific parameter configuration based on the inference result.
+
+The `_auto_detect_provider` method is responsible for automatically inferring the service provider based on environment information, according to the following priority order:
+
+1. **Highest Priority: Check Environment Variables for Specific Service Providers** This is the most direct and reliable basis for judgment. The framework will sequentially check whether environment variables such as `MODELSCOPE_API_KEY`, `OPENAI_API_KEY`, `ZHIPU_API_KEY`, etc. exist. Once any one is found, it will immediately determine the corresponding service provider.
+
+2. **Second Highest Priority: Determine Based on `base_url`** If the user has not set a specific service provider's key but has set the generic `LLM_BASE_URL`, the framework will parse this URL instead.
+
+   - **Domain Matching**: Identify cloud service providers by checking whether the URL contains characteristic strings such as `"api-inference.modelscope.cn"`, `"api.openai.com"`, etc.
+
+   - **Port Matching**: Identify local deployment solutions by checking whether the URL contains standard ports for local services such as `:11434` (Ollama), `:8000` (VLLM), etc.
+
+3. **Auxiliary Judgment: Analyze API Key Format** In some cases, if neither of the above two methods can determine, the framework will try to analyze the format of the generic environment variable `LLM_API_KEY`. For example, some service providers' API keys have fixed prefixes or unique encoding formats. However, since this method may have ambiguity (e.g., multiple service providers have similar key formats), its priority is lower and is only used as an auxiliary means.
+
+Some key code is as follows:
+
+```python
+def _auto_detect_provider(self, api_key: Optional[str], base_url: Optional[str]) -> str:
+    """
+    Automatically detect LLM provider
+    """
+    # 1. Check environment variables for specific providers (highest priority)
+    if os.getenv("MODELSCOPE_API_KEY"): return "modelscope"
+    if os.getenv("OPENAI_API_KEY"): return "openai"
+    if os.getenv("ZHIPU_API_KEY"): return "zhipu"
+    # ... Other service provider environment variable checks
+
+    # Get generic environment variables
+    actual_api_key = api_key or os.getenv("LLM_API_KEY")
+    actual_base_url = base_url or os.getenv("LLM_BASE_URL")
+
+    # 2. Determine based on base_url
+    if actual_base_url:
+        base_url_lower = actual_base_url.lower()
+        if "api-inference.modelscope.cn" in base_url_lower: return "modelscope"
+        if "open.bigmodel.cn" in base_url_lower: return "zhipu"
+        if "localhost" in base_url_lower or "127.0.0.1" in base_url_lower:
+            if ":11434" in base_url_lower: return "ollama"
+            if ":8000" in base_url_lower: return "vllm"
+            return "local" # Other local ports
+
+    # 3. Auxiliary judgment based on API key format
+    if actual_api_key:
+        if actual_api_key.startswith("ms-"): return "modelscope"
+        # ... Other key format judgments
+
+    # 4. Default return 'auto', use generic configuration
+    return "auto"
+```
+
+Once the `provider` is determined (whether user-specified or auto-detected), the `_resolve_credentials` method takes over to handle the differentiated configuration of service providers. It will actively search for corresponding environment variables based on the value of `provider` and set default `base_url` for it. Some key implementations are as follows:
+
+```python
+def _resolve_credentials(self, api_key: Optional[str], base_url: Optional[str]) -> tuple[str, str]:
+    """Resolve API key and base_url based on provider"""
+    if self.provider == "openai":
+        resolved_api_key = api_key or os.getenv("OPENAI_API_KEY") or os.getenv("LLM_API_KEY")
+        resolved_base_url = base_url or os.getenv("LLM_BASE_URL") or "https://api.openai.com/v1"
+        return resolved_api_key, resolved_base_url
+
+    elif self.provider == "modelscope":
+        resolved_api_key = api_key or os.getenv("MODELSCOPE_API_KEY") or os.getenv("LLM_API_KEY")
+        resolved_base_url = base_url or os.getenv("LLM_BASE_URL") or "https://api-inference.modelscope.cn/v1/"
+        return resolved_api_key, resolved_base_url
+
+    # ... Logic for other service providers
+```
+
+Let's experience the convenience brought by automatic detection through a simple example. Suppose a user wants to use the local Ollama service, they only need to configure the `.env` file as follows:
+
+```bash
+LLM_BASE_URL="http://localhost:11434/v1"
+LLM_MODEL_ID="llama3"
+```
+
+They don't need to configure `LLM_API_KEY` at all or specify `provider` in the code. Then, in Python code, they simply instantiate `HelloAgentsLLM`:
+
+```python
+from dotenv import load_dotenv
+from hello_agents import HelloAgentsLLM
+
+load_dotenv()
+
+# No need to pass provider, framework will auto-detect
+llm = HelloAgentsLLM()
+# Framework internal logs will show provider detected as 'ollama'
+
+# Subsequent invocation methods remain completely unchanged
+messages = [{"role": "user", "content": "Hello!"}]
+for chunk in llm.think(messages):
+    print(chunk, end="")
+
+```
+
+In this process, the `_auto_detect_provider` method successfully infers the `provider` as `"ollama"` by parsing `"localhost"` and `:11434` in `LLM_BASE_URL`. Subsequently, the `_resolve_credentials` method sets the correct default parameters for Ollama.
+
+Compared to the basic implementation in Section 4.1.3, the current HelloAgentsLLM has the following significant advantages:
+
+<div align="center">
+  <p>Table 7.1 Comparison of HelloAgentLLM Different Version Features</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/7-figures/table-01.png" alt="" width="90%"/>
+</div>
+
+As shown in Table 7.1 above, this evolution embodies an important principle of framework design: **start simple, gradually improve**. We enhanced functional completeness while maintaining interface simplicity.
+
+
+
+## 7.3 Framework Interface Implementation
+
+In the previous section, we built `HelloAgentsLLM`, a core component that solves the key problem of communicating with large language models. However, it still needs a series of supporting interfaces and components to handle data flow, manage configuration, handle exceptions, and provide a clear, unified structure for upper-layer application construction. This section will cover the following three core files:
+
+- **`message.py`**: Defines the unified message format within the framework, ensuring standardization of information transfer between agents and models.
+- **`config.py`**: Provides a centralized configuration management solution, making framework behavior easy to adjust and extend.
+- **`agent.py`**: Defines the abstract base class (`Agent`) for all agents, providing a unified interface and specification for implementing different types of agents in the future.
+
+### 7.3.1 Message Class
+
+In the interaction between agents and large language models, conversation history is crucial context. To manage this information in a standardized way, we designed a simple `Message` class. It will be extended in the subsequent context engineering chapter.
+
+```python
+"""Message system"""
+from typing import Optional, Dict, Any, Literal
+from datetime import datetime
+from pydantic import BaseModel
+
+# Define message role type, restricting its values
+MessageRole = Literal["user", "assistant", "system", "tool"]
+
+class Message(BaseModel):
+    """Message class"""
+
+    content: str
+    role: MessageRole
+    timestamp: datetime = None
+    metadata: Optional[Dict[str, Any]] = None
+
+    def __init__(self, content: str, role: MessageRole, **kwargs):
+        super().__init__(
+            content=content,
+            role=role,
+            timestamp=kwargs.get('timestamp', datetime.now()),
+            metadata=kwargs.get('metadata', {})
+        )
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to dictionary format (OpenAI API format)"""
+        return {
+            "role": self.role,
+            "content": self.content
+        }
+
+    def __str__(self) -> str:
+        return f"[{self.role}] {self.content}"
+```
+
+The design of this class has several key points. First, we strictly limit the values of the `role` field to four types: `"user"`, `"assistant"`, `"system"`, `"tool"` through `typing.Literal`, which directly corresponds to the OpenAI API specification and ensures type safety. In addition to the two core fields `content` and `role`, we also added `timestamp` and `metadata`, reserving space for logging and future feature expansion. Finally, the `to_dict()` method is one of its core functions, responsible for converting the internally used `Message` object to a dictionary format compatible with the OpenAI API, embodying the design principle of "rich internally, compatible externally."
+
+### 7.3.2 Config Class
+
+The responsibility of the `Config` class is to centralize hard-coded configuration parameters in the code and support reading from environment variables.
+
+```python
+"""Configuration management"""
+import os
+from typing import Optional, Dict, Any
+from pydantic import BaseModel
+
+class Config(BaseModel):
+    """HelloAgents configuration class"""
+
+    # LLM configuration
+    default_model: str = "gpt-3.5-turbo"
+    default_provider: str = "openai"
+    temperature: float = 0.7
+    max_tokens: Optional[int] = None
+
+    # System configuration
+    debug: bool = False
+    log_level: str = "INFO"
+
+    # Other configuration
+    max_history_length: int = 100
+
+    @classmethod
+    def from_env(cls) -> "Config":
+        """Create configuration from environment variables"""
+        return cls(
+            debug=os.getenv("DEBUG", "false").lower() == "true",
+            log_level=os.getenv("LOG_LEVEL", "INFO"),
+            temperature=float(os.getenv("TEMPERATURE", "0.7")),
+            max_tokens=int(os.getenv("MAX_TOKENS")) if os.getenv("MAX_TOKENS") else None,
+        )
+
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to dictionary"""
+        return self.dict()
+```
+
+First, we divide configuration items logically into `LLM configuration`, `System configuration`, etc., making the structure clear at a glance. Second, each configuration item has a reasonable default value, ensuring that the framework can work with zero configuration. The most core is the `from_env()` class method, which allows users to override default configurations by setting environment variables without modifying code, which is especially useful when deploying to different environments.
+
+### 7.3.3 Agent Abstract Base Class
+
+The `Agent` class is the top-level abstraction of the entire framework. It defines the common behaviors and attributes that an agent should have but does not care about specific implementation methods. We implement it through Python's `abc` (Abstract Base Classes) module, which forces all concrete agent implementations (such as `SimpleAgent`, `ReActAgent`, etc. in subsequent chapters) to follow the same "interface."
+
+```python
+"""Agent base class"""
+from abc import ABC, abstractmethod
+from typing import Optional, Any
+from .message import Message
+from .llm import HelloAgentsLLM
+from .config import Config
+
+class Agent(ABC):
+    """Agent base class"""
+
+    def __init__(
+        self,
+        name: str,
+        llm: HelloAgentsLLM,
+        system_prompt: Optional[str] = None,
+        config: Optional[Config] = None
+    ):
+        self.name = name
+        self.llm = llm
+        self.system_prompt = system_prompt
+        self.config = config or Config()
+        self._history: list[Message] = []
+
+    @abstractmethod
+    def run(self, input_text: str, **kwargs) -> str:
+        """Run Agent"""
+        pass
+
+    def add_message(self, message: Message):
+        """Add message to history"""
+        self._history.append(message)
+
+    def clear_history(self):
+        """Clear history"""
+        self._history.clear()
+
+    def get_history(self) -> list[Message]:
+        """Get history"""
+        return self._history.copy()
+
+    def __str__(self) -> str:
+        return f"Agent(name={self.name}, provider={self.llm.provider})"
+```
+
+The design of this class embodies the abstraction principle in object-oriented programming. First, it is defined as an abstract class that cannot be directly instantiated by inheriting `ABC`. Its constructor `__init__` clearly defines the core dependencies of an Agent: name, LLM instance, system prompt, and configuration. The most important part is the `run` method decorated with `@abstractmethod`, which forces all subclasses to implement this method, thereby ensuring that all agents have a unified execution entry point. In addition, the base class also provides common history management methods, which work in coordination with the `Message` class, reflecting the connection between components.
+
+At this point, we have completed the design and implementation of the core basic components of the `HelloAgents` framework.
+
+## 7.4 Framework Implementation of Agent Paradigms
+
+The content of this section will perform framework refactoring based on the three classic Agent paradigms (ReAct, Plan-and-Solve, Reflection) built in Chapter 4, and add SimpleAgent as a basic conversation paradigm. We will transform these independent Agent implementations into framework components based on a unified architecture. This refactoring mainly revolves around the following three core goals:
+
+1. **Systematic Improvement of Prompt Engineering**: Deeply optimize the prompts from Chapter 4, transitioning from specific task-oriented to generalized design, while enhancing format constraints and role definitions.
+2. **Standardization and Unification of Interfaces and Formats**: Establish a unified Agent base class and standardized running interface, with all Agents following the same initialization parameters, method signatures, and history management mechanisms.
+3. **Highly Configurable Customization Capabilities**: Support user-defined prompt templates, configuration parameters, and execution strategies.
+
+### 7.4.1 SimpleAgent
+
+SimpleAgent is the most basic Agent implementation, demonstrating how to build a complete conversational agent on the framework foundation. We will rewrite SimpleAgent by inheriting the framework base class. First, create a `my_simple_agent.py` file in your project directory:
+
+```python
+# my_simple_agent.py
+from typing import Optional, Iterator
+from hello_agents import SimpleAgent, HelloAgentsLLM, Config, Message
+
+class MySimpleAgent(SimpleAgent):
+    """
+    Rewritten simple conversation Agent
+    Demonstrates how to build custom Agent based on framework base class
+    """
+
+    def __init__(
+        self,
+        name: str,
+        llm: HelloAgentsLLM,
+        system_prompt: Optional[str] = None,
+        config: Optional[Config] = None,
+        tool_registry: Optional['ToolRegistry'] = None,
+        enable_tool_calling: bool = True
+    ):
+        super().__init__(name, llm, system_prompt, config)
+        self.tool_registry = tool_registry
+        self.enable_tool_calling = enable_tool_calling and tool_registry is not None
+        print(f"✅ {name} initialization complete, tool calling: {'enabled' if self.enable_tool_calling else 'disabled'}")
+```
+
+Next, we need to override the abstract method `run` of the Agent base class. SimpleAgent supports optional tool calling functionality, which also facilitates expansion in subsequent chapters:
+
+```python
+# Continue adding in my_simple_agent.py
+import re
+
+class MySimpleAgent(SimpleAgent):
+    # ... previous __init__ method
+
+    def run(self, input_text: str, max_tool_iterations: int = 3, **kwargs) -> str:
+        """
+        Rewritten run method - implements simple conversation logic, supports optional tool calling
+        """
+        print(f"🤖 {self.name} is processing: {input_text}")
+
+        # Build message list
+        messages = []
+
+        # Add system message (may include tool information)
+        enhanced_system_prompt = self._get_enhanced_system_prompt()
+        messages.append({"role": "system", "content": enhanced_system_prompt})
+
+        # Add history messages
+        for msg in self._history:
+            messages.append({"role": msg.role, "content": msg.content})
+
+        # Add current user message
+        messages.append({"role": "user", "content": input_text})
+
+        # If tool calling is not enabled, use simple conversation logic
+        if not self.enable_tool_calling:
+            response = self.llm.invoke(messages, **kwargs)
+            self.add_message(Message(input_text, "user"))
+            self.add_message(Message(response, "assistant"))
+            print(f"✅ {self.name} response complete")
+            return response
+
+        # Logic supporting multiple rounds of tool calling
+        return self._run_with_tools(messages, input_text, max_tool_iterations, **kwargs)
+
+    def _get_enhanced_system_prompt(self) -> str:
+        """Build enhanced system prompt, including tool information"""
+        base_prompt = self.system_prompt or "You are a helpful AI assistant."
+
+        if not self.enable_tool_calling or not self.tool_registry:
+            return base_prompt
+
+        # Get tool description
+        tools_description = self.tool_registry.get_tools_description()
+        if not tools_description or tools_description == "No tools available":
+            return base_prompt
+
+        tools_section = "\n\n## Available Tools\n"
+        tools_section += "You can use the following tools to help answer questions:\n"
+        tools_section += tools_description + "\n"
+
+        tools_section += "\n## Tool Calling Format\n"
+        tools_section += "When you need to use a tool, please use the following format:\n"
+        tools_section += "`[TOOL_CALL:{tool_name}:{parameters}]`\n"
+        tools_section += "For example: `[TOOL_CALL:search:Python programming]` or `[TOOL_CALL:memory:recall=user information]`\n\n"
+        tools_section += "Tool calling results will be automatically inserted into the conversation, and then you can continue answering based on the results.\n"
+
+        return base_prompt + tools_section
+```
+
+Now we implement the core logic of tool calling:
+
+```python
+# Continue adding in my_simple_agent.py
+class MySimpleAgent(SimpleAgent):
+    # ... previous methods
+
+    def _run_with_tools(self, messages: list, input_text: str, max_tool_iterations: int, **kwargs) -> str:
+        """Running logic supporting tool calling"""
+        current_iteration = 0
+        final_response = ""
+
+        while current_iteration < max_tool_iterations:
+            # Call LLM
+            response = self.llm.invoke(messages, **kwargs)
+
+            # Check if there are tool calls
+            tool_calls = self._parse_tool_calls(response)
+
+            if tool_calls:
+                print(f"🔧 Detected {len(tool_calls)} tool calls")
+                # Execute all tool calls and collect results
+                tool_results = []
+                clean_response = response
+
+                for call in tool_calls:
+                    result = self._execute_tool_call(call['tool_name'], call['parameters'])
+                    tool_results.append(result)
+                    # Remove tool call markers from response
+                    clean_response = clean_response.replace(call['original'], "")
+
+                # Build message containing tool results
+                messages.append({"role": "assistant", "content": clean_response})
+
+                # Add tool results
+                tool_results_text = "\n\n".join(tool_results)
+                messages.append({"role": "user", "content": f"Tool execution results:\n{tool_results_text}\n\nPlease provide a complete answer based on these results."})
+
+                current_iteration += 1
+                continue
+
+            # No tool calls, this is the final answer
+            final_response = response
+            break
+
+        # If maximum iterations exceeded, get last response
+        if current_iteration >= max_tool_iterations and not final_response:
+            final_response = self.llm.invoke(messages, **kwargs)
+
+        # Save to history
+        self.add_message(Message(input_text, "user"))
+        self.add_message(Message(final_response, "assistant"))
+        print(f"✅ {self.name} response complete")
+
+        return final_response
+
+    def _parse_tool_calls(self, text: str) -> list:
+        """Parse tool calls in text"""
+        pattern = r'\[TOOL_CALL:([^:]+):([^\]]+)\]'
+        matches = re.findall(pattern, text)
+
+        tool_calls = []
+        for tool_name, parameters in matches:
+            tool_calls.append({
+                'tool_name': tool_name.strip(),
+                'parameters': parameters.strip(),
+                'original': f'[TOOL_CALL:{tool_name}:{parameters}]'
+            })
+
+        return tool_calls
+
+    def _execute_tool_call(self, tool_name: str, parameters: str) -> str:
+        """Execute tool call"""
+        if not self.tool_registry:
+            return f"❌ Error: Tool registry not configured"
+
+        try:
+            # Intelligent parameter parsing
+            if tool_name == 'calculator':
+                # Calculator tool directly passes expression
+                result = self.tool_registry.execute_tool(tool_name, parameters)
+            else:
+                # Other tools use intelligent parameter parsing
+                param_dict = self._parse_tool_parameters(tool_name, parameters)
+                tool = self.tool_registry.get_tool(tool_name)
+                if not tool:
+                    return f"❌ Error: Tool '{tool_name}' not found"
+                result = tool.run(param_dict)
+
+            return f"🔧 Tool {tool_name} execution result:\n{result}"
+
+        except Exception as e:
+            return f"❌ Tool call failed: {str(e)}"
+
+    def _parse_tool_parameters(self, tool_name: str, parameters: str) -> dict:
+        """Intelligently parse tool parameters"""
+        param_dict = {}
+
+        if '=' in parameters:
+            # Format: key=value or action=search,query=Python
+            if ',' in parameters:
+                # Multiple parameters: action=search,query=Python,limit=3
+                pairs = parameters.split(',')
+                for pair in pairs:
+                    if '=' in pair:
+                        key, value = pair.split('=', 1)
+                        param_dict[key.strip()] = value.strip()
+            else:
+                # Single parameter: key=value
+                key, value = parameters.split('=', 1)
+                param_dict[key.strip()] = value.strip()
+        else:
+            # Directly pass parameters, intelligently infer based on tool type
+            if tool_name == 'search':
+                param_dict = {'query': parameters}
+            elif tool_name == 'memory':
+                param_dict = {'action': 'search', 'query': parameters}
+            else:
+                param_dict = {'input': parameters}
+
+        return param_dict
+```
+
+We can also add streaming response functionality and convenience methods to the custom Agent:
+
+```python
+# Continue adding in my_simple_agent.py
+class MySimpleAgent(SimpleAgent):
+    # ... previous methods
+
+    def stream_run(self, input_text: str, **kwargs) -> Iterator[str]:
+        """
+        Custom streaming run method
+        """
+        print(f"🌊 {self.name} starting streaming processing: {input_text}")
+
+        messages = []
+
+        if self.system_prompt:
+            messages.append({"role": "system", "content": self.system_prompt})
+
+        for msg in self._history:
+            messages.append({"role": msg.role, "content": msg.content})
+
+        messages.append({"role": "user", "content": input_text})
+
+        # Stream call LLM
+        full_response = ""
+        print("📝 Real-time response: ", end="")
+        for chunk in self.llm.stream_invoke(messages, **kwargs):
+            full_response += chunk
+            print(chunk, end="", flush=True)
+            yield chunk
+
+        print()  # New line
+
+        # Save complete conversation to history
+        self.add_message(Message(input_text, "user"))
+        self.add_message(Message(full_response, "assistant"))
+        print(f"✅ {self.name} streaming response complete")
+
+    def add_tool(self, tool) -> None:
+        """Add tool to Agent (convenience method)"""
+        if not self.tool_registry:
+            from hello_agents import ToolRegistry
+            self.tool_registry = ToolRegistry()
+            self.enable_tool_calling = True
+
+        self.tool_registry.register_tool(tool)
+        print(f"🔧 Tool '{tool.name}' added")
+
+    def has_tools(self) -> bool:
+        """Check if tools are available"""
+        return self.enable_tool_calling and self.tool_registry is not None
+
+    def remove_tool(self, tool_name: str) -> bool:
+        """Remove tool (convenience method)"""
+        if self.tool_registry:
+            self.tool_registry.unregister(tool_name)
+            return True
+        return False
+
+    def list_tools(self) -> list:
+        """List all available tools"""
+        if self.tool_registry:
+            return self.tool_registry.list_tools()
+        return []
+```
+
+Create a test file `test_simple_agent.py`:
+
+```python
+# test_simple_agent.py
+from dotenv import load_dotenv
+from hello_agents import HelloAgentsLLM, ToolRegistry
+from hello_agents.tools import CalculatorTool
+from my_simple_agent import MySimpleAgent
+
+# Load environment variables
+load_dotenv()
+
+# Create LLM instance
+llm = HelloAgentsLLM()
+
+# Test 1: Basic conversation Agent (no tools)
+print("=== Test 1: Basic Conversation ===")
+basic_agent = MySimpleAgent(
+    name="Basic Assistant",
+    llm=llm,
+    system_prompt="You are a friendly AI assistant, please answer questions in a concise and clear manner."
+)
+
+response1 = basic_agent.run("Hello, please introduce yourself")
+print(f"Basic conversation response: {response1}\n")
+
+# Test 2: Agent with tools
+print("=== Test 2: Tool-Enhanced Conversation ===")
+tool_registry = ToolRegistry()
+calculator = CalculatorTool()
+tool_registry.register_tool(calculator)
+
+enhanced_agent = MySimpleAgent(
+    name="Enhanced Assistant",
+    llm=llm,
+    system_prompt="You are an intelligent assistant that can use tools to help users.",
+    tool_registry=tool_registry,
+    enable_tool_calling=True
+)
+
+response2 = enhanced_agent.run("Please help me calculate 15 * 8 + 32")
+print(f"Tool-enhanced response: {response2}\n")
+
+# Test 3: Streaming response
+print("=== Test 3: Streaming Response ===")
+print("Streaming response: ", end="")
+for chunk in basic_agent.stream_run("Please explain what artificial intelligence is"):
+    pass  # Content already printed in real-time in stream_run
+
+# Test 4: Dynamic tool addition
+print("\n=== Test 4: Dynamic Tool Management ===")
+print(f"Before adding tool: {basic_agent.has_tools()}")
+basic_agent.add_tool(calculator)
+print(f"After adding tool: {basic_agent.has_tools()}")
+print(f"Available tools: {basic_agent.list_tools()}")
+
+# View conversation history
+print(f"\nConversation history: {len(basic_agent.get_history())} messages")
+```
+
+In this section, by inheriting the `Agent` base class, we successfully built a fully functional basic conversational agent `MySimpleAgent` that follows framework specifications. It not only supports basic conversation but also has optional tool calling capabilities, streaming response, and convenient tool management methods.
+
+### 7.4.2 ReActAgent
+
+The framework-based ReActAgent maintains the core logic unchanged while improving code organization and maintainability, mainly through prompt optimization and integration with the framework's tool system.
+
+(1) Improvement of Prompt Template
+
+Maintains the original format requirements, emphasizing "only one step can be executed at a time" to avoid confusion, and clarifies the usage scenarios of two types of Actions.
+
+```python
+MY_REACT_PROMPT = """You are an AI assistant with reasoning and action capabilities. You can analyze problems through thinking, then call appropriate tools to obtain information, and finally provide accurate answers.
+
+## Available Tools
+{tools}
+
+## Workflow
+Please respond strictly in the following format, executing only one step at a time:
+
+Thought: Analyze the current problem and think about what information is needed or what action to take.
+Action: Choose an action, the format must be one of the following:
+- `{{tool_name}}[{{tool_input}}]` - Call specified tool
+- `Finish[final answer]` - When you have enough information to give a final answer
+
+## Important Reminders
+1. Each response must include both Thought and Action parts
+2. Tool call format must strictly follow: tool_name[parameters]
+3. Only use Finish when you are confident you have enough information to answer the question
+4. If the information returned by the tool is insufficient, continue using other tools or different parameters of the same tool
+
+## Current Task
+**Question:** {question}
+
+## Execution History
+{history}
+
+Now begin your reasoning and action:
+"""
+```
+
+(2) Complete Implementation of Rewritten ReActAgent
+
+Create a `my_react_agent.py` file to rewrite ReActAgent:
+
+```python
+# my_react_agent.py
+import re
+from typing import Optional, List, Tuple
+from hello_agents import ReActAgent, HelloAgentsLLM, Config, Message, ToolRegistry
+
+class MyReActAgent(ReActAgent):
+    """
+    Rewritten ReAct Agent - Agent combining reasoning and action
+    """
+
+    def __init__(
+        self,
+        name: str,
+        llm: HelloAgentsLLM,
+        tool_registry: ToolRegistry,
+        system_prompt: Optional[str] = None,
+        config: Optional[Config] = None,
+        max_steps: int = 5,
+        custom_prompt: Optional[str] = None
+    ):
+        super().__init__(name, llm, system_prompt, config)
+        self.tool_registry = tool_registry
+        self.max_steps = max_steps
+        self.current_history: List[str] = []
+        self.prompt_template = custom_prompt if custom_prompt else MY_REACT_PROMPT
+        print(f"✅ {name} initialization complete, max steps: {max_steps}")
+```
+
+The meaning of its initialization parameters is as follows:
+
+- `name`: Name of the Agent.
+- `llm`: Instance of `HelloAgentsLLM`, responsible for communicating with the large language model.
+- `tool_registry`: Instance of `ToolRegistry`, used to manage and execute tools available to the Agent.
+- `system_prompt`: System prompt, used to set the Agent's role and behavioral guidelines.
+- `config`: Configuration object, used to pass framework-level settings.
+- `max_steps`: Maximum execution steps of the ReAct loop, preventing infinite loops.
+- `custom_prompt`: Custom prompt template, used to replace the default ReAct prompt.
+
+The framework-based ReActAgent decomposes the execution process into clear steps:
+
+```python
+def run(self, input_text: str, **kwargs) -> str:
+    """Run ReAct Agent"""
+    self.current_history = []
+    current_step = 0
+
+    print(f"\n🤖 {self.name} starting to process question: {input_text}")
+
+    while current_step < self.max_steps:
+        current_step += 1
+        print(f"\n--- Step {current_step} ---")
+
+        # 1. Build prompt
+        tools_desc = self.tool_registry.get_tools_description()
+        history_str = "\n".join(self.current_history)
+        prompt = self.prompt_template.format(
+            tools=tools_desc,
+            question=input_text,
+            history=history_str
+        )
+
+        # 2. Call LLM
+        messages = [{"role": "user", "content": prompt}]
+        response_text = self.llm.invoke(messages, **kwargs)
+
+        # 3. Parse output
+        thought, action = self._parse_output(response_text)
+
+        # 4. Check completion condition
+        if action and action.startswith("Finish"):
+            final_answer = self._parse_action_input(action)
+            self._save_to_history(input_text, final_answer)
+            return final_answer
+
+        # 5. Execute tool call
+        if action:
+            tool_name, tool_input = self._parse_action(action)
+            observation = self.tool_registry.execute_tool(tool_name, tool_input)
+            self.current_history.append(f"Action: {action}")
+            self.current_history.append(f"Observation: {observation}")
+
+    # Reached maximum steps
+    final_answer = "Sorry, I cannot complete this task within the limited number of steps."
+    self._save_to_history(input_text, final_answer)
+    return final_answer
+```
+
+Through the above refactoring, we successfully integrated the ReAct paradigm into the framework. The core improvement lies in utilizing the unified `ToolRegistry` interface and improving the stability of the agent's think-action loop execution through a configurable, more rigorous prompt template. For ReAct test cases, since tool calls are required, test code is provided at the end of the document.
+
+### 7.4.3 ReflectionAgent
+
+Since these types of Agents have already implemented core logic in Chapter 4, only the corresponding Prompts are provided here. Unlike the prompts specifically for code generation in Chapter 4, the framework version adopts a generalized design, making it suitable for various scenarios such as text generation, analysis, and creation, and supports deep customization by users through the `custom_prompts` parameter.
+
+```python
+DEFAULT_PROMPTS = {
+    "initial": """
+Please complete the task according to the following requirements:
+
+Task: {task}
+
+Please provide a complete and accurate answer.
+""",
+    "reflect": """
+Please carefully review the following answer and identify possible problems or areas for improvement:
+
+# Original Task:
+{task}
+
+# Current Answer:
+{content}
+
+Please analyze the quality of this answer, point out deficiencies, and provide specific improvement suggestions.
+If the answer is already good, please respond "No improvement needed".
+""",
+    "refine": """
+Please improve your answer based on the feedback:
+
+# Original Task:
+{task}
+
+# Previous Answer:
+{last_attempt}
+
+# Feedback:
+{feedback}
+
+Please provide an improved answer.
+"""
+}
+```
+
+You can try to build your own MyReflectionAgent based on the code from Chapter 4 and the ReAct implementation above. Below is a test code for verifying ideas.
+
+```python
+# test_reflection_agent.py
+from dotenv import load_dotenv
+from hello_agents import HelloAgentsLLM
+from my_reflection_agent import MyReflectionAgent
+
+load_dotenv()
+llm = HelloAgentsLLM()
+
+# Use default general prompts
+general_agent = MyReflectionAgent(name="My Reflection Assistant", llm=llm)
+
+# Use custom code generation prompts (similar to Chapter 4)
+code_prompts = {
+    "initial": "You are a Python expert, please write a function: {task}",
+    "reflect": "Please review the algorithm efficiency of the code:\nTask: {task}\nCode: {content}",
+    "refine": "Please optimize the code based on feedback:\nTask: {task}\nFeedback: {feedback}"
+}
+code_agent = MyReflectionAgent(
+    name="My Code Generation Assistant",
+    llm=llm,
+    custom_prompts=code_prompts
+)
+
+# Test usage
+result = general_agent.run("Write a short article about the development history of artificial intelligence")
+print(f"Final result: {result}")
+```
+
+### 7.4.4 PlanAndSolveAgent
+
+Unlike the free-text plan output in Chapter 4, the framework version mandates that the Planner output the plan in Python list format and provides a complete exception handling mechanism to ensure stable execution of subsequent steps. Framework-based Plan-and-Solve prompts:
+
+````bash
+# Default planner prompt template
+DEFAULT_PLANNER_PROMPT = """
+You are a top AI planning expert. Your task is to decompose complex problems raised by users into an action plan consisting of multiple simple steps.
+Please ensure that each step in the plan is an independent, executable subtask and is strictly arranged in logical order.
+Your output must be a Python list, where each element is a string describing a subtask.
+
+Question: {question}
+
+Please output your plan strictly in the following format:
+```python
+["Step 1", "Step 2", "Step 3", ...]
+```
+"""
+
+# Default executor prompt template
+DEFAULT_EXECUTOR_PROMPT = """
+You are a top AI execution expert. Your task is to solve problems step by step strictly according to the given plan.
+You will receive the original question, the complete plan, and the steps and results completed so far.
+Please focus on solving the "current step" and only output the final answer for that step, without any additional explanations or dialogue.
+
+# Original Question:
+{question}
+
+# Complete Plan:
+{plan}
+
+# Historical Steps and Results:
+{history}
+
+# Current Step:
+{current_step}
+
+Please only output the answer for the "current step":
+"""
+````
+
+This section still provides a comprehensive test file `test_plan_solve_agent.py`, which you can design and implement yourself.
+
+```python
+# test_plan_solve_agent.py
+from dotenv import load_dotenv
+from hello_agents.core.llm import HelloAgentsLLM
+from my_plan_solve_agent import MyPlanAndSolveAgent
+
+# Load environment variables
+load_dotenv()
+
+# Create LLM instance
+llm = HelloAgentsLLM()
+
+# Create custom PlanAndSolveAgent
+agent = MyPlanAndSolveAgent(
+    name="My Planning Execution Assistant",
+    llm=llm
+)
+
+# Test complex problem
+question = "A fruit store sold 15 apples on Monday. The number of apples sold on Tuesday was twice that of Monday. The number sold on Wednesday was 5 less than Tuesday. How many apples were sold in total over these three days?"
+
+result = agent.run(question)
+print(f"\nFinal result: {result}")
+
+# View conversation history
+print(f"Conversation history: {len(agent.get_history())} messages")
+```
+
+Finally, you can add a new prompt and try implementing `custom_prompt` to load custom prompts.
+
+```python
+# Create custom prompts specifically for math problems
+math_prompts = {
+    "planner": """
+You are a math problem planning expert. Please decompose the math problem into calculation steps:
+
+Question: {question}
+
+Output format:
+python
+["Calculation step 1", "Calculation step 2", "Sum total"]
+
+""",
+    "executor": """
+You are a math calculation expert. Please calculate the current step:
+
+Question: {question}
+Plan: {plan}
+History: {history}
+Current step: {current_step}
+
+Please only output the numerical result:
+"""
+}
+
+# Create math-specific Agent using custom prompts
+math_agent = MyPlanAndSolveAgent(
+    name="Math Calculation Assistant",
+    llm=llm,
+    custom_prompts=math_prompts
+)
+
+# Test math problem
+math_result = math_agent.run(question)
+print(f"Math-specific Agent result: {math_result}")
+```
+
+As shown in Table 7.2, through this framework refactoring, we not only maintained the core functionality of various Agent paradigms from Chapter 4 but also significantly improved code organization, maintainability, and extensibility. All Agents now share a unified infrastructure while maintaining their respective characteristics and advantages.
+
+<div align="center">
+  <p>Table 7.2 Comparison of Agent Implementations Across Chapters</p>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/7-figures/table-02.png" alt="" width="90%"/>
+</div>
+
+## 7.5 Tool System
+
+The content of this section will deeply explore the design and implementation of the tool system based on the Agent infrastructure built earlier. We will start from infrastructure construction and gradually delve into custom development design. The learning objectives of this section revolve around the following three core aspects:
+
+1. **Unified Tool Abstraction and Management**: Establish a standardized Tool base class and ToolRegistry registration mechanism to provide unified infrastructure for tool development, registration, discovery, and execution.
+
+2. **Practice-Driven Tool Development**: Using mathematical calculation tools as a case study, demonstrate how to design and implement custom tools, allowing readers to master the complete process of tool development.
+
+3. **Advanced Integration and Optimization Strategies**: Through the design of multi-source search tools, demonstrate how to integrate multiple external services, implement intelligent backend selection, result merging, and fault tolerance, reflecting the design thinking of the tool system in complex scenarios.
+
+### 7.5.1 Tool Base Class and Registration Mechanism Design
+
+When building an extensible tool system, we need to first establish a set of standardized infrastructure. This infrastructure includes the Tool base class, ToolRegistry registry, and tool management mechanisms.
+
+(1) Abstract Design of Tool Base Class
+
+The Tool base class is the core abstraction of the entire tool system, defining the interface specifications that all tools must follow:
+
+````python
+class Tool(ABC):
+    """Tool base class"""
+
+    def __init__(self, name: str, description: str):
+        self.name = name
+        self.description = description
+
+    @abstractmethod
+    def run(self, parameters: Dict[str, Any]) -> str:
+        """Execute tool"""
+        pass
+
+    @abstractmethod
+    def get_parameters(self) -> List[ToolParameter]:
+        """Get tool parameter definitions"""
+        pass
+````
+This design embodies the core idea of object-oriented design: through the unified `run` method interface, all tools can be executed in a consistent manner, accepting dictionary parameters and returning string results, ensuring framework consistency. At the same time, tools have self-description capabilities. Through the `get_parameters` method, they can clearly tell callers what parameters they need. This introspection mechanism provides a foundation for automated documentation generation and parameter validation. The design of metadata such as name and description gives the tool system good discoverability and understandability.
+
+(2) ToolParameter Parameter Definition System
+
+To support complex parameter validation and documentation generation, we designed the ToolParameter class:
+
+````python
+class ToolParameter(BaseModel):
+    """Tool parameter definition"""
+    name: str
+    type: str
+    description: str
+    required: bool = True
+    default: Any = None
+````
+This design allows tools to precisely describe their parameter requirements, supporting type checking, default value setting, and automatic documentation generation.
+
+(3) Implementation of ToolRegistry
+
+ToolRegistry is the management hub of the tool system, providing core functions such as tool registration, discovery, and execution. In this section, we mainly use the following functions:
+
+````python
+class ToolRegistry:
+    """HelloAgents tool registry"""
+
+    def __init__(self):
+        self._tools: dict[str, Tool] = {}
+        self._functions: dict[str, dict[str, Any]] = {}
+
+    def register_tool(self, tool: Tool):
+        """Register Tool object"""
+        if tool.name in self._tools:
+            print(f"⚠️ Warning: Tool '{tool.name}' already exists and will be overwritten.")
+        self._tools[tool.name] = tool
+        print(f"✅ Tool '{tool.name}' registered.")
+
+    def register_function(self, name: str, description: str, func: Callable[[str], str]):
+        """
+        Directly register a function as a tool (convenient method)
+
+        Args:
+            name: Tool name
+            description: Tool description
+            func: Tool function, accepts string parameter, returns string result
+        """
+        if name in self._functions:
+            print(f"⚠️ Warning: Tool '{name}' already exists and will be overwritten.")
+
+        self._functions[name] = {
+            "description": description,
+            "func": func
+        }
+        print(f"✅ Tool '{name}' registered.")
+````
+ToolRegistry supports two registration methods:
+
+1. **Tool Object Registration**: Suitable for complex tools, supports complete parameter definition and validation
+2. **Direct Function Registration**: Suitable for simple tools, quickly integrates existing functions
+
+(4) Tool Discovery and Management Mechanism
+
+The registry provides rich tool management functions:
+
+````python
+def get_tools_description(self) -> str:
+    """Get formatted description string of all available tools"""
+    descriptions = []
+
+    # Tool object descriptions
+    for tool in self._tools.values():
+        descriptions.append(f"- {tool.name}: {tool.description}")
+
+    # Function tool descriptions
+    for name, info in self._functions.items():
+        descriptions.append(f"- {name}: {info['description']}")
+
+    return "\n".join(descriptions) if descriptions else "No tools available"
+````
+The description string generated by this method can be directly used to build the Agent's prompt, letting the Agent know what tools are available.
+
+### 7.5.2 Custom Tool Development
+
+With the infrastructure in place, let's see how to develop a complete custom tool. A mathematical calculation tool is a good example because it is simple and intuitive. The most direct way is to use ToolRegistry's function registration feature.
+
+Let's create a custom mathematical calculation tool. First, create `my_calculator_tool.py` in your project directory:
+
+```python
+# my_calculator_tool.py
+import ast
+import operator
+import math
+from hello_agents import ToolRegistry
+
+def my_calculate(expression: str) -> str:
+    """Simple mathematical calculation function"""
+    if not expression.strip():
+        return "Calculation expression cannot be empty"
+
+    # Supported basic operations
+    operators = {
+        ast.Add: operator.add,      # +
+        ast.Sub: operator.sub,      # -
+        ast.Mult: operator.mul,     # *
+        ast.Div: operator.truediv,  # /
+    }
+
+    # Supported basic functions
+    functions = {
+        'sqrt': math.sqrt,
+        'pi': math.pi,
+    }
+
+    try:
+        node = ast.parse(expression, mode='eval')
+        result = _eval_node(node.body, operators, functions)
+        return str(result)
+    except:
+        return "Calculation failed, please check expression format"
+
+def _eval_node(node, operators, functions):
+    """Simplified expression evaluation"""
+    if isinstance(node, ast.Constant):
+        return node.value
+    elif isinstance(node, ast.BinOp):
+        left = _eval_node(node.left, operators, functions)
+        right = _eval_node(node.right, operators, functions)
+        op = operators.get(type(node.op))
+        return op(left, right)
+    elif isinstance(node, ast.Call):
+        func_name = node.func.id
+        if func_name in functions:
+            args = [_eval_node(arg, operators, functions) for arg in node.args]
+            return functions[func_name](*args)
+    elif isinstance(node, ast.Name):
+        if node.id in functions:
+            return functions[node.id]
+
+def create_calculator_registry():
+    """Create tool registry containing calculator"""
+    registry = ToolRegistry()
+
+    # Register calculator function
+    registry.register_function(
+        name="my_calculator",
+        description="Simple mathematical calculation tool, supports basic operations (+,-,*,/) and sqrt function",
+        func=my_calculate
+    )
+
+    return registry
+```
+
+The tool not only supports basic arithmetic operations but also covers commonly used mathematical functions and constants, meeting the needs of most calculation scenarios. You can also extend this file yourself to create a more complete calculation function. We provide a test file `test_my_calculator.py` to help you verify the functionality:
+
+```python
+# test_my_calculator.py
+from dotenv import load_dotenv
+from my_calculator_tool import create_calculator_registry
+
+# Load environment variables
+load_dotenv()
+
+def test_calculator_tool():
+    """Test custom calculator tool"""
+
+    # Create registry containing calculator
+    registry = create_calculator_registry()
+
+    print("🧪 Testing Custom Calculator Tool\n")
+
+    # Simple test cases
+    test_cases = [
+        "2 + 3",           # Basic addition
+        "10 - 4",          # Basic subtraction
+        "5 * 6",           # Basic multiplication
+        "15 / 3",          # Basic division
+        "sqrt(16)",        # Square root
+    ]
+
+    for i, expression in enumerate(test_cases, 1):
+        print(f"Test {i}: {expression}")
+        result = registry.execute_tool("my_calculator", expression)
+        print(f"Result: {result}\n")
+
+def test_with_simple_agent():
+    """Test integration with SimpleAgent"""
+    from hello_agents import HelloAgentsLLM
+
+    # Create LLM client
+    llm = HelloAgentsLLM()
+
+    # Create registry containing calculator
+    registry = create_calculator_registry()
+
+    print("🤖 Integration Test with SimpleAgent:")
+
+    # Simulate scenario where SimpleAgent uses tool
+    user_question = "Please help me calculate sqrt(16) + 2 * 3"
+
+    print(f"User question: {user_question}")
+
+    # Use tool to calculate
+    calc_result = registry.execute_tool("my_calculator", "sqrt(16) + 2 * 3")
+    print(f"Calculation result: {calc_result}")
+
+    # Build final answer
+    final_messages = [
+        {"role": "user", "content": f"The calculation result is {calc_result}, please answer the user's question in natural language: {user_question}"}
+    ]
+
+    print("\n🎯 SimpleAgent's answer:")
+    response = llm.think(final_messages)
+    for chunk in response:
+        print(chunk, end="", flush=True)
+    print("\n")
+
+if __name__ == "__main__":
+    test_calculator_tool()
+    test_with_simple_agent()
+```
+
+Through this simplified mathematical calculation tool case, we learned how to quickly develop custom tools: write a simple calculation function, register it through ToolRegistry, and then integrate it with SimpleAgent. For more intuitive observation, Figure 7.1 is provided here to clearly understand the code's running logic.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/7-figures/01.png" alt="" width="90%"/>
+  <p>Figure 7.1 SimpleAgent Workflow Based on HelloAgents</p>
+</div>
+
+### 7.5.3 Multi-Source Search Tool
+
+In practical applications, we often need to integrate multiple external services to provide more powerful functionality. Search tools are a typical example, integrating multiple search engines to provide more complete real information. In Chapter 1, we used Tavily's search API, and in Chapter 4, we used SerpApi's search API. Therefore, this time we use these two APIs to implement multi-source search functionality. If you haven't installed the corresponding Python dependencies, you can run the following script:
+
+```bash
+pip install "hello-agents[search]==0.1.1"
+```
+
+(1) Unified Interface Design for Search Tools
+
+The SearchTool built into the HelloAgents framework demonstrates how to design an advanced multi-source search tool:
+
+````python
+class SearchTool(Tool):
+    """
+    Intelligent hybrid search tool
+
+    Supports multiple search engine backends, intelligently selects the best search source:
+    1. Hybrid mode (hybrid) - Intelligently selects TAVILY or SERPAPI
+    2. Tavily API (tavily) - Professional AI search
+    3. SerpApi (serpapi) - Traditional Google search
+    """
+
+    def __init__(self, backend: str = "hybrid", tavily_key: Optional[str] = None, serpapi_key: Optional[str] = None):
+        super().__init__(
+            name="search",
+            description="An intelligent web search engine. Supports hybrid search mode, automatically selects the best search source."
+        )
+        self.backend = backend
+        self.tavily_key = tavily_key or os.getenv("TAVILY_API_KEY")
+        self.serpapi_key = serpapi_key or os.getenv("SERPAPI_API_KEY")
+        self.available_backends = []
+        self._setup_backends()
+````
+The core idea of this design is to automatically select the best search backend based on available API keys and dependency libraries.
+
+(2) Integration Strategy for TAVILY and SERPAPI Search Sources
+
+The framework implements intelligent backend selection logic:
+
+````python
+def _search_hybrid(self, query: str) -> str:
+    """Hybrid search - intelligently select the best search source"""
+    # Prioritize Tavily (AI-optimized search)
+    if "tavily" in self.available_backends:
+        try:
+            return self._search_tavily(query)
+        except Exception as e:
+            print(f"⚠️ Tavily search failed: {e}")
+            # If Tavily fails, try SerpApi
+            if "serpapi" in self.available_backends:
+                print("🔄 Switching to SerpApi search")
+                return self._search_serpapi(query)
+
+    # If Tavily is unavailable, use SerpApi
+    elif "serpapi" in self.available_backends:
+        try:
+            return self._search_serpapi(query)
+        except Exception as e:
+            print(f"⚠️ SerpApi search failed: {e}")
+
+    # If both are unavailable, prompt user to configure API
+    return "❌ No available search sources, please configure TAVILY_API_KEY or SERPAPI_API_KEY environment variables"
+````
+This design embodies the core concept of high-availability systems: through degradation mechanisms, the system can gradually degrade from the optimal search source to available alternatives. When all search sources are unavailable, it clearly prompts the user to configure the correct API keys.
+
+(3) Unified Formatting of Search Results
+
+Different search engines return results in different formats. The framework handles this through a unified formatting method:
+
+````python
+def _search_tavily(self, query: str) -> str:
+    """Search using Tavily"""
+    response = self.tavily_client.search(
+        query=query,
+        search_depth="basic",
+        include_answer=True,
+        max_results=3
+    )
+
+    result = f"🎯 Tavily AI search results: {response.get('answer', 'No direct answer found')}\n\n"
+
+    for i, item in enumerate(response.get('results', [])[:3], 1):
+        result += f"[{i}] {item.get('title', '')}\n"
+        result += f"    {item.get('content', '')[:200]}...\n"
+        result += f"    Source: {item.get('url', '')}\n\n"
+
+    return result
+````
+
+Based on the framework's design philosophy, we can create our own advanced search tool. This time we use a class-based approach to demonstrate different implementation methods. Create `my_advanced_search.py`:
+
+```python
+# my_advanced_search.py
+import os
+from typing import Optional, List, Dict, Any
+from hello_agents import ToolRegistry
+
+class MyAdvancedSearchTool:
+    """
+    Custom advanced search tool class
+    Demonstrates design patterns for multi-source integration and intelligent selection
+    """
+
+    def __init__(self):
+        self.name = "my_advanced_search"
+        self.description = "Intelligent search tool, supports multiple search sources, automatically selects best results"
+        self.search_sources = []
+        self._setup_search_sources()
+
+    def _setup_search_sources(self):
+        """Set up available search sources"""
+        # Check Tavily availability
+        if os.getenv("TAVILY_API_KEY"):
+            try:
+                from tavily import TavilyClient
+                self.tavily_client = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))
+                self.search_sources.append("tavily")
+                print("✅ Tavily search source enabled")
+            except ImportError:
+                print("⚠️ Tavily library not installed")
+
+        # Check SerpApi availability
+        if os.getenv("SERPAPI_API_KEY"):
+            try:
+                import serpapi
+                self.search_sources.append("serpapi")
+                print("✅ SerpApi search source enabled")
+            except ImportError:
+                print("⚠️ SerpApi library not installed")
+
+        if self.search_sources:
+            print(f"🔧 Available search sources: {', '.join(self.search_sources)}")
+        else:
+            print("⚠️ No available search sources, please configure API keys")
+
+    def search(self, query: str) -> str:
+        """Execute intelligent search"""
+        if not query.strip():
+            return "❌ Error: Search query cannot be empty"
+
+        # Check if there are available search sources
+        if not self.search_sources:
+            return """❌ No available search sources, please configure one of the following API keys:
+
+1. Tavily API: Set environment variable TAVILY_API_KEY
+   Get it at: https://tavily.com/
+
+2. SerpAPI: Set environment variable SERPAPI_API_KEY
+   Get it at: https://serpapi.com/
+
+Restart the program after configuration."""
+
+        print(f"🔍 Starting intelligent search: {query}")
+
+        # Try multiple search sources, return best result
+        for source in self.search_sources:
+            try:
+                if source == "tavily":
+                    result = self._search_with_tavily(query)
+                    if result and "not found" not in result.lower():
+                        return f"📊 Tavily AI search results:\n\n{result}"
+
+                elif source == "serpapi":
+                    result = self._search_with_serpapi(query)
+                    if result and "not found" not in result.lower():
+                        return f"🌐 SerpApi Google search results:\n\n{result}"
+
+            except Exception as e:
+                print(f"⚠️ {source} search failed: {e}")
+                continue
+
+        return "❌ All search sources failed, please check network connection and API key configuration"
+
+    def _search_with_tavily(self, query: str) -> str:
+        """Search using Tavily"""
+        response = self.tavily_client.search(query=query, max_results=3)
+
+        if response.get('answer'):
+            result = f"💡 AI direct answer: {response['answer']}\n\n"
+        else:
+            result = ""
+
+        result += "🔗 Related results:\n"
+        for i, item in enumerate(response.get('results', [])[:3], 1):
+            result += f"[{i}] {item.get('title', '')}\n"
+            result += f"    {item.get('content', '')[:150]}...\n\n"
+
+        return result
+
+    def _search_with_serpapi(self, query: str) -> str:
+        """Search using SerpApi"""
+        import serpapi
+
+        search = serpapi.GoogleSearch({
+            "q": query,
+            "api_key": os.getenv("SERPAPI_API_KEY"),
+            "num": 3
+        })
+
+        results = search.get_dict()
+
+        result = "🔗 Google search results:\n"
+        if "organic_results" in results:
+            for i, res in enumerate(results["organic_results"][:3], 1):
+                result += f"[{i}] {res.get('title', '')}\n"
+                result += f"    {res.get('snippet', '')}\n\n"
+
+        return result
+
+def create_advanced_search_registry():
+    """Create registry containing advanced search tool"""
+    registry = ToolRegistry()
+
+    # Create search tool instance
+    search_tool = MyAdvancedSearchTool()
+
+    # Register search tool's method as function
+    registry.register_function(
+        name="advanced_search",
+        description="Advanced search tool, integrates Tavily and SerpAPI multiple search sources, provides more comprehensive search results",
+        func=search_tool.search
+    )
+
+    return registry
+```
+
+Next, we can test the tool we wrote ourselves. Create `test_advanced_search.py`:
+
+```python
+# test_advanced_search.py
+from dotenv import load_dotenv
+from my_advanced_search import create_advanced_search_registry, MyAdvancedSearchTool
+
+# Load environment variables
+load_dotenv()
+
+def test_advanced_search():
+    """Test advanced search tool"""
+
+    # Create registry containing advanced search tool
+    registry = create_advanced_search_registry()
+
+    print("🔍 Testing Advanced Search Tool\n")
+
+    # Test queries
+    test_queries = [
+        "History of Python programming language",
+        "Latest developments in artificial intelligence",
+        "2024 technology trends"
+    ]
+
+    for i, query in enumerate(test_queries, 1):
+        print(f"Test {i}: {query}")
+        result = registry.execute_tool("advanced_search", query)
+        print(f"Result: {result}\n")
+        print("-" * 60 + "\n")
+
+def test_api_configuration():
+    """Test API configuration check"""
+    print("🔧 Testing API Configuration Check:")
+
+    # Directly create search tool instance
+    search_tool = MyAdvancedSearchTool()
+
+    # If API is not configured, configuration prompt will be displayed
+    result = search_tool.search("machine learning algorithms")
+    print(f"Search result: {result}")
+
+def test_with_agent():
+    """Test integration with Agent"""
+    print("\n🤖 Integration Test with Agent:")
+    print("Advanced search tool is ready and can be integrated with Agent")
+
+    # Display tool description
+    registry = create_advanced_search_registry()
+    tools_desc = registry.get_tools_description()
+    print(f"Tool description:\n{tools_desc}")
+
+if __name__ == "__main__":
+    test_advanced_search()
+    test_api_configuration()
+    test_with_agent()
+```
+
+Through this advanced search tool design practice, we learned how to use classes to build complex tool systems. Compared to the function approach, the class approach is more suitable for tools that need to maintain state (such as API clients, configuration information).
+
+### 7.5.4 Advanced Features of Tool System
+
+After mastering basic tool development and multi-source integration, let's explore advanced features of the tool system. These features enable the tool system to run stably in complex production environments and provide more powerful capabilities for Agents.
+
+(1) Tool Chain Invocation Mechanism
+
+In practical applications, Agents often need to combine multiple tools to complete complex tasks. We can design a tool chain manager to support this scenario, borrowing the graph concept mentioned in Chapter 6:
+
+```python
+# tool_chain_manager.py
+from typing import List, Dict, Any, Optional
+from hello_agents import ToolRegistry
+
+class ToolChain:
+    """Tool chain - supports sequential execution of multiple tools"""
+
+    def __init__(self, name: str, description: str):
+        self.name = name
+        self.description = description
+        self.steps: List[Dict[str, Any]] = []
+
+    def add_step(self, tool_name: str, input_template: str, output_key: str = None):
+        """
+        Add tool execution step
+
+        Args:
+            tool_name: Tool name
+            input_template: Input template, supports variable substitution
+            output_key: Key name for output result, used for reference in subsequent steps
+        """
+        self.steps.append({
+            "tool_name": tool_name,
+            "input_template": input_template,
+            "output_key": output_key or f"step_{len(self.steps)}_result"
+        })
+
+    def execute(self, registry: ToolRegistry, initial_input: str, context: Dict[str, Any] = None) -> str:
+        """Execute tool chain"""
+        context = context or {}
+        context["input"] = initial_input
+
+        print(f"🔗 Starting tool chain execution: {self.name}")
+
+        for i, step in enumerate(self.steps, 1):
+            tool_name = step["tool_name"]
+            input_template = step["input_template"]
+            output_key = step["output_key"]
+
+            # Replace variables in template
+            try:
+                tool_input = input_template.format(**context)
+            except KeyError as e:
+                return f"❌ Tool chain execution failed: Template variable {e} not found"
+
+            print(f"  Step {i}: Using {tool_name} to process '{tool_input[:50]}...'")
+
+            # Execute tool
+            result = registry.execute_tool(tool_name, tool_input)
+            context[output_key] = result
+
+            print(f"  ✅ Step {i} completed, result length: {len(result)} characters")
+
+        # Return result of last step
+        final_result = context[self.steps[-1]["output_key"]]
+        print(f"🎉 Tool chain '{self.name}' execution completed")
+        return final_result
+
+class ToolChainManager:
+    """Tool chain manager"""
+
+    def __init__(self, registry: ToolRegistry):
+        self.registry = registry
+        self.chains: Dict[str, ToolChain] = {}
+
+    def register_chain(self, chain: ToolChain):
+        """Register tool chain"""
+        self.chains[chain.name] = chain
+        print(f"✅ Tool chain '{chain.name}' registered")
+
+    def execute_chain(self, chain_name: str, input_data: str, context: Dict[str, Any] = None) -> str:
+        """Execute specified tool chain"""
+        if chain_name not in self.chains:
+            return f"❌ Tool chain '{chain_name}' does not exist"
+
+        chain = self.chains[chain_name]
+        return chain.execute(self.registry, input_data, context)
+
+    def list_chains(self) -> List[str]:
+        """List all tool chains"""
+        return list(self.chains.keys())
+
+# Usage example
+def create_research_chain() -> ToolChain:
+    """Create a research tool chain: search -> calculate -> summarize"""
+    chain = ToolChain(
+        name="research_and_calculate",
+        description="Search for information and perform related calculations"
+    )
+
+    # Step 1: Search for information
+    chain.add_step(
+        tool_name="search",
+        input_template="{input}",
+        output_key="search_result"
+    )
+
+    # Step 2: Perform calculations based on search results (if needed)
+    chain.add_step(
+        tool_name="my_calculator",
+        input_template="Calculate relevant values based on the following information: {search_result}",
+        output_key="calculation_result"
+    )
+
+    return chain
+```
+
+(2) Asynchronous Tool Execution Support
+
+For time-consuming tool operations, we can provide asynchronous execution support:
+
+```python
+# async_tool_executor.py
+import asyncio
+import concurrent.futures
+from typing import Dict, Any, List, Callable
+from hello_agents import ToolRegistry
+
+class AsyncToolExecutor:
+    """Asynchronous tool executor"""
+
+    def __init__(self, registry: ToolRegistry, max_workers: int = 4):
+        self.registry = registry
+        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
+
+    async def execute_tool_async(self, tool_name: str, input_data: str) -> str:
+        """Asynchronously execute a single tool"""
+        loop = asyncio.get_event_loop()
+
+        def _execute():
+            return self.registry.execute_tool(tool_name, input_data)
+
+        result = await loop.run_in_executor(self.executor, _execute)
+        return result
+
+    async def execute_tools_parallel(self, tasks: List[Dict[str, str]]) -> List[str]:
+        """Execute multiple tools in parallel"""
+        print(f"🚀 Starting parallel execution of {len(tasks)} tool tasks")
+
+        # Create async tasks
+        async_tasks = []
+        for task in tasks:
+            tool_name = task["tool_name"]
+            input_data = task["input_data"]
+            async_task = self.execute_tool_async(tool_name, input_data)
+            async_tasks.append(async_task)
+
+        # Wait for all tasks to complete
+        results = await asyncio.gather(*async_tasks)
+
+        print(f"✅ All tool tasks completed")
+        return results
+
+    def __del__(self):
+        """Clean up resources"""
+        if hasattr(self, 'executor'):
+            self.executor.shutdown(wait=True)
+
+# Usage example
+async def test_parallel_execution():
+    """Test parallel tool execution"""
+    from hello_agents import ToolRegistry
+
+    registry = ToolRegistry()
+    # Assume search and calculator tools are already registered
+
+    executor = AsyncToolExecutor(registry)
+
+    # Define parallel tasks
+    tasks = [
+        {"tool_name": "search", "input_data": "Python programming"},
+        {"tool_name": "search", "input_data": "machine learning"},
+        {"tool_name": "my_calculator", "input_data": "2 + 2"},
+        {"tool_name": "my_calculator", "input_data": "sqrt(16)"},
+    ]
+
+    # Execute in parallel
+    results = await executor.execute_tools_parallel(tasks)
+
+    for i, result in enumerate(results):
+        print(f"Task {i+1} result: {result[:100]}...")
+```
+
+Based on the above design and implementation experience, we can summarize the core concepts of tool system development: At the design level, each tool should follow the single responsibility principle, focusing on specific functionality while maintaining interface uniformity, and treating comprehensive exception handling and security-first input validation as basic requirements. In terms of performance optimization, use asynchronous execution to improve concurrent processing capabilities while reasonably managing external connections and system resources.
+
+
+
+## 7.6 Chapter Summary
+
+Before formally summarizing, we want to share good news with everyone: For all methods and functions implemented in this chapter, complete test cases are provided in the GitHub repository. You can visit [this link](https://github.com/jjyaoao/HelloAgents/blob/main/examples/chapter07_basic_setup.py) to view and run these test codes. This file contains demonstrations of four Agent paradigms, integration tests of the tool system, usage examples of advanced features, and interactive Agent experiences. If you want to verify whether your implementation is correct or want to deeply understand the actual usage of the framework, these test cases will be valuable references.
+
+Looking back at this chapter, we completed a challenging task: step by step, we built a basic agent framework—HelloAgents. This process consistently followed the core principles of "layered decoupling, single responsibility, and unified interfaces."
+
+In the specific implementation of the framework, we re-implemented four classic Agent paradigms. From SimpleAgent's basic conversation mode to ReActAgent's combination of reasoning and action; from ReflectionAgent's self-reflection and iterative optimization to PlanAndSolveAgent's decomposition planning and step-by-step execution. The tool system, as the core of Agent capability extension, was a complete engineering practice.
+
+More importantly, the construction of Chapter 7 is not the endpoint but provides the necessary technical foundation for deeper learning in subsequent chapters. We fully considered the extensibility of subsequent content in the initial design, reserving necessary interfaces and extension points for implementing advanced features. The unified LLM interface, standardized message system, and tool registration mechanism we established together constitute a complete technical foundation. This allows us to more calmly learn more advanced topics in subsequent chapters: Chapter 8's memory and RAG system will expand Agent's capability boundaries based on this; Chapter 9's context engineering will delve into the message processing mechanism we have established; Chapter 10's agent protocol will require extending new tools.
+
+Next, we will explore together how to add RAG systems and Memory mechanisms to the framework. Stay tuned for Chapter 8!
+
+
+## Exercises
+
+1. This chapter built the `HelloAgents` framework and explained "why we need to build our own Agent framework." Please analyze:
+
+   - Section 7.1.1 mentioned four main limitations of current mainstream frameworks. Combined with your actual experience using a framework in [Chapter 6 exercises](../chapter6/第六章%20框架开发实践.md#习题) or actual projects, explain how these problems affect development efficiency.
+   - `HelloAgents` proposes the design philosophy of "everything is a tool," abstracting modules like `Memory`, `RAG`, and `MCP` as tools. What are the advantages of this design? Are there any limitations? Please provide examples.
+   - Comparing the agent code implemented from scratch in Chapter 4 with the framework implementation in this chapter, what specific improvements does the framework bring? If you were to design a framework, what design principles would you prioritize?
+
+2. In Section 7.2, we extended `HelloAgentsLLM` to support multiple model providers and local model invocation.
+
+   > <strong>Hint</strong>: This is a practical exercise, hands-on operation is recommended
+
+   - Referring to the example in Section 7.2.1, try adding support for a new model provider to `HelloAgentsLLM` (such as `Gemini`, `Anthropic`, `Kim`). Implement it through inheritance and enable automatic detection of that provider's environment variables.
+   - Section 7.2.3 introduced three priorities of the automatic detection mechanism. Please analyze: If both `OPENAI_API_KEY` and `LLM_BASE_URL="http://localhost:11434/v1"` are set, which provider will the framework ultimately choose? Is this priority design reasonable?
+   - Besides `VLLM` and `Ollama` introduced in this chapter, there are other local model deployment solutions like `SGLang`. Please first search for and understand the basic information and characteristics of `SGLang`, then compare `VLLM`, `SGLang`, and `Ollama` in terms of ease of use, resource consumption, inference speed, and inference accuracy.
+
+3. In Section 7.3, we implemented the `Message` class, `Config` class, and `Agent` base class. Please analyze:
+
+   - The `Message` class uses `Pydantic`'s `BaseModel` for data validation. What are the advantages of this design in practical applications?
+   - The `Agent` base class defines two methods: `run` and `_execute`, where `run` is the public interface and `_execute` is an abstract method. What is this design pattern called? What are its benefits?
+   - In the `Config` class, we used the singleton pattern. Please explain what the singleton pattern is, why configuration management needs to use the singleton pattern, and what problems would arise if the singleton pattern is not used.
+
+4. In Section 7.4, we implemented four `Agent` paradigms in a framework manner.
+
+   > <strong>Hint</strong>: This is a practical exercise, hands-on operation is recommended
+
+   - Comparing the `ReActAgent` implemented from scratch in Chapter 4 with the framework-based `ReActAgent` in this chapter, list 3 specific improvements and explain how these improvements enhance code maintainability and extensibility.
+   - `ReflectionAgent` implements an "execute-reflect-optimize" loop. Please extend this implementation by adding a "quality scoring" mechanism: After each reflection, have the `LLM` score the current version's output, and only continue optimization if the score is below a threshold; otherwise, terminate early.
+   - Please design and implement a new `Agent` paradigm called `Tree-of-Thought Agent`, which should inherit from the `Agent` base class and be able to generate multiple possible thinking paths at each step, then select the optimal path to continue.
+
+5. In Section 7.5, we built the tool system. Please consider the following questions:
+
+   - The `BaseTool` class defines an `execute` abstract method that all tools must implement. Please explain why all tools should be forced to implement a unified interface. If a tool needs to return multiple values (such as a search tool returning title, summary, and link), how should it be designed?
+   - Section 7.5.3 implemented tool chains (`ToolChain`). Please design a practical application scenario that requires chaining at least 3 tools and draw the execution flow diagram of the tool chain.
+   - The asynchronous tool executor (`AsyncToolExecutor`) uses a thread pool to execute tools in parallel. Please analyze: Under what circumstances can parallel tool execution bring performance improvements?
+
+6. Framework extensibility is one of the important considerations in design. You now need to extend the `HelloAgents` framework to implement some interesting new features and characteristics.
+
+   - First, add a "streaming output" feature to `HelloAgents` so that the `Agent` can return intermediate results in real-time when generating responses (similar to the typing effect in the `ChatGPT` user interface). Please design the implementation plan for this feature and explain which classes and methods need to be modified.
+   - Then add a "multi-turn conversation management" feature to the framework that can automatically manage conversation history, support conversation branching and backtracking. How would you design this? What new classes are needed? How to integrate with the existing `Message` system?
+   - Finally, please design a "plugin system" for `HelloAgents` that allows third-party developers to extend framework functionality through plugins (such as adding new `Agent` types, new tool types, etc.) without modifying the framework's core code. Draw the architecture diagram of the plugin system and explain the key interfaces.
+

+ 62 - 58
docs/chapter7/第七章 构建你的Agent框架.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter7-Building-Your-Agent-Framework.md">English</a> | 中文
+</div>
+
 # 第七章 构建你的智能体框架
 
 在前面的章节中,我们讲解了智能体的基础知识,并体验了主流框架带来的开发便利。从本章开始,我们将进入一个更具挑战也更有价值的阶段:**从零开始,逐步构建一个智能体框架——HelloAgents**。
@@ -242,7 +246,7 @@ MODELSCOPE_API_KEY="your-modelscope-api-key"
 ```python
 # my_main.py
 from dotenv import load_dotenv
-from my_llm import MyLLM # 注意这里导入我们自己的类
+from my_llm import MyLLM # 注意:这里导入我们自己的类
 
 # 加载环境变量
 load_dotenv()
@@ -458,9 +462,9 @@ for chunk in llm.think(messages):
 
 在上节中,我们构建了 `HelloAgentsLLM` 这一核心组件,解决了与大语言模型通信的关键问题。不过它还需要一系列配套的接口和组件来处理数据流、管理配置、应对异常,并为上层应用的构建提供一个清晰、统一的结构。本节将讲述以下三个核心文件:
 
-- **`message.py`**: 定义了框架内统一的消息格式,确保了智能体与模型之间信息传递的标准化。
-- **`config.py`**: 提供了一个中心化的配置管理方案,使框架的行为易于调整和扩展。
-- **`agent.py`**: 定义了所有智能体的抽象基类(`Agent`),为后续实现不同类型的智能体提供了统一的接口和规范。
+- **`message.py`** 定义了框架内统一的消息格式,确保了智能体与模型之间信息传递的标准化。
+- **`config.py`** 提供了一个中心化的配置管理方案,使框架的行为易于调整和扩展。
+- **`agent.py`** 定义了所有智能体的抽象基类(`Agent`),为后续实现不同类型的智能体提供了统一的接口和规范。
 
 ### 7.3.1 Message 类
 
@@ -691,13 +695,13 @@ class MySimpleAgent(SimpleAgent):
             return base_prompt
 
         tools_section = "\n\n## 可用工具\n"
-        tools_section += "你可以使用以下工具来帮助回答问题\n"
+        tools_section += "你可以使用以下工具来帮助回答问题:\n"
         tools_section += tools_description + "\n"
 
         tools_section += "\n## 工具调用格式\n"
-        tools_section += "当需要使用工具时,请使用以下格式\n"
+        tools_section += "当需要使用工具时,请使用以下格式:\n"
         tools_section += "`[TOOL_CALL:{tool_name}:{parameters}]`\n"
-        tools_section += "例如`[TOOL_CALL:search:Python编程]` 或 `[TOOL_CALL:memory:recall=用户信息]`\n\n"
+        tools_section += "例如:`[TOOL_CALL:search:Python编程]` 或 `[TOOL_CALL:memory:recall=用户信息]`\n\n"
         tools_section += "工具调用结果会自动插入到对话中,然后你可以基于结果继续回答。\n"
 
         return base_prompt + tools_section
@@ -739,7 +743,7 @@ class MySimpleAgent(SimpleAgent):
 
                 # 添加工具结果
                 tool_results_text = "\n\n".join(tool_results)
-                messages.append({"role": "user", "content": f"工具执行结果\n{tool_results_text}\n\n请基于这些结果给出完整的回答。"})
+                messages.append({"role": "user", "content": f"工具执行结果:\n{tool_results_text}\n\n请基于这些结果给出完整的回答。"})
 
                 current_iteration += 1
                 continue
@@ -777,7 +781,7 @@ class MySimpleAgent(SimpleAgent):
     def _execute_tool_call(self, tool_name: str, parameters: str) -> str:
         """执行工具调用"""
         if not self.tool_registry:
-            return f"❌ 错误未配置工具注册表"
+            return f"❌ 错误:未配置工具注册表"
 
         try:
             # 智能参数解析
@@ -789,13 +793,13 @@ class MySimpleAgent(SimpleAgent):
                 param_dict = self._parse_tool_parameters(tool_name, parameters)
                 tool = self.tool_registry.get_tool(tool_name)
                 if not tool:
-                    return f"❌ 错误未找到工具 '{tool_name}'"
+                    return f"❌ 错误:未找到工具 '{tool_name}'"
                 result = tool.run(param_dict)
 
-            return f"🔧 工具 {tool_name} 执行结果\n{result}"
+            return f"🔧 工具 {tool_name} 执行结果:\n{result}"
 
         except Exception as e:
-            return f"❌ 工具调用失败{str(e)}"
+            return f"❌ 工具调用失败:{str(e)}"
 
     def _parse_tool_parameters(self, tool_name: str, parameters: str) -> dict:
         """智能解析工具参数"""
@@ -804,14 +808,14 @@ class MySimpleAgent(SimpleAgent):
         if '=' in parameters:
             # 格式: key=value 或 action=search,query=Python
             if ',' in parameters:
-                # 多个参数action=search,query=Python,limit=3
+                # 多个参数:action=search,query=Python,limit=3
                 pairs = parameters.split(',')
                 for pair in pairs:
                     if '=' in pair:
                         key, value = pair.split('=', 1)
                         param_dict[key.strip()] = value.strip()
             else:
-                # 单个参数key=value
+                # 单个参数:key=value
                 key, value = parameters.split('=', 1)
                 param_dict[key.strip()] = value.strip()
         else:
@@ -907,8 +911,8 @@ load_dotenv()
 # 创建LLM实例
 llm = HelloAgentsLLM()
 
-# 测试1基础对话Agent(无工具)
-print("=== 测试1基础对话 ===")
+# 测试1:基础对话Agent(无工具)
+print("=== 测试1:基础对话 ===")
 basic_agent = MySimpleAgent(
     name="基础助手",
     llm=llm,
@@ -918,8 +922,8 @@ basic_agent = MySimpleAgent(
 response1 = basic_agent.run("你好,请介绍一下自己")
 print(f"基础对话响应: {response1}\n")
 
-# 测试2带工具的Agent
-print("=== 测试2工具增强对话 ===")
+# 测试2:带工具的Agent
+print("=== 测试2:工具增强对话 ===")
 tool_registry = ToolRegistry()
 calculator = CalculatorTool()
 tool_registry.register_tool(calculator)
@@ -935,14 +939,14 @@ enhanced_agent = MySimpleAgent(
 response2 = enhanced_agent.run("请帮我计算 15 * 8 + 32")
 print(f"工具增强响应: {response2}\n")
 
-# 测试3流式响应
-print("=== 测试3流式响应 ===")
+# 测试3:流式响应
+print("=== 测试3:流式响应 ===")
 print("流式响应: ", end="")
 for chunk in basic_agent.stream_run("请解释什么是人工智能"):
     pass  # 内容已在stream_run中实时打印
 
-# 测试4动态添加工具
-print("\n=== 测试4动态工具管理 ===")
+# 测试4:动态添加工具
+print("\n=== 测试4:动态工具管理 ===")
 print(f"添加工具前: {basic_agent.has_tools()}")
 basic_agent.add_tool(calculator)
 print(f"添加工具后: {basic_agent.has_tools()}")
@@ -969,16 +973,16 @@ MY_REACT_PROMPT = """你是一个具备推理和行动能力的AI助手。你可
 {tools}
 
 ## 工作流程
-请严格按照以下格式进行回应,每次只能执行一个步骤
+请严格按照以下格式进行回应,每次只能执行一个步骤:
 
 Thought: 分析当前问题,思考需要什么信息或采取什么行动。
-Action: 选择一个行动,格式必须是以下之一
+Action: 选择一个行动,格式必须是以下之一:
 - `{{tool_name}}[{{tool_input}}]` - 调用指定工具
 - `Finish[最终答案]` - 当你有足够信息给出最终答案时
 
 ## 重要提醒
 1. 每次回应必须包含Thought和Action两部分
-2. 工具调用的格式必须严格遵循工具名[参数]
+2. 工具调用的格式必须严格遵循:工具名[参数]
 3. 只有当你确信有足够信息回答问题时,才使用Finish
 4. 如果工具返回的信息不够,继续使用其他工具或相同工具的不同参数
 
@@ -988,7 +992,7 @@ Action: 选择一个行动,格式必须是以下之一:
 ## 执行历史
 {history}
 
-现在开始你的推理和行动
+现在开始你的推理和行动:
 """
 ```
 
@@ -1027,13 +1031,13 @@ class MyReActAgent(ReActAgent):
 
 其初始化参数的含义如下:
 
-- `name`: Agent的名称。
-- `llm`: `HelloAgentsLLM`的实例,负责与大语言模型通信。
-- `tool_registry`: `ToolRegistry`的实例,用于管理和执行Agent可用的工具。
-- `system_prompt`: 系统提示词,用于设定Agent的角色和行为准则。
-- `config`: 配置对象,用于传递框架级的设置。
-- `max_steps`: ReAct循环的最大执行步数,防止无限循环。
-- `custom_prompt`: 自定义的提示词模板,用于替换默认的ReAct提示词。
+- `name` Agent的名称。
+- `llm` `HelloAgentsLLM`的实例,负责与大语言模型通信。
+- `tool_registry` `ToolRegistry`的实例,用于管理和执行Agent可用的工具。
+- `system_prompt` 系统提示词,用于设定Agent的角色和行为准则。
+- `config` 配置对象,用于传递框架级的设置。
+- `max_steps` ReAct循环的最大执行步数,防止无限循环。
+- `custom_prompt` 自定义的提示词模板,用于替换默认的ReAct提示词。
 
 框架化的ReActAgent将执行流程分解为清晰的步骤:
 
@@ -1093,14 +1097,14 @@ def run(self, input_text: str, **kwargs) -> str:
 ```python
 DEFAULT_PROMPTS = {
     "initial": """
-请根据以下要求完成任务
+请根据以下要求完成任务:
 
 任务: {task}
 
 请提供一个完整、准确的回答。
 """,
     "reflect": """
-请仔细审查以下回答,并找出可能的问题或改进空间
+请仔细审查以下回答,并找出可能的问题或改进空间:
 
 # 原始任务:
 {task}
@@ -1112,7 +1116,7 @@ DEFAULT_PROMPTS = {
 如果回答已经很好,请回答"无需改进"。
 """,
     "refine": """
-请根据反馈意见改进你的回答
+请根据反馈意见改进你的回答:
 
 # 原始任务:
 {task}
@@ -1144,9 +1148,9 @@ general_agent = MyReflectionAgent(name="我的反思助手", llm=llm)
 
 # 使用自定义代码生成提示词(类似第四章)
 code_prompts = {
-    "initial": "你是Python专家,请编写函数{task}",
-    "reflect": "请审查代码的算法效率:\n任务:{task}\n代码:{content}",
-    "refine": "请根据反馈优化代码:\n任务:{task}\n反馈:{feedback}"
+    "initial": "你是Python专家,请编写函数:{task}",
+    "reflect": "请审查代码的算法效率:\n任务:{task}\n代码:{content}",
+    "refine": "请根据反馈优化代码:\n任务:{task}\n反馈:{feedback}"
 }
 code_agent = MyReflectionAgent(
     name="我的代码生成助手",
@@ -1236,7 +1240,7 @@ print(f"对话历史: {len(agent.get_history())} 条消息")
 # 创建专门用于数学问题的自定义提示词
 math_prompts = {
     "planner": """
-你是数学问题规划专家。请将数学问题分解为计算步骤
+你是数学问题规划专家。请将数学问题分解为计算步骤:
 
 问题: {question}
 
@@ -1246,7 +1250,7 @@ python
 
 """,
     "executor": """
-你是数学计算专家。请计算当前步骤
+你是数学计算专家。请计算当前步骤:
 
 问题: {question}
 计划: {plan}
@@ -1410,7 +1414,7 @@ class ToolRegistry:
     def register_tool(self, tool: Tool):
         """注册Tool对象"""
         if tool.name in self._tools:
-            print(f"⚠️ 警告工具 '{tool.name}' 已存在,将被覆盖。")
+            print(f"⚠️ 警告:工具 '{tool.name}' 已存在,将被覆盖。")
         self._tools[tool.name] = tool
         print(f"✅ 工具 '{tool.name}' 已注册。")
         
@@ -1424,7 +1428,7 @@ class ToolRegistry:
             func: 工具函数,接受字符串参数,返回字符串结果
         """
         if name in self._functions:
-            print(f"⚠️ 警告工具 '{name}' 已存在,将被覆盖。")
+            print(f"⚠️ 警告:工具 '{name}' 已存在,将被覆盖。")
 
         self._functions[name] = {
             "description": description,
@@ -1635,7 +1639,7 @@ def test_with_simple_agent():
 
     # 构建最终回答
     final_messages = [
-        {"role": "user", "content": f"计算结果是 {calc_result},请用自然语言回答用户的问题{user_question}"}
+        {"role": "user", "content": f"计算结果是 {calc_result},请用自然语言回答用户的问题:{user_question}"}
     ]
 
     print("\n🎯 SimpleAgent的回答:")
@@ -1673,7 +1677,7 @@ class SearchTool(Tool):
     """
     智能混合搜索工具
 
-    支持多种搜索引擎后端,智能选择最佳搜索源
+    支持多种搜索引擎后端,智能选择最佳搜索源:
     1. 混合模式 (hybrid) - 智能选择TAVILY或SERPAPI
     2. Tavily API (tavily) - 专业AI搜索
     3. SerpApi (serpapi) - 传统Google搜索
@@ -1736,7 +1740,7 @@ def _search_tavily(self, query: str) -> str:
         max_results=3
     )
 
-    result = f"🎯 Tavily AI搜索结果{response.get('answer', '未找到直接答案')}\n\n"
+    result = f"🎯 Tavily AI搜索结果:{response.get('answer', '未找到直接答案')}\n\n"
 
     for i, item in enumerate(response.get('results', [])[:3], 1):
         result += f"[{i}] {item.get('title', '')}\n"
@@ -1795,11 +1799,11 @@ class MyAdvancedSearchTool:
     def search(self, query: str) -> str:
         """执行智能搜索"""
         if not query.strip():
-            return "❌ 错误搜索查询不能为空"
+            return "❌ 错误:搜索查询不能为空"
 
         # 检查是否有可用的搜索源
         if not self.search_sources:
-            return """❌ 没有可用的搜索源,请配置以下API密钥之一
+            return """❌ 没有可用的搜索源,请配置以下API密钥之一:
 
 1. Tavily API: 设置环境变量 TAVILY_API_KEY
    获取地址: https://tavily.com/
@@ -1817,12 +1821,12 @@ class MyAdvancedSearchTool:
                 if source == "tavily":
                     result = self._search_with_tavily(query)
                     if result and "未找到" not in result:
-                        return f"📊 Tavily AI搜索结果\n\n{result}"
+                        return f"📊 Tavily AI搜索结果:\n\n{result}"
 
                 elif source == "serpapi":
                     result = self._search_with_serpapi(query)
                     if result and "未找到" not in result:
-                        return f"🌐 SerpApi Google搜索结果\n\n{result}"
+                        return f"🌐 SerpApi Google搜索结果:\n\n{result}"
 
             except Exception as e:
                 print(f"⚠️ {source} 搜索失败: {e}")
@@ -1835,11 +1839,11 @@ class MyAdvancedSearchTool:
         response = self.tavily_client.search(query=query, max_results=3)
 
         if response.get('answer'):
-            result = f"💡 AI直接答案{response['answer']}\n\n"
+            result = f"💡 AI直接答案:{response['answer']}\n\n"
         else:
             result = ""
 
-        result += "🔗 相关结果\n"
+        result += "🔗 相关结果:\n"
         for i, item in enumerate(response.get('results', [])[:3], 1):
             result += f"[{i}] {item.get('title', '')}\n"
             result += f"    {item.get('content', '')[:150]}...\n\n"
@@ -1858,7 +1862,7 @@ class MyAdvancedSearchTool:
 
         results = search.get_dict()
 
-        result = "🔗 Google搜索结果\n"
+        result = "🔗 Google搜索结果:\n"
         if "organic_results" in results:
             for i, res in enumerate(results["organic_results"][:3], 1):
                 result += f"[{i}] {res.get('title', '')}\n"
@@ -1995,7 +1999,7 @@ class ToolChain:
             try:
                 tool_input = input_template.format(**context)
             except KeyError as e:
-                return f"❌ 工具链执行失败模板变量 {e} 未找到"
+                return f"❌ 工具链执行失败:模板变量 {e} 未找到"
 
             print(f"  步骤 {i}: 使用 {tool_name} 处理 '{tool_input[:50]}...'")
 
@@ -2036,23 +2040,23 @@ class ToolChainManager:
 
 # 使用示例
 def create_research_chain() -> ToolChain:
-    """创建一个研究工具链搜索 -> 计算 -> 总结"""
+    """创建一个研究工具链:搜索 -> 计算 -> 总结"""
     chain = ToolChain(
         name="research_and_calculate",
         description="搜索信息并进行相关计算"
     )
 
-    # 步骤1搜索信息
+    # 步骤1:搜索信息
     chain.add_step(
         tool_name="search",
         input_template="{input}",
         output_key="search_result"
     )
 
-    # 步骤2基于搜索结果进行计算(如果需要)
+    # 步骤2:基于搜索结果进行计算(如果需要)
     chain.add_step(
         tool_name="my_calculator",
-        input_template="根据以下信息计算相关数值{search_result}",
+        input_template="根据以下信息计算相关数值:{search_result}",
         output_key="calculation_result"
     )
 

+ 2083 - 0
docs/chapter8/Chapter8-Memory-and-Retrieval.md

@@ -0,0 +1,2083 @@
+<div align="right">
+  English | <a href="./第八章%20记忆与检索.md">中文</a>
+</div>
+
+# Chapter 8 Memory and Retrieval
+
+In previous chapters, we built the basic architecture of the HelloAgents framework, implementing various agent paradigms and tool systems. However, our framework still lacks a critical capability: **memory**. If an agent cannot remember previous interactions or learn from historical experiences, its performance will be greatly limited in continuous conversations or complex tasks.
+
+This chapter will add two core capabilities to HelloAgents based on the framework built in Chapter 7: **Memory System** and **Retrieval-Augmented Generation (RAG)**. We will adopt a "framework extension + knowledge popularization" approach, deeply understanding the theoretical foundations of Memory and RAG during the construction process, and ultimately implementing an agent system with complete memory and knowledge retrieval capabilities.
+
+
+## 8.1 From Cognitive Science to Agent Memory
+
+### 8.1.1 Inspiration from Human Memory Systems
+
+Before building an agent's memory system, let's first understand from a cognitive science perspective how humans process and store information. Human memory is a multi-level cognitive system that not only stores information but also classifies and organizes information based on importance, time, and context. Cognitive psychology provides a classic theoretical framework for understanding the structure and processes of memory<sup>[1]</sup>, as shown in Figure 8.1.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-1.png" alt="Human Memory System Structure" width="85%"/>
+  <p>Figure 8.1 Hierarchical Structure of Human Memory System</p>
+</div>
+
+According to cognitive psychology research, human memory can be divided into the following levels:
+
+1. **Sensory Memory**: Very short duration (0.5-3 seconds), huge capacity, responsible for temporarily storing all information received by the senses
+2. **Working Memory**: Short duration (15-30 seconds), limited capacity (7±2 items), responsible for information processing in current tasks
+3. **Long-term Memory**: Long duration (can last a lifetime), almost unlimited capacity, further divided into:
+   - **Procedural Memory**: Skills and habits (such as riding a bicycle)
+   - **Declarative Memory**: Knowledge that can be expressed in language, further divided into:
+     - **Semantic Memory**: General knowledge and concepts (such as "Paris is the capital of France")
+     - **Episodic Memory**: Personal experiences and events (such as "yesterday's meeting content")
+
+### 8.1.2 Why Agents Need Memory and RAG
+
+Drawing on the design of human memory systems, we can understand why agents also need similar memory capabilities. An important characteristic of human intelligence is the ability to remember past experiences, learn from them, and apply these experiences to new situations. Similarly, a truly intelligent agent also needs memory capabilities. For LLM-based agents, they typically face two fundamental limitations: **forgetting of conversation state** and **limitations of built-in knowledge**.
+
+(1) Limitation 1: Conversation Forgetting Due to Statelessness
+
+Current large language models, although powerful, are designed to be **stateless**. This means that each user request (or API call) is an independent, unrelated computation. The model itself does not automatically "remember" the content of the previous conversation. This brings several problems:
+
+1. **Context Loss**: In long conversations, important early information may be lost due to context window limitations
+2. **Lack of Personalization**: The agent cannot remember user preferences, habits, or specific needs
+3. **Limited Learning Ability**: Cannot learn and improve from past successes or failures
+4. **Consistency Issues**: May provide contradictory answers in multi-turn conversations
+
+Let's understand this problem through a specific example:
+
+```python
+# How to use Agent from Chapter 7
+from hello_agents import SimpleAgent, HelloAgentsLLM
+
+agent = SimpleAgent(name="Learning Assistant", llm=HelloAgentsLLM())
+
+# First conversation
+response1 = agent.run("My name is Zhang San, I'm learning Python and have mastered basic syntax")
+print(response1)  # "Great! Python basic syntax is an important foundation for programming..."
+ 
+# Second conversation (new session)
+response2 = agent.run("Do you remember my learning progress?")
+print(response2)  # "Sorry, I don't know your learning progress..."
+```
+
+To solve this problem, our framework needs to introduce a memory system.
+
+(2) Limitation 2: Limitations of Model's Built-in Knowledge
+
+Besides forgetting conversation history, another core limitation of LLMs is that their knowledge is **static and limited**. This knowledge comes entirely from their training data, bringing a series of problems:
+
+1. **Knowledge Timeliness**: Large models have a training data cutoff date and cannot access the latest information
+2. **Domain-Specific Knowledge**: General models may lack sufficient depth in specific domains
+3. **Factual Accuracy**: Reduce model hallucinations through retrieval verification
+4. **Explainability**: Provide information sources to enhance answer credibility
+
+To overcome this limitation, RAG technology emerged. Its core idea is to retrieve the most relevant information from an external knowledge base (such as documents, databases, APIs) before the model generates an answer, and provide this information as context to the model.
+
+### 8.1.3 Memory and RAG System Architecture Design
+
+Based on the framework foundation established in Chapter 7 and inspiration from cognitive science, we designed a layered memory and RAG system architecture, as shown in Figure 8.2. This architecture not only draws on the hierarchical structure of human memory systems but also fully considers the scalability of engineering implementation. In implementation, we design memory and RAG as two independent tools: `memory_tool` is responsible for storing and maintaining interaction information during conversations, while `rag_tool` is responsible for retrieving relevant information from user-provided knowledge bases as context and can automatically store important retrieval results in the memory system.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-2.png" alt="HelloAgents Memory and RAG System Architecture" width="95%"/>
+  <p>Figure 8.2 Overall Architecture of HelloAgents Memory and RAG System</p>
+</div>
+
+The memory system adopts a four-layer architecture design:
+
+```
+HelloAgents Memory System
+├── Infrastructure Layer
+│   ├── MemoryManager - Memory manager (unified scheduling and coordination)
+│   ├── MemoryItem - Memory data structure (standardized memory items)
+│   ├── MemoryConfig - Configuration management (system parameter settings)
+│   └── BaseMemory - Memory base class (common interface definition)
+├── Memory Types Layer
+│   ├── WorkingMemory - Working memory (temporary information, TTL management)
+│   ├── EpisodicMemory - Episodic memory (specific events, time series)
+│   ├── SemanticMemory - Semantic memory (abstract knowledge, graph relationships)
+│   └── PerceptualMemory - Perceptual memory (multimodal data)
+├── Storage Backend Layer
+│   ├── QdrantVectorStore - Vector storage (high-performance semantic retrieval)
+│   ├── Neo4jGraphStore - Graph storage (knowledge graph management)
+│   └── SQLiteDocumentStore - Document storage (structured persistence)
+└── Embedding Service Layer
+    ├── DashScopeEmbedding - Tongyi Qianwen embedding (cloud API)
+    ├── LocalTransformerEmbedding - Local embedding (offline deployment)
+    └── TFIDFEmbedding - TFIDF embedding (lightweight fallback)
+```
+
+The RAG system focuses on acquiring and utilizing external knowledge:
+
+```
+HelloAgents RAG System
+├── Document Processing Layer
+│   ├── DocumentProcessor - Document processor (multi-format parsing)
+│   ├── Document - Document object (metadata management)
+│   └── Pipeline - RAG pipeline (end-to-end processing)
+├── Embedding Layer
+│   └── Unified Embedding Interface - Reuses memory system's embedding service
+├── Vector Storage Layer
+│   └── QdrantVectorStore - Vector database (namespace isolation)
+└── Intelligent Q&A Layer
+    ├── Multi-strategy Retrieval - Vector retrieval + MQE + HyDE
+    ├── Context Construction - Intelligent fragment merging and truncation
+    └── LLM-Enhanced Generation - Accurate Q&A based on context
+```
+
+### 8.1.4 Learning Objectives and Quick Experience
+
+Let's first look at the core learning content of Chapter 8:
+
+```
+hello-agents/
+├── hello_agents/
+│   ├── memory/                   # Memory system module
+│   │   ├── base.py               # Basic data structures (MemoryItem, MemoryConfig, BaseMemory)
+│   │   ├── manager.py            # Memory manager (unified coordination and scheduling)
+│   │   ├── embedding.py          # Unified embedding service (DashScope/Local/TFIDF)
+│   │   ├── types/                # Memory type implementations
+│   │   │   ├── working.py        # Working memory (TTL management, pure in-memory)
+│   │   │   ├── episodic.py       # Episodic memory (event sequence, SQLite+Qdrant)
+│   │   │   ├── semantic.py       # Semantic memory (knowledge graph, Qdrant+Neo4j)
+│   │   │   └── perceptual.py     # Perceptual memory (multimodal, SQLite+Qdrant)
+│   │   ├── storage/              # Storage backend implementations
+│   │   │   ├── qdrant_store.py   # Qdrant vector storage (high-performance vector retrieval)
+│   │   │   ├── neo4j_store.py    # Neo4j graph storage (knowledge graph management)
+│   │   │   └── document_store.py # SQLite document storage (structured persistence)
+│   │   └── rag/                  # RAG system
+│   │       ├── pipeline.py       # RAG pipeline (end-to-end processing)
+│   │       └── document.py       # Document processor (multi-format parsing)
+│   └── tools/builtin/            # Extended built-in tools
+│       ├── memory_tool.py        # Memory tool (Agent memory capability)
+│       └── rag_tool.py           # RAG tool (intelligent Q&A capability)
+└──
+```
+
+**Quick Start: Installing the HelloAgents Framework**
+
+To allow readers to quickly experience the complete functionality of this chapter, we provide a directly installable Python package. You can install the version corresponding to this chapter with the following commands:
+
+```bash
+pip install "hello-agents[all]==0.2.0"
+python -m spacy download zh_core_web_sm
+python -m spacy download en_core_web_sm
+```
+
+In addition, you need to configure the graph database, vector database, LLM, and Embedding solution API in `.env`. In the tutorial, Qdrant is used for the vector database, Neo4J for the graph database, and Bailian platform is preferred for Embedding. If no API is available, you can switch to a local deployment model solution.
+
+```bash
+# ================================
+# Qdrant Vector Database Configuration - Get API key: https://cloud.qdrant.io/
+# ================================
+# Use Qdrant cloud service (recommended)
+QDRANT_URL=https://your-cluster.qdrant.tech:6333
+QDRANT_API_KEY=your_qdrant_api_key_here
+
+# Or use local Qdrant (requires Docker)
+# QDRANT_URL=http://localhost:6333
+# QDRANT_API_KEY=
+
+# Qdrant collection configuration
+QDRANT_COLLECTION=hello_agents_vectors
+QDRANT_VECTOR_SIZE=384
+QDRANT_DISTANCE=cosine
+QDRANT_TIMEOUT=30
+
+# ================================
+# Neo4j Graph Database Configuration - Get API key: https://neo4j.com/cloud/aura/
+# ================================
+# Use Neo4j Aura cloud service (recommended)
+NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
+NEO4J_USERNAME=neo4j
+NEO4J_PASSWORD=your_neo4j_password_here
+
+# Or use local Neo4j (requires Docker)
+# NEO4J_URI=bolt://localhost:7687
+# NEO4J_USERNAME=neo4j
+# NEO4J_PASSWORD=hello-agents-password
+
+# Neo4j connection configuration
+NEO4J_DATABASE=neo4j
+NEO4J_MAX_CONNECTION_LIFETIME=3600
+NEO4J_MAX_CONNECTION_POOL_SIZE=50
+NEO4J_CONNECTION_TIMEOUT=60
+
+# ==========================
+# Embedding Configuration Example - Get from Alibaba Cloud Console: https://dashscope.aliyun.com/
+# ==========================
+# - If empty, dashscope defaults to text-embedding-v3; local defaults to sentence-transformers/all-MiniLM-L6-v2
+EMBED_MODEL_TYPE=dashscope
+EMBED_MODEL_NAME=
+EMBED_API_KEY=
+EMBED_BASE_URL=
+```
+
+Learning in this chapter can be done in two ways:
+
+1. **Experiential Learning**: Directly install the framework using `pip`, run example code, and quickly experience various functions
+2. **Deep Learning**: Follow the chapter content, implement each component from scratch, and deeply understand the framework's design philosophy and implementation details
+
+We recommend adopting a "experience first, then implement" learning path. In this chapter, we provide complete test files. You can rewrite core functions and run tests to verify whether your implementation is correct.
+
+Following the design principles established in Chapter 7, we encapsulate memory and RAG capabilities as standard tools rather than creating new Agent classes. Before starting, let's spend 30 seconds experiencing building an agent with memory and RAG capabilities using Hello-agents!
+
+```python
+# Configure the LLM API in .env in the same folder
+from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry
+from hello_agents.tools import MemoryTool, RAGTool
+
+# Create LLM instance
+llm = HelloAgentsLLM()
+
+# Create Agent
+agent = SimpleAgent(
+    name="Intelligent Assistant",
+    llm=llm,
+    system_prompt="You are an AI assistant with memory and knowledge retrieval capabilities"
+)
+
+# Create tool registry
+tool_registry = ToolRegistry()
+
+# Add memory tool
+memory_tool = MemoryTool(user_id="user123")
+tool_registry.register_tool(memory_tool)
+
+# Add RAG tool
+rag_tool = RAGTool(knowledge_base_path="./knowledge_base")
+tool_registry.register_tool(rag_tool)
+
+# Configure tools for Agent
+agent.tool_registry = tool_registry
+
+# Start conversation
+response = agent.run("Hello! Please remember my name is Zhang San, I am a Python developer")
+print(response)
+```
+
+If everything is configured correctly, you can see the following content:
+
+```bash
+[OK] SQLite database tables and indexes created
+[OK] SQLite document storage initialized: ./memory_data\memory.db
+INFO:hello_agents.memory.storage.qdrant_store:✅ Successfully connected to Qdrant cloud service: https://0c517275-2ad0-4442-8309-11c36dc7e811.us-east-1-1.aws.cloud.qdrant.io:6333
+INFO:hello_agents.memory.storage.qdrant_store:✅ Using existing Qdrant collection: hello_agents_vectors
+INFO:hello_agents.memory.types.semantic:✅ Embedding model ready, dimension: 1024
+INFO:hello_agents.memory.types.semantic:✅ Qdrant vector database initialization complete
+INFO:hello_agents.memory.storage.neo4j_store:✅ Successfully connected to Neo4j cloud service: neo4j+s://851b3a28.databases.neo4j.io
+INFO:hello_agents.memory.types.semantic:✅ Neo4j graph database initialization complete
+INFO:hello_agents.memory.storage.neo4j_store:✅ Neo4j index creation complete
+INFO:hello_agents.memory.types.semantic:✅ Neo4j graph database initialization complete
+INFO:hello_agents.memory.types.semantic:🏥 Database health status: Qdrant=✅, Neo4j=✅
+INFO:hello_agents.memory.types.semantic:✅ Loaded Chinese spaCy model: zh_core_web_sm
+INFO:hello_agents.memory.types.semantic:✅ Loaded English spaCy model: en_core_web_sm
+INFO:hello_agents.memory.types.semantic:📚 Available language models: Chinese, English
+INFO:hello_agents.memory.types.semantic:Enhanced semantic memory initialization complete (using Qdrant+Neo4j professional databases)
+INFO:hello_agents.memory.manager:MemoryManager initialization complete, enabled memory types: ['working', 'episodic', 'semantic']
+✅ Tool 'memory' registered.
+INFO:hello_agents.memory.storage.qdrant_store:✅ Successfully connected to Qdrant cloud service: https://0c517275-2ad0-4442-8309-11c36dc7e811.us-east-1-1.aws.cloud.qdrant.io:6333
+INFO:hello_agents.memory.storage.qdrant_store:✅ Using existing Qdrant collection: rag_knowledge_base
+✅ RAG tool initialization successful: namespace=default, collection=rag_knowledge_base
+✅ Tool 'rag' registered.
+Hello, Zhang San! Nice to meet you. As a Python developer, you must be passionate about programming. If you have any technical questions or need to discuss Python-related topics, feel free to reach out to me anytime. I'll do my best to help you. Is there anything I can help you with right now?
+```
+
+## 8.2 Memory System: Giving Agents Memory
+
+### 8.2.1 Memory System Workflow
+
+Before entering the code implementation phase, we need to first define the workflow of the memory system. This workflow references the memory model in cognitive science and maps each cognitive stage to specific technical components and operations. Understanding this mapping relationship will help us with subsequent code implementation.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-3.png" alt="Memory Formation Process" width="90%"/>
+  <p>Figure 8.3 Cognitive Process of Memory Formation</p>
+</div>
+
+As shown in Figure 8.3, according to cognitive science research, the formation of human memory goes through the following stages:
+
+1. **Encoding**: Converting perceived information into a storable form
+2. **Storage**: Saving encoded information in the memory system
+3. **Retrieval**: Extracting relevant information from memory as needed
+4. **Consolidation**: Converting short-term memory into long-term memory
+5. **Forgetting**: Deleting unimportant or outdated information
+
+Based on this inspiration, we designed a complete memory system for HelloAgents. Its core idea is to mimic how the human brain processes different types of information, dividing memory into multiple specialized modules and establishing an intelligent management mechanism. Figure 8.4 shows in detail the workflow of this system, including key links such as memory addition, retrieval, consolidation, and forgetting.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-4.png" alt="Memory System Workflow" width="95%"/>
+  <p>Figure 8.4 Complete Workflow of HelloAgents Memory System</p>
+</div>
+
+Our memory system consists of four different types of memory modules, each optimized for specific application scenarios and lifecycles:
+
+First is **Working Memory**, which plays the role of the agent's "short-term memory," mainly used to store context information of the current conversation. To ensure high-speed access and response, its capacity is intentionally limited (for example, 50 items by default), and its lifecycle is bound to a single session, automatically clearing after the session ends.
+
+Second is **Episodic Memory**, which is responsible for long-term storage of specific interaction events and the agent's learning experiences. Unlike working memory, episodic memory contains rich contextual information and supports retrospective retrieval by time series or topic, serving as the foundation for the agent to "review" and learn from past experiences.
+
+Corresponding to specific events is **Semantic Memory**, which stores more abstract knowledge, concepts, and rules. For example, user preferences learned through conversations, instructions that need to be followed long-term, or domain knowledge points are all suitable for storage here. This part of memory has high persistence and importance and is the core for the agent to form a "knowledge system" and perform associative reasoning.
+
+Finally, to interact with increasingly rich multimedia, we introduced **Perceptual Memory**. This module specifically handles multimodal information such as images and audio and supports cross-modal retrieval. Its lifecycle is dynamically managed based on the importance of information and available storage space.
+
+### 8.2.2 Quick Experience: Get Started with Memory Features in 30 Seconds
+
+Before diving into implementation details, let's quickly experience the basic functions of the memory system:
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry
+from hello_agents.tools import MemoryTool
+
+# Create Agent with memory capability
+llm = HelloAgentsLLM()
+agent = SimpleAgent(name="Memory Assistant", llm=llm)
+
+# Create memory tool
+memory_tool = MemoryTool(user_id="user123")
+tool_registry = ToolRegistry()
+tool_registry.register_tool(memory_tool)
+agent.tool_registry = tool_registry
+
+# Experience memory features
+print("=== Adding Multiple Memories ===")
+
+# Add first memory
+result1 = memory_tool.execute("add", content="User Zhang San is a Python developer focusing on machine learning and data analysis", memory_type="semantic", importance=0.8)
+print(f"Memory 1: {result1}")
+
+# Add second memory
+result2 = memory_tool.execute("add", content="Li Si is a frontend engineer skilled in React and Vue.js development", memory_type="semantic", importance=0.7)
+print(f"Memory 2: {result2}")
+
+# Add third memory
+result3 = memory_tool.execute("add", content="Wang Wu is a product manager responsible for user experience design and requirements analysis", memory_type="semantic", importance=0.6)
+print(f"Memory 3: {result3}")
+
+print("\n=== Searching Specific Memories ===")
+# Search for frontend-related memories
+print("🔍 Searching 'frontend engineer':")
+result = memory_tool.execute("search", query="frontend engineer", limit=3)
+print(result)
+
+print("\n=== Memory Summary ===")
+result = memory_tool.execute("summary")
+print(result)
+```
+
+### 8.2.3 MemoryTool Detailed Explanation
+
+Now let's adopt a top-down approach, starting from the specific operations supported by MemoryTool and gradually delving into the underlying implementation. MemoryTool, as the unified interface of the memory system, follows the architectural pattern of "unified entry, distributed processing":
+
+````python
+def execute(self, action: str, **kwargs) -> str:
+    """Execute memory operation
+
+    Supported operations:
+    - add: Add memory (supports 4 types: working/episodic/semantic/perceptual)
+    - search: Search memory
+    - summary: Get memory summary
+    - stats: Get statistics
+    - update: Update memory
+    - remove: Delete memory
+    - forget: Forget memory (multiple strategies)
+    - consolidate: Consolidate memory (short-term → long-term)
+    - clear_all: Clear all memories
+    """
+
+    if action == "add":
+        return self._add_memory(**kwargs)
+    elif action == "search":
+        return self._search_memory(**kwargs)
+    elif action == "summary":
+        return self._get_summary(**kwargs)
+    # ... other operations
+````
+
+This unified `execute` interface design simplifies the Agent's calling method. The specific operation is specified through the `action` parameter, and `**kwargs` allows each operation to have different parameter requirements. Here we will list several important operations:
+
+(1) Operation 1: add
+
+The `add` operation is the foundation of the memory system. It simulates the process of the human brain encoding perceived information into memory. In implementation, we not only need to store memory content but also add rich contextual information to each memory. This information will play an important role in subsequent retrieval and management.
+
+````python
+def _add_memory(
+    self,
+    content: str = "",
+    memory_type: str = "working",
+    importance: float = 0.5,
+    file_path: str = None,
+    modality: str = None,
+    **metadata
+) -> str:
+    """Add memory"""
+    try:
+        # Ensure session ID exists
+        if self.current_session_id is None:
+            self.current_session_id = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+
+        # Perceptual memory file support
+        if memory_type == "perceptual" and file_path:
+            inferred = modality or self._infer_modality(file_path)
+            metadata.setdefault("modality", inferred)
+            metadata.setdefault("raw_data", file_path)
+
+        # Add session information to metadata
+        metadata.update({
+            "session_id": self.current_session_id,
+            "timestamp": datetime.now().isoformat()
+        })
+
+        memory_id = self.memory_manager.add_memory(
+            content=content,
+            memory_type=memory_type,
+            importance=importance,
+            metadata=metadata,
+            auto_classify=False
+        )
+
+        return f"✅ Memory added (ID: {memory_id[:8]}...)"
+
+    except Exception as e:
+        return f"❌ Failed to add memory: {str(e)}"
+````
+
+This mainly implements three key tasks: automatic management of session IDs (ensuring each memory has a clear session attribution), intelligent processing of multimodal data (automatically inferring file types and saving related metadata), and automatic supplementation of contextual information (adding timestamps and session information to each memory). Among them, the `importance` parameter (default 0.5) is used to mark the importance level of memory, with a value range of 0.0-1.0. This mechanism simulates the human brain's assessment of the importance of different information. This design allows the Agent to automatically distinguish conversations from different time periods and provide rich contextual information for subsequent retrieval and management.
+
+For each memory type, we provide different usage examples:
+
+```python
+# 1. Working Memory - Temporary information, limited capacity
+memory_tool.execute("add",
+    content="User just asked a question about Python functions",
+    memory_type="working",
+    importance=0.6
+)
+
+# 2. Episodic Memory - Specific events and experiences
+memory_tool.execute("add",
+    content="On March 15, 2024, user Zhang San completed their first Python project",
+    memory_type="episodic",
+    importance=0.8,
+    event_type="milestone",
+    location="Online learning platform"
+)
+
+# 3. Semantic Memory - Abstract knowledge and concepts
+memory_tool.execute("add",
+    content="Python is an interpreted, object-oriented programming language",
+    memory_type="semantic",
+    importance=0.9,
+    knowledge_type="factual"
+)
+
+# 4. Perceptual Memory - Multimodal information
+memory_tool.execute("add",
+    content="User uploaded a Python code screenshot containing function definitions",
+    memory_type="perceptual",
+    importance=0.7,
+    modality="image",
+    file_path="./uploads/code_screenshot.png"
+)
+```
+
+(2) Operation 2: search
+
+The `search` operation is the core function of the memory system. It needs to quickly find the most relevant content to the query among a large number of memories. It involves multiple steps such as semantic understanding, relevance calculation, and result sorting.
+
+````python
+def _search_memory(
+    self,
+    query: str,
+    limit: int = 5,
+    memory_types: List[str] = None,
+    memory_type: str = None,
+    min_importance: float = 0.1
+) -> str:
+    """Search memory"""
+    try:
+        # Parameter standardization
+        if memory_type and not memory_types:
+            memory_types = [memory_type]
+
+        results = self.memory_manager.retrieve_memories(
+            query=query,
+            limit=limit,
+            memory_types=memory_types,
+            min_importance=min_importance
+        )
+
+        if not results:
+            return f"🔍 No memories found related to '{query}'"
+
+        # Format results
+        formatted_results = []
+        formatted_results.append(f"🔍 Found {len(results)} related memories:")
+
+        for i, memory in enumerate(results, 1):
+            memory_type_label = {
+                "working": "Working Memory",
+                "episodic": "Episodic Memory",
+                "semantic": "Semantic Memory",
+                "perceptual": "Perceptual Memory"
+            }.get(memory.memory_type, memory.memory_type)
+
+            content_preview = memory.content[:80] + "..." if len(memory.content) > 80 else memory.content
+            formatted_results.append(
+                f"{i}. [{memory_type_label}] {content_preview} (Importance: {memory.importance:.2f})"
+            )
+
+        return "\n".join(formatted_results)
+
+    except Exception as e:
+        return f"❌ Failed to search memory: {str(e)}"
+````
+
+The search operation is designed to support both singular and plural parameter forms (`memory_type` and `memory_types`), allowing users to express their needs in the most natural way. Among them, the `min_importance` parameter (default 0.1) is used to filter low-quality memories. For the use of the search function, you can refer to this example:
+
+```python
+# Basic search
+result = memory_tool.execute("search", query="Python programming", limit=5)
+
+# Search by specifying memory type
+result = memory_tool.execute("search",
+    query="learning progress",
+    memory_type="episodic",
+    limit=3
+)
+
+# Multi-type search
+result = memory_tool.execute("search",
+    query="function definition",
+    memory_types=["semantic", "episodic"],
+    min_importance=0.5
+)
+```
+
+(3) Operation 3: forget
+
+The forgetting mechanism is the most cognitively scientific feature. It simulates the human brain's selective forgetting process and supports three strategies: importance-based (deleting unimportant memories), time-based (deleting outdated memories), and capacity-based (deleting the least important memories when storage approaches the limit).
+
+````python
+def _forget(self, strategy: str = "importance_based", threshold: float = 0.1, max_age_days: int = 30) -> str:
+    """Forget memories (supports multiple strategies)"""
+    try:
+        count = self.memory_manager.forget_memories(
+            strategy=strategy,
+            threshold=threshold,
+            max_age_days=max_age_days
+        )
+        return f"🧹 Forgot {count} memories (strategy: {strategy})"
+    except Exception as e:
+        return f"❌ Failed to forget memories: {str(e)}"
+````
+
+**Usage of three forgetting strategies:**
+
+```python
+# 1. Importance-based forgetting - Delete memories below importance threshold
+memory_tool.execute("forget",
+    strategy="importance_based",
+    threshold=0.2
+)
+
+# 2. Time-based forgetting - Delete memories older than specified days
+memory_tool.execute("forget",
+    strategy="time_based",
+    max_age_days=30
+)
+
+# 3. Capacity-based forgetting - Delete least important when memory count exceeds limit
+memory_tool.execute("forget",
+    strategy="capacity_based",
+    threshold=0.3
+)
+```
+
+(4) Operation 4: consolidate
+
+````python
+def _consolidate(self, from_type: str = "working", to_type: str = "episodic", importance_threshold: float = 0.7) -> str:
+    """Consolidate memories (promote important short-term memories to long-term memories)"""
+    try:
+        count = self.memory_manager.consolidate_memories(
+            from_type=from_type,
+            to_type=to_type,
+            importance_threshold=importance_threshold,
+        )
+        return f"🔄 Consolidated {count} memories to long-term memory ({from_type} → {to_type}, threshold={importance_threshold})"
+    except Exception as e:
+        return f"❌ Failed to consolidate memories: {str(e)}"
+````
+
+The consolidate operation draws on the concept of memory consolidation in neuroscience, simulating the process of the human brain converting short-term memory into long-term memory. The default setting is to convert working memories with importance exceeding 0.7 into episodic memories. This threshold ensures that only truly important information is preserved long-term. The entire process is automated; users do not need to manually select specific memories. The system intelligently identifies memories that meet the criteria and performs type conversion.
+
+**Usage examples of memory consolidation:**
+
+```python
+# Convert important working memories to episodic memories
+memory_tool.execute("consolidate",
+    from_type="working",
+    to_type="episodic",
+    importance_threshold=0.7
+)
+
+# Convert important episodic memories to semantic memories
+memory_tool.execute("consolidate",
+    from_type="episodic",
+    to_type="semantic",
+    importance_threshold=0.8
+)
+```
+
+Through the collaboration of these core operations, MemoryTool builds a complete memory lifecycle management system. From memory creation, retrieval, summarization to forgetting, consolidation, and management, it forms a closed-loop intelligent memory management system, giving the Agent truly human-like memory capabilities.
+
+### 8.2.4 MemoryManager Detailed Explanation
+
+After understanding the interface design of MemoryTool, let's delve into the underlying implementation to see how MemoryTool collaborates with MemoryManager. This layered design embodies the separation of concerns principle in software engineering. MemoryTool focuses on user interface and parameter processing, while MemoryManager is responsible for core memory management logic.
+
+MemoryTool creates a MemoryManager instance during initialization and enables different types of memory modules based on configuration. This design allows users to choose which memory types to enable based on specific needs, ensuring functional completeness while avoiding unnecessary resource consumption.
+
+````python
+class MemoryTool(Tool):
+    """Memory tool - Provides memory functionality for Agent"""
+
+    def __init__(
+        self,
+        user_id: str = "default_user",
+        memory_config: MemoryConfig = None,
+        memory_types: List[str] = None
+    ):
+        super().__init__(
+            name="memory",
+            description="Memory tool - Can store and retrieve conversation history, knowledge, and experience"
+        )
+
+        # Initialize memory manager
+        self.memory_config = memory_config or MemoryConfig()
+        self.memory_types = memory_types or ["working", "episodic", "semantic"]
+
+        self.memory_manager = MemoryManager(
+            config=self.memory_config,
+            user_id=user_id,
+            enable_working="working" in self.memory_types,
+            enable_episodic="episodic" in self.memory_types,
+            enable_semantic="semantic" in self.memory_types,
+            enable_perceptual="perceptual" in self.memory_types
+        )
+````
+
+MemoryManager, as the core coordinator of the memory system, is responsible for managing different types of memory modules and providing a unified operation interface.
+
+````python
+class MemoryManager:
+    """Memory manager - Unified memory operation interface"""
+
+    def __init__(
+        self,
+        config: Optional[MemoryConfig] = None,
+        user_id: str = "default_user",
+        enable_working: bool = True,
+        enable_episodic: bool = True,
+        enable_semantic: bool = True,
+        enable_perceptual: bool = False
+    ):
+        self.config = config or MemoryConfig()
+        self.user_id = user_id
+
+        # Initialize storage and retrieval components
+        self.store = MemoryStore(self.config)
+        self.retriever = MemoryRetriever(self.store, self.config)
+
+        # Initialize various types of memory
+        self.memory_types = {}
+
+        if enable_working:
+            self.memory_types['working'] = WorkingMemory(self.config, self.store)
+
+        if enable_episodic:
+            self.memory_types['episodic'] = EpisodicMemory(self.config, self.store)
+
+        if enable_semantic:
+            self.memory_types['semantic'] = SemanticMemory(self.config, self.store)
+
+        if enable_perceptual:
+            self.memory_types['perceptual'] = PerceptualMemory(self.config, self.store)
+````
+
+### 8.2.5 Four Types of Memory
+
+Now let's delve into the specific implementation of the four memory types. Each memory type has its unique characteristics and application scenarios:
+
+(1) Working Memory
+
+Working memory is the most active part of the memory system. It is responsible for storing temporary information in the current conversation session. The design focus of working memory is on fast access and automatic cleanup, which ensures the system's response speed and resource efficiency.
+
+Working memory adopts a pure in-memory storage solution, combined with a TTL (Time To Live) mechanism for automatic cleanup. The advantage of this design is extremely fast access speed, but it also means that the content of working memory will be lost after system restart. This characteristic perfectly fits the positioning of working memory: storing temporary and volatile information.
+
+````python
+class WorkingMemory:
+    """Working memory implementation
+    Features:
+    - Limited capacity (default 50 items) + TTL automatic cleanup
+    - Pure in-memory storage, extremely fast access
+    - Hybrid retrieval: TF-IDF vectorization + keyword matching
+    """
+
+    def __init__(self, config: MemoryConfig):
+        self.max_capacity = config.working_memory_capacity or 50
+        self.max_age_minutes = config.working_memory_ttl or 60
+        self.memories = []
+
+    def add(self, memory_item: MemoryItem) -> str:
+        """Add working memory"""
+        self._expire_old_memories()  # Expiration cleanup
+
+        if len(self.memories) >= self.max_capacity:
+            self._remove_lowest_priority_memory()  # Capacity management
+
+        self.memories.append(memory_item)
+        return memory_item.id
+
+    def retrieve(self, query: str, limit: int = 5, **kwargs) -> List[MemoryItem]:
+        """Hybrid retrieval: TF-IDF vectorization + keyword matching"""
+        self._expire_old_memories()
+
+        # Try TF-IDF vector retrieval
+        vector_scores = self._try_tfidf_search(query)
+
+        # Calculate comprehensive score
+        scored_memories = []
+        for memory in self.memories:
+            vector_score = vector_scores.get(memory.id, 0.0)
+            keyword_score = self._calculate_keyword_score(query, memory.content)
+
+            # Hybrid scoring
+            base_relevance = vector_score * 0.7 + keyword_score * 0.3 if vector_score > 0 else keyword_score
+            time_decay = self._calculate_time_decay(memory.timestamp)
+            importance_weight = 0.8 + (memory.importance * 0.4)
+
+            final_score = base_relevance * time_decay * importance_weight
+            if final_score > 0:
+                scored_memories.append((final_score, memory))
+
+        scored_memories.sort(key=lambda x: x[0], reverse=True)
+        return [memory for _, memory in scored_memories[:limit]]
+````
+
+Working memory retrieval adopts a hybrid retrieval strategy. It first attempts to use TF-IDF vectorization for semantic retrieval, and if that fails, it falls back to keyword matching. This design ensures reliable retrieval services in various environments. The scoring algorithm combines semantic similarity, time decay, and importance weight. The final score formula is: `(similarity × time decay) × (0.8 + importance × 0.4)`.
+
+(2) Episodic Memory
+
+Episodic memory is responsible for storing specific events and experiences. Its design focus is on maintaining the integrity of events and temporal sequence relationships. Episodic memory adopts a hybrid storage solution of SQLite + Qdrant. SQLite is responsible for storing structured data and complex queries, while Qdrant is responsible for efficient vector retrieval.
+
+````python
+class EpisodicMemory:
+    """Episodic memory implementation
+    Features:
+    - SQLite+Qdrant hybrid storage architecture
+    - Supports time series and session-level retrieval
+    - Structured filtering + semantic vector retrieval
+    """
+
+    def __init__(self, config: MemoryConfig):
+        self.doc_store = SQLiteDocumentStore(config.database_path)
+        self.vector_store = QdrantVectorStore(config.qdrant_url, config.qdrant_api_key)
+        self.embedder = create_embedding_model_with_fallback()
+        self.sessions = {}  # Session index
+
+    def add(self, memory_item: MemoryItem) -> str:
+        """Add episodic memory"""
+        # Create episode object
+        episode = Episode(
+            episode_id=memory_item.id,
+            session_id=memory_item.metadata.get("session_id", "default"),
+            timestamp=memory_item.timestamp,
+            content=memory_item.content,
+            context=memory_item.metadata
+        )
+
+        # Update session index
+        session_id = episode.session_id
+        if session_id not in self.sessions:
+            self.sessions[session_id] = []
+        self.sessions[session_id].append(episode.episode_id)
+
+        # Persistent storage (SQLite + Qdrant)
+        self._persist_episode(episode)
+        return memory_item.id
+
+    def retrieve(self, query: str, limit: int = 5, **kwargs) -> List[MemoryItem]:
+        """Hybrid retrieval: structured filtering + semantic vector retrieval"""
+        # 1. Structured pre-filtering (time range, importance, etc.)
+        candidate_ids = self._structured_filter(**kwargs)
+
+        # 2. Vector semantic retrieval
+        hits = self._vector_search(query, limit * 5, kwargs.get("user_id"))
+
+        # 3. Comprehensive scoring and sorting
+        results = []
+        for hit in hits:
+            if self._should_include(hit, candidate_ids, kwargs):
+                score = self._calculate_episode_score(hit)
+                memory_item = self._create_memory_item(hit)
+                results.append((score, memory_item))
+
+        results.sort(key=lambda x: x[0], reverse=True)
+        return [item for _, item in results[:limit]]
+
+    def _calculate_episode_score(self, hit) -> float:
+        """Episodic memory scoring algorithm"""
+        vec_score = float(hit.get("score", 0.0))
+        recency_score = self._calculate_recency(hit["metadata"]["timestamp"])
+        importance = hit["metadata"].get("importance", 0.5)
+
+        # Scoring formula: (vector similarity × 0.8 + temporal recency × 0.2) × importance weight
+        base_relevance = vec_score * 0.8 + recency_score * 0.2
+        importance_weight = 0.8 + (importance * 0.4)
+
+        return base_relevance * importance_weight
+````
+
+The retrieval implementation of episodic memory demonstrates a complex multi-factor scoring mechanism. It not only considers semantic similarity but also incorporates temporal recency considerations, ultimately adjusted by importance weight. The scoring formula is: `(vector similarity × 0.8 + temporal recency × 0.2) × (0.8 + importance × 0.4)`, ensuring that retrieval results are both semantically and temporally relevant.
+
+(3) Semantic Memory
+
+Semantic memory is the most complex part of the memory system. It is responsible for storing abstract concepts, rules, and knowledge. The design focus of semantic memory is on structured representation of knowledge and intelligent reasoning capabilities. Semantic memory adopts a hybrid architecture of Neo4j graph database and Qdrant vector database. This design allows the system to perform both fast semantic retrieval and complex relational reasoning using knowledge graphs.
+
+````python
+class SemanticMemory(BaseMemory):
+    """Semantic memory implementation
+
+    Features:
+    - Uses HuggingFace Chinese pre-trained models for text embedding
+    - Vector retrieval for fast similarity matching
+    - Knowledge graph storage for entities and relationships
+    - Hybrid retrieval strategy: vector + graph + semantic reasoning
+    """
+
+    def __init__(self, config: MemoryConfig, storage_backend=None):
+        super().__init__(config, storage_backend)
+
+        # Embedding model (unified provision)
+        self.embedding_model = get_text_embedder()
+
+        # Professional database storage
+        self.vector_store = QdrantConnectionManager.get_instance(**qdrant_config)
+        self.graph_store = Neo4jGraphStore(**neo4j_config)
+
+        # Entity and relation cache
+        self.entities: Dict[str, Entity] = {}
+        self.relations: List[Relation] = []
+
+        # NLP processor (supports Chinese and English)
+        self.nlp = self._init_nlp()
+````
+
+The addition process of semantic memory embodies the complete workflow of knowledge graph construction. The system not only stores memory content but also automatically extracts entities and relationships to build structured knowledge representations:
+
+```python
+def add(self, memory_item: MemoryItem) -> str:
+    """Add semantic memory"""
+    # 1. Generate text embedding
+    embedding = self.embedding_model.encode(memory_item.content)
+
+    # 2. Extract entities and relations
+    entities = self._extract_entities(memory_item.content)
+    relations = self._extract_relations(memory_item.content, entities)
+
+    # 3. Store to Neo4j graph database
+    for entity in entities:
+        self._add_entity_to_graph(entity, memory_item)
+
+    for relation in relations:
+        self._add_relation_to_graph(relation, memory_item)
+
+    # 4. Store to Qdrant vector database
+    metadata = {
+        "memory_id": memory_item.id,
+        "entities": [e.entity_id for e in entities],
+        "entity_count": len(entities),
+        "relation_count": len(relations)
+    }
+
+    self.vector_store.add_vectors(
+        vectors=[embedding.tolist()],
+        metadata=[metadata],
+        ids=[memory_item.id]
+    )
+```
+
+The retrieval of semantic memory implements a hybrid search strategy, combining the semantic understanding capability of vector retrieval and the relational reasoning capability of graph retrieval:
+
+```python
+def retrieve(self, query: str, limit: int = 5, **kwargs) -> List[MemoryItem]:
+    """Retrieve semantic memory"""
+    # 1. Vector retrieval
+    vector_results = self._vector_search(query, limit * 2, user_id)
+
+    # 2. Graph retrieval
+    graph_results = self._graph_search(query, limit * 2, user_id)
+
+    # 3. Hybrid ranking
+    combined_results = self._combine_and_rank_results(
+        vector_results, graph_results, query, limit
+    )
+
+    return combined_results[:limit]
+```
+
+The hybrid ranking algorithm adopts a multi-factor scoring mechanism:
+
+```python
+def _combine_and_rank_results(self, vector_results, graph_results, query, limit):
+    """Hybrid ranking of results"""
+    combined = {}
+
+    # Merge vector and graph retrieval results
+    for result in vector_results:
+        combined[result["memory_id"]] = {
+            **result,
+            "vector_score": result.get("score", 0.0),
+            "graph_score": 0.0
+        }
+
+    for result in graph_results:
+        memory_id = result["memory_id"]
+        if memory_id in combined:
+            combined[memory_id]["graph_score"] = result.get("similarity", 0.0)
+        else:
+            combined[memory_id] = {
+                **result,
+                "vector_score": 0.0,
+                "graph_score": result.get("similarity", 0.0)
+            }
+
+    # Calculate hybrid score
+    for memory_id, result in combined.items():
+        vector_score = result["vector_score"]
+        graph_score = result["graph_score"]
+        importance = result.get("importance", 0.5)
+
+        # Base relevance score
+        base_relevance = vector_score * 0.7 + graph_score * 0.3
+
+        # Importance weight [0.8, 1.2]
+        importance_weight = 0.8 + (importance * 0.4)
+
+        # Final score: similarity * importance weight
+        combined_score = base_relevance * importance_weight
+        result["combined_score"] = combined_score
+
+    # Sort and return
+    sorted_results = sorted(
+        combined.values(),
+        key=lambda x: x["combined_score"],
+        reverse=True
+    )
+
+    return sorted_results[:limit]
+```
+
+The scoring formula for semantic memory is: `(vector similarity × 0.7 + graph similarity × 0.3) × (0.8 + importance × 0.4)`. The core idea of this design is:
+
+- **Vector retrieval weight (0.7)**: Semantic similarity is the main factor, ensuring retrieval results are semantically related to the query
+- **Graph retrieval weight (0.3)**: Relational reasoning as a supplement, discovering implicit associations between concepts
+- **Importance weight range [0.8, 1.2]**: Avoids excessive influence of importance on similarity ranking, maintaining retrieval accuracy
+
+(4) Perceptual Memory
+
+Perceptual memory supports storage and retrieval of data in multiple modalities such as text, images, and audio. It adopts a modality-separated storage strategy, creating independent vector collections for data of different modalities. This design avoids dimension mismatch problems while ensuring retrieval accuracy:
+
+````python
+class PerceptualMemory(BaseMemory):
+    """Perceptual memory implementation
+
+    Features:
+    - Supports multimodal data (text, images, audio, etc.)
+    - Cross-modal similarity search
+    - Semantic understanding of perceptual data
+    - Supports content generation and retrieval
+    """
+
+    def __init__(self, config: MemoryConfig, storage_backend=None):
+        super().__init__(config, storage_backend)
+
+        # Multimodal encoders
+        self.text_embedder = get_text_embedder()
+        self._clip_model = self._init_clip_model()  # Image encoding
+        self._clap_model = self._init_clap_model()  # Audio encoding
+
+        # Modality-separated vector storage
+        self.vector_stores = {
+            "text": QdrantConnectionManager.get_instance(
+                collection_name="perceptual_text",
+                vector_size=self.vector_dim
+            ),
+            "image": QdrantConnectionManager.get_instance(
+                collection_name="perceptual_image",
+                vector_size=self._image_dim
+            ),
+            "audio": QdrantConnectionManager.get_instance(
+                collection_name="perceptual_audio",
+                vector_size=self._audio_dim
+            )
+        }
+````
+
+Perceptual memory retrieval supports both same-modality and cross-modality modes. Same-modality retrieval uses specialized encoders for precise matching, while cross-modality retrieval requires more complex semantic alignment mechanisms:
+
+```python
+def retrieve(self, query: str, limit: int = 5, **kwargs) -> List[MemoryItem]:
+    """Retrieve perceptual memory (can filter modality; same-modality vector retrieval + time/importance fusion)"""
+    user_id = kwargs.get("user_id")
+    target_modality = kwargs.get("target_modality")
+    query_modality = kwargs.get("query_modality", target_modality or "text")
+
+    # Same-modality vector retrieval
+    try:
+        query_vector = self._encode_data(query, query_modality)
+        store = self._get_vector_store_for_modality(target_modality or query_modality)
+
+        where = {"memory_type": "perceptual"}
+        if user_id:
+            where["user_id"] = user_id
+        if target_modality:
+            where["modality"] = target_modality
+
+        hits = store.search_similar(
+            query_vector=query_vector,
+            limit=max(limit * 5, 20),
+            where=where
+        )
+    except Exception:
+        hits = []
+
+    # Fusion ranking (vector similarity + temporal recency + importance weight)
+    results = []
+    for hit in hits:
+        vector_score = float(hit.get("score", 0.0))
+        recency_score = self._calculate_recency_score(hit["metadata"]["timestamp"])
+        importance = hit["metadata"].get("importance", 0.5)
+
+        # Scoring algorithm
+        base_relevance = vector_score * 0.8 + recency_score * 0.2
+        importance_weight = 0.8 + (importance * 0.4)
+        combined_score = base_relevance * importance_weight
+
+        results.append((combined_score, self._create_memory_item(hit)))
+
+    results.sort(key=lambda x: x[0], reverse=True)
+    return [item for _, item in results[:limit]]
+```
+
+The scoring formula for perceptual memory is: `(vector similarity × 0.8 + temporal recency × 0.2) × (0.8 + importance × 0.4)`. The scoring mechanism of perceptual memory also supports cross-modal retrieval, achieving semantic alignment of different modality data such as text, images, and audio through a unified vector space. When performing cross-modal retrieval, the system automatically adjusts scoring weights to ensure diversity and accuracy of retrieval results. Additionally, the temporal recency calculation in perceptual memory adopts an exponential decay model:
+
+```python
+def _calculate_recency_score(self, timestamp: str) -> float:
+    """Calculate temporal recency score"""
+    try:
+        memory_time = datetime.fromisoformat(timestamp)
+        current_time = datetime.now()
+        age_hours = (current_time - memory_time).total_seconds() / 3600
+
+        # Exponential decay: maintain high score within 24 hours, then gradually decay
+        decay_factor = 0.1  # Decay coefficient
+        recency_score = math.exp(-decay_factor * age_hours / 24)
+
+        return max(0.1, recency_score)  # Maintain minimum base score of 0.1
+    except Exception:
+        return 0.5  # Default medium score
+```
+
+This time decay model simulates the forgetting curve in human memory, ensuring that the perceptual memory system can prioritize retrieval of temporally more relevant memory content.
+
+## 8.3 RAG System: Knowledge Retrieval Enhancement
+
+### 8.3.1 RAG Fundamentals
+
+Before diving into the RAG system implementation of HelloAgents, let's first understand the basic concepts, development history, and core principles of RAG technology. Since this text is not created based on RAG as a foundation, we will only quickly review the relevant concepts here to better understand the technical choices and innovations in system design.
+
+(1) What is RAG?
+
+Retrieval-Augmented Generation (RAG) is a technology that combines information retrieval and text generation. Its core idea is: before generating an answer, first retrieve relevant information from an external knowledge base, then provide the retrieved information as context to the large language model, thereby generating more accurate and reliable answers.
+
+Therefore, Retrieval-Augmented Generation can be broken down into three words. **Retrieval** refers to querying relevant content from the knowledge base; **Augmented** means integrating retrieval results into prompts to assist model generation; **Generation** outputs answers that combine accuracy and transparency.
+
+(2) Basic Workflow
+
+A complete RAG application workflow is mainly divided into two core stages. In the **data preparation stage**, the system builds external knowledge into a retrievable database through **data extraction**, **text segmentation**, and **vectorization**. Subsequently, in the **application stage**, the system responds to user **queries**, **retrieves** relevant information from the database, **injects it into the prompt**, and finally drives the large language model to **generate answers**.
+
+(3) Development History
+
+First stage: Naive RAG (2020-2021). This is the embryonic stage of RAG technology, with a direct and simple process, commonly referred to as the "Retrieve-Read" mode. **Retrieval method**: Mainly relies on traditional keyword matching algorithms such as `TF-IDF` or `BM25`. These methods calculate term frequency and document frequency to evaluate relevance, with good literal matching effects, but difficulty understanding semantic similarity. **Generation mode**: Directly concatenates retrieved document content into the prompt context without processing, then sends it to the generation model.
+
+Second stage: Advanced RAG (2022-2023). With the maturity of vector databases and text embedding technology, RAG entered a rapid development stage. Researchers and developers introduced a large number of optimization techniques in various stages of "retrieval" and "generation". **Retrieval method**: Shifted to semantic retrieval based on **dense embedding**. By converting text into high-dimensional vectors, the model can understand and match semantic similarity, not just keywords. **Generation mode**: Introduced many optimization techniques, such as query rewriting, document chunking, reranking, etc.
+
+Third stage: Modular RAG (2023-present). Building on advanced RAG, modern RAG systems further develop toward modularization, automation, and intelligence. Various parts of the system are designed as pluggable, composable independent modules to adapt to more diverse and complex application scenarios. **Retrieval methods**: Such as hybrid retrieval, multi-query expansion, hypothetical document embedding, etc. **Generation modes**: Chain-of-thought reasoning, self-reflection and correction, etc.
+
+### 8.3.2 RAG System Working Principle
+
+Before diving into implementation details, we can use a flowchart to outline the complete workflow of HelloAgents' RAG system:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-5.png" alt="RAG System Core Principle" width="85%"/>
+  <p>Figure 8.5 Core Working Principle of RAG System</p>
+</div>
+
+As shown in Figure 8.5, it demonstrates the two main working modes of the RAG system:
+1. **Data Processing Workflow**: Processing and storing knowledge documents. Here we adopt the tool `Markitdown`, with the design idea of uniformly converting all incoming external knowledge sources into Markdown format for processing.
+2. **Query and Generation Workflow**: Retrieving relevant information based on queries and generating answers.
+
+### 8.3.3 Quick Experience: Get Started with RAG Features in 30 Seconds
+
+Let's quickly experience the basic functions of the RAG system:
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry
+from hello_agents.tools import RAGTool
+
+# Create Agent with RAG capability
+llm = HelloAgentsLLM()
+agent = SimpleAgent(name="Knowledge Assistant", llm=llm)
+
+# Create RAG tool
+rag_tool = RAGTool(
+    knowledge_base_path="./knowledge_base",
+    collection_name="test_collection",
+    rag_namespace="test"
+)
+
+tool_registry = ToolRegistry()
+tool_registry.register_tool(rag_tool)
+agent.tool_registry = tool_registry
+
+# Experience RAG features
+# Add first knowledge
+result1 = rag_tool.execute("add_text",
+    text="Python is a high-level programming language first released by Guido van Rossum in 1991. Python's design philosophy emphasizes code readability and concise syntax.",
+    document_id="python_intro")
+print(f"Knowledge 1: {result1}")
+
+# Add second knowledge
+result2 = rag_tool.execute("add_text",
+    text="Machine learning is a branch of artificial intelligence that uses algorithms to enable computers to learn patterns from data. It mainly includes three types: supervised learning, unsupervised learning, and reinforcement learning.",
+    document_id="ml_basics")
+print(f"Knowledge 2: {result2}")
+
+# Add third knowledge
+result3 = rag_tool.execute("add_text",
+    text="RAG (Retrieval-Augmented Generation) is an AI technology that combines information retrieval and text generation. It enhances the generation capability of large language models by retrieving relevant knowledge.",
+    document_id="rag_concept")
+print(f"Knowledge 3: {result3}")
+
+
+print("\n=== Search Knowledge ===")
+result = rag_tool.execute("search",
+    query="History of Python programming language",
+    limit=3,
+    min_score=0.1
+)
+print(result)
+
+print("\n=== Knowledge Base Statistics ===")
+result = rag_tool.execute("stats")
+print(result)
+```
+
+Next, we will delve into the specific implementation of the HelloAgents RAG system.
+
+### 8.3.4 RAG System Architecture Design
+
+In this section, we adopt a different approach from the memory system explanation. Because `Memory_tool` is a systematic implementation, while RAG in our design is defined as a tool that can be organized as a pipeline. The core architecture of our RAG system can be summarized as a "five-layer seven-step" design pattern:
+
+```
+User Layer: RAGTool unified interface
+  ↓
+Application Layer: Intelligent Q&A, search, management
+  ↓
+Processing Layer: Document parsing, chunking, vectorization
+  ↓
+Storage Layer: Vector database, document storage
+  ↓
+Foundation Layer: Embedding model, LLM, database
+```
+
+The advantage of this layered design is that each layer can be independently optimized and replaced while maintaining the stability of the overall system. For example, you can easily switch the embedding model from sentence-transformers to Bailian API without affecting the upper-level business logic. Similarly, the processing workflow code is completely reusable, and you can also select the parts you need and put them into your own project. RAGTool serves as the unified entry point of the RAG system, providing a concise API interface.
+
+````python
+class RAGTool(Tool):
+    """RAG tool
+
+    Provides complete RAG capabilities:
+    - Add multi-format documents (PDF, Office, images, audio, etc.)
+    - Intelligent retrieval and recall
+    - LLM-enhanced Q&A
+    - Knowledge base management
+    """
+
+    def __init__(
+        self,
+        knowledge_base_path: str = "./knowledge_base",
+        qdrant_url: str = None,
+        qdrant_api_key: str = None,
+        collection_name: str = "rag_knowledge_base",
+        rag_namespace: str = "default"
+    ):
+        # Initialize RAG pipeline
+        self._pipelines: Dict[str, Dict[str, Any]] = {}
+        self.llm = HelloAgentsLLM()
+
+        # Create default pipeline
+        default_pipeline = create_rag_pipeline(
+            qdrant_url=self.qdrant_url,
+            qdrant_api_key=self.qdrant_api_key,
+            collection_name=self.collection_name,
+            rag_namespace=self.rag_namespace
+        )
+        self._pipelines[self.rag_namespace] = default_pipeline
+````
+
+The entire processing workflow is as follows:
+```
+Any format document → MarkItDown conversion → Markdown text → Intelligent chunking → Vectorization → Storage and retrieval
+```
+
+(1) Multimodal Document Loading
+
+One of the core advantages of the RAG system is its powerful multimodal document processing capability. The system uses MarkItDown as a unified document conversion engine, supporting almost all common document formats. MarkItDown is an open-source universal document conversion tool from Microsoft. It is a core component of the HelloAgents RAG system, responsible for uniformly converting documents of any format into structured Markdown text. Whether the input is PDF, Word, Excel, images, or audio, it will ultimately be converted to standard Markdown format, then enter the unified chunking, vectorization, and storage workflow.
+
+```python
+def _convert_to_markdown(path: str) -> str:
+    """
+    Universal document reader using MarkItDown with enhanced PDF processing.
+    Core function: Convert documents of any format to Markdown text
+
+    Supported formats:
+    - Documents: PDF, Word, Excel, PowerPoint
+    - Images: JPG, PNG, GIF (via OCR)
+    - Audio: MP3, WAV, M4A (via transcription)
+    - Text: TXT, CSV, JSON, XML, HTML
+    - Code: Python, JavaScript, Java, etc.
+    """
+    if not os.path.exists(path):
+        return ""
+
+    # Use enhanced processing for PDF files
+    ext = (os.path.splitext(path)[1] or '').lower()
+    if ext == '.pdf':
+        return _enhanced_pdf_processing(path)
+
+    # Use MarkItDown unified conversion for other formats
+    md_instance = _get_markitdown_instance()
+    if md_instance is None:
+        return _fallback_text_reader(path)
+
+    try:
+        result = md_instance.convert(path)
+        markdown_text = getattr(result, "text_content", None)
+        if isinstance(markdown_text, str) and markdown_text.strip():
+            print(f"[RAG] MarkItDown conversion successful: {path} -> {len(markdown_text)} chars Markdown")
+            return markdown_text
+        return ""
+    except Exception as e:
+        print(f"[WARNING] MarkItDown conversion failed {path}: {e}")
+        return _fallback_text_reader(path)
+```
+
+(2) Intelligent Chunking Strategy
+
+After MarkItDown conversion, all documents are unified into standard Markdown format. This provides a structured foundation for subsequent intelligent chunking. HelloAgents implements an intelligent chunking strategy specifically for Markdown format, fully utilizing the structured characteristics of Markdown for precise segmentation.
+
+Markdown structure-aware chunking workflow:
+
+```
+Standard Markdown text → Heading hierarchy parsing → Paragraph semantic segmentation → Token calculation chunking → Overlap strategy optimization → Vectorization preparation
+       ↓                ↓              ↓            ↓           ↓            ↓
+   Unified format      #/##/###      Semantic boundary  Size control  Information continuity  Embedding vector
+   Clear structure     Hierarchy recognition  Integrity guarantee  Retrieval optimization  Context preservation  Similarity matching
+```
+
+Since all documents have been converted to Markdown format, the system can use Markdown's heading structure (#, ##, ###, etc.) for precise semantic segmentation:
+
+```python
+def _split_paragraphs_with_headings(text: str) -> List[Dict]:
+    """Split paragraphs based on heading hierarchy, maintaining semantic integrity"""
+    lines = text.splitlines()
+    heading_stack: List[str] = []
+    paragraphs: List[Dict] = []
+    buf: List[str] = []
+    char_pos = 0
+
+    def flush_buf(end_pos: int):
+        if not buf:
+            return
+        content = "\n".join(buf).strip()
+        if not content:
+            return
+        paragraphs.append({
+            "content": content,
+            "heading_path": " > ".join(heading_stack) if heading_stack else None,
+            "start": max(0, end_pos - len(content)),
+            "end": end_pos,
+        })
+
+    for ln in lines:
+        raw = ln
+        if raw.strip().startswith("#"):
+            # Process heading line
+            flush_buf(char_pos)
+            level = len(raw) - len(raw.lstrip('#'))
+            title = raw.lstrip('#').strip()
+
+            if level <= 0:
+                level = 1
+            if level <= len(heading_stack):
+                heading_stack = heading_stack[:level-1]
+            heading_stack.append(title)
+
+            char_pos += len(raw) + 1
+            continue
+
+        # Accumulate paragraph content
+        if raw.strip() == "":
+            flush_buf(char_pos)
+            buf = []
+        else:
+            buf.append(raw)
+        char_pos += len(raw) + 1
+
+    flush_buf(char_pos)
+
+    if not paragraphs:
+        paragraphs = [{"content": text, "heading_path": None, "start": 0, "end": len(text)}]
+
+    return paragraphs
+```
+
+Based on Markdown paragraph segmentation, the system further performs intelligent chunking based on token count. Since the input is already structured Markdown text, the system can more precisely control chunk boundaries, ensuring that each chunk is both suitable for vectorization processing and maintains the integrity of the Markdown structure:
+
+```python
+def _chunk_paragraphs(paragraphs: List[Dict], chunk_tokens: int, overlap_tokens: int) -> List[Dict]:
+    """Intelligent chunking based on token count"""
+    chunks: List[Dict] = []
+    cur: List[Dict] = []
+    cur_tokens = 0
+    i = 0
+
+    while i < len(paragraphs):
+        p = paragraphs[i]
+        p_tokens = _approx_token_len(p["content"]) or 1
+
+        if cur_tokens + p_tokens <= chunk_tokens or not cur:
+            cur.append(p)
+            cur_tokens += p_tokens
+            i += 1
+        else:
+            # Generate current chunk
+            content = "\n\n".join(x["content"] for x in cur)
+            start = cur[0]["start"]
+            end = cur[-1]["end"]
+            heading_path = next((x["heading_path"] for x in reversed(cur) if x.get("heading_path")), None)
+
+            chunks.append({
+                "content": content,
+                "start": start,
+                "end": end,
+                "heading_path": heading_path,
+            })
+
+            # Build overlap section
+            if overlap_tokens > 0 and cur:
+                kept: List[Dict] = []
+                kept_tokens = 0
+                for x in reversed(cur):
+                    t = _approx_token_len(x["content"]) or 1
+                    if kept_tokens + t > overlap_tokens:
+                        break
+                    kept.append(x)
+                    kept_tokens += t
+                cur = list(reversed(kept))
+                cur_tokens = kept_tokens
+            else:
+                cur = []
+                cur_tokens = 0
+
+    # Process last chunk
+    if cur:
+        content = "\n\n".join(x["content"] for x in cur)
+        start = cur[0]["start"]
+        end = cur[-1]["end"]
+        heading_path = next((x["heading_path"] for x in reversed(cur) if x.get("heading_path")), None)
+
+        chunks.append({
+            "content": content,
+            "start": start,
+            "end": end,
+            "heading_path": heading_path,
+        })
+
+    return chunks
+```
+
+At the same time, to be compatible with different languages, the system implements a token estimation algorithm for Chinese-English mixed text, which is crucial for accurately controlling chunk size:
+
+```python
+def _approx_token_len(text: str) -> int:
+    """Approximate token length estimation, supports Chinese-English mixed text"""
+    # CJK characters counted as 1 token each
+    cjk = sum(1 for ch in text if _is_cjk(ch))
+    # Other characters counted by whitespace tokenization
+    non_cjk_tokens = len([t for t in text.split() if t])
+    return cjk + non_cjk_tokens
+
+def _is_cjk(ch: str) -> bool:
+    """Determine if character is CJK"""
+    code = ord(ch)
+    return (
+        0x4E00 <= code <= 0x9FFF or  # CJK Unified Ideographs
+        0x3400 <= code <= 0x4DBF or  # CJK Extension A
+        0x20000 <= code <= 0x2A6DF or # CJK Extension B
+        0x2A700 <= code <= 0x2B73F or # CJK Extension C
+        0x2B740 <= code <= 0x2B81F or # CJK Extension D
+        0x2B820 <= code <= 0x2CEAF or # CJK Extension E
+        0xF900 <= code <= 0xFAFF      # CJK Compatibility Ideographs
+    )
+```
+
+(3) Unified Embedding and Vector Storage
+
+The embedding model is the core of the RAG system. It is responsible for converting text into high-dimensional vectors, enabling computers to understand and compare semantic similarity of text. The retrieval capability of the RAG system largely depends on the quality of the embedding model and the efficiency of vector storage. HelloAgents implements a unified embedding interface. For demonstration purposes, we use the Bailian API here. If not yet configured, you can switch to the local `all-MiniLM-L6-v2` model. If both solutions are not supported, the TF-IDF algorithm is also configured as a fallback. In actual use, you can replace it with your desired model or API, or try to extend the framework content~
+
+```python
+def index_chunks(
+    store = None,
+    chunks: List[Dict] = None,
+    cache_db: Optional[str] = None,
+    batch_size: int = 64,
+    rag_namespace: str = "default"
+) -> None:
+    """
+    Index markdown chunks with unified embedding and Qdrant storage.
+    Uses Bailian API with fallback to sentence-transformers.
+    """
+    if not chunks:
+        print("[RAG] No chunks to index")
+        return
+
+    # Use unified embedding model
+    embedder = get_text_embedder()
+    dimension = get_dimension(384)
+
+    # Create default Qdrant storage
+    if store is None:
+        store = _create_default_vector_store(dimension)
+        print(f"[RAG] Created default Qdrant store with dimension {dimension}")
+
+    # Preprocess Markdown text for better embedding quality
+    processed_texts = []
+    for c in chunks:
+        raw_content = c["content"]
+        processed_content = _preprocess_markdown_for_embedding(raw_content)
+        processed_texts.append(processed_content)
+
+    print(f"[RAG] Embedding start: total_texts={len(processed_texts)} batch_size={batch_size}")
+
+    # Batch encoding
+    vecs: List[List[float]] = []
+    for i in range(0, len(processed_texts), batch_size):
+        part = processed_texts[i:i+batch_size]
+        try:
+            # Use unified embedder (handles caching internally)
+            part_vecs = embedder.encode(part)
+
+            # Standardize to List[List[float]] format
+            if not isinstance(part_vecs, list):
+                if hasattr(part_vecs, "tolist"):
+                    part_vecs = [part_vecs.tolist()]
+                else:
+                    part_vecs = [list(part_vecs)]
+
+            # Process vector format and dimension
+            for v in part_vecs:
+                try:
+                    if hasattr(v, "tolist"):
+                        v = v.tolist()
+                    v_norm = [float(x) for x in v]
+
+                    # Dimension check and adjustment
+                    if len(v_norm) != dimension:
+                        print(f"[WARNING] Vector dimension anomaly: expected {dimension}, actual {len(v_norm)}")
+                        if len(v_norm) < dimension:
+                            v_norm.extend([0.0] * (dimension - len(v_norm)))
+                        else:
+                            v_norm = v_norm[:dimension]
+
+                    vecs.append(v_norm)
+                except Exception as e:
+                    print(f"[WARNING] Vector conversion failed: {e}, using zero vector")
+                    vecs.append([0.0] * dimension)
+
+        except Exception as e:
+            print(f"[WARNING] Batch {i} encoding failed: {e}")
+            # Implement retry mechanism
+            # ... retry logic ...
+
+        print(f"[RAG] Embedding progress: {min(i+batch_size, len(processed_texts))}/{len(processed_texts)}")
+```
+
+### 8.3.5 Advanced Retrieval Strategies
+
+The retrieval capability of the RAG system is its core competitiveness. In practical applications, there may be wording differences between user queries and actual content in documents, resulting in relevant documents not being retrieved. To solve this problem, HelloAgents implements three complementary advanced retrieval strategies: Multi-Query Expansion (MQE), Hypothetical Document Embeddings (HyDE), and a unified extended retrieval framework.
+
+(1) Multi-Query Expansion (MQE)
+
+Multi-Query Expansion (MQE) is a technique that improves retrieval recall by generating semantically equivalent diverse queries. The core insight of this method is: the same question can have multiple different expressions, and different expressions may match different relevant documents. For example, "how to learn Python" can be expanded to "Python beginner tutorial", "Python learning methods", "Python programming guide", and other queries. By executing these expanded queries in parallel and merging the results, the system can cover a wider range of relevant documents, avoiding missing important information due to wording differences.
+
+The advantage of MQE is that it can automatically understand multiple possible meanings of user queries, especially effective for ambiguous queries or professional terminology queries. The system uses LLM to generate expanded queries, ensuring diversity and semantic relevance of expansions:
+
+```python
+def _prompt_mqe(query: str, n: int) -> List[str]:
+    """Use LLM to generate diverse query expansions"""
+    try:
+        from ...core.llm import HelloAgentsLLM
+        llm = HelloAgentsLLM()
+        prompt = [
+            {"role": "system", "content": "You are a retrieval query expansion assistant. Generate semantically equivalent or complementary diverse queries. Use Chinese, keep it short, avoid punctuation."},
+            {"role": "user", "content": f"Original query: {query}\nPlease provide {n} differently phrased queries, one per line."}
+        ]
+        text = llm.invoke(prompt)
+        lines = [ln.strip("- \t") for ln in (text or "").splitlines()]
+        outs = [ln for ln in lines if ln]
+        return outs[:n] or [query]
+    except Exception:
+        return [query]
+```
+
+(2) Hypothetical Document Embeddings (HyDE)
+
+Hypothetical Document Embeddings (HyDE) is an innovative retrieval technique. Its core idea is "use answers to find answers". Traditional retrieval methods use questions to match documents, but there is often a difference in the distribution of questions and answers in semantic space—questions are usually interrogative sentences, while document content is declarative sentences. HyDE has the LLM first generate a hypothetical answer paragraph, then uses this answer paragraph to retrieve real documents, thereby narrowing the semantic gap between queries and documents.
+
+The advantage of this method is that hypothetical answers are closer to real answers in semantic space, thus enabling more accurate matching to relevant documents. Even if the content of the hypothetical answer is not completely correct, the key terms, concepts, and expression styles it contains can effectively guide the retrieval system to find the correct documents. Especially for professional domain queries, HyDE can generate hypothetical documents containing domain terminology, significantly improving retrieval accuracy:
+
+```python
+def _prompt_hyde(query: str) -> Optional[str]:
+    """Generate hypothetical document to improve retrieval"""
+    try:
+        from ...core.llm import HelloAgentsLLM
+        llm = HelloAgentsLLM()
+        prompt = [
+            {"role": "system", "content": "Based on the user's question, first write a possible answer paragraph for use as a query document in vector retrieval (no analysis process)."},
+            {"role": "user", "content": f"Question: {query}\nPlease directly write a medium-length, objective paragraph containing key terminology."}
+        ]
+        return llm.invoke(prompt)
+    except Exception:
+        return None
+```
+
+(3) Extended Retrieval Framework
+
+HelloAgents integrates the two strategies of MQE and HyDE into a unified extended retrieval framework. The system allows users to choose which strategies to enable based on specific scenarios through the `enable_mqe` and `enable_hyde` parameters: for scenarios requiring high recall, both strategies can be enabled simultaneously; for performance-sensitive scenarios, only basic retrieval can be used.
+
+The core mechanism of extended retrieval is a three-step "expand-retrieve-merge" workflow. First, the system generates multiple expanded queries based on the original query (including diverse queries generated by MQE and hypothetical documents generated by HyDE); then, it executes vector retrieval in parallel for each expanded query to obtain a candidate document pool; finally, it merges all results through deduplication and score sorting, returning the most relevant top-k documents. The ingenuity of this design is that it expands the candidate pool through the `candidate_pool_multiplier` parameter (default is 4), ensuring sufficient candidate documents for screening, while avoiding returning duplicate content through intelligent deduplication.
+
+```python
+def search_vectors_expanded(
+    store = None,
+    query: str = "",
+    top_k: int = 8,
+    rag_namespace: Optional[str] = None,
+    only_rag_data: bool = True,
+    score_threshold: Optional[float] = None,
+    enable_mqe: bool = False,
+    mqe_expansions: int = 2,
+    enable_hyde: bool = False,
+    candidate_pool_multiplier: int = 4,
+) -> List[Dict]:
+    """
+    Search with query expansion using unified embedding and Qdrant.
+    """
+    if not query:
+        return []
+
+    # Create default storage
+    if store is None:
+        store = _create_default_vector_store()
+
+    # Query expansion
+    expansions: List[str] = [query]
+
+    if enable_mqe and mqe_expansions > 0:
+        expansions.extend(_prompt_mqe(query, mqe_expansions))
+    if enable_hyde:
+        hyde_text = _prompt_hyde(query)
+        if hyde_text:
+            expansions.append(hyde_text)
+
+    # Deduplication and trimming
+    uniq: List[str] = []
+    for e in expansions:
+        if e and e not in uniq:
+            uniq.append(e)
+    expansions = uniq[: max(1, len(uniq))]
+
+    # Allocate candidate pool
+    pool = max(top_k * candidate_pool_multiplier, 20)
+    per = max(1, pool // max(1, len(expansions)))
+
+    # Build RAG data filter
+    where = {"memory_type": "rag_chunk"}
+    if only_rag_data:
+        where["is_rag_data"] = True
+        where["data_source"] = "rag_pipeline"
+    if rag_namespace:
+        where["rag_namespace"] = rag_namespace
+
+    # Collect results from all expanded queries
+    agg: Dict[str, Dict] = {}
+    for q in expansions:
+        qv = embed_query(q)
+        hits = store.search_similar(
+            query_vector=qv,
+            limit=per,
+            score_threshold=score_threshold,
+            where=where
+        )
+        for h in hits:
+            mid = h.get("metadata", {}).get("memory_id", h.get("id"))
+            s = float(h.get("score", 0.0))
+            if mid not in agg or s > float(agg[mid].get("score", 0.0)):
+                agg[mid] = h
+
+    # Sort by score and return
+    merged = list(agg.values())
+    merged.sort(key=lambda x: float(x.get("score", 0.0)), reverse=True)
+    return merged[:top_k]
+```
+
+In practical applications, the combined use of these three strategies works best. MQE excels at handling wording diversity issues, HyDE excels at handling semantic gap issues, and the unified framework ensures result quality and diversity. For general queries, it is recommended to enable MQE; for professional domain queries, it is recommended to enable both MQE and HyDE simultaneously; for performance-sensitive scenarios, only basic retrieval or only MQE can be used.
+
+Of course, there are many other interesting methods. This is just an appropriate extension introduction for everyone. In actual usage scenarios, you also need to try to find solutions suitable for the problem.
+
+## 8.4 Building an Intelligent Document Q&A Assistant
+
+In the previous sections, we detailed the design and implementation of HelloAgents' memory system and RAG system. Now, let's demonstrate through a complete practical case how to organically combine these two systems to build an intelligent document Q&A assistant.
+
+### 8.4.1 Case Background and Objectives
+
+In actual work, we often need to process a large number of technical documents, research papers, product manuals, and other PDF files. Traditional document reading methods are inefficient, making it difficult to quickly locate key information, let alone establish associations between knowledge.
+
+This case will use the public beta PDF document `Happy-LLM-0727.pdf` from Datawhale's another hands-on large model tutorial Happy-LLM as an example to build a **Gradio-based Web application**, demonstrating how to use RAGTool and MemoryTool to build a complete interactive learning assistant. The PDF can be obtained from this [link](https://github.com/datawhalechina/happy-llm/releases/download/v1.0.1/Happy-LLM-0727.pdf).
+
+We hope to implement the following functions:
+
+1. **Intelligent Document Processing**: Use MarkItDown to achieve unified conversion from PDF to Markdown, intelligent chunking strategy based on Markdown structure, efficient vectorization and index construction
+
+2. **Advanced Retrieval Q&A**: Multi-Query Expansion (MQE) to improve recall, Hypothetical Document Embeddings (HyDE) to improve retrieval accuracy, context-aware intelligent Q&A
+
+3. **Multi-level Memory Management**: Working memory manages current learning tasks and context, episodic memory records learning events and query history, semantic memory stores conceptual knowledge and understanding, perceptual memory processes document features and multimodal information
+
+4. **Personalized Learning Support**: Personalized recommendations based on learning history, memory consolidation and selective forgetting, learning report generation and progress tracking
+
+To more clearly demonstrate the workflow of the entire system, Figure 8.6 shows the relationships and data flow between the five steps. The five steps form a complete closed loop: Step 1 records information from processed PDF documents to the memory system, Step 2's retrieval results are also recorded to the memory system, Step 3 demonstrates the complete functions of the memory system (add, retrieve, consolidate, forget), Step 4 integrates RAG and Memory to provide intelligent routing, and Step 5 collects all statistical information to generate learning reports.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-6.png" alt="" width="85%"/>
+  <p>Figure 8.6 Five-step Execution Workflow of Intelligent Q&A Assistant</p>
+</div>
+
+Next, we will demonstrate how to implement this Web application. The entire application is divided into three core parts:
+
+1. **Core Assistant Class (PDFLearningAssistant)**: Encapsulates the calling logic of RAGTool and MemoryTool
+2. **Gradio Web Interface**: Provides a friendly user interaction interface, this part can refer to the example code for learning
+3. **Other Core Functions**: Note recording, learning review, statistics viewing, and report generation
+
+### 8.4.2 Implementation of Core Assistant Class
+
+First, we implement the core assistant class `PDFLearningAssistant`, which encapsulates the calling logic of RAGTool and MemoryTool.
+
+(1) Class Initialization
+
+```python
+class PDFLearningAssistant:
+    """Intelligent document Q&A assistant"""
+
+    def __init__(self, user_id: str = "default_user"):
+        """Initialize learning assistant
+
+        Args:
+            user_id: User ID, used to isolate data for different users
+        """
+        self.user_id = user_id
+        self.session_id = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+
+        # Initialize tools
+        self.memory_tool = MemoryTool(user_id=user_id)
+        self.rag_tool = RAGTool(rag_namespace=f"pdf_{user_id}")
+
+        # Learning statistics
+        self.stats = {
+            "session_start": datetime.now(),
+            "documents_loaded": 0,
+            "questions_asked": 0,
+            "concepts_learned": 0
+        }
+
+        # Currently loaded document
+        self.current_document = None
+```
+
+In this initialization process, we made several key design decisions:
+
+**MemoryTool Initialization**: Implements user-level memory isolation through the `user_id` parameter. Learning memories of different users are completely independent, and each user has their own working memory, episodic memory, semantic memory, and perceptual memory space.
+
+**RAGTool Initialization**: Implements knowledge base namespace isolation through the `rag_namespace` parameter. Using `f"pdf_{user_id}"` as the namespace, each user has their own independent PDF knowledge base.
+
+**Session Management**: `session_id` is used to track the complete process of a single learning session, facilitating subsequent learning journey review and analysis.
+
+**Statistical Information**: The `stats` dictionary records key learning metrics for generating learning reports.
+
+(2) Loading PDF Documents
+
+```python
+def load_document(self, pdf_path: str) -> Dict[str, Any]:
+    """Load PDF document into knowledge base
+
+    Args:
+        pdf_path: PDF file path
+
+    Returns:
+        Dict: Result containing success and message
+    """
+    if not os.path.exists(pdf_path):
+        return {"success": False, "message": f"File does not exist: {pdf_path}"}
+
+    start_time = time.time()
+
+    # [RAGTool] Process PDF: MarkItDown conversion → Intelligent chunking → Vectorization
+    result = self.rag_tool.execute(
+        "add_document",
+        file_path=pdf_path,
+        chunk_size=1000,
+        chunk_overlap=200
+    )
+
+    process_time = time.time() - start_time
+
+    if result.get("success", False):
+        self.current_document = os.path.basename(pdf_path)
+        self.stats["documents_loaded"] += 1
+
+        # [MemoryTool] Record to learning memory
+        self.memory_tool.execute(
+            "add",
+            content=f"Loaded document 《{self.current_document}》",
+            memory_type="episodic",
+            importance=0.9,
+            event_type="document_loaded",
+            session_id=self.session_id
+        )
+
+        return {
+            "success": True,
+            "message": f"Loading successful! (Time: {process_time:.1f}s)",
+            "document": self.current_document
+        }
+    else:
+        return {
+            "success": False,
+            "message": f"Loading failed: {result.get('error', 'Unknown error')}"
+        }
+```
+
+We can complete PDF processing with just one line of code:
+
+```python
+result = self.rag_tool.execute(
+    "add_document",
+    file_path=pdf_path,
+    chunk_size=1000,
+    chunk_overlap=200
+)
+```
+
+This call triggers the complete processing workflow of RAGTool (MarkItDown conversion, enhanced processing, intelligent chunking, vectorization storage). These internal details have been introduced in detail in Section 8.3. We only need to focus on:
+
+- **Operation Type**: `"add_document"` - Add document to knowledge base
+- **File Path**: `file_path` - Path to the PDF file
+- **Chunking Parameters**: `chunk_size=1000, chunk_overlap=200` - Control text chunking
+- **Return Result**: Dictionary containing processing status and statistical information
+
+After the document is successfully loaded, we use MemoryTool to record it to episodic memory:
+
+```python
+self.memory_tool.execute(
+    "add",
+    content=f"Loaded document 《{self.current_document}》",
+    memory_type="episodic",
+    importance=0.9,
+    event_type="document_loaded",
+    session_id=self.session_id
+)
+```
+
+**Why use episodic memory?** Because this is a specific, timestamped event, suitable for recording with episodic memory. The `session_id` parameter associates this event with the current learning session, facilitating subsequent review of the learning journey.
+
+This memory record lays the foundation for subsequent personalized services:
+
+- User asks "What documents have I loaded before?" → Retrieve from episodic memory
+- System can track user's learning journey and document usage
+
+### 8.4.3 Intelligent Q&A Function
+
+After the document is loaded, users can ask questions about the document. We implement an `ask` method to handle user questions:
+
+```python
+def ask(self, question: str, use_advanced_search: bool = True) -> str:
+    """Ask questions about the document
+
+    Args:
+        question: User question
+        use_advanced_search: Whether to use advanced retrieval (MQE + HyDE)
+
+    Returns:
+        str: Answer
+    """
+    if not self.current_document:
+        return "⚠️ Please load a document first!"
+
+    # [MemoryTool] Record question to working memory
+    self.memory_tool.execute(
+        "add",
+        content=f"Question: {question}",
+        memory_type="working",
+        importance=0.6,
+        session_id=self.session_id
+    )
+
+    # [RAGTool] Use advanced retrieval to get answer
+    answer = self.rag_tool.execute(
+        "ask",
+        question=question,
+        limit=5,
+        enable_advanced_search=use_advanced_search,
+        enable_mqe=use_advanced_search,
+        enable_hyde=use_advanced_search
+    )
+
+    # [MemoryTool] Record to episodic memory
+    self.memory_tool.execute(
+        "add",
+        content=f"Learning about '{question}'",
+        memory_type="episodic",
+        importance=0.7,
+        event_type="qa_interaction",
+        session_id=self.session_id
+    )
+
+    self.stats["questions_asked"] += 1
+
+    return answer
+```
+
+When we call `self.rag_tool.execute("ask", ...)`, RAGTool internally executes the following advanced retrieval workflow:
+
+1. **Multi-Query Expansion (MQE)**:
+
+   ```python
+   # Generate diverse queries
+   expanded_queries = self._generate_multi_queries(question)
+   # For example, for "What is a large language model?", it might generate:
+   # - "What is the definition of a large language model?"
+   # - "Please explain large language models"
+   # - "What does LLM mean?"
+   ```
+
+   MQE generates semantically equivalent but differently expressed queries through LLM, understanding user intent from multiple angles, improving recall by 30%-50%.
+
+2. **Hypothetical Document Embeddings (HyDE)**:
+
+   - Generate hypothetical answer documents, bridging the semantic gap between queries and documents
+   - Use vectors of hypothetical answers for retrieval
+
+The internal implementation of these advanced retrieval techniques has been introduced in detail in Section 8.3.5.
+
+### 8.4.4 Other Core Functions
+
+In addition to loading documents and intelligent Q&A, we also need to implement functions such as note recording, learning review, statistics viewing, and report generation:
+
+```python
+def add_note(self, content: str, concept: Optional[str] = None):
+    """Add learning note"""
+    self.memory_tool.execute(
+        "add",
+        content=content,
+        memory_type="semantic",
+        importance=0.8,
+        concept=concept or "general",
+        session_id=self.session_id
+    )
+    self.stats["concepts_learned"] += 1
+
+def recall(self, query: str, limit: int = 5) -> str:
+    """Review learning journey"""
+    result = self.memory_tool.execute(
+        "search",
+        query=query,
+        limit=limit
+    )
+    return result
+
+def get_stats(self) -> Dict[str, Any]:
+    """Get learning statistics"""
+    duration = (datetime.now() - self.stats["session_start"]).total_seconds()
+    return {
+        "Session Duration": f"{duration:.0f}s",
+        "Documents Loaded": self.stats["documents_loaded"],
+        "Questions Asked": self.stats["questions_asked"],
+        "Learning Notes": self.stats["concepts_learned"],
+        "Current Document": self.current_document or "Not loaded"
+    }
+
+def generate_report(self, save_to_file: bool = True) -> Dict[str, Any]:
+    """Generate learning report"""
+    memory_summary = self.memory_tool.execute("summary", limit=10)
+    rag_stats = self.rag_tool.execute("stats")
+
+    duration = (datetime.now() - self.stats["session_start"]).total_seconds()
+    report = {
+        "session_info": {
+            "session_id": self.session_id,
+            "user_id": self.user_id,
+            "start_time": self.stats["session_start"].isoformat(),
+            "duration_seconds": duration
+        },
+        "learning_metrics": {
+            "documents_loaded": self.stats["documents_loaded"],
+            "questions_asked": self.stats["questions_asked"],
+            "concepts_learned": self.stats["concepts_learned"]
+        },
+        "memory_summary": memory_summary,
+        "rag_status": rag_stats
+    }
+
+    if save_to_file:
+        report_file = f"learning_report_{self.session_id}.json"
+        with open(report_file, 'w', encoding='utf-8') as f:
+            json.dump(report, f, ensure_ascii=False, indent=2, default=str)
+        report["report_file"] = report_file
+
+    return report
+```
+
+These methods respectively implement:
+
+- **add_note**: Save learning notes to semantic memory
+- **recall**: Retrieve learning journey from memory system
+- **get_stats**: Get statistical information of current session
+- **generate_report**: Generate detailed learning report and save as JSON file
+
+### 8.4.5 Running Effect Demonstration
+
+Next is the running effect demonstration. As shown in Figure 8.7, after entering the main page, you need to first initialize the assistant, which is to load our database, model, API and other loading operations. Then pass in the PDF document and click to load the document.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-7.png" alt="" width="85%"/>
+  <p>Figure 8.7 Q&A Assistant Main Page</p>
+</div>
+
+The first function is intelligent Q&A, which can retrieve based on uploaded documents and return reference sources and similarity calculations of related materials. This is a demonstration of RAG tool capabilities, as shown in Figure 8.8.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-8.png" alt="" width="85%"/>
+  <p>Figure 8.8 Q&A Assistant Main Page</p>
+</div>
+
+The second function is learning notes. As shown in Figure 8.9, you can select related concepts and write note content. This part uses Memory tool and will store your personal notes in the database for easy statistics and subsequent return of overall learning reports.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-9.png" alt="" width="85%"/>
+  <p>Figure 8.9 Q&A Assistant Main Page</p>
+</div>
+
+Finally, there are statistics on learning progress and report generation. As shown in Figure 8.10, we can see the number of documents loaded, number of questions asked, and number of notes during the use of the assistant. Finally, our Q&A results and notes are organized into a JSON document and returned.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-10.png" alt="" width="85%"/>
+  <p>Figure 8.10 Q&A Assistant Main Page</p>
+</div>
+
+Through this Q&A assistant case, we demonstrated how to use RAGTool and MemoryTool to build a complete **Web-based intelligent document Q&A system**. The complete code can be found in `code/chapter8/11_Q&A_Assistant.py`. After starting, visit `http://localhost:7860` to use this intelligent learning assistant.
+
+Readers are advised to run this case personally, experience the capabilities of RAG and Memory, and expand and customize on this basis to build intelligent applications that meet their own needs!
+
+## 8.5 Chapter Summary and Outlook
+
+In this chapter, we successfully added two core capabilities to the HelloAgents framework: the memory system and the RAG system.
+
+For readers who wish to deeply learn and apply the content of this chapter, we provide the following suggestions:
+
+1. From zero to one, design a basic memory module by hand and gradually iterate to add more complex features.
+
+2. Try and evaluate different embedding models and retrieval strategies in projects to find the optimal solution for specific tasks.
+
+3. Apply the learned memory and RAG systems to a real personal project, testing and improving capabilities in practice.
+
+Advanced Exploration
+
+1. Track and study cutting-edge memory and RAG repositories, learning excellent implementations.
+2. Explore the possibility of applying RAG architecture to multimodal (text + image) or cross-modal scenarios.
+3. Participate in the HelloAgents open-source project, contributing your ideas and code.
+
+Through the study of this chapter, you have not only mastered the implementation technology of Memory and RAG systems, but more importantly, understood how to transform cognitive science theory into practical engineering solutions. This interdisciplinary way of thinking will lay a solid foundation for your further development in the AI field.
+
+Finally, let's summarize the complete knowledge system of this chapter through a mind map, as shown in Figure 8.11:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/8-figures/8-11.png" alt="" width="85%"/>
+  <p>Figure 8.11 Hello-agents Chapter 8 Knowledge Summary</p>
+</div>
+
+This chapter demonstrated the capabilities of the HelloAgents framework's memory system and RAG technology. We successfully built a truly "intelligent" learning assistant. This architecture can be easily extended to other application scenarios, such as customer service, technical support, personal assistants, and other fields.
+
+In the next chapter, we will continue to explore how to further improve the dialogue quality and user experience of agents through context engineering. Stay tuned!
+
+## Exercises
+
+> **Note**: Some exercises do not have standard answers. The focus is on cultivating learners' comprehensive understanding and practical ability of memory systems and RAG technology.
+
+1. This chapter introduced four memory types: working memory, episodic memory, semantic memory, and perceptual memory. Please analyze:
+
+   - In Section 8.2.5, each memory type has a unique scoring formula. Please compare the scoring mechanisms of episodic memory and semantic memory, and explain why episodic memory emphasizes "temporal recency" more (weight 0.2), while semantic memory emphasizes "graph retrieval" more (weight 0.3)?
+   - If you were to design a "personal health management assistant" (needs to record user's diet, exercise, sleep data, and provide health advice), how would you combine these four memory types? Please design specific application scenarios for each memory type.
+   - Working memory uses a TTL (Time To Live) mechanism to automatically clean expired data. Please think: under what circumstances should important working memories be "consolidated" into long-term memory? How to design an automatic consolidation trigger condition?
+
+2. In the RAG system in Section 8.3, we use MarkItDown to uniformly convert various format documents to Markdown. Please think deeply:
+
+   > **Note**: This is a hands-on practice question, actual operation is recommended
+
+   - The current intelligent chunking strategy is based on Markdown heading hierarchy (#, ##, ###) for segmentation. If processing documents without clear heading structure (such as novels, legal provisions), how should the chunking strategy be optimized? Please try to implement a chunking algorithm based on "semantic boundaries".
+   - Section 8.3.5 introduced two advanced retrieval strategies: MQE (Multi-Query Expansion) and HyDE (Hypothetical Document Embeddings). Please select a practical scenario (such as technical document Q&A, medical knowledge retrieval), compare the effect differences of basic retrieval, MQE, and HyDE, and analyze their respective applicable scenarios.
+   - The retrieval quality of the RAG system largely depends on the choice of embedding model. Please compare the three embedding solutions mentioned in this chapter (Bailian API, local Transformer, TF-IDF) from the dimensions of accuracy, speed, cost, offline deployment, etc., and provide selection recommendations.
+
+3. The "forgetting" mechanism of the memory system is an important design that simulates human cognition. Based on the MemoryTool in Section 8.2.3, please complete the following extended practice:
+
+   > **Note**: This is a hands-on practice question, actual operation is recommended
+
+   - Currently, three forgetting strategies are provided: importance-based, time-based, and capacity-based. Please design and implement an "intelligent forgetting" strategy that comprehensively considers importance, access frequency, time decay, and other factors, using weighted scoring to decide which memories should be forgotten.
+   - In long-running agent systems, the memory database may accumulate a large amount of data. Please design a "memory archiving" mechanism: transfer long-unused but potentially valuable memories to cold storage, and restore them when needed. How should this mechanism be integrated with the existing four memory types?
+   - Think: If the agent needs to "forget" certain sensitive information (such as user privacy data), is it sufficient to just delete it from the database? In the case of using vector databases and graph databases, how to ensure data is completely cleared?
+
+4. In the "Intelligent Learning Assistant" case in Section 8.4, we combined MemoryTool and RAGTool. Please analyze in depth:
+
+   - The `ask_question()` method in the case uses both RAG retrieval and memory retrieval. Please analyze: under what circumstances should RAG be prioritized? Under what circumstances should Memory be prioritized? How to design an "intelligent routing" mechanism to automatically select the most appropriate retrieval method?
+   - The current learning report (`generate_report()`) only contains statistical information. Please extend this function and design a more intelligent learning report generator: able to analyze user's learning trajectory, identify knowledge blind spots, and recommend next learning content. Which memory types and retrieval strategies are needed for this?
+   - Suppose you want to deploy this learning assistant as a multi-user Web service, where each user has independent memory and knowledge base. Please design a data isolation solution: how to implement user-level data isolation in Qdrant and Neo4j? How to optimize retrieval performance in multi-user scenarios?
+
+5. Semantic memory uses Neo4j graph database to store knowledge graphs. Please think:
+
+   - In the semantic memory implementation in Section 8.2.5, the system automatically extracts entities and relationships to build knowledge graphs. Please analyze: how accurate is this automatic extraction? Under what circumstances might incorrect entities or relationships be extracted? How to design a "knowledge graph quality assessment" mechanism?
+   - An important advantage of knowledge graphs is supporting complex relational reasoning. Please design a query scenario that fully utilizes Neo4j's graph query capabilities (such as multi-hop relationships, path finding) to accomplish tasks that pure vector retrieval cannot complete.
+   - Compare the "vector retrieval + graph retrieval" hybrid strategy of semantic memory with pure vector retrieval: in what types of queries can graph retrieval bring significant performance improvements? Please illustrate with specific examples.
+
+## References
+
+[1] Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In *Psychology of learning and motivation* (Vol. 2, pp. 89-195). Academic press.
+

+ 4 - 0
docs/chapter8/第八章 记忆与检索.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter8-Memory-and-Retrieval.md">English</a> | 中文
+</div>
+
 # 第八章 记忆与检索
 
 在前面的章节中,我们构建了HelloAgents框架的基础架构,实现了多种智能体范式和工具系统。不过,我们的框架还缺少一个关键能力:<strong>记忆</strong>。如果智能体无法记住之前的交互内容,也无法从历史经验中学习,那么在连续对话或复杂任务中,其表现将受到极大限制。

+ 2816 - 0
docs/chapter9/Chapter9-Context-Engineering.md

@@ -0,0 +1,2816 @@
+<div align="right">
+  English | <a href="./第九章%20上下文工程.md">中文</a>
+</div>
+
+# Chapter 9 Context Engineering
+
+In previous chapters, we have introduced memory systems and RAG for agents. However, to enable agents to stably "think" and "act" in real complex scenarios, memory and retrieval alone are not enough—we need an engineering methodology to continuously and systematically construct appropriate "context" for the model. This is the theme of this chapter: Context Engineering. It focuses on "how to assemble and optimize input context in a reusable, measurable, and evolvable way before each model call", thereby improving correctness, robustness, and efficiency<sup>[1][2]</sup>.
+
+To enable readers to quickly experience the complete functionality of this chapter, we provide a directly installable Python package. You can install the version corresponding to this chapter with the following command:
+
+```bash
+pip install "hello-agents[all]==0.2.7"
+```
+
+This chapter mainly introduces the core concepts and practices of context engineering, and adds a context builder and two supporting tools to the HelloAgents framework:
+
+- **ContextBuilder** (`hello_agents/context/builder.py`): Context builder that implements the GSSC (Gather-Select-Structure-Compress) pipeline, providing a unified context management interface
+- **NoteTool** (`hello_agents/tools/builtin/note_tool.py`): Structured note tool that supports persistent memory management for agents
+- **TerminalTool** (`hello_agents/tools/builtin/terminal_tool.py`): Terminal tool that supports file system operations and just-in-time context retrieval for agents
+
+These components together constitute a complete context engineering solution, which is key to implementing long-term task management and agentic search, and will be introduced in detail in subsequent sections.
+
+In addition to installing the framework, you also need to configure the LLM API in `.env`. The examples in this chapter mainly use large language models for context management and intelligent decision-making.
+
+After configuration is complete, you can start the learning journey of this chapter!
+
+## 9.1 What is Context Engineering
+
+After years of Prompt Engineering becoming the focus of applied AI, a new term has come to the forefront: **Context Engineering**. Today, building systems with language models is no longer just about finding the right phrasing and wording in prompts, but about answering a more macro question: **What kind of context configuration is most likely to make the model produce the behavior we expect?**
+
+The so-called "context" refers to the set of tokens included when sampling a large language model (LLM). The engineering problem at hand is to **optimize the utility of these tokens** under the inherent constraints of the LLM, in order to stably obtain expected results. To effectively harness LLMs, it is often necessary to "think in context"—that is: at any call, examine the overall state visible to the LLM and predict the behavior this state might induce.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/9-figures/9-1.webp" alt="" width="85%"/>
+  <p>Figure 9.1 Prompt engineering vs Context engineering</p>
+</div>
+
+This section will explore the emerging context engineering and provide a refined mental model for building **controllable and effective** agents.
+
+**Context Engineering vs. Prompt Engineering**
+
+As shown in Figure 9.1, from the perspective of leading model vendors, context engineering is the natural evolution of prompt engineering. Prompt engineering focuses on how to write and organize LLM instructions to obtain better results (such as system prompt writing and structured strategies); while context engineering is **how to plan and maintain the "optimal information set (tokens)" during the inference stage**, which includes not only the prompt itself, but also all other information that will enter the context window.
+
+In the early stages of LLM engineering, prompts were often the main work, because most use cases (except daily chat) required fine-tuned prompt optimization for single-turn classification or text generation. As the name suggests, the core of prompt engineering is "how to write effective prompts", especially system prompts. However, as we begin to engineer stronger agents that work over longer time spans and across multiple inference rounds, we need strategies that can manage the **entire context state**—including system instructions, tools, MCP (Model Context Protocol), external data, message history, etc.
+
+An agent running in a loop will continuously generate data that may be relevant to the next round of inference. This information must be **periodically refined**. Therefore, the "art and technique" of context engineering lies in **identifying which content should enter the limited context window** from the continuously expanding "candidate information universe".
+
+## 9.2 Why Context Engineering is Important
+
+Although models are getting faster and can handle larger data scales, we observe that: like humans, LLMs will "wander" or "get confused" at a certain point. Needle-in-a-haystack benchmarks reveal a phenomenon: **context rot**—as the number of tokens in the context window increases, the model's ability to accurately recall information from the context actually decreases.
+
+Different models may have smoother degradation curves, but this characteristic appears in almost all models. Therefore, **context must be viewed as a limited resource with diminishing marginal returns**. Just as humans have limited working memory capacity, LLMs also have an "attention budget". Each new token consumes part of this budget, so we need to be more careful about which tokens should be provided to the LLM.
+
+This scarcity is not accidental, but stems from the architectural constraints of LLMs. Transformers allow each token to establish associations with **all** tokens in the context, theoretically forming \(n^2\) pairwise attention relationships. As the context length grows, the model's ability to model these pairwise relationships is "stretched thin", naturally creating tension between "context scale" and "attention concentration". In addition, the model's attention patterns come from the training data distribution—short sequences are usually more common than long sequences, so the model has less experience with "full-context dependencies" and fewer specialized parameters.
+
+Techniques such as position encoding interpolation can allow models to "adapt" to sequences longer than during training at inference time, but at the cost of some precision in understanding token positions. Overall, these factors together form a **performance gradient** rather than a "cliff-like" collapse: models are still powerful in long contexts, but compared to short contexts, their precision in information retrieval and long-range reasoning will decline.
+
+Based on the above reality, **conscious context engineering** becomes a necessity for building robust agents.
+
+### 9.2.1 The "Anatomy" of Effective Context
+
+Under the constraint of "limited attention budget", the goal of excellent context engineering is: **maximize the probability of obtaining expected results with as few but high signal density tokens as possible**. In practice, we recommend engineering around the following components:
+
+- **System Prompt**: Clear and straightforward language, with information hierarchy at "just right" height. Common pitfalls at two extremes:
+  - Over-hardcoding: Writing complex, fragile if-else logic in prompts, with high long-term maintenance costs and fragility.
+  - Too vague: Only providing macro goals and generalized guidance, lacking **specific signals** for expected output or assuming incorrect "shared context".
+  It is recommended to organize prompts into sections (such as <background_information>, <instructions>, tool guidance, output description, etc.), separated by XML/Markdown. Regardless of format, the pursuit is the **"minimum necessary information set" that can fully outline expected behavior** ("minimum" does not equal "shortest"). First run with the best model on the minimum prompt, then add clear instructions and examples based on failure modes.
+
+- **Tools**: Tools define the contract between the agent and the information/action space, and must promote efficiency: they must return **token-friendly** information while encouraging efficient agent behavior. Tools should:
+  - Have single responsibilities with low overlap, clear interface semantics;
+  - Be robust to errors;
+  - Have clear and unambiguous parameter descriptions, fully leveraging the model's strengths in expression and reasoning.
+  A common failure mode is "bloated tool sets": fuzzy functional boundaries, making the decision of "which tool to use" itself ambiguous. **If human engineers can't tell which tool to use, don't expect agents to do better**. Carefully identifying a "Minimum Viable Tool Set (MVTS)" can often significantly improve stability and maintainability in long-term interactions.
+
+- **Few-shot Examples**: Always recommend providing examples, but don't recommend stuffing "all boundary conditions" into prompts. Please carefully select a set of **diverse and typical** examples that directly portray "expected behavior". For LLMs, **good examples are worth a thousand words**.
+
+The overall guiding principle is: **sufficient but compact information**. As shown in Figure 9.2, this is dynamic retrieval entering runtime.
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/9-figures/9-2.webp" alt="" width="85%"/>
+  <p>Figure 9.2 Calibrating the system prompt</p>
+</div>
+
+### 9.2.2 Context Retrieval and Agentic Search
+
+A concise definition: **Agent = LLM autonomously calling tools in a loop**. As the capabilities of underlying models increase, the autonomy level of agents can be improved: they can more independently explore complex problem spaces and recover from errors.
+
+Engineering practice is gradually transitioning from "one-time retrieval before inference (embedding retrieval)" to "**Just-in-time (JIT) context**". The latter no longer preloads all relevant data, but maintains **lightweight references** (file paths, storage queries, URLs, etc.), dynamically loading required data through tools at runtime. This allows the model to write targeted queries, cache necessary results, and analyze large volumes of data with commands like <code>head</code>/<code>tail</code>—without stuffing entire data blocks into context at once. Its cognitive pattern is closer to humans: we don't memorize all information, but use external indexes like file systems, inboxes, bookmarks to extract on demand.
+
+In addition to storage efficiency, **metadata of references** itself can help refine behavior: directory hierarchy, naming conventions, timestamps, etc., all implicitly convey "purpose and timeliness". For example, <code>tests/test_utils.py</code> and <code>src/core/test_utils.py</code> have different semantic implications.
+
+Allowing agents to autonomously navigate and retrieve also enables **progressive disclosure**: each interaction step generates new context, which in turn guides the next decision—file size hints at complexity, naming hints at purpose, timestamps hint at relevance. Agents can build understanding layer by layer, keeping only the "currently necessary subset" in working memory, and using "note-taking" for supplementary persistence, thereby maintaining focus rather than being "dragged down by comprehensiveness".
+
+The trade-off is: runtime exploration is often slower than pre-computed retrieval, and requires "opinionated" engineering design to ensure the model has the right tools and heuristics. Without guidance, agents may misuse tools, chase dead ends, or miss key information, causing context waste.
+
+In many scenarios, a **hybrid strategy** is more effective: preload a small amount of "high-value" context to ensure speed, then allow agents to continue autonomous exploration on demand. The choice of boundaries depends on task dynamics and timeliness requirements. In engineering, you can preload files like "project convention descriptions (such as README/guides)", while providing primitives like <code>glob</code>, <code>grep</code>, allowing agents to retrieve specific files just-in-time, thereby bypassing the sunk costs of outdated indexes and complex syntax trees.
+
+### 9.2.3 Context Engineering for Long-Horizon Tasks
+
+Long-horizon tasks require agents to maintain coherence, context consistency, and goal orientation in action sequences that exceed the context window. For example, large codebase migrations, systematic research spanning hours. Expecting to infinitely increase the context window cannot cure the problems of "context pollution" and relevance degradation, so engineering methods directly facing these constraints are needed: **Compaction**, **Structured note-taking**, and **Sub-agent architectures**.
+
+- **Compaction**
+  - Definition: When a conversation approaches the context limit, perform high-fidelity summarization and restart a new context window with the summary to maintain long-range coherence.
+  - Practice: Have the model compress and retain architectural decisions, unresolved defects, implementation details, discarding repetitive tool outputs and noise; the new window carries the compressed summary + a few recent highly relevant artifacts (such as "recently accessed files").
+  - Tuning suggestions: First optimize **recall** (ensure no key information is missed), then optimize **precision** (remove redundant content); a safe "light-touch" compression is to clean up "tool calls and results in deep history".
+
+- **Structured note-taking**
+  - Definition: Also called "agent memory". Agents write key information to **persistent storage outside the context** at fixed frequencies, pulling it back on demand in subsequent stages.
+  - Value: Maintain persistent state and dependencies with extremely low context overhead. For example, maintaining TODO lists, project NOTES.md, indexes of key conclusions/dependencies/blockers, maintaining progress and consistency across dozens of tool calls and multiple context resets.
+  - Note: Equally effective in non-coding scenarios (such as long-term strategic tasks, goal management and statistical counting in games/simulations). Combined with <code>MemoryTool</code> from Chapter 8, file-based/vector-based external memory can be easily implemented and retrieved at runtime.
+
+- **Sub-agent architectures**
+  - Idea: The main agent is responsible for high-level planning and synthesis, while multiple specialized sub-agents each dig deep, call tools, and explore in "clean context windows", finally only returning **condensed summaries** (typically 1,000–2,000 tokens).
+  - Benefits: Achieve separation of concerns. Complex search contexts remain internal to sub-agents, while the main agent focuses on integration and reasoning; suitable for complex research/analysis tasks requiring parallel exploration.
+  - Experience: Public multi-agent research systems show that this pattern has significant advantages over single-agent baselines in complex research tasks.
+
+Method trade-offs can follow these rules of thumb:
+
+- **Compaction**: Suitable for tasks requiring long conversation continuity, emphasizing context "relay".
+- **Structured note-taking**: Suitable for iterative development and research with milestones/phased results.
+- **Sub-agent architectures**: Suitable for complex research and analysis that can benefit from parallel exploration.
+
+Even as model capabilities continue to improve, "maintaining coherence and focus in long interactions" remains a core challenge in building robust agents. Careful and systematic context engineering will maintain its key value in the long term.
+
+## 9.3 Practice in Hello-Agents: ContextBuilder
+
+This section will detail the context engineering practice in the HelloAgents framework. We will gradually demonstrate how to build a production-grade context management system from design motivation, core data structures, implementation details to complete cases. The design philosophy of ContextBuilder is "simple and efficient", removing unnecessary complexity, uniformly selecting based on "relevance + recency" scores, conforming to the engineering orientation of Agent modularity and maintainability.
+
+### 9.3.1 Design Motivation and Goals
+
+Before building ContextBuilder, we first need to clarify its design goals and core value. An excellent context management system should solve the following key problems:
+
+1. **Unified Entry**: Abstract "Gather-Select-Structure-Compress" as a reusable pipeline, reducing repetitive template code in Agent implementations. This unified interface design allows developers to avoid repeatedly writing context management logic in each Agent.
+
+2. **Stable Form**: Output a context template with a fixed skeleton, facilitating debugging, A/B testing, and evaluation. We adopted a sectioned template structure:
+   - `[Role & Policies]`: Clarify the Agent's role positioning and behavioral guidelines
+   - `[Task]`: The specific task currently to be completed
+   - `[State]`: The Agent's current state and context information
+   - `[Evidence]`: Evidence information retrieved from external knowledge bases
+   - `[Context]`: Historical dialogue and related memories
+   - `[Output]`: Expected output format and requirements
+
+3. **Budget Guardian**: Retain high-value information as much as possible within the token budget, providing fallback compression strategies for over-limit contexts. This ensures that even in scenarios with huge amounts of information, the system can run stably.
+
+4. **Minimum Rules**: Do not introduce classification dimensions such as source/priority to avoid complexity growth. Practice shows that a simple scoring mechanism based on relevance and recency is effective enough in most scenarios.
+
+### 9.3.2 Core Data Structures
+
+The implementation of ContextBuilder relies on two core data structures that define the system's configuration and information units.
+
+(1) ContextPacket: Candidate Information Package
+
+```python
+from dataclasses import dataclass
+from typing import Optional, Dict, Any
+from datetime import datetime
+
+@dataclass
+class ContextPacket:
+    """Candidate information package
+
+    Attributes:
+        content: Information content
+        timestamp: Timestamp
+        token_count: Token count
+        relevance_score: Relevance score (0.0-1.0)
+        metadata: Optional metadata
+    """
+    content: str
+    timestamp: datetime
+    token_count: int
+    relevance_score: float = 0.5
+    metadata: Optional[Dict[str, Any]] = None
+
+    def __post_init__(self):
+        """Post-initialization processing"""
+        if self.metadata is None:
+            self.metadata = {}
+        # Ensure relevance score is within valid range
+        self.relevance_score = max(0.0, min(1.0, self.relevance_score))
+```
+
+`ContextPacket` is the basic unit of information in the system. Each candidate information is encapsulated as a ContextPacket, containing core attributes such as content, timestamp, token count, and relevance score. This unified data structure simplifies subsequent selection and sorting logic.
+
+(2) ContextConfig: Configuration Management
+
+```python
+@dataclass
+class ContextConfig:
+    """Context building configuration
+
+    Attributes:
+        max_tokens: Maximum token count
+        reserve_ratio: Ratio reserved for system instructions (0.0-1.0)
+        min_relevance: Minimum relevance threshold
+        enable_compression: Whether to enable compression
+        recency_weight: Recency weight (0.0-1.0)
+        relevance_weight: Relevance weight (0.0-1.0)
+    """
+    max_tokens: int = 3000
+    reserve_ratio: float = 0.2
+    min_relevance: float = 0.1
+    enable_compression: bool = True
+    recency_weight: float = 0.3
+    relevance_weight: float = 0.7
+
+    def __post_init__(self):
+        """Validate configuration parameters"""
+        assert 0.0 <= self.reserve_ratio <= 1.0, "reserve_ratio must be in [0, 1] range"
+        assert 0.0 <= self.min_relevance <= 1.0, "min_relevance must be in [0, 1] range"
+        assert abs(self.recency_weight + self.relevance_weight - 1.0) < 1e-6, \
+            "recency_weight + relevance_weight must equal 1.0"
+```
+
+`ContextConfig` encapsulates all configurable parameters, making system behavior flexibly adjustable. Particularly noteworthy is the `reserve_ratio` parameter, which ensures that key information such as system instructions always has sufficient space and will not be squeezed out by other information.
+
+### 9.3.3 GSSC Pipeline Detailed Explanation
+
+The core of ContextBuilder is the GSSC (Gather-Select-Structure-Compress) pipeline, which decomposes the context building process into four clear stages. Let's dive into the implementation details of each stage.
+
+(1) Gather: Multi-source Information Collection
+
+The first stage is to collect candidate information from multiple sources. The key to this stage is fault tolerance and flexibility.
+
+```python
+def _gather(
+    self,
+    user_query: str,
+    conversation_history: Optional[List[Message]] = None,
+    system_instructions: Optional[str] = None,
+    custom_packets: Optional[List[ContextPacket]] = None
+) -> List[ContextPacket]:
+    """Collect all candidate information
+
+    Args:
+        user_query: User query
+        conversation_history: Conversation history
+        system_instructions: System instructions
+        custom_packets: Custom information packages
+
+    Returns:
+        List[ContextPacket]: Candidate information list
+    """
+    packets = []
+
+    # 1. Add system instructions (highest priority, not scored)
+    if system_instructions:
+        packets.append(ContextPacket(
+            content=system_instructions,
+            timestamp=datetime.now(),
+            token_count=self._count_tokens(system_instructions),
+            relevance_score=1.0,  # System instructions always retained
+            metadata={"type": "system_instruction", "priority": "high"}
+        ))
+
+    # 2. Retrieve relevant memories from memory system
+    if self.memory_tool:
+        try:
+            memory_results = self.memory_tool.execute(
+                "search",
+                query=user_query,
+                limit=10,
+                min_importance=0.3
+            )
+            # Parse memory results and convert to ContextPacket
+            memory_packets = self._parse_memory_results(memory_results, user_query)
+            packets.extend(memory_packets)
+        except Exception as e:
+            print(f"[WARNING] Memory retrieval failed: {e}")
+
+    # 3. Retrieve relevant knowledge from RAG system
+    if self.rag_tool:
+        try:
+            rag_results = self.rag_tool.execute(
+                "search",
+                query=user_query,
+                limit=5,
+                min_score=0.3
+            )
+            # Parse RAG results and convert to ContextPacket
+            rag_packets = self._parse_rag_results(rag_results, user_query)
+            packets.extend(rag_packets)
+        except Exception as e:
+            print(f"[WARNING] RAG retrieval failed: {e}")
+
+    # 4. Add conversation history (only keep recent N entries)
+    if conversation_history:
+        recent_history = conversation_history[-5:]  # Default keep recent 5 entries
+        for msg in recent_history:
+            packets.append(ContextPacket(
+                content=f"{msg.role}: {msg.content}",
+                timestamp=msg.timestamp if hasattr(msg, 'timestamp') else datetime.now(),
+                token_count=self._count_tokens(msg.content),
+                relevance_score=0.6,  # Base relevance of historical messages
+                metadata={"type": "conversation_history", "role": msg.role}
+            ))
+
+    # 5. Add custom information packages
+    if custom_packets:
+        packets.extend(custom_packets)
+
+    print(f"[ContextBuilder] Collected {len(packets)} candidate information packages")
+    return packets
+```
+
+This implementation demonstrates several important design considerations:
+
+- **Fault Tolerance Mechanism**: Each external data source call is wrapped in try-except, ensuring that failure of a single source does not affect the overall process
+- **Priority Handling**: System instructions are marked as high priority, ensuring they are always retained
+- **History Limitation**: Conversation history only keeps the most recent entries, avoiding the context window being occupied by historical information
+
+(2) Select: Intelligent Information Selection
+
+The second stage is to score and select candidate information based on relevance and recency. This is the core of the entire pipeline and directly determines the quality of the final context.
+
+```python
+def _select(
+    self,
+    packets: List[ContextPacket],
+    user_query: str,
+    available_tokens: int
+) -> List[ContextPacket]:
+    """Select the most relevant information packages
+
+    Args:
+        packets: Candidate information package list
+        user_query: User query (for calculating relevance)
+        available_tokens: Available token count
+
+    Returns:
+        List[ContextPacket]: Selected information package list
+    """
+    # 1. Separate system instructions and other information
+    system_packets = [p for p in packets if p.metadata.get("type") == "system_instruction"]
+    other_packets = [p for p in packets if p.metadata.get("type") != "system_instruction"]
+
+    # 2. Calculate tokens occupied by system instructions
+    system_tokens = sum(p.token_count for p in system_packets)
+    remaining_tokens = available_tokens - system_tokens
+
+    if remaining_tokens <= 0:
+        print("[WARNING] System instructions have occupied all token budget")
+        return system_packets
+
+    # 3. Calculate comprehensive scores for other information
+    scored_packets = []
+    for packet in other_packets:
+        # Calculate relevance score (if not yet calculated)
+        if packet.relevance_score == 0.5:  # Default value, needs recalculation
+            relevance = self._calculate_relevance(packet.content, user_query)
+            packet.relevance_score = relevance
+
+        # Calculate recency score
+        recency = self._calculate_recency(packet.timestamp)
+
+        # Combined score = relevance weight × relevance + recency weight × recency
+        combined_score = (
+            self.config.relevance_weight * packet.relevance_score +
+            self.config.recency_weight * recency
+        )
+
+        # Filter information below minimum relevance threshold
+        if packet.relevance_score >= self.config.min_relevance:
+            scored_packets.append((combined_score, packet))
+
+    # 4. Sort by score in descending order
+    scored_packets.sort(key=lambda x: x[0], reverse=True)
+
+    # 5. Greedy selection: fill from high to low score until token limit is reached
+    selected = system_packets.copy()
+    current_tokens = system_tokens
+
+    for score, packet in scored_packets:
+        if current_tokens + packet.token_count <= available_tokens:
+            selected.append(packet)
+            current_tokens += packet.token_count
+        else:
+            # Token budget is full, stop selection
+            break
+
+    print(f"[ContextBuilder] Selected {len(selected)} information packages, total {current_tokens} tokens")
+    return selected
+
+def _calculate_relevance(self, content: str, query: str) -> float:
+    """Calculate relevance between content and query
+
+    Uses simple keyword overlap algorithm. In production, can be replaced with vector similarity calculation.
+
+    Args:
+        content: Content text
+        query: Query text
+
+    Returns:
+        float: Relevance score (0.0-1.0)
+    """
+    # Tokenization (simple implementation, can use more complex tokenizers)
+    content_words = set(content.lower().split())
+    query_words = set(query.lower().split())
+
+    if not query_words:
+        return 0.0
+
+    # Jaccard similarity
+    intersection = content_words & query_words
+    union = content_words | query_words
+
+    return len(intersection) / len(union) if union else 0.0
+
+def _calculate_recency(self, timestamp: datetime) -> float:
+    """Calculate temporal recency score
+
+    Uses exponential decay model, maintains high score within 24 hours, then gradually decays.
+
+    Args:
+        timestamp: Information timestamp
+
+    Returns:
+        float: Recency score (0.0-1.0)
+    """
+    import math
+
+    age_hours = (datetime.now() - timestamp).total_seconds() / 3600
+
+    # Exponential decay: maintain high score within 24 hours, then gradually decay
+    decay_factor = 0.1  # Decay coefficient
+    recency_score = math.exp(-decay_factor * age_hours / 24)
+
+    return max(0.1, min(1.0, recency_score))  # Limit to [0.1, 1.0] range
+```
+
+The core algorithm of the selection stage embodies several important engineering considerations:
+
+- **Scoring Mechanism**: Uses weighted combination of relevance and recency, with configurable weights
+- **Greedy Algorithm**: Fills from high to low score, ensuring selection of the most valuable information within limited budget
+- **Filtering Mechanism**: Filters low-quality information through the `min_relevance` parameter
+
+(3) Structure: Structured Output
+
+The third stage is to organize selected information into a structured context template.
+
+```python
+def _structure(self, selected_packets: List[ContextPacket], user_query: str) -> str:
+    """Organize selected information packages into structured context template
+
+    Args:
+        selected_packets: Selected information package list
+        user_query: User query
+
+    Returns:
+        str: Structured context string
+    """
+    # Group by type
+    system_instructions = []
+    evidence = []
+    context = []
+
+    for packet in selected_packets:
+        packet_type = packet.metadata.get("type", "general")
+
+        if packet_type == "system_instruction":
+            system_instructions.append(packet.content)
+        elif packet_type in ["rag_result", "knowledge"]:
+            evidence.append(packet.content)
+        else:
+            context.append(packet.content)
+
+    # Build structured template
+    sections = []
+
+    # [Role & Policies]
+    if system_instructions:
+        sections.append("[Role & Policies]\n" + "\n".join(system_instructions))
+
+    # [Task]
+    sections.append(f"[Task]\n{user_query}")
+
+    # [Evidence]
+    if evidence:
+        sections.append("[Evidence]\n" + "\n---\n".join(evidence))
+
+    # [Context]
+    if context:
+        sections.append("[Context]\n" + "\n".join(context))
+
+    # [Output]
+    sections.append("[Output]\nPlease provide accurate, evidence-based answers based on the above information.")
+
+    return "\n\n".join(sections)
+```
+
+The structuring stage organizes scattered information packages into clear sections. This design has several advantages:
+
+- **Readability**: Clear sections make it easier for both humans and models to understand the context structure
+- **Debuggability**: Problem localization is easier, can quickly identify which area has problematic information
+- **Extensibility**: Adding new information sources only requires creating new sections
+
+(4) Compress: Fallback Compression
+
+The fourth stage is to compress over-limit contexts.
+
+```python
+def _compress(self, context: str, max_tokens: int) -> str:
+    """Compress over-limit context
+
+    Args:
+        context: Original context
+        max_tokens: Maximum token limit
+
+    Returns:
+        str: Compressed context
+    """
+    current_tokens = self._count_tokens(context)
+
+    if current_tokens <= max_tokens:
+        return context  # No compression needed
+
+    print(f"[ContextBuilder] Context over limit ({current_tokens} > {max_tokens}), executing compression")
+
+    # Section compression: maintain structural integrity
+    sections = context.split("\n\n")
+    compressed_sections = []
+    current_total = 0
+
+    for section in sections:
+        section_tokens = self._count_tokens(section)
+
+        if current_total + section_tokens <= max_tokens:
+            # Fully retain
+            compressed_sections.append(section)
+            current_total += section_tokens
+        else:
+            # Partially retain
+            remaining_tokens = max_tokens - current_total
+            if remaining_tokens > 50:  # Retain at least 50 tokens
+                # Simple truncation (can use LLM summarization in production)
+                truncated = self._truncate_text(section, remaining_tokens)
+                compressed_sections.append(truncated + "\n[... Content compressed ...]")
+            break
+
+    compressed_context = "\n\n".join(compressed_sections)
+    final_tokens = self._count_tokens(compressed_context)
+    print(f"[ContextBuilder] Compression complete: {current_tokens} -> {final_tokens} tokens")
+
+    return compressed_context
+
+def _truncate_text(self, text: str, max_tokens: int) -> str:
+    """Truncate text to specified token count
+
+    Args:
+        text: Original text
+        max_tokens: Maximum token count
+
+    Returns:
+        str: Truncated text
+    """
+    # Simple implementation: estimate by character ratio
+    # Should use precise tokenizer in production
+    char_per_token = len(text) / self._count_tokens(text) if self._count_tokens(text) > 0 else 4
+    max_chars = int(max_tokens * char_per_token)
+
+    return text[:max_chars]
+
+def _count_tokens(self, text: str) -> int:
+    """Estimate token count of text
+
+    Args:
+        text: Text content
+
+    Returns:
+        int: Token count
+    """
+    # Simple estimation: Chinese 1 char ≈ 1 token, English 1 word ≈ 1.3 tokens
+    # Should use actual tokenizer in production
+    chinese_chars = sum(1 for ch in text if '\u4e00' <= ch <= '\u9fff')
+    english_words = len([w for w in text.split() if w])
+
+    return int(chinese_chars + english_words * 1.3)
+```
+
+The design of the compression stage embodies the principle of "maintaining structural integrity". Even when the token budget is tight, it tries to retain key information from each section.
+
+### 9.3.4 Complete Usage Example
+
+Now let's demonstrate how to use ContextBuilder in actual projects through a complete example.
+
+(1) Basic Usage
+
+```python
+from hello_agents.context import ContextBuilder, ContextConfig
+from hello_agents.tools import MemoryTool, RAGTool
+from hello_agents.core.message import Message
+from datetime import datetime
+
+# 1. Initialize tools
+memory_tool = MemoryTool(user_id="user123")
+rag_tool = RAGTool(knowledge_base_path="./knowledge_base")
+
+# 2. Create ContextBuilder
+config = ContextConfig(
+    max_tokens=3000,
+    reserve_ratio=0.2,
+    min_relevance=0.2,
+    enable_compression=True
+)
+
+builder = ContextBuilder(
+    memory_tool=memory_tool,
+    rag_tool=rag_tool,
+    config=config
+)
+
+# 3. Prepare conversation history
+conversation_history = [
+    Message(content="I'm developing a data analysis tool", role="user", timestamp=datetime.now()),
+    Message(content="Great! Data analysis tools usually need to handle large amounts of data. What tech stack do you plan to use?", role="assistant", timestamp=datetime.now()),
+    Message(content="I plan to use Python and Pandas, and have completed the CSV reading module", role="user", timestamp=datetime.now()),
+    Message(content="Good choice! Pandas is very powerful for data processing. Next you may need to consider data cleaning and transformation.", role="assistant", timestamp=datetime.now()),
+]
+
+# 4. Add some memories
+memory_tool.execute(
+    "add",
+    content="User is developing a data analysis tool using Python and Pandas",
+    memory_type="semantic",
+    importance=0.8
+)
+
+memory_tool.execute(
+    "add",
+    content="Completed development of CSV reading module",
+    memory_type="episodic",
+    importance=0.7
+)
+
+# 5. Build context
+context = builder.build(
+    user_query="How to optimize Pandas memory usage?",
+    conversation_history=conversation_history,
+    system_instructions="You are a senior Python data engineering consultant. Your answers need to: 1) Provide specific actionable advice 2) Explain technical principles 3) Provide code examples"
+)
+
+print("=" * 80)
+print("Built context:")
+print("=" * 80)
+print(context)
+print("=" * 80)
+```
+
+(2) Running Effect Demonstration
+
+After running the above code, you will see the following structured context output:
+
+```
+================================================================================
+Built context:
+================================================================================
+[Role & Policies]
+You are a senior Python data engineering consultant. Your answers need to: 1) Provide specific actionable advice 2) Explain technical principles 3) Provide code examples
+
+[Task]
+How to optimize Pandas memory usage?
+
+[Evidence]
+Core strategies for Pandas memory optimization include:
+1. Use appropriate data types (such as category instead of object)
+2. Read large files in chunks
+3. Use chunksize parameter
+---
+Data type optimization can significantly reduce memory usage. For example, downgrading int64 to int32 can save 50% memory.
+
+[Context]
+user: I'm developing a data analysis tool
+assistant: Great! Data analysis tools usually need to handle large amounts of data. What tech stack do you plan to use?
+user: I plan to use Python and Pandas, and have completed the CSV reading module
+assistant: Good choice! Pandas is very powerful for data processing. Next you may need to consider data cleaning and transformation.
+Memory: User is developing a data analysis tool using Python and Pandas
+Memory: Completed development of CSV reading module
+
+[Output]
+Please provide accurate, evidence-based answers based on the above information.
+================================================================================
+```
+
+This structured context contains all necessary information:
+
+- **[Role & Policies]**: Clarifies the AI's role and answer requirements
+- **[Task]**: Clearly expresses the user's question
+- **[Evidence]**: Relevant knowledge retrieved from the RAG system
+- **[Context]**: Conversation history and related memories, providing sufficient background information
+- **[Output]**: Guides the LLM on how to organize the answer
+
+(3) Integration with Agent
+
+Finally, let's demonstrate how to integrate ContextBuilder into an Agent:
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry
+from hello_agents.context import ContextBuilder, ContextConfig
+from hello_agents.tools import MemoryTool, RAGTool
+
+class ContextAwareAgent(SimpleAgent):
+    """Agent with context awareness capability"""
+
+    def __init__(self, name: str, llm: HelloAgentsLLM, **kwargs):
+        super().__init__(name=name, llm=llm, system_prompt=kwargs.get("system_prompt", ""))
+
+        # Initialize context builder
+        self.memory_tool = MemoryTool(user_id=kwargs.get("user_id", "default"))
+        self.rag_tool = RAGTool(knowledge_base_path=kwargs.get("knowledge_base_path", "./kb"))
+
+        self.context_builder = ContextBuilder(
+            memory_tool=self.memory_tool,
+            rag_tool=self.rag_tool,
+            config=ContextConfig(max_tokens=4000)
+        )
+
+        self.conversation_history = []
+
+    def run(self, user_input: str) -> str:
+        """Run Agent, automatically build optimized context"""
+
+        # 1. Use ContextBuilder to build optimized context
+        optimized_context = self.context_builder.build(
+            user_query=user_input,
+            conversation_history=self.conversation_history,
+            system_instructions=self.system_prompt
+        )
+
+        # 2. Call LLM with optimized context
+        messages = [
+            {"role": "system", "content": optimized_context},
+            {"role": "user", "content": user_input}
+        ]
+        response = self.llm.invoke(messages)
+
+        # 3. Update conversation history
+        from hello_agents.core.message import Message
+        from datetime import datetime
+
+        self.conversation_history.append(
+            Message(content=user_input, role="user", timestamp=datetime.now())
+        )
+        self.conversation_history.append(
+            Message(content=response, role="assistant", timestamp=datetime.now())
+        )
+
+        # 4. Record important interactions to memory system
+        self.memory_tool.execute(
+            "add",
+            content=f"Q: {user_input}\nA: {response[:200]}...",  # Summary
+            memory_type="episodic",
+            importance=0.6
+        )
+
+        return response
+
+# Usage example
+agent = ContextAwareAgent(
+    name="Data Analysis Consultant",
+    llm=HelloAgentsLLM(),
+    system_prompt="You are a senior Python data engineering consultant.",
+    user_id="user123",
+    knowledge_base_path="./data_science_kb"
+)
+
+response = agent.run("How to optimize Pandas memory usage?")
+print(response)
+```
+
+Through this approach, ContextBuilder becomes the "context management brain" of the Agent, automatically handling information collection, filtering, and organization, allowing the Agent to always reason and generate under optimal context.
+
+### 9.3.5 Best Practices and Optimization Recommendations
+
+When actually applying ContextBuilder, the following best practices are worth noting:
+
+1. **Dynamically adjust token budget**: Dynamically adjust `max_tokens` based on task complexity, use smaller budgets for simple tasks, increase budgets for complex tasks.
+
+2. **Relevance calculation optimization**: In production environments, replace simple keyword overlap with vector similarity calculation to improve retrieval quality.
+
+3. **Caching mechanism**: For unchanging system instructions and knowledge base content, implement caching mechanisms to avoid repeated calculations.
+
+4. **Monitoring and logging**: Record statistical information for each context build (number of selected information, token usage rate, etc.) for subsequent optimization.
+
+5. **A/B testing**: For key parameters (such as relevance weight, recency weight), find optimal configuration through A/B testing.
+
+## 9.4 NoteTool: Structured Notes
+
+NoteTool is a structured external memory component provided for "long-horizon tasks". It uses Markdown files as carriers, with YAML front matter in the header to record key information, and the body to record status, conclusions, blockers, and action items. This design combines human readability, version control friendliness, and ease of re-injecting into context, making it an important tool for building long-horizon agents.
+
+### 9.4.1 Design Philosophy and Application Scenarios
+
+Before diving into implementation details, let's first understand the design philosophy and typical application scenarios of NoteTool.
+
+(1) Why do we need NoteTool?
+
+In Chapter 8, we introduced MemoryTool, which provides powerful memory management capabilities. However, MemoryTool mainly focuses on **conversational memory**—short-term working memory, episodic memory, and semantic memory. For **project-based tasks** that require long-term tracking and structured management, we need a lighter, more human-friendly recording method.
+
+NoteTool fills this gap by providing:
+
+- **Structured recording**: Uses Markdown + YAML format, suitable for both machine parsing and human reading and editing
+- **Version friendly**: Plain text format, naturally supports version control systems like Git
+- **Low overhead**: No complex database operations required, suitable for lightweight state tracking
+- **Flexible categorization**: Flexibly organize notes through `type` and `tags`, supporting multi-dimensional retrieval
+
+(2) Typical Application Scenarios
+
+NoteTool is particularly suitable for the following scenarios:
+
+**Scenario 1: Long-term Project Tracking**
+
+Imagine an agent is assisting with a large codebase refactoring task, which may take days or even weeks. NoteTool can record:
+
+- `task_state`: Current stage task status and progress
+- `conclusion`: Key conclusions after each stage ends
+- `blocker`: Problems and blocking points encountered
+- `action`: Next action plan
+
+```python
+# Record task status
+notes.run({
+    "action": "create",
+    "title": "Refactoring Project - Phase 1",
+    "content": "Completed refactoring of data model layer, test coverage reached 85%. Next will refactor business logic layer.",
+    "note_type": "task_state",
+    "tags": ["refactoring", "phase1"]
+})
+
+# Record blocker
+notes.run({
+    "action": "create",
+    "title": "Dependency Conflict Issue",
+    "content": "Found some third-party library versions incompatible, need to resolve. Impact scope: 3 modules in business logic layer.",
+    "note_type": "blocker",
+    "tags": ["dependency", "urgent"]
+})
+```
+
+**Scenario 2: Research Task Management**
+
+An intelligent research assistant conducting literature review can use NoteTool to record:
+
+- Core viewpoints of each paper (`conclusion`)
+- Topics to be investigated in depth (`action`)
+- Important references (`reference`)
+
+**Scenario 3: Cooperation with ContextBuilder**
+
+Before each round of dialogue, the Agent can retrieve relevant notes through `search` or `list` operations and inject them into the context:
+
+```python
+# In Agent's run method
+def run(self, user_input: str) -> str:
+    # 1. Retrieve relevant notes
+    relevant_notes = self.note_tool.run({
+        "action": "search",
+        "query": user_input,
+        "limit": 3
+    })
+
+    # 2. Convert note content to ContextPacket
+    note_packets = []
+    for note in relevant_notes:
+        note_packets.append(ContextPacket(
+            content=note['content'],
+            timestamp=note['updated_at'],
+            token_count=self._count_tokens(note['content']),
+            relevance_score=0.7,
+            metadata={"type": "note", "note_type": note['type']}
+        ))
+
+    # 3. Pass notes when building context
+    context = self.context_builder.build(
+        user_query=user_input,
+        custom_packets=note_packets,
+        ...
+    )
+```
+
+### 9.4.2 Storage Format Detailed Explanation
+
+NoteTool adopts a hybrid format of Markdown + YAML, which balances structure and readability.
+
+(1) Note File Format
+
+Each note is an independent `.md` file with the following format:
+
+```markdown
+---
+id: note_20250119_153000_0
+title: Project Progress - Phase 1
+type: task_state
+tags: [refactoring, phase1, backend]
+created_at: 2025-01-19T15:30:00
+updated_at: 2025-01-19T15:30:00
+---
+
+# Project Progress - Phase 1
+
+## Completion Status
+
+Completed refactoring of data model layer, main changes include:
+
+1. Unified entity class naming conventions
+2. Introduced type hints to improve code maintainability
+3. Optimized database query performance
+
+## Test Coverage
+
+- Unit test coverage: 85%
+- Integration test coverage: 70%
+
+## Next Steps
+
+1. Refactor business logic layer
+2. Resolve dependency conflict issues
+3. Increase integration test coverage to 85%
+```
+
+Advantages of this format:
+
+- **YAML metadata**: Machine-parsable, supports precise field extraction and retrieval
+- **Markdown body**: Human-readable, supports rich formatting (headings, lists, code blocks, etc.)
+- **Filename as ID**: Simplifies management, each note's filename is its unique identifier
+
+(2) Index File
+
+NoteTool maintains a `notes_index.json` file for quick retrieval and management of notes:
+
+```json
+{
+  "note_20250119_153000_0": {
+    "id": "note_20250119_153000_0",
+    "title": "Project Progress - Phase 1",
+    "type": "task_state",
+    "tags": ["refactoring", "phase1", "backend"],
+    "created_at": "2025-01-19T15:30:00",
+    "updated_at": "2025-01-19T15:30:00",
+    "file_path": "./notes/note_20250119_153000_0.md"
+  }
+}
+```
+
+The role of this index file:
+
+- **Quick retrieval**: No need to open each file, search directly from the index
+- **Metadata management**: Centrally manage metadata for all notes
+- **Integrity check**: Can detect missing or corrupted files
+
+### 9.4.3 Core Operations Detailed Explanation
+
+NoteTool provides seven core operations covering the complete lifecycle management of notes.
+
+(1) create: Create Note
+
+```python
+def _create_note(
+    self,
+    title: str,
+    content: str,
+    note_type: str = "general",
+    tags: Optional[List[str]] = None
+) -> str:
+    """Create note
+
+    Args:
+        title: Note title
+        content: Note content (Markdown format)
+        note_type: Note type (task_state/conclusion/blocker/action/reference/general)
+        tags: Tag list
+
+    Returns:
+        str: Note ID
+    """
+    from datetime import datetime
+
+    # 1. Generate unique ID
+    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
+    note_id = f"note_{timestamp}_{len(self.index)}"
+
+    # 2. Build metadata
+    metadata = {
+        "id": note_id,
+        "title": title,
+        "type": note_type,
+        "tags": tags or [],
+        "created_at": datetime.now().isoformat(),
+        "updated_at": datetime.now().isoformat()
+    }
+
+    # 3. Build complete Markdown file content
+    md_content = self._build_markdown(metadata, content)
+
+    # 4. Save to file
+    file_path = os.path.join(self.workspace, f"{note_id}.md")
+    with open(file_path, 'w', encoding='utf-8') as f:
+        f.write(md_content)
+
+    # 5. Update index
+    metadata["file_path"] = file_path
+    self.index[note_id] = metadata
+    self._save_index()
+
+    return note_id
+
+def _build_markdown(self, metadata: Dict, content: str) -> str:
+    """Build Markdown file content (YAML + body)"""
+    import yaml
+
+    # YAML front matter
+    yaml_header = yaml.dump(metadata, allow_unicode=True, sort_keys=False)
+
+    # Combined format
+    return f"---\n{yaml_header}---\n\n{content}"
+```
+
+Usage example:
+
+```python
+from hello_agents.tools import NoteTool
+
+notes = NoteTool(workspace="./project_notes")
+
+note_id = notes.run({
+    "action": "create",
+    "title": "Refactoring Project - Phase 1",
+    "content": """## Completion Status
+Completed refactoring of data model layer, test coverage reached 85%.
+
+## Next Steps
+Refactor business logic layer""",
+    "note_type": "task_state",
+    "tags": ["refactoring", "phase1"]
+})
+
+print(f"✅ Note created successfully, ID: {note_id}")
+```
+
+(2) read: Read Note
+
+```python
+def _read_note(self, note_id: str) -> Dict:
+    """Read note content
+
+    Args:
+        note_id: Note ID
+
+    Returns:
+        Dict: Dictionary containing metadata and content
+    """
+    if note_id not in self.index:
+        raise ValueError(f"Note does not exist: {note_id}")
+
+    file_path = self.index[note_id]["file_path"]
+
+    # Read file
+    with open(file_path, 'r', encoding='utf-8') as f:
+        raw_content = f.read()
+
+    # Parse YAML metadata and Markdown body
+    metadata, content = self._parse_markdown(raw_content)
+
+    return {
+        "metadata": metadata,
+        "content": content
+    }
+
+def _parse_markdown(self, raw_content: str) -> Tuple[Dict, str]:
+    """Parse Markdown file (separate YAML and body)"""
+    import yaml
+
+    # Find YAML delimiters
+    parts = raw_content.split('---\n', 2)
+
+    if len(parts) >= 3:
+        # Has YAML front matter
+        yaml_str = parts[1]
+        content = parts[2].strip()
+        metadata = yaml.safe_load(yaml_str)
+    else:
+        # No metadata, all as body
+        metadata = {}
+        content = raw_content.strip()
+
+    return metadata, content
+```
+
+(3) update: Update Note
+
+```python
+def _update_note(
+    self,
+    note_id: str,
+    title: Optional[str] = None,
+    content: Optional[str] = None,
+    note_type: Optional[str] = None,
+    tags: Optional[List[str]] = None
+) -> str:
+    """Update note
+
+    Args:
+        note_id: Note ID
+        title: New title (optional)
+        content: New content (optional)
+        note_type: New type (optional)
+        tags: New tags (optional)
+
+    Returns:
+        str: Operation result message
+    """
+    if note_id not in self.index:
+        raise ValueError(f"Note does not exist: {note_id}")
+
+    # 1. Read existing note
+    note = self._read_note(note_id)
+    metadata = note["metadata"]
+    old_content = note["content"]
+
+    # 2. Update fields
+    if title:
+        metadata["title"] = title
+    if note_type:
+        metadata["type"] = note_type
+    if tags is not None:
+        metadata["tags"] = tags
+    if content is not None:
+        old_content = content
+
+    # Update timestamp
+    from datetime import datetime
+    metadata["updated_at"] = datetime.now().isoformat()
+
+    # 3. Rebuild and save
+    md_content = self._build_markdown(metadata, old_content)
+    file_path = metadata["file_path"]
+
+    with open(file_path, 'w', encoding='utf-8') as f:
+        f.write(md_content)
+
+    # 4. Update index
+    self.index[note_id] = metadata
+    self._save_index()
+
+    return f"✅ Note updated: {metadata['title']}"
+```
+
+(4) search: Search Notes
+
+```python
+def _search_notes(
+    self,
+    query: str,
+    limit: int = 10,
+    note_type: Optional[str] = None,
+    tags: Optional[List[str]] = None
+) -> List[Dict]:
+    """Search notes
+
+    Args:
+        query: Search keyword
+        limit: Return quantity limit
+        note_type: Filter by type (optional)
+        tags: Filter by tags (optional)
+
+    Returns:
+        List[Dict]: List of matching notes
+    """
+    results = []
+    query_lower = query.lower()
+
+    for note_id, metadata in self.index.items():
+        # Type filter
+        if note_type and metadata.get("type") != note_type:
+            continue
+
+        # Tag filter
+        if tags:
+            note_tags = set(metadata.get("tags", []))
+            if not note_tags.intersection(tags):
+                continue
+
+        # Read note content
+        try:
+            note = self._read_note(note_id)
+            content = note["content"]
+            title = metadata.get("title", "")
+
+            # Search in title and content
+            if query_lower in title.lower() or query_lower in content.lower():
+                results.append({
+                    "note_id": note_id,
+                    "title": title,
+                    "type": metadata.get("type"),
+                    "tags": metadata.get("tags", []),
+                    "content": content,
+                    "updated_at": metadata.get("updated_at")
+                })
+        except Exception as e:
+            print(f"[WARNING] Failed to read note {note_id}: {e}")
+            continue
+
+    # Sort by update time
+    results.sort(key=lambda x: x["updated_at"], reverse=True)
+
+    return results[:limit]
+```
+
+(5) list: List Notes
+
+```python
+def _list_notes(
+    self,
+    note_type: Optional[str] = None,
+    tags: Optional[List[str]] = None,
+    limit: int = 20
+) -> List[Dict]:
+    """List notes (in reverse chronological order by update time)
+
+    Args:
+        note_type: Filter by type (optional)
+        tags: Filter by tags (optional)
+        limit: Return quantity limit
+
+    Returns:
+        List[Dict]: List of note metadata
+    """
+    results = []
+
+    for note_id, metadata in self.index.items():
+        # Type filter
+        if note_type and metadata.get("type") != note_type:
+            continue
+
+        # Tag filter
+        if tags:
+            note_tags = set(metadata.get("tags", []))
+            if not note_tags.intersection(tags):
+                continue
+
+        results.append(metadata)
+
+    # Sort by update time
+    results.sort(key=lambda x: x.get("updated_at", ""), reverse=True)
+
+    return results[:limit]
+```
+
+(6) summary: Note Summary
+
+```python
+def _summary(self) -> Dict[str, Any]:
+    """Generate note summary statistics
+
+    Returns:
+        Dict: Statistical information
+    """
+    total_count = len(self.index)
+
+    # Count by type
+    type_counts = {}
+    for metadata in self.index.values():
+        note_type = metadata.get("type", "general")
+        type_counts[note_type] = type_counts.get(note_type, 0) + 1
+
+    # Recently updated notes
+    recent_notes = sorted(
+        self.index.values(),
+        key=lambda x: x.get("updated_at", ""),
+        reverse=True
+    )[:5]
+
+    return {
+        "total_notes": total_count,
+        "type_distribution": type_counts,
+        "recent_notes": [
+            {
+                "id": note["id"],
+                "title": note.get("title", ""),
+                "type": note.get("type"),
+                "updated_at": note.get("updated_at")
+            }
+            for note in recent_notes
+        ]
+    }
+```
+
+(7) delete: Delete Note
+
+```python
+def _delete_note(self, note_id: str) -> str:
+    """Delete note
+
+    Args:
+        note_id: Note ID
+
+    Returns:
+        str: Operation result message
+    """
+    if note_id not in self.index:
+        raise ValueError(f"Note does not exist: {note_id}")
+
+    # 1. Delete file
+    file_path = self.index[note_id]["file_path"]
+    if os.path.exists(file_path):
+        os.remove(file_path)
+
+    # 2. Remove from index
+    title = self.index[note_id].get("title", note_id)
+    del self.index[note_id]
+    self._save_index()
+
+    return f"✅ Note deleted: {title}"
+```
+
+### 9.4.4 Deep Integration with ContextBuilder
+
+The true power of NoteTool lies in its combined use with ContextBuilder. Let's demonstrate this integration through a complete case study.
+
+(1) Scenario Setup
+
+Suppose we are building a long-term project assistant that needs to:
+
+1. Record phased progress of the project
+2. Track pending issues
+3. Automatically review relevant notes during each conversation
+4. Provide coherent recommendations based on historical notes
+
+(2) Implementation Example
+
+```python
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.context import ContextBuilder, ContextConfig, ContextPacket
+from hello_agents.tools import MemoryTool, RAGTool, NoteTool
+from datetime import datetime
+
+class ProjectAssistant(SimpleAgent):
+    """Long-term project assistant, integrating NoteTool and ContextBuilder"""
+
+    def __init__(self, name: str, project_name: str, **kwargs):
+        super().__init__(name=name, llm=HelloAgentsLLM(), **kwargs)
+
+        self.project_name = project_name
+
+        # Initialize tools
+        self.memory_tool = MemoryTool(user_id=project_name)
+        self.rag_tool = RAGTool(knowledge_base_path=f"./{project_name}_kb")
+        self.note_tool = NoteTool(workspace=f"./{project_name}_notes")
+
+        # Initialize context builder
+        self.context_builder = ContextBuilder(
+            memory_tool=self.memory_tool,
+            rag_tool=self.rag_tool,
+            config=ContextConfig(max_tokens=4000)
+        )
+
+        self.conversation_history = []
+
+    def run(self, user_input: str, note_as_action: bool = False) -> str:
+        """Run assistant, automatically integrate notes"""
+
+        # 1. Retrieve relevant notes from NoteTool
+        relevant_notes = self._retrieve_relevant_notes(user_input)
+
+        # 2. Convert notes to ContextPacket
+        note_packets = self._notes_to_packets(relevant_notes)
+
+        # 3. Build optimized context
+        context = self.context_builder.build(
+            user_query=user_input,
+            conversation_history=self.conversation_history,
+            system_instructions=self._build_system_instructions(),
+            custom_packets=note_packets
+        )
+
+        # 4. Call LLM
+        response = self.llm.invoke(context)
+
+        # 5. If needed, record interaction as note
+        if note_as_action:
+            self._save_as_note(user_input, response)
+
+        # 6. Update conversation history
+        self._update_history(user_input, response)
+
+        return response
+
+    def _retrieve_relevant_notes(self, query: str, limit: int = 3) -> List[Dict]:
+        """Retrieve relevant notes"""
+        try:
+            # Prioritize retrieving blocker and action type notes
+            blockers = self.note_tool.run({
+                "action": "list",
+                "note_type": "blocker",
+                "limit": 2
+            })
+
+            # General search
+            search_results = self.note_tool.run({
+                "action": "search",
+                "query": query,
+                "limit": limit
+            })
+
+            # Merge and deduplicate
+            all_notes = {note['note_id']: note for note in blockers + search_results}
+            return list(all_notes.values())[:limit]
+
+        except Exception as e:
+            print(f"[WARNING] Note retrieval failed: {e}")
+            return []
+
+    def _notes_to_packets(self, notes: List[Dict]) -> List[ContextPacket]:
+        """Convert notes to context packets"""
+        packets = []
+
+        for note in notes:
+            content = f"[Note: {note['title']}]\n{note['content']}"
+
+            packets.append(ContextPacket(
+                content=content,
+                timestamp=datetime.fromisoformat(note['updated_at']),
+                token_count=len(content) // 4,  # Simple estimation
+                relevance_score=0.75,  # Notes have high relevance
+                metadata={
+                    "type": "note",
+                    "note_type": note['type'],
+                    "note_id": note['note_id']
+                }
+            ))
+
+        return packets
+
+    def _save_as_note(self, user_input: str, response: str):
+        """Save interaction as note"""
+        try:
+            # Determine what type of note to save
+            if "problem" in user_input.lower() or "blocker" in user_input.lower():
+                note_type = "blocker"
+            elif "plan" in user_input.lower() or "next" in user_input.lower():
+                note_type = "action"
+            else:
+                note_type = "conclusion"
+
+            self.note_tool.run({
+                "action": "create",
+                "title": f"{user_input[:30]}...",
+                "content": f"## Question\n{user_input}\n\n## Analysis\n{response}",
+                "note_type": note_type,
+                "tags": [self.project_name, "auto_generated"]
+            })
+
+        except Exception as e:
+            print(f"[WARNING] Failed to save note: {e}")
+
+    def _build_system_instructions(self) -> str:
+        """Build system instructions"""
+        return f"""You are a long-term assistant for the {self.project_name} project.
+
+Your responsibilities:
+1. Provide coherent recommendations based on historical notes
+2. Track project progress and pending issues
+3. Reference relevant historical notes when answering
+4. Provide specific, actionable next-step recommendations
+
+Notes:
+- Prioritize issues marked as blockers
+- Indicate source of basis in recommendations (notes, memory, or knowledge base)
+- Maintain awareness of overall project progress"""
+
+    def _update_history(self, user_input: str, response: str):
+        """Update conversation history"""
+        from hello_agents.core.message import Message
+
+        self.conversation_history.append(
+            Message(content=user_input, role="user", timestamp=datetime.now())
+        )
+        self.conversation_history.append(
+            Message(content=response, role="assistant", timestamp=datetime.now())
+        )
+
+        # Limit history length
+        if len(self.conversation_history) > 10:
+            self.conversation_history = self.conversation_history[-10:]
+
+# Usage example
+assistant = ProjectAssistant(
+    name="Project Assistant",
+    project_name="data_pipeline_refactoring"
+)
+
+# First interaction: Record project status
+response = assistant.run(
+    "We have completed refactoring of the data model layer, test coverage reached 85%. Next plan is to refactor the business logic layer.",
+    note_as_action=True
+)
+
+# Second interaction: Raise issue
+response = assistant.run(
+    "When refactoring the business logic layer, I encountered dependency version conflict issues. How should I resolve this?"
+)
+
+# View note summary
+summary = assistant.note_tool.run({"action": "summary"})
+print(summary)
+```
+
+(3) Running Effect Demonstration
+
+```bash
+[ContextBuilder] Collected 8 candidate information packages
+[ContextBuilder] Selected 7 information packages, total 3500 tokens
+
+✅ Assistant answer:
+
+I noticed this issue was mentioned in your previously recorded notes. According to the note [Refactoring Project - Phase 1], your current test coverage has reached 85%, which is a good foundation.
+
+Regarding the dependency version conflict issue, I recommend:
+
+1. **Use virtual environment isolation**: Create an independent virtual environment for the business logic layer to avoid dependency conflicts with other modules
+2. **Lock versions**: Explicitly specify exact versions of all dependencies in requirements.txt
+3. **Use pipdeptree**: Analyze the dependency tree to find the root cause of conflicts
+
+I will mark this issue as a blocker and recommend prioritizing its resolution.
+
+[Source: Note note_20250119_153000_0, Project knowledge base]
+
+---
+
+📋 Note summary:
+{
+  "total_notes": 2,
+  "type_distribution": {
+    "action": 1,
+    "blocker": 1
+  },
+  "recent_notes": [
+    {
+      "id": "note_20250119_154500_1",
+      "title": "When refactoring the business logic layer, I encountered dependency version conflict issues...",
+      "type": "blocker",
+      "updated_at": "2025-01-19T15:45:00"
+    },
+    {
+      "id": "note_20250119_153000_0",
+      "title": "We have completed refactoring of the data model layer...",
+      "type": "action",
+      "updated_at": "2025-01-19T15:30:00"
+    }
+  ]
+}
+```
+
+### 9.4.5 Best Practices
+
+When actually using NoteTool, the following best practices can help you build more powerful long-horizon agents:
+
+1. **Reasonable note classification**:
+   - `task_state`: Record phased progress and status
+   - `conclusion`: Record important conclusions and findings
+   - `blocker`: Record blocking issues, highest priority
+   - `action`: Record next action plans
+   - `reference`: Record important reference materials
+
+2. **Regular cleanup and archiving**:
+   - For resolved blockers, update to conclusion
+   - For outdated actions, delete or update promptly
+   - Use tags for version management, such as `["v1.0", "completed"]`
+
+3. **Cooperation with ContextBuilder**:
+   - Retrieve relevant notes before each round of dialogue
+   - Set different relevance scores based on note type (blocker > action > conclusion)
+   - Limit number of notes to avoid context overload
+
+4. **Human-machine collaboration**:
+   - Notes are in human-readable Markdown format, supporting manual editing
+   - Use Git for version control to track note evolution
+   - At key stages, manually review notes generated by Agent
+
+5. **Automated workflow**:
+   - Regularly generate note summary reports
+   - Automatically generate project progress documents based on notes
+   - Synchronize note content to other systems (such as Notion, Confluence)
+
+## 9.5 TerminalTool: Instant File System Access
+
+In previous chapters, we introduced MemoryTool and RAGTool, which provide conversational memory and knowledge retrieval capabilities respectively. However, in many practical scenarios, agents need **instant access and exploration of the file system**—viewing log files, analyzing codebase structure, retrieving configuration files, etc. This is where TerminalTool comes in.
+
+TerminalTool provides agents with **secure command-line execution capability**, supporting common file system and text processing commands, while ensuring system security through multi-layer security mechanisms. This design implements the "Just-in-time (JIT) context" concept mentioned in Section 9.2.2—agents don't need to preload all files, but explore and retrieve on demand.
+
+### 9.5.1 Design Philosophy and Security Mechanisms
+
+(1) Why do we need TerminalTool?
+
+When building long-horizon agents, we often encounter the following scenarios:
+
+**Scenario 1: Codebase Exploration**
+
+A development assistant needs to help users understand the structure of a large codebase:
+
+```python
+# Traditional approach: Pre-index all files (high cost, may be outdated)
+rag_tool.add_document("./project/**/*.py")  # Time-consuming, occupies large storage
+
+# TerminalTool approach: Instant exploration
+terminal.run({"command": "find . -name '*.py' -type f"})  # Fast, real-time
+terminal.run({"command": "grep -r 'class UserService' ."})  # Precise location
+terminal.run({"command": "head -n 50 src/services/user.py"})  # View on demand
+```
+
+**Scenario 2: Log File Analysis**
+
+An operations assistant needs to analyze application logs:
+
+```python
+# Check log file size
+terminal.run({"command": "ls -lh /var/log/app.log"})
+
+# View latest error logs
+terminal.run({"command": "tail -n 100 /var/log/app.log | grep ERROR"})
+
+# Count error type distribution
+terminal.run({"command": "grep ERROR /var/log/app.log | cut -d':' -f3 | sort | uniq -c"})
+```
+
+**Scenario 3: Data File Preview**
+
+A data analysis assistant needs to quickly understand the structure of data files:
+
+```python
+# View first few lines of CSV file
+terminal.run({"command": "head -n 5 data/sales.csv"})
+
+# Count lines
+terminal.run({"command": "wc -l data/*.csv"})
+
+# View column names
+terminal.run({"command": "head -n 1 data/sales.csv | tr ',' '\n'"})
+```
+
+The common characteristic of these scenarios is: **need real-time, lightweight file system access, rather than pre-indexing and vectorization**. TerminalTool is designed precisely for this "exploratory" workflow.
+
+(2) Security Mechanism Detailed Explanation
+
+Allowing agents to execute commands is a powerful but dangerous capability. TerminalTool ensures system security through multi-layer security mechanisms:
+
+**First Layer: Command Whitelist**
+
+Only allow safe read-only commands, completely prohibit any operations that may modify the system:
+
+```python
+ALLOWED_COMMANDS = {
+    # File listing and information
+    'ls', 'dir', 'tree',
+    # File content viewing
+    'cat', 'head', 'tail', 'less', 'more',
+    # File search
+    'find', 'grep', 'egrep', 'fgrep',
+    # Text processing
+    'wc', 'sort', 'uniq', 'cut', 'awk', 'sed',
+    # Directory operations
+    'pwd', 'cd',
+    # File information
+    'file', 'stat', 'du', 'df',
+    # Others
+    'echo', 'which', 'whereis',
+}
+```
+
+If the agent attempts to execute commands outside the whitelist, it will be immediately rejected:
+
+```python
+terminal.run({"command": "rm -rf /"})
+# ❌ Command not allowed: rm
+# Allowed commands: cat, cd, cut, dir, du, ...
+```
+
+**Second Layer: Working Directory Restriction (Sandbox)**
+
+TerminalTool can only access the specified working directory and its subdirectories, cannot access other parts of the system:
+
+```python
+# Specify working directory during initialization
+terminal = TerminalTool(workspace="./project")
+
+# Allowed: Access files within working directory
+terminal.run({"command": "cat ./src/main.py"})  # ✅
+
+# Prohibited: Access files outside working directory
+terminal.run({"command": "cat /etc/passwd"})  # ❌ Not allowed to access paths outside working directory
+
+# Prohibited: Escape through ..
+terminal.run({"command": "cd ../../../etc"})  # ❌ Not allowed to access paths outside working directory
+```
+
+This sandbox mechanism ensures that even if the agent's behavior is abnormal, it cannot affect other parts of the system.
+
+**Third Layer: Timeout Control**
+
+Each command has an execution time limit to prevent infinite loops or resource exhaustion:
+
+```python
+terminal = TerminalTool(
+    workspace="./project",
+    timeout=30  # 30 second timeout
+)
+
+# If command execution exceeds 30 seconds
+terminal.run({"command": "find / -name '*.log'"})
+# ❌ Command execution timeout (exceeded 30 seconds)
+```
+
+**Fourth Layer: Output Size Limit**
+
+Limit the size of command output to prevent memory overflow:
+
+```python
+terminal = TerminalTool(
+    workspace="./project",
+    max_output_size=10 * 1024 * 1024  # 10MB
+)
+
+# If output exceeds 10MB
+terminal.run({"command": "cat huge_file.log"})
+# ... (first 10MB of content) ...
+# ⚠️ Output truncated (exceeded 10485760 bytes)
+```
+
+Through these four layers of security mechanisms, TerminalTool provides powerful capabilities while maximizing system security.
+
+### 9.5.2 Core Functionality Detailed Explanation
+
+The implementation of TerminalTool focuses on two core functions: command execution and directory navigation.
+
+(1) Command Execution
+
+The core `_execute_command` method is responsible for actually executing commands:
+
+```python
+def _execute_command(self, command: str) -> str:
+    """Execute command"""
+    try:
+        # Execute command in current directory
+        result = subprocess.run(
+            command,
+            shell=True,
+            cwd=str(self.current_dir),  # Execute in current working directory
+            capture_output=True,
+            text=True,
+            timeout=self.timeout,
+            env=os.environ.copy()
+        )
+
+        # Merge standard output and standard error
+        output = result.stdout
+        if result.stderr:
+            output += f"\n[stderr]\n{result.stderr}"
+
+        # Check output size
+        if len(output) > self.max_output_size:
+            output = output[:self.max_output_size]
+            output += f"\n\n⚠️ Output truncated (exceeded {self.max_output_size} bytes)"
+
+        # Add return code information
+        if result.returncode != 0:
+            output = f"⚠️ Command return code: {result.returncode}\n\n{output}"
+
+        return output if output else "✅ Command executed successfully (no output)"
+
+    except subprocess.TimeoutExpired:
+        return f"❌ Command execution timeout (exceeded {self.timeout} seconds)"
+    except Exception as e:
+        return f"❌ Command execution failed: {e}"
+```
+
+Key points of this implementation:
+
+- **Current directory awareness**: Use `cwd` parameter to execute commands in the correct directory
+- **Error handling**: Capture and merge standard error, provide complete diagnostic information
+- **Return code check**: Non-zero return codes are marked as warnings
+- **Fault-tolerant design**: Timeouts and exceptions are handled properly, won't cause agent to crash
+
+(2) Directory Navigation
+
+Special handling of the `cd` command supports agent navigation in the file system:
+
+```python
+def _handle_cd(self, parts: List[str]) -> str:
+    """Handle cd command"""
+    if not self.allow_cd:
+        return "❌ cd command is disabled"
+
+    if len(parts) < 2:
+        # cd without parameters, return current directory
+        return f"Current directory: {self.current_dir}"
+
+    target_dir = parts[1]
+
+    # Handle relative path
+    if target_dir == "..":
+        new_dir = self.current_dir.parent
+    elif target_dir == ".":
+        new_dir = self.current_dir
+    elif target_dir == "~":
+        new_dir = self.workspace
+    else:
+        new_dir = (self.current_dir / target_dir).resolve()
+
+    # Check if within working directory
+    try:
+        new_dir.relative_to(self.workspace)
+    except ValueError:
+        return f"❌ Not allowed to access paths outside working directory: {new_dir}"
+
+    # Check if directory exists
+    if not new_dir.exists():
+        return f"❌ Directory does not exist: {new_dir}"
+
+    if not new_dir.is_dir():
+        return f"❌ Not a directory: {new_dir}"
+
+    # Update current directory
+    self.current_dir = new_dir
+    return f"✅ Switched to directory: {self.current_dir}"
+```
+
+This design supports agents in multi-step file system exploration:
+
+```python
+# Step 1: View project structure
+terminal.run({"command": "ls -la"})
+
+# Step 2: Enter source code directory
+terminal.run({"command": "cd src"})
+
+# Step 3: Find specific files
+terminal.run({"command": "find . -name '*service*.py'"})
+
+# Step 4: View file content
+terminal.run({"command": "cat user_service.py"})
+```
+
+### 9.5.3 Typical Usage Patterns
+
+TerminalTool supports various common file system operation patterns.
+
+(1) Exploratory Navigation
+
+Agents can explore codebases step by step like human developers:
+
+```python
+from hello_agents.tools import TerminalTool
+
+terminal = TerminalTool(workspace="./my_project")
+
+# Step 1: View project root directory
+print(terminal.run({"command": "ls -la"}))
+"""
+total 24
+drwxr-xr-x  6 user  staff   192 Jan 19 16:00 .
+drwxr-xr-x  5 user  staff   160 Jan 19 15:30 ..
+-rw-r--r--  1 user  staff  1234 Jan 19 15:30 README.md
+drwxr-xr-x  4 user  staff   128 Jan 19 15:30 src
+drwxr-xr-x  3 user  staff    96 Jan 19 15:30 tests
+-rw-r--r--  1 user  staff   456 Jan 19 15:30 requirements.txt
+"""
+
+# Step 2: View source code directory structure
+terminal.run({"command": "cd src"})
+print(terminal.run({"command": "tree"}))
+
+# Step 3: Search for specific patterns
+print(terminal.run({"command": "grep -r 'def process' ."}))
+```
+
+(2) Data File Analysis
+
+Quickly understand the structure and content of data files:
+
+```python
+terminal = TerminalTool(workspace="./data")
+
+# View first few lines of CSV file
+print(terminal.run({"command": "head -n 5 sales_2024.csv"}))
+"""
+date,product,quantity,revenue
+2024-01-01,Widget A,150,4500.00
+2024-01-01,Widget B,200,8000.00
+2024-01-02,Widget A,180,5400.00
+2024-01-02,Widget C,120,3600.00
+"""
+
+# Count total lines
+print(terminal.run({"command": "wc -l *.csv"}))
+"""
+  10234 sales_2024.csv
+   8567 sales_2023.csv
+  18801 total
+"""
+
+# Extract and count product categories
+print(terminal.run({"command": "tail -n +2 sales_2024.csv | cut -d',' -f2 | sort | uniq -c"}))
+"""
+  3456 Widget A
+  4123 Widget B
+  2655 Widget C
+"""
+```
+
+(3) Log File Analysis
+
+Real-time analysis of application logs, quickly locate issues:
+
+```python
+terminal = TerminalTool(workspace="/var/log")
+
+# View latest error logs
+print(terminal.run({"command": "tail -n 50 app.log | grep ERROR"}))
+
+# Count error type distribution
+print(terminal.run({"command": "grep ERROR app.log | awk '{print $4}' | sort | uniq -c | sort -rn"}))
+"""
+  245 DatabaseConnectionError
+  123 TimeoutException
+   67 ValidationError
+   34 AuthenticationError
+"""
+
+# Find logs for specific time period
+print(terminal.run({"command": "grep '2024-01-19 15:' app.log | tail -n 20"}))
+```
+
+(4) Codebase Analysis
+
+Assist code review and understanding:
+
+```python
+terminal = TerminalTool(workspace="./codebase")
+
+# Count lines of code
+print(terminal.run({"command": "find . -name '*.py' -exec wc -l {} + | tail -n 1"}))
+
+# Find all TODO comments
+print(terminal.run({"command": "grep -rn 'TODO' --include='*.py'"}))
+
+# Find definition of specific function
+print(terminal.run({"command": "grep -rn 'def process_data' --include='*.py'"}))
+
+# View function implementation
+print(terminal.run({"command": "sed -n '/def process_data/,/^def /p' src/processor.py | head -n -1"}))
+```
+
+### 9.5.4 Collaboration with Other Tools
+
+The true power of TerminalTool lies in its collaborative use with MemoryTool, NoteTool, and ContextBuilder.
+
+(1) Collaboration with MemoryTool
+
+Information discovered by TerminalTool can be stored in the memory system:
+
+```python
+# Use TerminalTool to discover project structure
+structure = terminal.run({"command": "tree -L 2 src"})
+
+# Store in semantic memory
+memory_tool.execute(
+    "add",
+    content=f"Project structure:\n{structure}",
+    memory_type="semantic",
+    importance=0.8,
+    metadata={"type": "project_structure"}
+)
+```
+
+(2) Collaboration with NoteTool
+
+Important discoveries can be recorded as structured notes:
+
+```python
+# Discover a performance bottleneck
+log_analysis = terminal.run({"command": "grep 'slow query' app.log | tail -n 10"})
+
+# Record as blocker note
+note_tool.run({
+    "action": "create",
+    "title": "Database Slow Query Issue",
+    "content": f"## Problem Description\nFound multiple slow queries affecting system performance\n\n## Log Analysis\n```\n{log_analysis}\n```\n\n## Next Steps\n1. Analyze slow query SQL\n2. Add indexes\n3. Optimize query logic",
+    "note_type": "blocker",
+    "tags": ["performance", "database"]
+})
+```
+
+(3) Collaboration with ContextBuilder
+
+TerminalTool output can be part of the context:
+
+```python
+# Explore codebase
+code_structure = terminal.run({"command": "ls -R src"})
+recent_changes = terminal.run({"command": "git log --oneline -10"})
+
+# Convert to ContextPacket
+from hello_agents.context import ContextPacket
+from datetime import datetime
+
+packets = [
+    ContextPacket(
+        content=f"Codebase structure:\n{code_structure}",
+        timestamp=datetime.now(),
+        token_count=len(code_structure) // 4,
+        relevance_score=0.7,
+        metadata={"type": "code_structure", "source": "terminal"}
+    ),
+    ContextPacket(
+        content=f"Recent commits:\n{recent_changes}",
+        timestamp=datetime.now(),
+        token_count=len(recent_changes) // 4,
+        relevance_score=0.8,
+        metadata={"type": "git_history", "source": "terminal"}
+    )
+]
+
+# Include this information when building context
+context = context_builder.build(
+    user_query="How to refactor the user service module?",
+    custom_packets=packets
+)
+```
+
+## 9.6 Long-Horizon Agent in Practice: Codebase Maintenance Assistant
+
+Now, let's integrate ContextBuilder, NoteTool, and TerminalTool to build a complete long-horizon agent—**Codebase Maintenance Assistant**. This assistant can:
+
+1. Explore and understand codebase structure
+2. Record discovered issues and improvement points
+3. Track long-term refactoring tasks
+4. Maintain coherence under context window limitations
+
+### 9.6.1 Scenario Setup and Requirements Analysis
+
+**Business Scenario**
+
+Suppose we are maintaining a medium-sized Python web application. This codebase contains about 50 Python files, built with the Flask framework, covering data models, business logic, API interfaces, and other modules, while also having some technical debt that needs to be gradually cleaned up. In this scenario, we need an intelligent assistant to help us explore the codebase, understand project structure, dependencies, and code style; identify issues in the code, such as code duplication, excessive complexity, lack of tests, etc.; track task progress, record to-do items, completed work, and encountered blockers; and provide coherent refactoring recommendations based on historical context.
+
+**Challenges and Solutions**
+
+This scenario faces several typical long-horizon task challenges. First is the problem of information exceeding the context window—the entire codebase may contain tens of thousands of lines of code, which cannot be placed in the context window all at once. We solve this by using TerminalTool for instant, on-demand code exploration, viewing specific files only when needed. Second is the cross-session state management challenge—refactoring tasks may last for days and need to maintain progress across multiple sessions. We address this by using NoteTool to record phased progress, to-do items, and key decisions. Finally, there's the issue of context quality and relevance—each conversation needs to review relevant historical information but cannot be overwhelmed by irrelevant information. We use ContextBuilder to intelligently filter and organize context, ensuring high signal density.
+
+### 9.6.2 System Architecture Design
+
+Our codebase maintenance assistant adopts a three-layer architecture, as shown in Figure 9.3:
+
+<div align="center">
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/9-figures/9-3.png" alt="" width="85%"/>
+  <p>Figure 9.3 Three-layer architecture of codebase maintenance assistant</p>
+</div>
+
+### 9.6.3 Core Implementation
+
+Now let's implement the core class of this system:
+
+```python
+from typing import Dict, Any, List, Optional
+from datetime import datetime
+import json
+
+from hello_agents import SimpleAgent, HelloAgentsLLM
+from hello_agents.context import ContextBuilder, ContextConfig, ContextPacket
+from hello_agents.tools import MemoryTool, NoteTool, TerminalTool
+from hello_agents.core.message import Message
+
+
+class CodebaseMaintainer:
+    """Codebase Maintenance Assistant - Long-horizon agent example
+
+    Integrates ContextBuilder + NoteTool + TerminalTool + MemoryTool
+    Implements cross-session codebase maintenance task management
+    """
+
+    def __init__(
+        self,
+        project_name: str,
+        codebase_path: str,
+        llm: Optional[HelloAgentsLLM] = None
+    ):
+        self.project_name = project_name
+        self.codebase_path = codebase_path
+        self.session_id = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+
+        # Initialize LLM
+        self.llm = llm or HelloAgentsLLM()
+
+        # Initialize tools
+        self.memory_tool = MemoryTool(user_id=project_name)
+        self.note_tool = NoteTool(workspace=f"./{project_name}_notes")
+        self.terminal_tool = TerminalTool(workspace=codebase_path, timeout=60)
+
+        # Initialize context builder
+        self.context_builder = ContextBuilder(
+            memory_tool=self.memory_tool,
+            rag_tool=None,  # This case does not use RAG
+            config=ContextConfig(
+                max_tokens=4000,
+                reserve_ratio=0.15,
+                min_relevance=0.2,
+                enable_compression=True
+            )
+        )
+
+        # Conversation history
+        self.conversation_history: List[Message] = []
+
+        # Statistics
+        self.stats = {
+            "session_start": datetime.now(),
+            "commands_executed": 0,
+            "notes_created": 0,
+            "issues_found": 0
+        }
+
+        print(f"✅ Codebase maintenance assistant initialized: {project_name}")
+        print(f"📁 Working directory: {codebase_path}")
+        print(f"🆔 Session ID: {self.session_id}")
+
+    def run(self, user_input: str, mode: str = "auto") -> str:
+        """Run assistant
+
+        Args:
+            user_input: User input
+            mode: Running mode
+                - "auto": Automatically decide whether to use tools
+                - "explore": Focus on code exploration
+                - "analyze": Focus on problem analysis
+                - "plan": Focus on task planning
+
+        Returns:
+            str: Assistant's answer
+        """
+        print(f"\n{'='*80}")
+        print(f"👤 User: {user_input}")
+        print(f"{'='*80}\n")
+
+        # Step 1: Execute preprocessing based on mode
+        pre_context = self._preprocess_by_mode(user_input, mode)
+
+        # Step 2: Retrieve relevant notes
+        relevant_notes = self._retrieve_relevant_notes(user_input)
+        note_packets = self._notes_to_packets(relevant_notes)
+
+        # Step 3: Build optimized context
+        context = self.context_builder.build(
+            user_query=user_input,
+            conversation_history=self.conversation_history,
+            system_instructions=self._build_system_instructions(mode),
+            custom_packets=note_packets + pre_context
+        )
+
+        # Step 4: Call LLM
+        print("🤖 Thinking...")
+        response = self.llm.invoke(context)
+
+        # Step 5: Post-processing
+        self._postprocess_response(user_input, response)
+
+        # Step 6: Update conversation history
+        self._update_history(user_input, response)
+
+        print(f"\n🤖 Assistant: {response}\n")
+        print(f"{'='*80}\n")
+
+        return response
+
+    def _preprocess_by_mode(
+        self,
+        user_input: str,
+        mode: str
+    ) -> List[ContextPacket]:
+        """Execute preprocessing based on mode, collect relevant information"""
+        packets = []
+
+        if mode == "explore" or mode == "auto":
+            # Explore mode: Automatically view project structure
+            print("🔍 Exploring codebase structure...")
+
+            structure = self.terminal_tool.run({"command": "find . -type f -name '*.py' | head -n 20"})
+            self.stats["commands_executed"] += 1
+
+            packets.append(ContextPacket(
+                content=f"[Codebase Structure]\n{structure}",
+                timestamp=datetime.now(),
+                token_count=len(structure) // 4,
+                relevance_score=0.6,
+                metadata={"type": "code_structure", "source": "terminal"}
+            ))
+
+        if mode == "analyze":
+            # Analyze mode: Check code complexity and issues
+            print("📊 Analyzing code quality...")
+
+            # Count lines of code
+            loc = self.terminal_tool.run({"command": "find . -name '*.py' -exec wc -l {} + | tail -n 1"})
+
+            # Find TODO and FIXME
+            todos = self.terminal_tool.run({"command": "grep -rn 'TODO\\|FIXME' --include='*.py' | head -n 10"})
+
+            self.stats["commands_executed"] += 2
+
+            packets.append(ContextPacket(
+                content=f"[Code Statistics]\n{loc}\n\n[To-Do Items]\n{todos}",
+                timestamp=datetime.now(),
+                token_count=(len(loc) + len(todos)) // 4,
+                relevance_score=0.7,
+                metadata={"type": "code_analysis", "source": "terminal"}
+            ))
+
+        if mode == "plan":
+            # Planning mode: Load recent notes
+            print("📋 Loading task planning...")
+
+            task_notes = self.note_tool.run({
+                "action": "list",
+                "note_type": "task_state",
+                "limit": 3
+            })
+
+            if task_notes:
+                content = "\n".join([f"- {note['title']}" for note in task_notes])
+                packets.append(ContextPacket(
+                    content=f"[Current Tasks]\n{content}",
+                    timestamp=datetime.now(),
+                    token_count=len(content) // 4,
+                    relevance_score=0.8,
+                    metadata={"type": "task_plan", "source": "notes"}
+                ))
+
+        return packets
+
+    def _retrieve_relevant_notes(self, query: str, limit: int = 3) -> List[Dict]:
+        """Retrieve relevant notes"""
+        try:
+            # Prioritize retrieving blockers
+            blockers = self.note_tool.run({
+                "action": "list",
+                "note_type": "blocker",
+                "limit": 2
+            })
+
+            # Search relevant notes
+            search_results = self.note_tool.run({
+                "action": "search",
+                "query": query,
+                "limit": limit
+            })
+
+            # Merge and deduplicate
+            all_notes = {note.get('note_id') or note.get('id'): note for note in (blockers or []) + (search_results or [])}
+            return list(all_notes.values())[:limit]
+
+        except Exception as e:
+            print(f"[WARNING] Note retrieval failed: {e}")
+            return []
+
+    def _notes_to_packets(self, notes: List[Dict]) -> List[ContextPacket]:
+        """Convert notes to context packets"""
+        packets = []
+
+        for note in notes:
+            # Set different relevance scores based on note type
+            relevance_map = {
+                "blocker": 0.9,
+                "action": 0.8,
+                "task_state": 0.75,
+                "conclusion": 0.7
+            }
+
+            note_type = note.get('type', 'general')
+            relevance = relevance_map.get(note_type, 0.6)
+
+            content = f"[Note: {note.get('title', 'Untitled')}]\nType: {note_type}\n\n{note.get('content', '')}"
+
+            packets.append(ContextPacket(
+                content=content,
+                timestamp=datetime.fromisoformat(note.get('updated_at', datetime.now().isoformat())),
+                token_count=len(content) // 4,
+                relevance_score=relevance,
+                metadata={
+                    "type": "note",
+                    "note_type": note_type,
+                    "note_id": note.get('note_id') or note.get('id')
+                }
+            ))
+
+        return packets
+
+    def _build_system_instructions(self, mode: str) -> str:
+        """Build system instructions"""
+        base_instructions = f"""You are the codebase maintenance assistant for the {self.project_name} project.
+
+Your core capabilities:
+1. Use TerminalTool to explore codebase (ls, cat, grep, find, etc.)
+2. Use NoteTool to record discoveries and tasks
+3. Provide coherent recommendations based on historical notes
+
+Current session ID: {self.session_id}
+"""
+
+        mode_specific = {
+            "explore": """
+Current mode: Explore codebase
+
+You should:
+- Actively use terminal commands to understand code structure
+- Identify key modules and files
+- Record project architecture in notes
+""",
+            "analyze": """
+Current mode: Analyze code quality
+
+You should:
+- Find code issues (duplication, complexity, TODOs, etc.)
+- Evaluate code quality
+- Record discovered issues as blocker or action notes
+""",
+            "plan": """
+Current mode: Task planning
+
+You should:
+- Review historical notes and tasks
+- Formulate next action plan
+- Update task status notes
+""",
+            "auto": """
+Current mode: Auto decision
+
+You should:
+- Flexibly choose strategies based on user needs
+- Use tools when needed
+- Maintain professionalism and practicality in responses
+"""
+        }
+
+        return base_instructions + mode_specific.get(mode, mode_specific["auto"])
+
+    def _postprocess_response(self, user_input: str, response: str):
+        """Post-processing: Analyze response, automatically record important information"""
+
+        # If issues found, automatically create blocker note
+        if any(keyword in response.lower() for keyword in ["issue", "bug", "error", "blocker", "problem"]):
+            try:
+                self.note_tool.run({
+                    "action": "create",
+                    "title": f"Issue found: {user_input[:30]}...",
+                    "content": f"## User Input\n{user_input}\n\n## Issue Analysis\n{response[:500]}...",
+                    "note_type": "blocker",
+                    "tags": [self.project_name, "auto_detected", self.session_id]
+                })
+                self.stats["notes_created"] += 1
+                self.stats["issues_found"] += 1
+                print("📝 Automatically created issue note")
+            except Exception as e:
+                print(f"[WARNING] Failed to create note: {e}")
+
+        # If task planning, automatically create action note
+        elif any(keyword in user_input.lower() for keyword in ["plan", "next", "task", "todo"]):
+            try:
+                self.note_tool.run({
+                    "action": "create",
+                    "title": f"Task planning: {user_input[:30]}...",
+                    "content": f"## Discussion\n{user_input}\n\n## Action Plan\n{response[:500]}...",
+                    "note_type": "action",
+                    "tags": [self.project_name, "planning", self.session_id]
+                })
+                self.stats["notes_created"] += 1
+                print("📝 Automatically created action plan note")
+            except Exception as e:
+                print(f"[WARNING] Failed to create note: {e}")
+
+    def _update_history(self, user_input: str, response: str):
+        """Update conversation history"""
+        self.conversation_history.append(
+            Message(content=user_input, role="user", timestamp=datetime.now())
+        )
+        self.conversation_history.append(
+            Message(content=response, role="assistant", timestamp=datetime.now())
+        )
+
+        # Limit history length (keep recent 10 rounds of conversation)
+        if len(self.conversation_history) > 20:
+            self.conversation_history = self.conversation_history[-20:]
+
+    # === Convenience methods ===
+
+    def explore(self, target: str = ".") -> str:
+        """Explore codebase"""
+        return self.run(f"Please explore the code structure of {target}", mode="explore")
+
+    def analyze(self, focus: str = "") -> str:
+        """Analyze code quality"""
+        query = f"Please analyze code quality" + (f", focusing on {focus}" if focus else "")
+        return self.run(query, mode="analyze")
+
+    def plan_next_steps(self) -> str:
+        """Plan next steps"""
+        return self.run("Based on current progress, plan next steps", mode="plan")
+
+    def execute_command(self, command: str) -> str:
+        """Execute terminal command"""
+        result = self.terminal_tool.run({"command": command})
+        self.stats["commands_executed"] += 1
+        return result
+
+    def create_note(
+        self,
+        title: str,
+        content: str,
+        note_type: str = "general",
+        tags: List[str] = None
+    ) -> str:
+        """Create note"""
+        result = self.note_tool.run({
+            "action": "create",
+            "title": title,
+            "content": content,
+            "note_type": note_type,
+            "tags": tags or [self.project_name]
+        })
+        self.stats["notes_created"] += 1
+        return result
+
+    def get_stats(self) -> Dict[str, Any]:
+        """Get statistics"""
+        duration = (datetime.now() - self.stats["session_start"]).total_seconds()
+
+        # Get note summary
+        try:
+            note_summary = self.note_tool.run({"action": "summary"})
+        except:
+            note_summary = {}
+
+        return {
+            "session_info": {
+                "session_id": self.session_id,
+                "project": self.project_name,
+                "duration_seconds": duration
+            },
+            "activity": {
+                "commands_executed": self.stats["commands_executed"],
+                "notes_created": self.stats["notes_created"],
+                "issues_found": self.stats["issues_found"]
+            },
+            "notes": note_summary
+        }
+
+    def generate_report(self, save_to_file: bool = True) -> Dict[str, Any]:
+        """Generate session report"""
+        report = self.get_stats()
+
+        if save_to_file:
+            report_file = f"maintainer_report_{self.session_id}.json"
+            with open(report_file, 'w', encoding='utf-8') as f:
+                json.dump(report, f, ensure_ascii=False, indent=2, default=str)
+            report["report_file"] = report_file
+            print(f"📄 Report saved: {report_file}")
+
+        return report
+```
+
+### 9.6.4 Complete Usage Example
+
+Now let's demonstrate the workflow of this long-horizon agent through a complete usage scenario:
+
+```python
+# ========== Initialize Assistant ==========
+
+from hello_agents import HelloAgentsLLM
+
+maintainer = CodebaseMaintainer(
+    project_name="my_flask_app",
+    codebase_path="./my_flask_app",
+    llm=HelloAgentsLLM()
+)
+
+
+# ========== Day 1: Explore Codebase ==========
+# 1. Initial exploration
+response = maintainer.explore()
+"""
+🔍 Exploring codebase structure...
+🤖 Thinking...
+
+🤖 Assistant: I have scanned the codebase, this is a Flask web application, mainly containing the following modules:
+
+📁 Project structure:
+- app/
+  ├── models/      # Data models (User, Product, Order, etc.)
+  ├── routes/      # API routes
+  ├── services/    # Business logic
+  └── utils/       # Utility functions
+- tests/           # Test files
+- migrations/      # Database migrations
+- config.py        # Configuration file
+
+Total approximately 3,500 lines of Python code.
+
+Recommended next steps:
+1. Review core model definitions
+2. Check routes and API design
+3. Evaluate test coverage
+"""
+
+# 2. Deep dive into a specific module
+response = maintainer.run("Please analyze the data model design in the app/models/ directory")
+"""
+🤖 Assistant: I reviewed the app/models/ directory and found the following data models:
+
+1. User (user.py)
+   - Fields: id, username, email, password_hash, created_at
+   - Relationships: One-to-many Order
+   - Issue: ⚠️ Missing index, email field not set with unique constraint
+
+2. Product (product.py)
+   - Fields: id, name, description, price, stock
+   - Issue: ✅ Well designed
+
+3. Order (order.py)
+   - Fields: id, user_id, product_id, quantity, status
+   - Issue: ⚠️ Missing creation time field, not conducive to data analysis
+
+Recommendations:
+1. Add unique constraint to User.email
+2. Add created_at and updated_at fields to Order
+3. Consider adding soft delete support
+
+I have recorded these issues as blocker notes.
+"""
+# [Automatically created blocker note]
+
+
+# ========== Day 2: Analyze Code Quality ==========
+
+# 1. Overall quality analysis
+response = maintainer.analyze()
+"""
+📊 Analyzing code quality...
+🤖 Thinking...
+
+🤖 Assistant: Code quality analysis report:
+
+📊 Code statistics:
+- Total lines: 3,542 lines
+- Average file length: approximately 150 lines
+- Largest file: services/order_service.py (456 lines)
+
+⚠️ Issues found:
+
+1. Code duplication (HIGH):
+   - user_service.py and product_service.py have similar CRUD logic
+   - Recommendation: Extract base class BaseService
+
+2. Excessive complexity (MEDIUM):
+   - order_service.py::process_order method has 8 levels of nesting
+   - Recommendation: Split into multiple small methods
+
+3. Lack of tests (HIGH):
+   - Test coverage only 45%
+   - services/ directory has no tests at all
+
+4. TODO/FIXME (12 items):
+   - Most are "optimize later" type comments
+   - Recommendation: Clean up or convert to formal tasks
+
+I have recorded these issues as notes and marked priorities.
+"""
+# [Automatically created multiple blocker notes]
+
+# 2. View specific issue
+response = maintainer.run("Please review the process_order method in order_service.py and provide refactoring recommendations")
+"""
+🤖 Assistant: I reviewed the code of the process_order method:
+
+【Original Code】
+def process_order(self, order_id):
+    order = Order.query.get(order_id)
+    if order:
+        if order.status == 'pending':
+            product = Product.query.get(order.product_id)
+            if product:
+                if product.stock >= order.quantity:
+                    # ... 8 levels of nesting continue ...
+
+【Problem Analysis】
+1. Deep nesting leads to poor readability
+2. Lack of error handling
+3. Business logic mixed with data access
+
+【Refactoring Recommendations】
+def process_order(self, order_id):
+    # 1. Early return pattern, reduce nesting
+    order = self._get_order(order_id)
+    if not order or order.status != 'pending':
+        return False
+
+    # 2. Extract methods, single responsibility
+    if not self._check_stock(order):
+        return False
+
+    # 3. Use transactions to ensure consistency
+    with db.transaction():
+        self._update_stock(order)
+        self._update_order_status(order, 'processing')
+
+    return True
+
+【Next Steps】
+1. I have created a refactoring plan note
+2. Recommend writing test cases first
+3. Refactor gradually, keeping functionality unchanged
+"""
+# [Automatically created action note]
+
+
+# ========== Day 3: Plan Refactoring Tasks ==========
+
+# 1. Review progress
+response = maintainer.plan_next_steps()
+"""
+📋 Loading task planning...
+🤖 Thinking...
+
+🤖 Assistant: Based on the exploration and analysis from the past two days, I have organized refactoring task priorities:
+
+🔴 High priority (complete this week):
+1. [blocker] Add unique constraint to User.email
+   - Risk: Low, just database migration
+   - Effort: 1 hour
+
+2. [blocker] Extract BaseService to reduce code duplication
+   - Risk: Medium, affects multiple service classes
+   - Effort: Half a day
+
+3. [blocker] Refactor process_order method
+   - Risk: High, core business logic
+   - Effort: 1 day (including writing tests)
+
+🟡 Medium priority (next week):
+4. Add unit tests for services/
+5. Clean up TODO comments
+6. Add time fields to Order
+
+🟢 Low priority (TBD):
+7. Performance optimization
+8. Documentation updates
+
+Recommendations:
+- This week, first resolve data model issues (tasks 1, 6)
+- Next week, conduct code refactoring (tasks 2, 3)
+- Continuously increase test coverage (task 4)
+
+I have updated task status notes, you can check progress anytime.
+"""
+
+# 2. Manually create detailed refactoring plan
+maintainer.create_note(
+    title="Weekly Refactoring Plan - Week 1",
+    content="""## Objectives
+Complete optimization of data model layer
+
+## Task Checklist
+- [ ] Add unique constraint to User.email
+- [ ] Add created_at, updated_at fields to Order
+- [ ] Write database migration scripts
+- [ ] Update related test cases
+
+## Schedule
+- Monday: Design migration scripts
+- Tuesday-Wednesday: Execute migration and test
+- Thursday: Update test cases
+- Friday: Code Review
+
+## Risks
+- Database migration may affect production environment, needs to be executed during off-peak hours
+- Existing data may have duplicate emails, need to clean up first
+""",
+    note_type="task_state",
+    tags=["refactoring", "week1", "high_priority"]
+)
+
+print("✅ Created detailed refactoring plan")
+
+
+# ========== One Week Later: Check Progress ==========
+
+# View note summary
+summary = maintainer.note_tool.run({"action": "summary"})
+print("📊 Note summary:")
+print(json.dumps(summary, indent=2, ensure_ascii=False))
+"""
+{
+  "total_notes": 8,
+  "type_distribution": {
+    "blocker": 3,
+    "action": 2,
+    "task_state": 2,
+    "conclusion": 1
+  },
+  "recent_notes": [
+    {
+      "id": "note_20250119_160000_7",
+      "title": "Weekly Refactoring Plan - Week 1",
+      "type": "task_state",
+      "updated_at": "2025-01-19T16:00:00"
+    },
+    ...
+  ]
+}
+"""
+
+# Generate complete report
+report = maintainer.generate_report()
+print("\n📄 Session report:")
+print(json.dumps(report, indent=2, ensure_ascii=False))
+"""
+{
+  "session_info": {
+    "session_id": "session_20250119_150000",
+    "project": "my_flask_app",
+    "duration_seconds": 172800  # 2 days
+  },
+  "activity": {
+    "commands_executed": 24,
+    "notes_created": 8,
+    "issues_found": 3
+  },
+  "notes": { ... }
+}
+"""
+```
+
+### 9.6.5 Running Effect Analysis
+
+Through this complete case study, we can see several key characteristics of long-horizon agents. First is cross-session coherence—the agent maintains task coherence across multiple days and sessions through NoteTool. Issues explored on day one are automatically considered during day two analysis, day three planning can synthesize all discoveries from the previous two days, and the complete history is preserved when checking a week later. Second is intelligent context management—ContextBuilder ensures high-quality context for each conversation, automatically gathering relevant notes (especially blocker types), dynamically adjusting preprocessing strategies based on conversation mode, and selecting the most relevant information within the token budget.
+
+The third characteristic is instant file system access—TerminalTool supports flexible code exploration without needing to pre-index the entire codebase, can view specific file content instantly, and supports complex text processing (grep, awk, etc.). Fourth is automated knowledge management—the system automatically manages discovered knowledge, automatically creating blocker notes when issues are found, automatically creating action notes when discussing plans, and automatically storing key information in the memory system. Finally is human-machine collaboration—this system supports flexible human-machine collaboration modes, where agents can automatically complete exploration and analysis, humans can intervene and guide through the note system, and supports manually creating detailed planning notes.
+
+This basic framework can be further extended, such as integrating RAGTool to build vector indexes for codebases combined with semantic retrieval, splitting into specialized explorers, analyzers, and planners to implement multi-agent collaboration, integrating testing tools to automatically verify refactoring results, executing git commands through TerminalTool to track code changes, or building visual interfaces using Gradio/Streamlit.
+
+## 9.7 Chapter Summary
+
+In this chapter, we deeply explored the theoretical foundations and engineering practices of context engineering:
+
+### Theoretical Level
+
+1. **Essence of Context Engineering**: Evolution from "prompt engineering" to "context engineering", the core is managing limited attention budget
+2. **Context Rot**: Understanding performance degradation brought by long contexts, recognizing context as a scarce resource
+3. **Three Major Strategies**: Compaction, structured note-taking, sub-agent architectures
+
+### Engineering Practice
+
+1. **ContextBuilder**: Implements GSSC pipeline, provides unified context management interface
+2. **NoteTool**: Hybrid format of Markdown+YAML, supports structured long-term memory
+3. **TerminalTool**: Secure command-line tool, supports instant file system access
+4. **Long-Horizon Agent**: Integrates three major tools, builds cross-session codebase maintenance assistant
+
+### Core Takeaways
+
+- **Layered Design**: Instant access (TerminalTool) + session memory (MemoryTool) + persistent notes (NoteTool)
+- **Intelligent Filtering**: Scoring mechanism based on relevance and recency
+- **Security First**: Multi-layer security mechanisms ensure system stability
+- **Human-Machine Collaboration**: Balance between automation and controllability
+
+Through this chapter's learning, you have not only mastered the core technologies of context engineering, but more importantly, understood how to build agent systems that can maintain coherence and effectiveness over long time spans. These skills will become an important foundation for you to build production-level agent applications.
+
+In the next chapter, we will explore agent communication protocols and learn how to enable agents to interact more broadly with the external world.
+
+## Exercises
+
+> **Note**: Some exercises do not have standard answers. The focus is on cultivating learners' comprehensive understanding and practical ability in context engineering and long-horizon task management.
+
+1. This chapter introduced the difference between context engineering and prompt engineering. Please analyze:
+
+   - Section 9.1 mentioned "context must be viewed as a limited resource with diminishing marginal returns". Please explain what the "context rot" phenomenon is? Why do we still need to carefully manage context even when models support 100K or even 200K context windows?
+   - Suppose you want to build a "code review assistant" that needs to analyze a codebase containing 50 files. Please compare two strategies: (1) Load all file content into context at once; (2) Use JIT (Just-in-time) context, retrieving files on demand through tools. Analyze the advantages, disadvantages, and applicable scenarios of each.
+   - Section 9.2.1 mentioned two extreme pitfalls of system prompts: "over-hardcoding" and "too vague". Please give a practical example of each and explain how to find the right balance.
+
+2. The GSSC (Gather-Select-Structure-Compress) pipeline is the core technology of this chapter. Please think deeply:
+
+   > **Note**: This is a hands-on practice question, actual operation is recommended
+
+   - In the ContextBuilder implementation in Section 9.3, the four stages each have different responsibilities. Please analyze: If a certain stage fails (such as the Select stage selecting irrelevant information, or the Compress stage over-compressing leading to information loss), what impact will it have on the final agent performance?
+   - Based on the code in Section 9.3.4, add a "context quality assessment" function to ContextBuilder: After each context build, automatically evaluate the information density, relevance, and completeness of the context, and provide optimization suggestions.
+   - The "compression" stage in the GSSC pipeline uses LLM for intelligent summarization. Please think: Under what circumstances might simple truncation or sliding window strategies be more appropriate than LLM summarization? Design a hybrid compression strategy that combines the advantages of multiple compression methods.
+
+3. NoteTool and TerminalTool are key tools supporting long-horizon tasks. Based on Sections 9.4 and 9.5, please complete the following extension practices:
+
+   > **Note**: This is a hands-on practice question, actual operation is recommended
+
+   - NoteTool uses a hierarchical note system (project notes, task notes, temporary notes). Please design an "automatic note organization" mechanism: When temporary notes accumulate to a certain number, the agent can automatically analyze these notes, promote important information to task notes or project notes, and clean up redundant content.
+   - TerminalTool provides file system operation capabilities, but Section 9.5.2 emphasizes security design. Please analyze: Are the current security mechanisms (path validation, command whitelist, permission check) sufficient? If the agent needs to access sensitive files or execute dangerous operations, how should a "human-machine collaborative approval" process be designed?
+   - Combining NoteTool and TerminalTool, design an "intelligent code refactoring assistant": Can analyze codebase structure, record refactoring plans, execute refactoring operations step by step, and track progress and encountered problems in notes. Please draw a complete workflow diagram.
+
+4. In the "long-horizon task management" case in Section 9.6, we saw the value of context engineering in practical applications. Please analyze in depth:
+
+   - The case uses a "layered context management" strategy: instant access (TerminalTool) + session memory (MemoryTool) + persistent notes (NoteTool). Please analyze: How should these three layers coordinate? What information should be placed in which layer? How to avoid information redundancy and inconsistency?
+   - Suppose an interruption occurs during task execution (such as system crash, network disconnection), the agent needs to recover state from notes and continue execution. Please design a "resume from breakpoint" mechanism: How to record sufficient state information in notes? How to verify that the recovered state is correct?
+   - Long-horizon tasks often involve parallel or serial execution of multiple subtasks. Please design a "task dependency management" system: Can express dependency relationships between tasks (such as "Task B must be executed after Task A is completed"), and automatically schedule task execution order. How should this system integrate with NoteTool?
+
+5. This chapter repeatedly mentioned the concept of "progressive disclosure". Please think:
+
+   - In Section 9.2.2, progressive disclosure is described as "each interaction step produces new context, which in turn guides the next decision". Please design a specific application scenario (such as academic paper writing, complex problem debugging), demonstrating how progressive disclosure helps agents complete tasks more efficiently.
+   - A potential risk of progressive disclosure is "inefficient exploration": The agent may waste time on unimportant details or miss key information. Please design an "exploration guidance" mechanism: Through heuristic rules or metacognitive strategies, help the agent make smarter decisions about "what to explore next".
+   - Compare "progressive disclosure" with traditional "load all context at once": In what types of tasks does the former have obvious advantages? In what types of tasks might the latter be more appropriate? Please provide at least 3 examples of different types of tasks.
+
+## References
+
+[1] Anthropic. Effective Context Engineering for AI Agents. `https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents`
+
+[2] David Kim. Context-Engineering (GitHub). `https://github.com/davidkimai/Context-Engineering`
+

+ 147 - 143
docs/chapter9/第九章 上下文工程.md

@@ -1,3 +1,7 @@
+<div align="right">
+  <a href="./Chapter9-Context-Engineering.md">English</a> | 中文
+</div>
+
 # 第九章 上下文工程
 
 在前面的章节中,我们已经为智能体引入了记忆系统与RAG。然而,要让智能体在真实复杂场景中稳定地“思考”与“行动”,仅有记忆与检索还不够——我们需要一套工程化方法,持续、系统地为模型构造恰当的“上下文”。这就是本章的主题:上下文工程(Context Engineering)。它关注的是“在每一次模型调用前,如何以可复用、可度量、可演进的方式,拼装并优化输入上下文”,从而提升正确性、鲁棒性与效率<sup>[1][2]</sup>。
@@ -122,33 +126,33 @@ pip install "hello-agents[all]==0.2.7"
 
 即便模型能力持续提升,“在长交互中维持连贯性与聚焦”仍是构建强健智能体的核心挑战。谨慎而系统的上下文工程将长期保持其关键价值。
 
-## 9.3 在 Hello-Agents 中的实践:ContextBuilder
+## 9.3 在 Hello-Agents 中的实践ContextBuilder
 
 本节将详细介绍 HelloAgents 框架中的上下文工程实践。我们将从设计动机、核心数据结构、实现细节到完整案例,逐步展示如何构建一个生产级的上下文管理系统。ContextBuilder 的设计理念是"简单高效",去除不必要的复杂性,统一以"相关性+新近性"的分数进行选择,符合 Agent 模块化与可维护性的工程取向。
 
 ### 9.3.1 设计动机与目标
 
-在构建 ContextBuilder 之前,我们首先需要明确其设计目标和核心价值。一个优秀的上下文管理系统应该解决以下几个关键问题:
+在构建 ContextBuilder 之前,我们首先需要明确其设计目标和核心价值。一个优秀的上下文管理系统应该解决以下几个关键问题
 
-1. <strong>统一入口</strong>:将"获取(Gather)- 选择(Select)- 结构化(Structure)- 压缩(Compress)"抽象为可复用流水线,减少在 Agent 实现中的重复模板代码。这种统一的接口设计让开发者无需在每个 Agent 中重复编写上下文管理逻辑。
+1. <strong>统一入口</strong>将"获取(Gather)- 选择(Select)- 结构化(Structure)- 压缩(Compress)"抽象为可复用流水线,减少在 Agent 实现中的重复模板代码。这种统一的接口设计让开发者无需在每个 Agent 中重复编写上下文管理逻辑。
 
-2. <strong>稳定形态</strong>:输出固定骨架的上下文模板,便于调试、A/B 测试与评估。我们采用了分区组织的模板结构:
-   - `[Role & Policies]`:明确 Agent 的角色定位和行为准则
-   - `[Task]`:当前需要完成的具体任务
-   - `[State]`:Agent 的当前状态和上下文信息
-   - `[Evidence]`:从外部知识库检索的证据信息
-   - `[Context]`:历史对话和相关记忆
-   - `[Output]`:期望的输出格式和要求
+2. <strong>稳定形态</strong>输出固定骨架的上下文模板,便于调试、A/B 测试与评估。我们采用了分区组织的模板结构
+   - `[Role & Policies]`明确 Agent 的角色定位和行为准则
+   - `[Task]`当前需要完成的具体任务
+   - `[State]`Agent 的当前状态和上下文信息
+   - `[Evidence]`从外部知识库检索的证据信息
+   - `[Context]`历史对话和相关记忆
+   - `[Output]`期望的输出格式和要求
 
-3. <strong>预算守护</strong>:在 token 预算内尽量保留高价值信息,对超限上下文提供兜底压缩策略。这确保了即使在信息量巨大的场景下,系统也能稳定运行。
+3. <strong>预算守护</strong>在 token 预算内尽量保留高价值信息,对超限上下文提供兜底压缩策略。这确保了即使在信息量巨大的场景下,系统也能稳定运行。
 
-4. <strong>最小规则</strong>:不引入来源/优先级等分类维度,避免复杂度增长。实践表明,基于相关性和新近性的简单评分机制,在大多数场景下已经足够有效。
+4. <strong>最小规则</strong>不引入来源/优先级等分类维度,避免复杂度增长。实践表明,基于相关性和新近性的简单评分机制,在大多数场景下已经足够有效。
 
 ### 9.3.2 核心数据结构
 
 ContextBuilder 的实现依赖两个核心数据结构,它们定义了系统的配置和信息单元。
 
-(1)ContextPacket:候选信息包
+(1)ContextPacket候选信息包
 
 ```python
 from dataclasses import dataclass
@@ -182,7 +186,7 @@ class ContextPacket:
 
 `ContextPacket` 是系统中信息的基本单元。每个候选信息都会被封装为一个 ContextPacket,包含内容、时间戳、token 数量和相关性分数等核心属性。这种统一的数据结构简化了后续的选择和排序逻辑。
 
-(2)ContextConfig:配置管理
+(2)ContextConfig配置管理
 
 ```python
 @dataclass
@@ -218,7 +222,7 @@ class ContextConfig:
 
 ContextBuilder 的核心是 GSSC(Gather-Select-Structure-Compress)流水线,它将上下文构建过程分解为四个清晰的阶段。让我们深入了解每个阶段的实现细节。
 
-(1)Gather:多源信息汇集
+(1)Gather多源信息汇集
 
 第一阶段是从多个来源汇集候选信息。这个阶段的关键在于容错性和灵活性。
 
@@ -303,13 +307,13 @@ def _gather(
     return packets
 ```
 
-这个实现展示了几个重要的设计考虑:
+这个实现展示了几个重要的设计考虑
 
-- <strong>容错机制</strong>:每个外部数据源的调用都被 try-except 包裹,确保单个源的失败不会影响整体流程
-- <strong>优先级处理</strong>:系统指令被标记为高优先级,确保始终被保留
-- <strong>历史限制</strong>:对话历史只保留最近的几条,避免上下文窗口被历史信息占据
+- <strong>容错机制</strong>每个外部数据源的调用都被 try-except 包裹,确保单个源的失败不会影响整体流程
+- <strong>优先级处理</strong>系统指令被标记为高优先级,确保始终被保留
+- <strong>历史限制</strong>对话历史只保留最近的几条,避免上下文窗口被历史信息占据
 
-(2)Select:智能信息选择
+(2)Select智能信息选择
 
 第二阶段是根据相关性和新近性对候选信息进行评分和选择。这是整个流水线的核心,直接决定了最终上下文的质量。
 
@@ -428,13 +432,13 @@ def _calculate_recency(self, timestamp: datetime) -> float:
     return max(0.1, min(1.0, recency_score))  # 限制在 [0.1, 1.0] 范围内
 ```
 
-选择阶段的核心算法体现了几个重要的工程考量:
+选择阶段的核心算法体现了几个重要的工程考量
 
-- <strong>评分机制</strong>:采用相关性和新近性的加权组合,权重可配置
-- <strong>贪心算法</strong>:按分数从高到低填充,确保在有限预算内选择最有价值的信息
-- <strong>过滤机制</strong>:通过 `min_relevance` 参数过滤低质量信息
+- <strong>评分机制</strong>采用相关性和新近性的加权组合,权重可配置
+- <strong>贪心算法</strong>按分数从高到低填充,确保在有限预算内选择最有价值的信息
+- <strong>过滤机制</strong>通过 `min_relevance` 参数过滤低质量信息
 
-(3)Structure:结构化输出
+(3)Structure结构化输出
 
 第三阶段是将选中的信息组织成结构化的上下文模板。
 
@@ -488,13 +492,13 @@ def _structure(self, selected_packets: List[ContextPacket], user_query: str) ->
     return "\n\n".join(sections)
 ```
 
-结构化阶段将散乱的信息包组织成清晰的分区,这种设计有几个优势:
+结构化阶段将散乱的信息包组织成清晰的分区,这种设计有几个优势
 
-- <strong>可读性</strong>:清晰的分区让人类和模型都更容易理解上下文结构
-- <strong>可调试性</strong>:问题定位更容易,可以快速识别哪个区域的信息有问题
-- <strong>可扩展性</strong>:添加新的信息源只需要创建新的分区
+- <strong>可读性</strong>清晰的分区让人类和模型都更容易理解上下文结构
+- <strong>可调试性</strong>问题定位更容易,可以快速识别哪个区域的信息有问题
+- <strong>可扩展性</strong>添加新的信息源只需要创建新的分区
 
-(4)Compress:兜底压缩
+(4)Compress兜底压缩
 
 第四阶段是对超限上下文进行压缩处理。
 
@@ -648,7 +652,7 @@ print("=" * 80)
 
 (2)运行效果展示
 
-运行上述代码后,您将看到如下结构化的上下文输出:
+运行上述代码后,您将看到如下结构化的上下文输出
 
 ```
 ================================================================================
@@ -681,17 +685,17 @@ assistant: 不错的选择!Pandas在数据处理方面非常强大。接下来
 ================================================================================
 ```
 
-这个结构化的上下文包含了所有必要的信息:
+这个结构化的上下文包含了所有必要的信息
 
-- <strong>[Role & Policies]</strong>:明确了 AI 的角色和回答要求
-- <strong>[Task]</strong>:清晰地表达了用户的问题
-- <strong>[Evidence]</strong>:从 RAG 系统检索的相关知识
-- <strong>[Context]</strong>:对话历史和相关记忆,提供了充分的背景信息
-- <strong>[Output]</strong>:指导 LLM 如何组织回答
+- <strong>[Role & Policies]</strong>明确了 AI 的角色和回答要求
+- <strong>[Task]</strong>清晰地表达了用户的问题
+- <strong>[Evidence]</strong>从 RAG 系统检索的相关知识
+- <strong>[Context]</strong>对话历史和相关记忆,提供了充分的背景信息
+- <strong>[Output]</strong>指导 LLM 如何组织回答
 
 (3)与 Agent 集成
 
-最后,让我们展示如何将 ContextBuilder 集成到 Agent 中:
+最后,让我们展示如何将 ContextBuilder 集成到 Agent 中
 
 ```python
 from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry
@@ -771,21 +775,21 @@ print(response)
 
 ### 9.3.5 最佳实践与优化建议
 
-在实际应用 ContextBuilder 时,以下几点最佳实践值得注意:
+在实际应用 ContextBuilder 时,以下几点最佳实践值得注意
 
-1. <strong>动态调整 token 预算</strong>:根据任务复杂度动态调整 `max_tokens`,简单任务使用较小预算,复杂任务增加预算。
+1. <strong>动态调整 token 预算</strong>根据任务复杂度动态调整 `max_tokens`,简单任务使用较小预算,复杂任务增加预算。
 
-2. <strong>相关性计算优化</strong>:在生产环境中,将简单的关键词重叠替换为向量相似度计算,提升检索质量。
+2. <strong>相关性计算优化</strong>在生产环境中,将简单的关键词重叠替换为向量相似度计算,提升检索质量。
 
-3. <strong>缓存机制</strong>:对于不变的系统指令和知识库内容,可以实现缓存机制,避免重复计算。
+3. <strong>缓存机制</strong>对于不变的系统指令和知识库内容,可以实现缓存机制,避免重复计算。
 
-4. <strong>监控与日志</strong>:记录每次上下文构建的统计信息(选中信息数量、token 使用率等),便于后续优化。
+4. <strong>监控与日志</strong>记录每次上下文构建的统计信息(选中信息数量、token 使用率等),便于后续优化。
 
-5. <strong>A/B 测试</strong>:对于关键参数(如相关性权重、新近性权重),通过 A/B 测试找到最优配置。
+5. <strong>A/B 测试</strong>对于关键参数(如相关性权重、新近性权重),通过 A/B 测试找到最优配置。
 
 
 
-## 9.4 NoteTool:结构化笔记
+## 9.4 NoteTool结构化笔记
 
 NoteTool 是为"长时程任务"提供的结构化外部记忆组件。它以 Markdown 文件作为载体,头部使用 YAML 前置元数据记录关键信息,正文用于记录状态、结论、阻塞与行动项等内容。这种设计结合了人类可读性、版本控制友好性和易于回注上下文的特性,是构建长时程智能体的重要工具。
 
@@ -797,25 +801,25 @@ NoteTool 是为"长时程任务"提供的结构化外部记忆组件。它以 Ma
 
 在第八章中,我们介绍了 MemoryTool,它提供了强大的记忆管理能力。然而,MemoryTool 主要关注<strong>对话式记忆</strong>——短期工作记忆、情景记忆和语义记忆。对于需要长期追踪、结构化管理的<strong>项目式任务</strong>,我们需要一种更轻量、更人类友好的记录方式。
 
-NoteTool 填补了这个gap,它提供了:
+NoteTool 填补了这个gap,它提供了
 
-- <strong>结构化记录</strong>:使用 Markdown + YAML 格式,既适合机器解析,也方便人类阅读和编辑
-- <strong>版本友好</strong>:纯文本格式,天然支持 Git 等版本控制系统
-- <strong>低开销</strong>:无需复杂的数据库操作,适合轻量级的状态追踪
-- <strong>灵活分类</strong>:通过 `type` 和 `tags` 灵活组织笔记,支持多维度检索
+- <strong>结构化记录</strong>使用 Markdown + YAML 格式,既适合机器解析,也方便人类阅读和编辑
+- <strong>版本友好</strong>纯文本格式,天然支持 Git 等版本控制系统
+- <strong>低开销</strong>无需复杂的数据库操作,适合轻量级的状态追踪
+- <strong>灵活分类</strong>通过 `type` 和 `tags` 灵活组织笔记,支持多维度检索
 
 (2)典型应用场景
 
-NoteTool 特别适合以下场景:
+NoteTool 特别适合以下场景
 
-<strong>场景1:长期项目追踪</strong>
+<strong>场景1长期项目追踪</strong>
 
-想象一个智能体正在协助完成一个大型代码库的重构任务,这可能需要几天甚至几周。NoteTool 可以记录:
+想象一个智能体正在协助完成一个大型代码库的重构任务,这可能需要几天甚至几周。NoteTool 可以记录
 
-- `task_state`:当前阶段的任务状态和进度
-- `conclusion`:每个阶段结束后的关键结论
-- `blocker`:遇到的问题和阻塞点
-- `action`:下一步的行动计划
+- `task_state`当前阶段的任务状态和进度
+- `conclusion`每个阶段结束后的关键结论
+- `blocker`遇到的问题和阻塞点
+- `action`下一步的行动计划
 
 ```python
 # 记录任务状态
@@ -837,17 +841,17 @@ notes.run({
 })
 ```
 
-<strong>场景2:研究任务管理</strong>
+<strong>场景2研究任务管理</strong>
 
-一个智能研究助手在进行文献综述时,可以使用 NoteTool 记录:
+一个智能研究助手在进行文献综述时,可以使用 NoteTool 记录
 
 - 每篇论文的核心观点(`conclusion`)
 - 待深入调研的主题(`action`)
 - 重要的参考文献(`reference`)
 
-<strong>场景3:与 ContextBuilder 配合</strong>
+<strong>场景3与 ContextBuilder 配合</strong>
 
-在每轮对话前,Agent 可以通过 `search` 或 `list` 操作检索相关笔记,并将其注入到上下文中:
+在每轮对话前,Agent 可以通过 `search` 或 `list` 操作检索相关笔记,并将其注入到上下文中
 
 ```python
 # 在 Agent 的 run 方法中
@@ -884,7 +888,7 @@ NoteTool 采用了 Markdown + YAML 的混合格式,这种设计兼顾了结构
 
 (1)笔记文件格式
 
-每个笔记都是一个独立的 `.md` 文件,格式如下:
+每个笔记都是一个独立的 `.md` 文件,格式如下
 
 ```markdown
 ---
@@ -918,15 +922,15 @@ updated_at: 2025-01-19T15:30:00
 3. 提升集成测试覆盖率至85%
 ```
 
-这种格式的优势:
+这种格式的优势
 
-- <strong>YAML 元数据</strong>:机器可解析,支持精确的字段提取和检索
-- <strong>Markdown 正文</strong>:人类可读,支持丰富的格式化(标题、列表、代码块等)
-- <strong>文件名即 ID</strong>:简化管理,每个笔记的文件名就是其唯一标识
+- <strong>YAML 元数据</strong>机器可解析,支持精确的字段提取和检索
+- <strong>Markdown 正文</strong>人类可读,支持丰富的格式化(标题、列表、代码块等)
+- <strong>文件名即 ID</strong>简化管理,每个笔记的文件名就是其唯一标识
 
 (2)索引文件
 
-NoteTool 维护一个 `notes_index.json` 文件,用于快速检索和管理笔记:
+NoteTool 维护一个 `notes_index.json` 文件,用于快速检索和管理笔记
 
 ```json
 {
@@ -942,17 +946,17 @@ NoteTool 维护一个 `notes_index.json` 文件,用于快速检索和管理笔
 }
 ```
 
-这个索引文件的作用:
+这个索引文件的作用
 
-- <strong>快速检索</strong>:无需打开每个文件,直接从索引中查找
-- <strong>元数据管理</strong>:集中管理所有笔记的元数据
-- <strong>完整性校验</strong>:可以检测文件缺失或损坏
+- <strong>快速检索</strong>无需打开每个文件,直接从索引中查找
+- <strong>元数据管理</strong>集中管理所有笔记的元数据
+- <strong>完整性校验</strong>可以检测文件缺失或损坏
 
 ### 9.4.3 核心操作详解
 
 NoteTool 提供了七个核心操作,覆盖了笔记的完整生命周期管理。
 
-(1)create:创建笔记
+(1)create创建笔记
 
 ```python
 def _create_note(
@@ -1015,7 +1019,7 @@ def _build_markdown(self, metadata: Dict, content: str) -> str:
     return f"---\n{yaml_header}---\n\n{content}"
 ```
 
-使用示例:
+使用示例
 
 ```python
 from hello_agents.tools import NoteTool
@@ -1037,7 +1041,7 @@ note_id = notes.run({
 print(f"✅ 笔记创建成功,ID: {note_id}")
 ```
 
-(2)read:读取笔记
+(2)read读取笔记
 
 ```python
 def _read_note(self, note_id: str) -> Dict:
@@ -1086,7 +1090,7 @@ def _parse_markdown(self, raw_content: str) -> Tuple[Dict, str]:
     return metadata, content
 ```
 
-(3)update:更新笔记
+(3)update更新笔记
 
 ```python
 def _update_note(
@@ -1145,7 +1149,7 @@ def _update_note(
     return f"✅ 笔记已更新: {metadata['title']}"
 ```
 
-(4)search:搜索笔记
+(4)search搜索笔记
 
 ```python
 def _search_notes(
@@ -1206,7 +1210,7 @@ def _search_notes(
     return results[:limit]
 ```
 
-(5)list:列出笔记
+(5)list列出笔记
 
 ```python
 def _list_notes(
@@ -1246,7 +1250,7 @@ def _list_notes(
     return results[:limit]
 ```
 
-(6)summary:笔记摘要
+(6)summary笔记摘要
 
 ```python
 def _summary(self) -> Dict[str, Any]:
@@ -1285,7 +1289,7 @@ def _summary(self) -> Dict[str, Any]:
     }
 ```
 
-(7)delete:删除笔记
+(7)delete删除笔记
 
 ```python
 def _delete_note(self, note_id: str) -> str:
@@ -1319,7 +1323,7 @@ NoteTool 的真正威力在于与 ContextBuilder 的配合使用。让我们通
 
 (1)场景设定
 
-假设我们正在构建一个长期项目助手,它需要:
+假设我们正在构建一个长期项目助手,它需要
 
 1. 记录项目的阶段性进展
 2. 追踪待解决的问题
@@ -1553,36 +1557,36 @@ print(summary)
 
 ### 9.4.5 最佳实践
 
-在实际使用 NoteTool 时,以下最佳实践能帮助您构建更强大的长时程智能体:
+在实际使用 NoteTool 时,以下最佳实践能帮助您构建更强大的长时程智能体
 
-1. <strong>合理的笔记分类</strong>:
-   - `task_state`:记录阶段性进展和状态
-   - `conclusion`:记录重要的结论和发现
-   - `blocker`:记录阻塞问题,优先级最高
-   - `action`:记录下一步行动计划
-   - `reference`:记录重要的参考资料
+1. <strong>合理的笔记分类</strong>
+   - `task_state`记录阶段性进展和状态
+   - `conclusion`记录重要的结论和发现
+   - `blocker`记录阻塞问题,优先级最高
+   - `action`记录下一步行动计划
+   - `reference`记录重要的参考资料
 
-2. <strong>定期清理和归档</strong>:
+2. <strong>定期清理和归档</strong>
    - 对于已解决的 blocker,更新为 conclusion
    - 对于过时的 action,及时删除或更新
    - 使用 tags 进行版本管理,如 `["v1.0", "completed"]`
 
-3. <strong>与 ContextBuilder 的配合</strong>:
+3. <strong>与 ContextBuilder 的配合</strong>
    - 在每轮对话前检索相关笔记
    - 根据笔记类型设置不同的相关性分数(blocker > action > conclusion)
    - 限制笔记数量,避免上下文过载
 
-4. <strong>人机协作</strong>:
+4. <strong>人机协作</strong>
    - 笔记是人类可读的 Markdown 格式,支持手动编辑
    - 使用 Git 进行版本控制,追踪笔记的演化
    - 在关键阶段,人工审核 Agent 生成的笔记
 
-5. <strong>自动化工作流</strong>:
+5. <strong>自动化工作流</strong>
    - 定期生成笔记摘要报告
    - 基于笔记自动生成项目进度文档
    - 将笔记内容同步到其他系统(如 Notion、Confluence)
 
-## 9.5 TerminalTool:即时文件系统访问
+## 9.5 TerminalTool即时文件系统访问
 
 在前面的章节中,我们介绍了 MemoryTool 和 RAGTool,它们分别提供了对话记忆和知识检索能力。然而,在许多实际场景中,智能体需要<strong>即时访问和探索文件系统</strong>——查看日志文件、分析代码库结构、检索配置文件等。这就是 TerminalTool 的用武之地。
 
@@ -1592,11 +1596,11 @@ TerminalTool 为智能体提供了<strong>安全的命令行执行能力</strong
 
 (1)为什么需要 TerminalTool?
 
-在构建长程智能体时,我们经常遇到以下场景:
+在构建长程智能体时,我们经常遇到以下场景
 
-<strong>场景1:代码库探索</strong>
+<strong>场景1代码库探索</strong>
 
-一个开发助手需要帮助用户理解一个大型代码库的结构:
+一个开发助手需要帮助用户理解一个大型代码库的结构
 
 ```python
 # 传统方式:预先索引所有文件(成本高、可能过时)
@@ -1608,9 +1612,9 @@ terminal.run({"command": "grep -r 'class UserService' ."})  # 精确定位
 terminal.run({"command": "head -n 50 src/services/user.py"})  # 按需查看
 ```
 
-<strong>场景2:日志文件分析</strong>
+<strong>场景2日志文件分析</strong>
 
-一个运维助手需要分析应用日志:
+一个运维助手需要分析应用日志
 
 ```python
 # 检查日志文件大小
@@ -1623,9 +1627,9 @@ terminal.run({"command": "tail -n 100 /var/log/app.log | grep ERROR"})
 terminal.run({"command": "grep ERROR /var/log/app.log | cut -d':' -f3 | sort | uniq -c"})
 ```
 
-<strong>场景3:数据文件预览</strong>
+<strong>场景3数据文件预览</strong>
 
-一个数据分析助手需要快速了解数据文件的结构:
+一个数据分析助手需要快速了解数据文件的结构
 
 ```python
 # 查看 CSV 文件的前几行
@@ -1638,15 +1642,15 @@ terminal.run({"command": "wc -l data/*.csv"})
 terminal.run({"command": "head -n 1 data/sales.csv | tr ',' '\n'"})
 ```
 
-这些场景的共同特点是:<strong>需要实时、轻量级的文件系统访问,而不是预先索引和向量化</strong>。TerminalTool 正是为这种"探索式"工作流设计的。
+这些场景的共同特点是<strong>需要实时、轻量级的文件系统访问,而不是预先索引和向量化</strong>。TerminalTool 正是为这种"探索式"工作流设计的。
 
 (2)安全机制详解
 
-允许智能体执行命令是一个强大但危险的能力。TerminalTool 通过多层安全机制确保系统安全:
+允许智能体执行命令是一个强大但危险的能力。TerminalTool 通过多层安全机制确保系统安全
 
-<strong>第一层:命令白名单</strong>
+<strong>第一层命令白名单</strong>
 
-只允许安全的只读命令,完全禁止任何可能修改系统的操作:
+只允许安全的只读命令,完全禁止任何可能修改系统的操作
 
 ```python
 ALLOWED_COMMANDS = {
@@ -1667,7 +1671,7 @@ ALLOWED_COMMANDS = {
 }
 ```
 
-如果智能体尝试执行白名单外的命令,会立即被拒绝:
+如果智能体尝试执行白名单外的命令,会立即被拒绝
 
 ```python
 terminal.run({"command": "rm -rf /"})
@@ -1675,9 +1679,9 @@ terminal.run({"command": "rm -rf /"})
 # 允许的命令: cat, cd, cut, dir, du, ...
 ```
 
-<strong>第二层:工作目录限制(沙箱)</strong>
+<strong>第二层工作目录限制(沙箱)</strong>
 
-TerminalTool 只能访问指定的工作目录及其子目录,无法访问系统其他部分:
+TerminalTool 只能访问指定的工作目录及其子目录,无法访问系统其他部分
 
 ```python
 # 初始化时指定工作目录
@@ -1695,9 +1699,9 @@ terminal.run({"command": "cd ../../../etc"})  # ❌ 不允许访问工作目录
 
 这种沙箱机制确保了即使智能体的行为出现异常,也无法影响系统其他部分。
 
-<strong>第三层:超时控制</strong>
+<strong>第三层超时控制</strong>
 
-每个命令都有执行时间限制,防止无限循环或资源耗尽:
+每个命令都有执行时间限制,防止无限循环或资源耗尽
 
 ```python
 terminal = TerminalTool(
@@ -1710,9 +1714,9 @@ terminal.run({"command": "find / -name '*.log'"})
 # ❌ 命令执行超时(超过 30 秒)
 ```
 
-<strong>第四层:输出大小限制</strong>
+<strong>第四层输出大小限制</strong>
 
-限制命令输出的大小,防止内存溢出:
+限制命令输出的大小,防止内存溢出
 
 ```python
 terminal = TerminalTool(
@@ -1730,11 +1734,11 @@ terminal.run({"command": "cat huge_file.log"})
 
 ### 9.5.2 核心功能详解
 
-TerminalTool 的实现聚焦于两个核心功能:命令执行和目录导航。
+TerminalTool 的实现聚焦于两个核心功能命令执行和目录导航。
 
 (1)命令执行
 
-核心的 `_execute_command` 方法负责实际执行命令:
+核心的 `_execute_command` 方法负责实际执行命令
 
 ```python
 def _execute_command(self, command: str) -> str:
@@ -1773,16 +1777,16 @@ def _execute_command(self, command: str) -> str:
         return f"❌ 命令执行失败: {e}"
 ```
 
-这个实现的关键点:
+这个实现的关键点
 
-- <strong>当前目录感知</strong>:使用 `cwd` 参数在正确的目录下执行命令
-- <strong>错误处理</strong>:捕获并合并标准错误,提供完整的诊断信息
-- <strong>返回码检查</strong>:非零返回码会被标记为警告
-- <strong>容错设计</strong>:超时和异常都会被妥善处理,不会导致智能体崩溃
+- <strong>当前目录感知</strong>使用 `cwd` 参数在正确的目录下执行命令
+- <strong>错误处理</strong>捕获并合并标准错误,提供完整的诊断信息
+- <strong>返回码检查</strong>非零返回码会被标记为警告
+- <strong>容错设计</strong>超时和异常都会被妥善处理,不会导致智能体崩溃
 
 (2)目录导航
 
-`cd` 命令的特殊处理支持智能体在文件系统中导航:
+`cd` 命令的特殊处理支持智能体在文件系统中导航
 
 ```python
 def _handle_cd(self, parts: List[str]) -> str:
@@ -1824,7 +1828,7 @@ def _handle_cd(self, parts: List[str]) -> str:
     return f"✅ 切换到目录: {self.current_dir}"
 ```
 
-这种设计支持智能体进行多步骤的文件系统探索:
+这种设计支持智能体进行多步骤的文件系统探索
 
 ```python
 # 第一步:查看项目结构
@@ -1846,7 +1850,7 @@ TerminalTool 支持多种常见的文件系统操作模式。
 
 (1)探索式导航
 
-智能体可以像人类开发者一样逐步探索代码库:
+智能体可以像人类开发者一样逐步探索代码库
 
 ```python
 from hello_agents.tools import TerminalTool
@@ -1875,7 +1879,7 @@ print(terminal.run({"command": "grep -r 'def process' ."}))
 
 (2)数据文件分析
 
-快速了解数据文件的结构和内容:
+快速了解数据文件的结构和内容
 
 ```python
 terminal = TerminalTool(workspace="./data")
@@ -1909,7 +1913,7 @@ print(terminal.run({"command": "tail -n +2 sales_2024.csv | cut -d',' -f2 | sort
 
 (3)日志文件分析
 
-实时分析应用日志,快速定位问题:
+实时分析应用日志,快速定位问题
 
 ```python
 terminal = TerminalTool(workspace="/var/log")
@@ -1932,7 +1936,7 @@ print(terminal.run({"command": "grep '2024-01-19 15:' app.log | tail -n 20"}))
 
 (4)代码库分析
 
-辅助代码审查和理解:
+辅助代码审查和理解
 
 ```python
 terminal = TerminalTool(workspace="./codebase")
@@ -1956,7 +1960,7 @@ TerminalTool 的真正威力在于与 MemoryTool、NoteTool 和 ContextBuilder 
 
 (1)与 MemoryTool 协同
 
-TerminalTool 发现的信息可以存储到记忆系统中:
+TerminalTool 发现的信息可以存储到记忆系统中
 
 ```python
 # 使用 TerminalTool 发现项目结构
@@ -1974,7 +1978,7 @@ memory_tool.execute(
 
 (2)与 NoteTool 协同
 
-重要的发现可以记录为结构化笔记:
+重要的发现可以记录为结构化笔记
 
 ```python
 # 发现一个性能瓶颈
@@ -1992,7 +1996,7 @@ note_tool.run({
 
 (3)与 ContextBuilder 协同
 
-TerminalTool 的输出可以作为上下文的一部分:
+TerminalTool 的输出可以作为上下文的一部分
 
 ```python
 # 探索代码库
@@ -2027,9 +2031,9 @@ context = context_builder.build(
 )
 ```
 
-## 9.6 长程智能体实战:代码库维护助手
+## 9.6 长程智能体实战代码库维护助手
 
-现在,让我们将 ContextBuilder、NoteTool 和 TerminalTool 整合起来,构建一个完整的长程智能体——<strong>代码库维护助手</strong>。这个助手能够:
+现在,让我们将 ContextBuilder、NoteTool 和 TerminalTool 整合起来,构建一个完整的长程智能体——<strong>代码库维护助手</strong>。这个助手能够
 
 1. 探索和理解代码库结构
 2. 记录发现的问题和改进点
@@ -2048,7 +2052,7 @@ context = context_builder.build(
 
 ### 9.6.2 系统架构设计
 
-我们的代码库维护助手采用三层架构,如图9.3所示:
+我们的代码库维护助手采用三层架构,如图9.3所示
 
 <div align="center">
   <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/9-figures/9-3.png" alt="" width="85%"/>
@@ -2058,7 +2062,7 @@ context = context_builder.build(
 
 ### 9.6.3 核心实现
 
-现在让我们实现这个系统的核心类:
+现在让我们实现这个系统的核心类
 
 ```python
 from typing import Dict, Any, List, Optional
@@ -2468,7 +2472,7 @@ class CodebaseMaintainer:
 
 ### 9.6.4 完整使用示例
 
-现在让我们通过一个完整的使用场景,展示这个长程智能体的工作流程:
+现在让我们通过一个完整的使用场景,展示这个长程智能体的工作流程
 
 ```python
 # ========== 初始化助手 ==========
@@ -2747,27 +2751,27 @@ print(json.dumps(report, indent=2, ensure_ascii=False))
 
 ## 9.7 本章总结
 
-在本章中,我们深入探讨了上下文工程的理论基础和工程实践:
+在本章中,我们深入探讨了上下文工程的理论基础和工程实践
 
 ### 理论层面
 
-1. <strong>上下文工程的本质</strong>:从"提示工程"到"上下文工程"的演进,核心是管理有限的注意力预算
-2. <strong>上下文腐蚀</strong>:理解长上下文带来的性能下降,认识到上下文是稀缺资源
-3. <strong>三大策略</strong>:压缩整合、结构化笔记、子代理架构
+1. <strong>上下文工程的本质</strong>从"提示工程"到"上下文工程"的演进,核心是管理有限的注意力预算
+2. <strong>上下文腐蚀</strong>理解长上下文带来的性能下降,认识到上下文是稀缺资源
+3. <strong>三大策略</strong>压缩整合、结构化笔记、子代理架构
 
 ### 工程实践
 
-1. <strong>ContextBuilder</strong>:实现了 GSSC 流水线,提供统一的上下文管理接口
-2. <strong>NoteTool</strong>:Markdown+YAML 的混合格式,支持结构化的长期记忆
-3. <strong>TerminalTool</strong>:安全的命令行工具,支持即时的文件系统访问
-4. <strong>长程智能体</strong>:整合三大工具,构建了跨会话的代码库维护助手
+1. <strong>ContextBuilder</strong>实现了 GSSC 流水线,提供统一的上下文管理接口
+2. <strong>NoteTool</strong>Markdown+YAML 的混合格式,支持结构化的长期记忆
+3. <strong>TerminalTool</strong>安全的命令行工具,支持即时的文件系统访问
+4. <strong>长程智能体</strong>整合三大工具,构建了跨会话的代码库维护助手
 
 ### 核心收获
 
-- <strong>分层设计</strong>:即时访问(TerminalTool) + 会话记忆(MemoryTool) + 持久笔记(NoteTool)
-- <strong>智能筛选</strong>:基于相关性和新近性的评分机制
-- <strong>安全第一</strong>:多层安全机制确保系统稳定
-- <strong>人机协作</strong>:自动化与可控性的平衡
+- <strong>分层设计</strong>即时访问(TerminalTool) + 会话记忆(MemoryTool) + 持久笔记(NoteTool)
+- <strong>智能筛选</strong>基于相关性和新近性的评分机制
+- <strong>安全第一</strong>多层安全机制确保系统稳定
+- <strong>人机协作</strong>自动化与可控性的平衡
 
 通过这一章的学习,您不仅掌握了上下文工程的核心技术,更重要的是理解了如何构建能够在长时间跨度内保持连贯性和有效性的智能体系统。这些技能将成为您构建生产级智能体应用的重要基础。
 

+ 0 - 0
docs/images/5-figures/dify-12-new.png → docs/images/5-figures/dify-12.png


+ 106 - 1
docs/index.html

@@ -9,9 +9,35 @@
     <meta name="viewport"
         content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
     <link rel="stylesheet" href="//cdn.jsdelivr.net/npm/docsify@latest/lib/themes/vue.css">
+    <style>
+        /* 语言切换按钮样式 */
+        .lang-switch {
+            position: fixed;
+            top: 20px;
+            right: 80px;
+            z-index: 999;
+        }
+        .lang-switch button {
+            background: #42b983;
+            color: white;
+            border: none;
+            padding: 8px 16px;
+            border-radius: 4px;
+            cursor: pointer;
+            font-size: 14px;
+            transition: background 0.3s;
+        }
+        .lang-switch button:hover {
+            background: #33a06f;
+        }
+    </style>
 </head>
 
 <body>
+    <!-- 语言切换按钮 -->
+    <div class="lang-switch">
+        <button id="langBtn" onclick="switchLanguage()">English</button>
+    </div>
     <div id="app"></div>
     <script src="//cdn.jsdelivr.net/npm/mermaid@8.0.0-rc.8/dist/mermaid.min.js"></script>
     <script>
@@ -23,7 +49,14 @@
             subMaxLevel: 3,
             relativePath: false,  // 启用相对路径支持
             alias: {
-                '/.*/_sidebar.md': '/_sidebar.md'
+                // 英文路径映射
+                '/en/README.md': '/README_EN.md',
+                '/en/Preface.md': '/Preface.md',
+                '/en/_sidebar.md': '/_sidebar_en.md',
+                '/en/chapter(\\d+)/Chapter(.*)': '/chapter$1/Chapter$2',
+
+                // 默认中文路径保持不变
+                '/_sidebar.md': '/_sidebar.md'
             },
             pagination: {
                 previousText: '上一章节',
@@ -34,6 +67,12 @@
                 fontsize: '0.9em',
                 color: 'rgb(90,90,90)',
                 language: 'chinese'
+            },
+            // 多语言配置
+            fallbackLanguages: ['en'],
+            nameLink: {
+                '/en/': '#/en/',
+                '/': '#/'
             }
         }
     </script>
@@ -52,6 +91,72 @@
     <script src="//cdn.jsdelivr.net/npm/docsify-katex@latest/dist/docsify-katex.js"></script>
     <!-- 字数统计 -->
     <script src="//unpkg.com/docsify-count/dist/countable.js"></script>
+
+    <!-- 语言切换脚本 -->
+    <script>
+        function switchLanguage() {
+            const currentHash = window.location.hash;
+            const langBtn = document.getElementById('langBtn');
+
+            // 检测当前语言
+            if (currentHash.includes('/en/')) {
+                // 从英文切换到中文 (移除 /en/ 前缀)
+                let newHash = currentHash.replace('/en/', '/');
+                // 将英文文件名转换为中文文件名
+                newHash = newHash.replace('/README_EN.md', '/README.md');
+                newHash = newHash.replace('/Preface.md', '/前言.md');
+                // 将 Chapter*.md 转换为 第*章 *.md (需要根据实际文件名映射)
+                // 这里简化处理,直接跳转到首页
+                if (newHash.includes('Chapter')) {
+                    newHash = '#/';
+                }
+                window.location.hash = newHash || '#/';
+                langBtn.textContent = 'English';
+                // 更新分页按钮文本
+                window.$docsify.pagination.previousText = '上一章节';
+                window.$docsify.pagination.nextText = '下一章节';
+            } else {
+                // 从中文切换到英文 (添加 /en/ 前缀)
+                let newHash;
+                if (currentHash === '' || currentHash === '#/' || currentHash === '#') {
+                    // 首页
+                    newHash = '#/en/README.md';
+                } else {
+                    // 其他页面,添加 /en/ 前缀
+                    const path = currentHash.replace('#/', '');
+                    // 将中文文件名转换为英文文件名
+                    let enPath = path.replace('README.md', 'README_EN.md');
+                    enPath = enPath.replace('前言.md', 'Preface.md');
+                    // 将 第*章 *.md 转换为 Chapter*.md (需要根据实际文件名映射)
+                    // 这里简化处理,直接跳转到英文首页
+                    if (enPath.includes('第') && enPath.includes('章')) {
+                        enPath = 'README_EN.md';
+                    }
+                    newHash = '#/en/' + enPath;
+                }
+                window.location.hash = newHash;
+                langBtn.textContent = '中文';
+                // 更新分页按钮文本
+                window.$docsify.pagination.previousText = 'Previous';
+                window.$docsify.pagination.nextText = 'Next';
+            }
+
+            // 重新加载页面以应用新的 sidebar
+            window.location.reload();
+        }
+
+        // 页面加载时设置按钮文本
+        window.addEventListener('load', function() {
+            const currentHash = window.location.hash;
+            const langBtn = document.getElementById('langBtn');
+
+            if (currentHash.includes('/en/')) {
+                langBtn.textContent = '中文';
+            } else {
+                langBtn.textContent = 'English';
+            }
+        });
+    </script>
 </body>
 
 </html>

+ 8 - 4
docs/前言.md

@@ -1,9 +1,13 @@
+<div align="right">
+  <a href="./Preface.md">English</a> | 中文
+</div>
+
 # 前言
-自2022年底以来,以 ChatGPT 为代表的大语言模型(Large Language Model, LLM)如同一场技术海啸,彻底改变了我们与人工智能交互的方式。LLM 强大的自然语言理解和生成能力,让我们看到了通往通用人工智能(AGI)的曙光。然而,当最初的惊艳沉淀下来,开发者们开始探索下一个前沿:如何让 AI 不仅仅是一个“有问必答”的工具,而是成为一个能自主规划、调用工具、解决复杂问题的“行动者”?
+自 2022 年底以来,以 ChatGPT 为代表的大语言模型(Large Language Model, LLM)如同一场技术海啸,彻底改变了我们与人工智能交互的方式。LLM 强大的自然语言理解和生成能力,让我们看到了通往通用人工智能(AGI)的曙光。然而,当最初的惊艳沉淀下来,开发者们开始探索下一个前沿:如何让 AI 不仅仅是一个“有问必答”的工具,而是成为一个能自主规划、调用工具、解决复杂问题的“行动者”?
 
 答案,就是 智能体(Agent)。
 
-如果说2024年是“百模大战”的元年,那么2025年无疑开启了“Agent元年”。我们看到,技术的焦点正从训练更大、更强的基础模型,转向如何构建更聪明、更高效的智能体应用。单个智能体已经能胜任特定领域的任务,而由多个智能体分工、协作、甚至辩论,共同完成一个宏大目标的多智能体系统(Multi-Agent System, MAS),则被视为释放 LLM 全部潜能、解决真实世界复杂问题的关键钥匙。
+如果说 2024 年是“百模大战”的元年,那么 2025 年无疑开启了“Agent 元年”。我们看到,技术的焦点正从训练更大、更强的基础模型,转向如何构建更聪明、更高效的智能体应用。单个智能体已经能胜任特定领域的任务,而由多个智能体分工、协作、甚至辩论,共同完成一个宏大目标的多智能体系统(Multi-Agent System, MAS),则被视为释放 LLM 全部潜能、解决真实世界复杂问题的关键钥匙。
 
 然而,当前的生态中存在一个明显的断层:一方面是层出不穷的 Agent 框架和应用,令人眼花缭乱;另一方面,却是系统性知识的极度匮乏。大多数教程聚焦于某个特定框架的 API 调用,学习者往往“知其然,而不知其所以然”,在面对复杂需求时,依然感到力不从心。我们缺少一本能够穿透框架表象,从第一性原理出发,系统讲解智能体设计、构建与协作的实战指南。
 
@@ -19,7 +23,7 @@
 
 - 具备基础的 Python 编程能力。
 
-- 对大语言模型有基本的概念性了解(例如,知道如何获取LLM的API)。
+- 对大语言模型有基本的概念性了解(例如,知道如何获取 LLM  API)。
 
 - 请放心,你无需具备深厚的算法或模型训练背景,项目的重点是应用与构建。
 
@@ -39,4 +43,4 @@
 
 最后,作为一个开源项目,我们热忱欢迎你的参与和贡献。当你遇到问题时,可以在我们的社区中提问;当你有了新的想法或发现时,也欢迎你随时加入到项目的共建中来。
 
-感谢你选择阅读Hello-agents,祝你学习愉快,探索无限!
+感谢你选择阅读 Hello-agents,祝你学习愉快,探索无限!

Some files were not shown because too many files changed in this diff