2 ngày trước cách đây · b8d8908ad0
--- a/code/chapter15/Helloagents-AI-Town/MEMORY_SYSTEM_GUIDE.md
+++ b/code/chapter15/Helloagents-AI-Town/MEMORY_SYSTEM_GUIDE.md
@@ -233,9 +233,9 @@ memory_config = MemoryConfig(
 
				     storage_path=f"./memory_data/{npc_name}",  # 存储路径
			
 
				     working_memory_capacity=10,                # 工作记忆容量
			
 
				     working_memory_tokens=2000,                # 工作记忆token限制
			
 
				-    episodic_memory_capacity=100,              # 情景记忆容量
			
 
				-    enable_forgetting=True,                    # 启用遗忘机制
			
 
				-    forgetting_threshold=0.3                   # 遗忘阈值
			
 
				+    max_capacity=100,                          # 记忆总容量
			
 
				+    importance_threshold=0.3,                  # 重要性阈值
			
 
				+    decay_factor=0.95                          # 时间衰减系数
			
 
				 )
			
 
				 ```
			
 
				 
			
@@ -245,8 +245,9 @@ memory_config = MemoryConfig(
 
				 |------|--------|----------|------|
			
 
				 | working_memory_capacity | 10 | 5-20 | 工作记忆容量,越大越占内存 |
			
 
				 | working_memory_tokens | 2000 | 1000-4000 | Token限制,影响上下文长度 |
			
 
				-| episodic_memory_capacity | 100 | 50-500 | 长期记忆容量,越大越占磁盘 |
			
 
				-| forgetting_threshold | 0.3 | 0.1-0.5 | 遗忘阈值,越低越容易遗忘 |
			
 
				+| max_capacity | 100 | 50-500 | 记忆总容量,越大越占磁盘 |
			
 
				+| importance_threshold | 0.3 | 0.1-0.5 | 重要性阈值,越高越偏向保留重要记忆 |
			
 
				+| decay_factor | 0.95 | 0.8-0.99 | 时间衰减系数,越低越强调近期记忆 |
			
 
				 
			
 
				 ---
			
 
				 
			
@@ -319,7 +320,7 @@ rm -rf backend/memory_data/张三
 
				 **解决方法:**
			
 
				 - 检查日志中是否有"记忆系统已初始化"
			
 
				 - 检查memory_data目录是否存在
			
 
				-- 调高forgetting_threshold参数
			
 
				+- 降低importance_threshold参数
			
 
				 
			
 
				 ### Q2: 记忆检索不准确?
			
 
				 
			
@@ -335,8 +336,8 @@ rm -rf backend/memory_data/张三
 
				 ### Q3: 记忆占用空间太大?
			
 
				 
			
 
				 **解决方法:**
			
 
				-- 降低episodic_memory_capacity
			
 
				-- 提高forgetting_threshold
			
 
				+- 降低max_capacity
			
 
				+- 提高importance_threshold
			
 
				 - 定期清理旧记忆
			
 
				 
			
 
				 ---
			
--- a/code/chapter15/Helloagents-AI-Town/backend/agents.py
+++ b/code/chapter15/Helloagents-AI-Town/backend/agents.py
@@ -148,9 +148,9 @@ class NPCAgentManager:
 
				             storage_path=memory_dir,
			
 
				             working_memory_capacity=10,  # 最近10条对话
			
 
				             working_memory_tokens=2000,  # 最多2000个token
			
 
				-            episodic_memory_capacity=100,  # 最多100条长期记忆
			
 
				-            enable_forgetting=True,  # 启用遗忘机制
			
 
				-            forgetting_threshold=0.3  # 重要性低于0.3的记忆会被遗忘
			
 
				+            max_capacity=100,  # 最多100条长期记忆
			
 
				+            importance_threshold=0.3,  # 检索和整合时关注重要性较高的记忆
			
 
				+            decay_factor=0.95  # 时间衰减系数
			
 
				         )
			
 
				 
			
 
				         # 创建记忆管理器
			
--- a/docs/chapter1/Chapter1-Introduction-to-Agents.md
+++ b/docs/chapter1/Chapter1-Introduction-to-Agents.md
@@ -557,7 +557,7 @@ Unlike workflows, agents based on large language models are **autonomous, goal-o
 
				 
			
 
				 In this process, there are no hard-coded rules like `if weather=sunny then recommend Summer Palace`. If the weather is "rainy," the agent will autonomously reason and recommend indoor venues such as the National Museum or Capital Museum. **This ability to dynamically reason and make decisions based on real-time information is the core value of agents.**
			
 
				 
			
 
				-## 1.4 Chapter Summary
			
 
				+## 1.5 Chapter Summary
			
 
				 
			
 
				 In this chapter, we embarked on an introductory journey to explore agents. Our journey began with the most fundamental questions:
			
 
				 
			
--- a/docs/chapter1/第一章初识智能体.md
+++ b/docs/chapter1/第一章初识智能体.md
@@ -564,7 +564,7 @@ Action: Finish[今天北京的天气是晴朗的，气温26摄氏度，非常适
 
				 
			
 
				 
			
 
				 
			
 
				-## 1.4 本章小结
			
 
				+## 1.5 本章小结
			
 
				 
			
 
				 在本章中，我们共同踏上了探索智能体的初识之旅。我们的旅程从最基本的问题开始：
			
 
				 
			
--- a/docs/chapter3/Chapter3-Fundamentals-of-Large-Language-Models.md
+++ b/docs/chapter3/Chapter3-Fundamentals-of-Large-Language-Models.md
@@ -489,11 +489,13 @@ The working mode of the Decoder-Only architecture is called **Autoregressive**.
 
				 
			
 
				 The model is like playing a "word chain" game, constantly "reviewing" the content it has already written, then thinking about what the next word should be.
			
 
				 
			
 
				-You might ask, how does the decoder ensure that when predicting the `t`-th word, it doesn't "peek" at the answer of the `t+1`-th word?
			
 
				+You might ask: during training, the model is often given the complete text sequence at once, so how does it ensure that when learning to predict the next token, it does not "peek" at later answers?
			
 
				 
			
 
				 The answer is **Masked Self-Attention**. In the Decoder-Only architecture, this mechanism becomes crucial. Its working principle is very clever:
			
 
				 
			
 
				-After the self-attention mechanism calculates the attention score matrix (i.e., each word's attention score to all other words), but before performing Softmax normalization, the model applies a "mask." This mask replaces the scores corresponding to all tokens located after the current position (i.e., not yet observed) with a very large negative number. When this matrix with negative infinity scores goes through the Softmax function, the probabilities at these positions become 0. This way, when the model calculates the output at any position, it is mathematically prevented from attending to information after it. This mechanism ensures that when predicting the next word, the model can and only can rely on all information it has already seen, located before the current position, thereby ensuring fairness of prediction and coherence of logic.
			
 
				+During training, although a whole text sequence can be fed into the model in parallel, after the self-attention mechanism calculates the attention score matrix (i.e., each word's attention score to all other words) and before Softmax normalization, the model applies a causal mask. This mask replaces the scores corresponding to all tokens after the current position with a very large negative number. When this matrix goes through Softmax, the probabilities at those positions become 0. In this way, when the model calculates the representation at any position, it is mathematically prevented from attending to information after that position.
			
 
				+
			
 
				+During generation, the situation is even more direct: future tokens have not been generated yet, so the model can only use the already generated content as context and predict the next token step by step. Masked self-attention keeps the training objective consistent with autoregressive generation, ensuring that the model always relies only on information before the current position.
			
 
				 
			
 
				 **Advantages of Decoder-Only Architecture**
			
 
				 
			
--- a/docs/chapter3/第三章大语言模型基础.md
+++ b/docs/chapter3/第三章大语言模型基础.md
@@ -491,11 +491,13 @@ Decoder-Only 架构的工作模式被称为<strong>自回归 (Autoregressive)</s
 
				 
			
 
				 模型就像一个在玩“文字接龙”的游戏，它不断地“回顾”自己已经写下的内容，然后思考下一个字该写什么。
			
 
				 
			
 
				-你可能会问，解码器是如何保证在预测第 `t` 个词时，不去“偷看”第 `t+1` 个词的答案呢？
			
 
				+你可能会问，在训练时模型通常会一次性拿到完整文本，那么它是如何保证学习预测第 `t+1` 个词时，不去“偷看”更后面的答案呢？
			
 
				 
			
 
				 答案就是<strong>掩码自注意力 (Masked Self-Attention)</strong> 。在 Decoder-Only 架构中，这个机制变得至关重要。它的工作原理非常巧妙：
			
 
				 
			
 
				-在自注意力机制计算出注意力分数矩阵（即每个词对其他所有词的关注度得分）之后，但在进行 Softmax 归一化之前，模型会应用一个“掩码”。这个掩码会将所有位于当前位置之后（即目前尚未观测到）的词元对应的分数，替换为一个非常大的负数。当这个带有负无穷分数的矩阵经过 Softmax 函数时，这些位置的概率就会变为 0。这样一来，模型在计算任何一个位置的输出时，都从数学上被阻止了去关注它后面的信息。这种机制保证了模型在预测下一个词时，能且仅能依赖它已经见过的、位于当前位置之前的所有信息，从而确保了预测的公平性和逻辑的连贯性。
			
 
				+在训练阶段，虽然一段文本可以被并行送入模型，但在自注意力机制计算出注意力分数矩阵（即每个词对其他所有词的关注度得分）之后、进行 Softmax 归一化之前，模型会应用一个“因果掩码”。这个掩码会将所有位于当前位置之后的词元对应的分数替换为一个非常大的负数。当这个带有负无穷分数的矩阵经过 Softmax 函数时，这些位置的概率就会变为 0。这样一来，模型在计算任意位置的表示时，都从数学上被阻止了去关注它后面的信息。
			
 
				+
			
 
				+在生成阶段则更直接：未来的词元还没有生成出来，模型只能把已经生成的内容作为上下文输入，并一步步预测下一个词。掩码自注意力保证了训练时的学习方式与生成时的自回归使用方式保持一致，从而让模型始终只依赖当前位置之前的信息。
			
 
				 
			
 
				 <strong>Decoder-Only 架构的优势</strong>
			
 
				 
			
--- a/docs/chapter8/Chapter8-Memory-and-Retrieval.md
+++ b/docs/chapter8/Chapter8-Memory-and-Retrieval.md
@@ -51,11 +51,14 @@ agent = SimpleAgent(name="Learning Assistant", llm=HelloAgentsLLM())
 
				 response1 = agent.run("My name is Zhang San, I'm learning Python and have mastered basic syntax")
			
 
				 print(response1)  # "Great! Python basic syntax is an important foundation for programming..."
			
 
				  
			
 
				-# Second conversation (new session)
			
 
				+# Second conversation (new session, such as after restarting the program and creating a new Agent)
			
 
				+agent = SimpleAgent(name="Learning Assistant", llm=HelloAgentsLLM())
			
 
				 response2 = agent.run("Do you remember my learning progress?")
			
 
				 print(response2)  # "Sorry, I don't know your learning progress..."
			
 
				 ```
			
 
				 
			
 
				+Note that the `SimpleAgent` from Chapter 7 temporarily stores the current dialogue in `_history` within the same instance, so consecutive turns in the same process and instance can carry recent context. However, this history is only a temporary message list. It is not persisted across sessions and does not support long-term retrieval, forgetting, or consolidation.
			
 
				+
			
 
				 To solve this problem, our framework needs to introduce a memory system.
			
 
				 
			
 
				 (2) Limitation 2: Limitations of Model's Built-in Knowledge
			
--- a/docs/chapter8/第八章记忆与检索.md
+++ b/docs/chapter8/第八章记忆与检索.md
@@ -51,11 +51,14 @@ agent = SimpleAgent(name="学习助手", llm=HelloAgentsLLM())
 
				 response1 = agent.run("我叫张三，正在学习Python，目前掌握了基础语法")
			
 
				 print(response1)  # "很好！Python基础语法是编程的重要基础..."
			
 
				  
			
 
				-# 第二次对话（新的会话）
			
 
				+# 第二次对话（新的会话，例如重启程序后重新创建Agent）
			
 
				+agent = SimpleAgent(name="学习助手", llm=HelloAgentsLLM())
			
 
				 response2 = agent.run("你还记得我的学习进度吗？")
			
 
				 print(response2)  # "抱歉，我不知道您的学习进度..."
			
 
				 ```
			
 
				 
			
 
				+需要注意的是，第七章中的 `SimpleAgent` 会在同一个实例的 `_history` 中暂存当前对话，因此同一进程、同一实例内的连续对话可以携带最近上下文。但这种历史只是临时消息列表，不会跨会话持久化，也不能进行长期检索、遗忘和整合。
			
 
				+
			
 
				 要解决这个问题，我们的框架需要引入记忆系统。
			
 
				 
			
 
				 （2）局限二：模型内置知识的局限性