English | 中文

# Chapter 9 Context Engineering In previous chapters, we have introduced memory systems and RAG for agents. However, to enable agents to stably "think" and "act" in real complex scenarios, memory and retrieval alone are not enough—we need an engineering methodology to continuously and systematically construct appropriate "context" for the model. This is the theme of this chapter: Context Engineering. It focuses on "how to assemble and optimize input context in a reusable, measurable, and evolvable way before each model call", thereby improving correctness, robustness, and efficiency^[1][2]. To enable readers to quickly experience the complete functionality of this chapter, we provide a directly installable Python package. You can install the version corresponding to this chapter with the following command: ```bash pip install "hello-agents[all]==0.2.7" ``` This chapter mainly introduces the core concepts and practices of context engineering, and adds a context builder and two supporting tools to the HelloAgents framework: - **ContextBuilder** (`hello_agents/context/builder.py`): Context builder that implements the GSSC (Gather-Select-Structure-Compress) pipeline, providing a unified context management interface - **NoteTool** (`hello_agents/tools/builtin/note_tool.py`): Structured note tool that supports persistent memory management for agents - **TerminalTool** (`hello_agents/tools/builtin/terminal_tool.py`): Terminal tool that supports file system operations and just-in-time context retrieval for agents These components together constitute a complete context engineering solution, which is key to implementing long-term task management and agentic search, and will be introduced in detail in subsequent sections. In addition to installing the framework, you also need to configure the LLM API in `.env`. The examples in this chapter mainly use large language models for context management and intelligent decision-making. After configuration is complete, you can start the learning journey of this chapter! ## 9.1 What is Context Engineering After years of Prompt Engineering becoming the focus of applied AI, a new term has come to the forefront: **Context Engineering**. Today, building systems with language models is no longer just about finding the right phrasing and wording in prompts, but about answering a more macro question: **What kind of context configuration is most likely to make the model produce the behavior we expect?** The so-called "context" refers to the set of tokens included when sampling a large language model (LLM). The engineering problem at hand is to **optimize the utility of these tokens** under the inherent constraints of the LLM, in order to stably obtain expected results. To effectively harness LLMs, it is often necessary to "think in context"—that is: at any call, examine the overall state visible to the LLM and predict the behavior this state might induce.

Figure 9.1 Prompt engineering vs Context engineering

This section will explore the emerging context engineering and provide a refined mental model for building **controllable and effective** agents. **Context Engineering vs. Prompt Engineering** As shown in Figure 9.1, from the perspective of leading model vendors, context engineering is the natural evolution of prompt engineering. Prompt engineering focuses on how to write and organize LLM instructions to obtain better results (such as system prompt writing and structured strategies); while context engineering is **how to plan and maintain the "optimal information set (tokens)" during the inference stage**, which includes not only the prompt itself, but also all other information that will enter the context window. In the early stages of LLM engineering, prompts were often the main work, because most use cases (except daily chat) required fine-tuned prompt optimization for single-turn classification or text generation. As the name suggests, the core of prompt engineering is "how to write effective prompts", especially system prompts. However, as we begin to engineer stronger agents that work over longer time spans and across multiple inference rounds, we need strategies that can manage the **entire context state**—including system instructions, tools, MCP (Model Context Protocol), external data, message history, etc. An agent running in a loop will continuously generate data that may be relevant to the next round of inference. This information must be **periodically refined**. Therefore, the "art and technique" of context engineering lies in **identifying which content should enter the limited context window** from the continuously expanding "candidate information universe". ## 9.2 Why Context Engineering is Important Although models are getting faster and can handle larger data scales, we observe that: like humans, LLMs will "wander" or "get confused" at a certain point. Needle-in-a-haystack benchmarks reveal a phenomenon: **context rot**—as the number of tokens in the context window increases, the model's ability to accurately recall information from the context actually decreases. Different models may have smoother degradation curves, but this characteristic appears in almost all models. Therefore, **context must be viewed as a limited resource with diminishing marginal returns**. Just as humans have limited working memory capacity, LLMs also have an "attention budget". Each new token consumes part of this budget, so we need to be more careful about which tokens should be provided to the LLM. This scarcity is not accidental, but stems from the architectural constraints of LLMs. Transformers allow each token to establish associations with **all** tokens in the context, theoretically forming $n^2$ pairwise attention relationships. As the context length grows, the model's ability to model these pairwise relationships is "stretched thin", naturally creating tension between "context scale" and "attention concentration". In addition, the model's attention patterns come from the training data distribution—short sequences are usually more common than long sequences, so the model has less experience with "full-context dependencies" and fewer specialized parameters. Techniques such as position encoding interpolation can allow models to "adapt" to sequences longer than during training at inference time, but at the cost of some precision in understanding token positions. Overall, these factors together form a **performance gradient** rather than a "cliff-like" collapse: models are still powerful in long contexts, but compared to short contexts, their precision in information retrieval and long-range reasoning will decline. Based on the above reality, **conscious context engineering** becomes a necessity for building robust agents. ### 9.2.1 The "Anatomy" of Effective Context Under the constraint of "limited attention budget", the goal of excellent context engineering is: **maximize the probability of obtaining expected results with as few but high signal density tokens as possible**. In practice, we recommend engineering around the following components: - **System Prompt**: Clear and straightforward language, with information hierarchy at "just right" height. Common pitfalls at two extremes: - Over-hardcoding: Writing complex, fragile if-else logic in prompts, with high long-term maintenance costs and fragility. - Too vague: Only providing macro goals and generalized guidance, lacking **specific signals** for expected output or assuming incorrect "shared context". It is recommended to organize prompts into sections (such as , , tool guidance, output description, etc.), separated by XML/Markdown. Regardless of format, the pursuit is the **"minimum necessary information set" that can fully outline expected behavior** ("minimum" does not equal "shortest"). First run with the best model on the minimum prompt, then add clear instructions and examples based on failure modes. - **Tools**: Tools define the contract between the agent and the information/action space, and must promote efficiency: they must return **token-friendly** information while encouraging efficient agent behavior. Tools should: - Have single responsibilities with low overlap, clear interface semantics; - Be robust to errors; - Have clear and unambiguous parameter descriptions, fully leveraging the model's strengths in expression and reasoning. A common failure mode is "bloated tool sets": fuzzy functional boundaries, making the decision of "which tool to use" itself ambiguous. **If human engineers can't tell which tool to use, don't expect agents to do better**. Carefully identifying a "Minimum Viable Tool Set (MVTS)" can often significantly improve stability and maintainability in long-term interactions. - **Few-shot Examples**: Always recommend providing examples, but don't recommend stuffing "all boundary conditions" into prompts. Please carefully select a set of **diverse and typical** examples that directly portray "expected behavior". For LLMs, **good examples are worth a thousand words**. The overall guiding principle is: **sufficient but compact information**. As shown in Figure 9.2, this is dynamic retrieval entering runtime.

Figure 9.2 Calibrating the system prompt

### 9.2.2 Context Retrieval and Agentic Search A concise definition: **Agent = LLM autonomously calling tools in a loop**. As the capabilities of underlying models increase, the autonomy level of agents can be improved: they can more independently explore complex problem spaces and recover from errors. Engineering practice is gradually transitioning from "one-time retrieval before inference (embedding retrieval)" to "**Just-in-time (JIT) context**". The latter no longer preloads all relevant data, but maintains **lightweight references** (file paths, storage queries, URLs, etc.), dynamically loading required data through tools at runtime. This allows the model to write targeted queries, cache necessary results, and analyze large volumes of data with commands like head/tail—without stuffing entire data blocks into context at once. Its cognitive pattern is closer to humans: we don't memorize all information, but use external indexes like file systems, inboxes, bookmarks to extract on demand. In addition to storage efficiency, **metadata of references** itself can help refine behavior: directory hierarchy, naming conventions, timestamps, etc., all implicitly convey "purpose and timeliness". For example, tests/test_utils.py and src/core/test_utils.py have different semantic implications. Allowing agents to autonomously navigate and retrieve also enables **progressive disclosure**: each interaction step generates new context, which in turn guides the next decision—file size hints at complexity, naming hints at purpose, timestamps hint at relevance. Agents can build understanding layer by layer, keeping only the "currently necessary subset" in working memory, and using "note-taking" for supplementary persistence, thereby maintaining focus rather than being "dragged down by comprehensiveness". The trade-off is: runtime exploration is often slower than pre-computed retrieval, and requires "opinionated" engineering design to ensure the model has the right tools and heuristics. Without guidance, agents may misuse tools, chase dead ends, or miss key information, causing context waste. In many scenarios, a **hybrid strategy** is more effective: preload a small amount of "high-value" context to ensure speed, then allow agents to continue autonomous exploration on demand. The choice of boundaries depends on task dynamics and timeliness requirements. In engineering, you can preload files like "project convention descriptions (such as README/guides)", while providing primitives like glob, grep, allowing agents to retrieve specific files just-in-time, thereby bypassing the sunk costs of outdated indexes and complex syntax trees. ### 9.2.3 Context Engineering for Long-Horizon Tasks Long-horizon tasks require agents to maintain coherence, context consistency, and goal orientation in action sequences that exceed the context window. For example, large codebase migrations, systematic research spanning hours. Expecting to infinitely increase the context window cannot cure the problems of "context pollution" and relevance degradation, so engineering methods directly facing these constraints are needed: **Compaction**, **Structured note-taking**, and **Sub-agent architectures**. - **Compaction** - Definition: When a conversation approaches the context limit, perform high-fidelity summarization and restart a new context window with the summary to maintain long-range coherence. - Practice: Have the model compress and retain architectural decisions, unresolved defects, implementation details, discarding repetitive tool outputs and noise; the new window carries the compressed summary + a few recent highly relevant artifacts (such as "recently accessed files"). - Tuning suggestions: First optimize **recall** (ensure no key information is missed), then optimize **precision** (remove redundant content); a safe "light-touch" compression is to clean up "tool calls and results in deep history". - **Structured note-taking** - Definition: Also called "agent memory". Agents write key information to **persistent storage outside the context** at fixed frequencies, pulling it back on demand in subsequent stages. - Value: Maintain persistent state and dependencies with extremely low context overhead. For example, maintaining TODO lists, project NOTES.md, indexes of key conclusions/dependencies/blockers, maintaining progress and consistency across dozens of tool calls and multiple context resets. - Note: Equally effective in non-coding scenarios (such as long-term strategic tasks, goal management and statistical counting in games/simulations). Combined with MemoryTool from Chapter 8, file-based/vector-based external memory can be easily implemented and retrieved at runtime. - **Sub-agent architectures** - Idea: The main agent is responsible for high-level planning and synthesis, while multiple specialized sub-agents each dig deep, call tools, and explore in "clean context windows", finally only returning **condensed summaries** (typically 1,000–2,000 tokens). - Benefits: Achieve separation of concerns. Complex search contexts remain internal to sub-agents, while the main agent focuses on integration and reasoning; suitable for complex research/analysis tasks requiring parallel exploration. - Experience: Public multi-agent research systems show that this pattern has significant advantages over single-agent baselines in complex research tasks. Method trade-offs can follow these rules of thumb: - **Compaction**: Suitable for tasks requiring long conversation continuity, emphasizing context "relay". - **Structured note-taking**: Suitable for iterative development and research with milestones/phased results. - **Sub-agent architectures**: Suitable for complex research and analysis that can benefit from parallel exploration. Even as model capabilities continue to improve, "maintaining coherence and focus in long interactions" remains a core challenge in building robust agents. Careful and systematic context engineering will maintain its key value in the long term. ## 9.3 Practice in Hello-Agents: ContextBuilder This section will detail the context engineering practice in the HelloAgents framework. We will gradually demonstrate how to build a production-grade context management system from design motivation, core data structures, implementation details to complete cases. The design philosophy of ContextBuilder is "simple and efficient", removing unnecessary complexity, uniformly selecting based on "relevance + recency" scores, conforming to the engineering orientation of Agent modularity and maintainability. ### 9.3.1 Design Motivation and Goals Before building ContextBuilder, we first need to clarify its design goals and core value. An excellent context management system should solve the following key problems: 1. **Unified Entry**: Abstract "Gather-Select-Structure-Compress" as a reusable pipeline, reducing repetitive template code in Agent implementations. This unified interface design allows developers to avoid repeatedly writing context management logic in each Agent. 2. **Stable Form**: Output a context template with a fixed skeleton, facilitating debugging, A/B testing, and evaluation. We adopted a sectioned template structure: - `[Role & Policies]`: Clarify the Agent's role positioning and behavioral guidelines - `[Task]`: The specific task currently to be completed - `[State]`: The Agent's current state and context information - `[Evidence]`: Evidence information retrieved from external knowledge bases - `[Context]`: Historical dialogue and related memories - `[Output]`: Expected output format and requirements 3. **Budget Guardian**: Retain high-value information as much as possible within the token budget, providing fallback compression strategies for over-limit contexts. This ensures that even in scenarios with huge amounts of information, the system can run stably. 4. **Minimum Rules**: Do not introduce classification dimensions such as source/priority to avoid complexity growth. Practice shows that a simple scoring mechanism based on relevance and recency is effective enough in most scenarios. ### 9.3.2 Core Data Structures The implementation of ContextBuilder relies on two core data structures that define the system's configuration and information units. (1) ContextPacket: Candidate Information Package ```python from dataclasses import dataclass from typing import Optional, Dict, Any from datetime import datetime @dataclass class ContextPacket: """Candidate information package Attributes: content: Information content timestamp: Timestamp token_count: Token count relevance_score: Relevance score (0.0-1.0) metadata: Optional metadata """ content: str timestamp: datetime token_count: int relevance_score: float = 0.5 metadata: Optional[Dict[str, Any]] = None def __post_init__(self): """Post-initialization processing""" if self.metadata is None: self.metadata = {} # Ensure relevance score is within valid range self.relevance_score = max(0.0, min(1.0, self.relevance_score)) ``` `ContextPacket` is the basic unit of information in the system. Each candidate information is encapsulated as a ContextPacket, containing core attributes such as content, timestamp, token count, and relevance score. This unified data structure simplifies subsequent selection and sorting logic. (2) ContextConfig: Configuration Management ```python @dataclass class ContextConfig: """Context building configuration Attributes: max_tokens: Maximum token count reserve_ratio: Ratio reserved for system instructions (0.0-1.0) min_relevance: Minimum relevance threshold enable_compression: Whether to enable compression recency_weight: Recency weight (0.0-1.0) relevance_weight: Relevance weight (0.0-1.0) """ max_tokens: int = 3000 reserve_ratio: float = 0.2 min_relevance: float = 0.1 enable_compression: bool = True recency_weight: float = 0.3 relevance_weight: float = 0.7 def __post_init__(self): """Validate configuration parameters""" assert 0.0 <= self.reserve_ratio <= 1.0, "reserve_ratio must be in [0, 1] range" assert 0.0 <= self.min_relevance <= 1.0, "min_relevance must be in [0, 1] range" assert abs(self.recency_weight + self.relevance_weight - 1.0) < 1e-6, \ "recency_weight + relevance_weight must equal 1.0" ``` `ContextConfig` encapsulates all configurable parameters, making system behavior flexibly adjustable. Particularly noteworthy is the `reserve_ratio` parameter, which ensures that key information such as system instructions always has sufficient space and will not be squeezed out by other information. ### 9.3.3 GSSC Pipeline Detailed Explanation The core of ContextBuilder is the GSSC (Gather-Select-Structure-Compress) pipeline, which decomposes the context building process into four clear stages. Let's dive into the implementation details of each stage. (1) Gather: Multi-source Information Collection The first stage is to collect candidate information from multiple sources. The key to this stage is fault tolerance and flexibility. ```python def _gather( self, user_query: str, conversation_history: Optional[List[Message]] = None, system_instructions: Optional[str] = None, custom_packets: Optional[List[ContextPacket]] = None ) -> List[ContextPacket]: """Collect all candidate information Args: user_query: User query conversation_history: Conversation history system_instructions: System instructions custom_packets: Custom information packages Returns: List[ContextPacket]: Candidate information list """ packets = [] # 1. Add system instructions (highest priority, not scored) if system_instructions: packets.append(ContextPacket( content=system_instructions, timestamp=datetime.now(), token_count=self._count_tokens(system_instructions), relevance_score=1.0, # System instructions always retained metadata={"type": "system_instruction", "priority": "high"} )) # 2. Retrieve relevant memories from memory system if self.memory_tool: try: memory_results = self.memory_tool.execute( "search", query=user_query, limit=10, min_importance=0.3 ) # Parse memory results and convert to ContextPacket memory_packets = self._parse_memory_results(memory_results, user_query) packets.extend(memory_packets) except Exception as e: print(f"[WARNING] Memory retrieval failed: {e}") # 3. Retrieve relevant knowledge from RAG system if self.rag_tool: try: rag_results = self.rag_tool.execute( "search", query=user_query, limit=5, min_score=0.3 ) # Parse RAG results and convert to ContextPacket rag_packets = self._parse_rag_results(rag_results, user_query) packets.extend(rag_packets) except Exception as e: print(f"[WARNING] RAG retrieval failed: {e}") # 4. Add conversation history (only keep recent N entries) if conversation_history: recent_history = conversation_history[-5:] # Default keep recent 5 entries for msg in recent_history: packets.append(ContextPacket( content=f"{msg.role}: {msg.content}", timestamp=msg.timestamp if hasattr(msg, 'timestamp') else datetime.now(), token_count=self._count_tokens(msg.content), relevance_score=0.6, # Base relevance of historical messages metadata={"type": "conversation_history", "role": msg.role} )) # 5. Add custom information packages if custom_packets: packets.extend(custom_packets) print(f"[ContextBuilder] Collected {len(packets)} candidate information packages") return packets ``` This implementation demonstrates several important design considerations: - **Fault Tolerance Mechanism**: Each external data source call is wrapped in try-except, ensuring that failure of a single source does not affect the overall process - **Priority Handling**: System instructions are marked as high priority, ensuring they are always retained - **History Limitation**: Conversation history only keeps the most recent entries, avoiding the context window being occupied by historical information (2) Select: Intelligent Information Selection The second stage is to score and select candidate information based on relevance and recency. This is the core of the entire pipeline and directly determines the quality of the final context. ```python def _select( self, packets: List[ContextPacket], user_query: str, available_tokens: int ) -> List[ContextPacket]: """Select the most relevant information packages Args: packets: Candidate information package list user_query: User query (for calculating relevance) available_tokens: Available token count Returns: List[ContextPacket]: Selected information package list """ # 1. Separate system instructions and other information system_packets = [p for p in packets if p.metadata.get("type") == "system_instruction"] other_packets = [p for p in packets if p.metadata.get("type") != "system_instruction"] # 2. Calculate tokens occupied by system instructions system_tokens = sum(p.token_count for p in system_packets) remaining_tokens = available_tokens - system_tokens if remaining_tokens <= 0: print("[WARNING] System instructions have occupied all token budget") return system_packets # 3. Calculate comprehensive scores for other information scored_packets = [] for packet in other_packets: # Calculate relevance score (if not yet calculated) if packet.relevance_score == 0.5: # Default value, needs recalculation relevance = self._calculate_relevance(packet.content, user_query) packet.relevance_score = relevance # Calculate recency score recency = self._calculate_recency(packet.timestamp) # Combined score = relevance weight × relevance + recency weight × recency combined_score = ( self.config.relevance_weight * packet.relevance_score + self.config.recency_weight * recency ) # Filter information below minimum relevance threshold if packet.relevance_score >= self.config.min_relevance: scored_packets.append((combined_score, packet)) # 4. Sort by score in descending order scored_packets.sort(key=lambda x: x[0], reverse=True) # 5. Greedy selection: fill from high to low score until token limit is reached selected = system_packets.copy() current_tokens = system_tokens for score, packet in scored_packets: if current_tokens + packet.token_count <= available_tokens: selected.append(packet) current_tokens += packet.token_count else: # Token budget is full, stop selection break print(f"[ContextBuilder] Selected {len(selected)} information packages, total {current_tokens} tokens") return selected def _calculate_relevance(self, content: str, query: str) -> float: """Calculate relevance between content and query Uses simple keyword overlap algorithm. In production, can be replaced with vector similarity calculation. Args: content: Content text query: Query text Returns: float: Relevance score (0.0-1.0) """ # Tokenization (simple implementation, can use more complex tokenizers) content_words = set(content.lower().split()) query_words = set(query.lower().split()) if not query_words: return 0.0 # Jaccard similarity intersection = content_words & query_words union = content_words | query_words return len(intersection) / len(union) if union else 0.0 def _calculate_recency(self, timestamp: datetime) -> float: """Calculate temporal recency score Uses exponential decay model, maintains high score within 24 hours, then gradually decays. Args: timestamp: Information timestamp Returns: float: Recency score (0.0-1.0) """ import math age_hours = (datetime.now() - timestamp).total_seconds() / 3600 # Exponential decay: maintain high score within 24 hours, then gradually decay decay_factor = 0.1 # Decay coefficient recency_score = math.exp(-decay_factor * age_hours / 24) return max(0.1, min(1.0, recency_score)) # Limit to [0.1, 1.0] range ``` The core algorithm of the selection stage embodies several important engineering considerations: - **Scoring Mechanism**: Uses weighted combination of relevance and recency, with configurable weights - **Greedy Algorithm**: Fills from high to low score, ensuring selection of the most valuable information within limited budget - **Filtering Mechanism**: Filters low-quality information through the `min_relevance` parameter (3) Structure: Structured Output The third stage is to organize selected information into a structured context template. ```python def _structure(self, selected_packets: List[ContextPacket], user_query: str) -> str: """Organize selected information packages into structured context template Args: selected_packets: Selected information package list user_query: User query Returns: str: Structured context string """ # Group by type system_instructions = [] evidence = [] context = [] for packet in selected_packets: packet_type = packet.metadata.get("type", "general") if packet_type == "system_instruction": system_instructions.append(packet.content) elif packet_type in ["rag_result", "knowledge"]: evidence.append(packet.content) else: context.append(packet.content) # Build structured template sections = [] # [Role & Policies] if system_instructions: sections.append("[Role & Policies]\n" + "\n".join(system_instructions)) # [Task] sections.append(f"[Task]\n{user_query}") # [Evidence] if evidence: sections.append("[Evidence]\n" + "\n---\n".join(evidence)) # [Context] if context: sections.append("[Context]\n" + "\n".join(context)) # [Output] sections.append("[Output]\nPlease provide accurate, evidence-based answers based on the above information.") return "\n\n".join(sections) ``` The structuring stage organizes scattered information packages into clear sections. This design has several advantages: - **Readability**: Clear sections make it easier for both humans and models to understand the context structure - **Debuggability**: Problem localization is easier, can quickly identify which area has problematic information - **Extensibility**: Adding new information sources only requires creating new sections (4) Compress: Fallback Compression The fourth stage is to compress over-limit contexts. ```python def _compress(self, context: str, max_tokens: int) -> str: """Compress over-limit context Args: context: Original context max_tokens: Maximum token limit Returns: str: Compressed context """ current_tokens = self._count_tokens(context) if current_tokens <= max_tokens: return context # No compression needed print(f"[ContextBuilder] Context over limit ({current_tokens} > {max_tokens}), executing compression") # Section compression: maintain structural integrity sections = context.split("\n\n") compressed_sections = [] current_total = 0 for section in sections: section_tokens = self._count_tokens(section) if current_total + section_tokens <= max_tokens: # Fully retain compressed_sections.append(section) current_total += section_tokens else: # Partially retain remaining_tokens = max_tokens - current_total if remaining_tokens > 50: # Retain at least 50 tokens # Simple truncation (can use LLM summarization in production) truncated = self._truncate_text(section, remaining_tokens) compressed_sections.append(truncated + "\n[... Content compressed ...]") break compressed_context = "\n\n".join(compressed_sections) final_tokens = self._count_tokens(compressed_context) print(f"[ContextBuilder] Compression complete: {current_tokens} -> {final_tokens} tokens") return compressed_context def _truncate_text(self, text: str, max_tokens: int) -> str: """Truncate text to specified token count Args: text: Original text max_tokens: Maximum token count Returns: str: Truncated text """ # Simple implementation: estimate by character ratio # Should use precise tokenizer in production char_per_token = len(text) / self._count_tokens(text) if self._count_tokens(text) > 0 else 4 max_chars = int(max_tokens * char_per_token) return text[:max_chars] def _count_tokens(self, text: str) -> int: """Estimate token count of text Args: text: Text content Returns: int: Token count """ # Simple estimation: Chinese 1 char ≈ 1 token, English 1 word ≈ 1.3 tokens # Should use actual tokenizer in production chinese_chars = sum(1 for ch in text if '\u4e00' <= ch <= '\u9fff') english_words = len([w for w in text.split() if w]) return int(chinese_chars + english_words * 1.3) ``` The design of the compression stage embodies the principle of "maintaining structural integrity". Even when the token budget is tight, it tries to retain key information from each section. ### 9.3.4 Complete Usage Example Now let's demonstrate how to use ContextBuilder in actual projects through a complete example. (1) Basic Usage ```python from hello_agents.context import ContextBuilder, ContextConfig from hello_agents.tools import MemoryTool, RAGTool from hello_agents.core.message import Message from datetime import datetime # 1. Initialize tools memory_tool = MemoryTool(user_id="user123") rag_tool = RAGTool(knowledge_base_path="./knowledge_base") # 2. Create ContextBuilder config = ContextConfig( max_tokens=3000, reserve_ratio=0.2, min_relevance=0.2, enable_compression=True ) builder = ContextBuilder( memory_tool=memory_tool, rag_tool=rag_tool, config=config ) # 3. Prepare conversation history conversation_history = [ Message(content="I'm developing a data analysis tool", role="user", timestamp=datetime.now()), Message(content="Great! Data analysis tools usually need to handle large amounts of data. What tech stack do you plan to use?", role="assistant", timestamp=datetime.now()), Message(content="I plan to use Python and Pandas, and have completed the CSV reading module", role="user", timestamp=datetime.now()), Message(content="Good choice! Pandas is very powerful for data processing. Next you may need to consider data cleaning and transformation.", role="assistant", timestamp=datetime.now()), ] # 4. Add some memories memory_tool.execute( "add", content="User is developing a data analysis tool using Python and Pandas", memory_type="semantic", importance=0.8 ) memory_tool.execute( "add", content="Completed development of CSV reading module", memory_type="episodic", importance=0.7 ) # 5. Build context context = builder.build( user_query="How to optimize Pandas memory usage?", conversation_history=conversation_history, system_instructions="You are a senior Python data engineering consultant. Your answers need to: 1) Provide specific actionable advice 2) Explain technical principles 3) Provide code examples" ) print("=" * 80) print("Built context:") print("=" * 80) print(context) print("=" * 80) ``` (2) Running Effect Demonstration After running the above code, you will see the following structured context output: ``` ================================================================================ Built context: ================================================================================ [Role & Policies] You are a senior Python data engineering consultant. Your answers need to: 1) Provide specific actionable advice 2) Explain technical principles 3) Provide code examples [Task] How to optimize Pandas memory usage? [Evidence] Core strategies for Pandas memory optimization include: 1. Use appropriate data types (such as category instead of object) 2. Read large files in chunks 3. Use chunksize parameter --- Data type optimization can significantly reduce memory usage. For example, downgrading int64 to int32 can save 50% memory. [Context] user: I'm developing a data analysis tool assistant: Great! Data analysis tools usually need to handle large amounts of data. What tech stack do you plan to use? user: I plan to use Python and Pandas, and have completed the CSV reading module assistant: Good choice! Pandas is very powerful for data processing. Next you may need to consider data cleaning and transformation. Memory: User is developing a data analysis tool using Python and Pandas Memory: Completed development of CSV reading module [Output] Please provide accurate, evidence-based answers based on the above information. ================================================================================ ``` This structured context contains all necessary information: - **[Role & Policies]**: Clarifies the AI's role and answer requirements - **[Task]**: Clearly expresses the user's question - **[Evidence]**: Relevant knowledge retrieved from the RAG system - **[Context]**: Conversation history and related memories, providing sufficient background information - **[Output]**: Guides the LLM on how to organize the answer (3) Integration with Agent Finally, let's demonstrate how to integrate ContextBuilder into an Agent: ```python from hello_agents import SimpleAgent, HelloAgentsLLM, ToolRegistry from hello_agents.context import ContextBuilder, ContextConfig from hello_agents.tools import MemoryTool, RAGTool class ContextAwareAgent(SimpleAgent): """Agent with context awareness capability""" def __init__(self, name: str, llm: HelloAgentsLLM, **kwargs): super().__init__(name=name, llm=llm, system_prompt=kwargs.get("system_prompt", "")) # Initialize context builder self.memory_tool = MemoryTool(user_id=kwargs.get("user_id", "default")) self.rag_tool = RAGTool(knowledge_base_path=kwargs.get("knowledge_base_path", "./kb")) self.context_builder = ContextBuilder( memory_tool=self.memory_tool, rag_tool=self.rag_tool, config=ContextConfig(max_tokens=4000) ) self.conversation_history = [] def run(self, user_input: str) -> str: """Run Agent, automatically build optimized context""" # 1. Use ContextBuilder to build optimized context optimized_context = self.context_builder.build( user_query=user_input, conversation_history=self.conversation_history, system_instructions=self.system_prompt ) # 2. Call LLM with optimized context messages = [ {"role": "system", "content": optimized_context}, {"role": "user", "content": user_input} ] response = self.llm.invoke(messages) # 3. Update conversation history from hello_agents.core.message import Message from datetime import datetime self.conversation_history.append( Message(content=user_input, role="user", timestamp=datetime.now()) ) self.conversation_history.append( Message(content=response, role="assistant", timestamp=datetime.now()) ) # 4. Record important interactions to memory system self.memory_tool.execute( "add", content=f"Q: {user_input}\nA: {response[:200]}...", # Summary memory_type="episodic", importance=0.6 ) return response # Usage example agent = ContextAwareAgent( name="Data Analysis Consultant", llm=HelloAgentsLLM(), system_prompt="You are a senior Python data engineering consultant.", user_id="user123", knowledge_base_path="./data_science_kb" ) response = agent.run("How to optimize Pandas memory usage?") print(response) ``` Through this approach, ContextBuilder becomes the "context management brain" of the Agent, automatically handling information collection, filtering, and organization, allowing the Agent to always reason and generate under optimal context. ### 9.3.5 Best Practices and Optimization Recommendations When actually applying ContextBuilder, the following best practices are worth noting: 1. **Dynamically adjust token budget**: Dynamically adjust `max_tokens` based on task complexity, use smaller budgets for simple tasks, increase budgets for complex tasks. 2. **Relevance calculation optimization**: In production environments, replace simple keyword overlap with vector similarity calculation to improve retrieval quality. 3. **Caching mechanism**: For unchanging system instructions and knowledge base content, implement caching mechanisms to avoid repeated calculations. 4. **Monitoring and logging**: Record statistical information for each context build (number of selected information, token usage rate, etc.) for subsequent optimization. 5. **A/B testing**: For key parameters (such as relevance weight, recency weight), find optimal configuration through A/B testing. ## 9.4 NoteTool: Structured Notes NoteTool is a structured external memory component provided for "long-horizon tasks". It uses Markdown files as carriers, with YAML front matter in the header to record key information, and the body to record status, conclusions, blockers, and action items. This design combines human readability, version control friendliness, and ease of re-injecting into context, making it an important tool for building long-horizon agents. ### 9.4.1 Design Philosophy and Application Scenarios Before diving into implementation details, let's first understand the design philosophy and typical application scenarios of NoteTool. (1) Why do we need NoteTool? In Chapter 8, we introduced MemoryTool, which provides powerful memory management capabilities. However, MemoryTool mainly focuses on **conversational memory**—short-term working memory, episodic memory, and semantic memory. For **project-based tasks** that require long-term tracking and structured management, we need a lighter, more human-friendly recording method. NoteTool fills this gap by providing: - **Structured recording**: Uses Markdown + YAML format, suitable for both machine parsing and human reading and editing - **Version friendly**: Plain text format, naturally supports version control systems like Git - **Low overhead**: No complex database operations required, suitable for lightweight state tracking - **Flexible categorization**: Flexibly organize notes through `type` and `tags`, supporting multi-dimensional retrieval (2) Typical Application Scenarios NoteTool is particularly suitable for the following scenarios: **Scenario 1: Long-term Project Tracking** Imagine an agent is assisting with a large codebase refactoring task, which may take days or even weeks. NoteTool can record: - `task_state`: Current stage task status and progress - `conclusion`: Key conclusions after each stage ends - `blocker`: Problems and blocking points encountered - `action`: Next action plan ```python # Record task status notes.run({ "action": "create", "title": "Refactoring Project - Phase 1", "content": "Completed refactoring of data model layer, test coverage reached 85%. Next will refactor business logic layer.", "note_type": "task_state", "tags": ["refactoring", "phase1"] }) # Record blocker notes.run({ "action": "create", "title": "Dependency Conflict Issue", "content": "Found some third-party library versions incompatible, need to resolve. Impact scope: 3 modules in business logic layer.", "note_type": "blocker", "tags": ["dependency", "urgent"] }) ``` **Scenario 2: Research Task Management** An intelligent research assistant conducting literature review can use NoteTool to record: - Core viewpoints of each paper (`conclusion`) - Topics to be investigated in depth (`action`) - Important references (`reference`) **Scenario 3: Cooperation with ContextBuilder** Before each round of dialogue, the Agent can retrieve relevant notes through `search` or `list` operations and inject them into the context: ```python # In Agent's run method def run(self, user_input: str) -> str: # 1. Retrieve relevant notes relevant_notes = self.note_tool.run({ "action": "search", "query": user_input, "limit": 3 }) # 2. Convert note content to ContextPacket note_packets = [] for note in relevant_notes: note_packets.append(ContextPacket( content=note['content'], timestamp=note['updated_at'], token_count=self._count_tokens(note['content']), relevance_score=0.7, metadata={"type": "note", "note_type": note['type']} )) # 3. Pass notes when building context context = self.context_builder.build( user_query=user_input, custom_packets=note_packets, ... ) ``` ### 9.4.2 Storage Format Detailed Explanation NoteTool adopts a hybrid format of Markdown + YAML, which balances structure and readability. (1) Note File Format Each note is an independent `.md` file with the following format: ```markdown --- id: note_20250119_153000_0 title: Project Progress - Phase 1 type: task_state tags: [refactoring, phase1, backend] created_at: 2025-01-19T15:30:00 updated_at: 2025-01-19T15:30:00 --- # Project Progress - Phase 1 ## Completion Status Completed refactoring of data model layer, main changes include: 1. Unified entity class naming conventions 2. Introduced type hints to improve code maintainability 3. Optimized database query performance ## Test Coverage - Unit test coverage: 85% - Integration test coverage: 70% ## Next Steps 1. Refactor business logic layer 2. Resolve dependency conflict issues 3. Increase integration test coverage to 85% ``` Advantages of this format: - **YAML metadata**: Machine-parsable, supports precise field extraction and retrieval - **Markdown body**: Human-readable, supports rich formatting (headings, lists, code blocks, etc.) - **Filename as ID**: Simplifies management, each note's filename is its unique identifier (2) Index File NoteTool maintains a `notes_index.json` file for quick retrieval and management of notes: ```json { "note_20250119_153000_0": { "id": "note_20250119_153000_0", "title": "Project Progress - Phase 1", "type": "task_state", "tags": ["refactoring", "phase1", "backend"], "created_at": "2025-01-19T15:30:00", "updated_at": "2025-01-19T15:30:00", "file_path": "./notes/note_20250119_153000_0.md" } } ``` The role of this index file: - **Quick retrieval**: No need to open each file, search directly from the index - **Metadata management**: Centrally manage metadata for all notes - **Integrity check**: Can detect missing or corrupted files ### 9.4.3 Core Operations Detailed Explanation NoteTool provides seven core operations covering the complete lifecycle management of notes. (1) create: Create Note ```python def _create_note( self, title: str, content: str, note_type: str = "general", tags: Optional[List[str]] = None ) -> str: """Create note Args: title: Note title content: Note content (Markdown format) note_type: Note type (task_state/conclusion/blocker/action/reference/general) tags: Tag list Returns: str: Note ID """ from datetime import datetime # 1. Generate unique ID timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') note_id = f"note_{timestamp}_{len(self.index)}" # 2. Build metadata metadata = { "id": note_id, "title": title, "type": note_type, "tags": tags or [], "created_at": datetime.now().isoformat(), "updated_at": datetime.now().isoformat() } # 3. Build complete Markdown file content md_content = self._build_markdown(metadata, content) # 4. Save to file file_path = os.path.join(self.workspace, f"{note_id}.md") with open(file_path, 'w', encoding='utf-8') as f: f.write(md_content) # 5. Update index metadata["file_path"] = file_path self.index[note_id] = metadata self._save_index() return note_id def _build_markdown(self, metadata: Dict, content: str) -> str: """Build Markdown file content (YAML + body)""" import yaml # YAML front matter yaml_header = yaml.dump(metadata, allow_unicode=True, sort_keys=False) # Combined format return f"---\n{yaml_header}---\n\n{content}" ``` Usage example: ```python from hello_agents.tools import NoteTool notes = NoteTool(workspace="./project_notes") note_id = notes.run({ "action": "create", "title": "Refactoring Project - Phase 1", "content": """## Completion Status Completed refactoring of data model layer, test coverage reached 85%. ## Next Steps Refactor business logic layer""", "note_type": "task_state", "tags": ["refactoring", "phase1"] }) print(f"✅ Note created successfully, ID: {note_id}") ``` (2) read: Read Note ```python def _read_note(self, note_id: str) -> Dict: """Read note content Args: note_id: Note ID Returns: Dict: Dictionary containing metadata and content """ if note_id not in self.index: raise ValueError(f"Note does not exist: {note_id}") file_path = self.index[note_id]["file_path"] # Read file with open(file_path, 'r', encoding='utf-8') as f: raw_content = f.read() # Parse YAML metadata and Markdown body metadata, content = self._parse_markdown(raw_content) return { "metadata": metadata, "content": content } def _parse_markdown(self, raw_content: str) -> Tuple[Dict, str]: """Parse Markdown file (separate YAML and body)""" import yaml # Find YAML delimiters parts = raw_content.split('---\n', 2) if len(parts) >= 3: # Has YAML front matter yaml_str = parts[1] content = parts[2].strip() metadata = yaml.safe_load(yaml_str) else: # No metadata, all as body metadata = {} content = raw_content.strip() return metadata, content ``` (3) update: Update Note ```python def _update_note( self, note_id: str, title: Optional[str] = None, content: Optional[str] = None, note_type: Optional[str] = None, tags: Optional[List[str]] = None ) -> str: """Update note Args: note_id: Note ID title: New title (optional) content: New content (optional) note_type: New type (optional) tags: New tags (optional) Returns: str: Operation result message """ if note_id not in self.index: raise ValueError(f"Note does not exist: {note_id}") # 1. Read existing note note = self._read_note(note_id) metadata = note["metadata"] old_content = note["content"] # 2. Update fields if title: metadata["title"] = title if note_type: metadata["type"] = note_type if tags is not None: metadata["tags"] = tags if content is not None: old_content = content # Update timestamp from datetime import datetime metadata["updated_at"] = datetime.now().isoformat() # 3. Rebuild and save md_content = self._build_markdown(metadata, old_content) file_path = metadata["file_path"] with open(file_path, 'w', encoding='utf-8') as f: f.write(md_content) # 4. Update index self.index[note_id] = metadata self._save_index() return f"✅ Note updated: {metadata['title']}" ``` (4) search: Search Notes ```python def _search_notes( self, query: str, limit: int = 10, note_type: Optional[str] = None, tags: Optional[List[str]] = None ) -> List[Dict]: """Search notes Args: query: Search keyword limit: Return quantity limit note_type: Filter by type (optional) tags: Filter by tags (optional) Returns: List[Dict]: List of matching notes """ results = [] query_lower = query.lower() for note_id, metadata in self.index.items(): # Type filter if note_type and metadata.get("type") != note_type: continue # Tag filter if tags: note_tags = set(metadata.get("tags", [])) if not note_tags.intersection(tags): continue # Read note content try: note = self._read_note(note_id) content = note["content"] title = metadata.get("title", "") # Search in title and content if query_lower in title.lower() or query_lower in content.lower(): results.append({ "note_id": note_id, "title": title, "type": metadata.get("type"), "tags": metadata.get("tags", []), "content": content, "updated_at": metadata.get("updated_at") }) except Exception as e: print(f"[WARNING] Failed to read note {note_id}: {e}") continue # Sort by update time results.sort(key=lambda x: x["updated_at"], reverse=True) return results[:limit] ``` (5) list: List Notes ```python def _list_notes( self, note_type: Optional[str] = None, tags: Optional[List[str]] = None, limit: int = 20 ) -> List[Dict]: """List notes (in reverse chronological order by update time) Args: note_type: Filter by type (optional) tags: Filter by tags (optional) limit: Return quantity limit Returns: List[Dict]: List of note metadata """ results = [] for note_id, metadata in self.index.items(): # Type filter if note_type and metadata.get("type") != note_type: continue # Tag filter if tags: note_tags = set(metadata.get("tags", [])) if not note_tags.intersection(tags): continue results.append(metadata) # Sort by update time results.sort(key=lambda x: x.get("updated_at", ""), reverse=True) return results[:limit] ``` (6) summary: Note Summary ```python def _summary(self) -> Dict[str, Any]: """Generate note summary statistics Returns: Dict: Statistical information """ total_count = len(self.index) # Count by type type_counts = {} for metadata in self.index.values(): note_type = metadata.get("type", "general") type_counts[note_type] = type_counts.get(note_type, 0) + 1 # Recently updated notes recent_notes = sorted( self.index.values(), key=lambda x: x.get("updated_at", ""), reverse=True )[:5] return { "total_notes": total_count, "type_distribution": type_counts, "recent_notes": [ { "id": note["id"], "title": note.get("title", ""), "type": note.get("type"), "updated_at": note.get("updated_at") } for note in recent_notes ] } ``` (7) delete: Delete Note ```python def _delete_note(self, note_id: str) -> str: """Delete note Args: note_id: Note ID Returns: str: Operation result message """ if note_id not in self.index: raise ValueError(f"Note does not exist: {note_id}") # 1. Delete file file_path = self.index[note_id]["file_path"] if os.path.exists(file_path): os.remove(file_path) # 2. Remove from index title = self.index[note_id].get("title", note_id) del self.index[note_id] self._save_index() return f"✅ Note deleted: {title}" ``` ### 9.4.4 Deep Integration with ContextBuilder The true power of NoteTool lies in its combined use with ContextBuilder. Let's demonstrate this integration through a complete case study. (1) Scenario Setup Suppose we are building a long-term project assistant that needs to: 1. Record phased progress of the project 2. Track pending issues 3. Automatically review relevant notes during each conversation 4. Provide coherent recommendations based on historical notes (2) Implementation Example ```python from hello_agents import SimpleAgent, HelloAgentsLLM from hello_agents.context import ContextBuilder, ContextConfig, ContextPacket from hello_agents.tools import MemoryTool, RAGTool, NoteTool from datetime import datetime class ProjectAssistant(SimpleAgent): """Long-term project assistant, integrating NoteTool and ContextBuilder""" def __init__(self, name: str, project_name: str, **kwargs): super().__init__(name=name, llm=HelloAgentsLLM(), **kwargs) self.project_name = project_name # Initialize tools self.memory_tool = MemoryTool(user_id=project_name) self.rag_tool = RAGTool(knowledge_base_path=f"./{project_name}_kb") self.note_tool = NoteTool(workspace=f"./{project_name}_notes") # Initialize context builder self.context_builder = ContextBuilder( memory_tool=self.memory_tool, rag_tool=self.rag_tool, config=ContextConfig(max_tokens=4000) ) self.conversation_history = [] def run(self, user_input: str, note_as_action: bool = False) -> str: """Run assistant, automatically integrate notes""" # 1. Retrieve relevant notes from NoteTool relevant_notes = self._retrieve_relevant_notes(user_input) # 2. Convert notes to ContextPacket note_packets = self._notes_to_packets(relevant_notes) # 3. Build optimized context context = self.context_builder.build( user_query=user_input, conversation_history=self.conversation_history, system_instructions=self._build_system_instructions(), custom_packets=note_packets ) # 4. Call LLM response = self.llm.invoke(context) # 5. If needed, record interaction as note if note_as_action: self._save_as_note(user_input, response) # 6. Update conversation history self._update_history(user_input, response) return response def _retrieve_relevant_notes(self, query: str, limit: int = 3) -> List[Dict]: """Retrieve relevant notes""" try: # Prioritize retrieving blocker and action type notes blockers = self.note_tool.run({ "action": "list", "note_type": "blocker", "limit": 2 }) # General search search_results = self.note_tool.run({ "action": "search", "query": query, "limit": limit }) # Merge and deduplicate all_notes = {note['note_id']: note for note in blockers + search_results} return list(all_notes.values())[:limit] except Exception as e: print(f"[WARNING] Note retrieval failed: {e}") return [] def _notes_to_packets(self, notes: List[Dict]) -> List[ContextPacket]: """Convert notes to context packets""" packets = [] for note in notes: content = f"[Note: {note['title']}]\n{note['content']}" packets.append(ContextPacket( content=content, timestamp=datetime.fromisoformat(note['updated_at']), token_count=len(content) // 4, # Simple estimation relevance_score=0.75, # Notes have high relevance metadata={ "type": "note", "note_type": note['type'], "note_id": note['note_id'] } )) return packets def _save_as_note(self, user_input: str, response: str): """Save interaction as note""" try: # Determine what type of note to save if "problem" in user_input.lower() or "blocker" in user_input.lower(): note_type = "blocker" elif "plan" in user_input.lower() or "next" in user_input.lower(): note_type = "action" else: note_type = "conclusion" self.note_tool.run({ "action": "create", "title": f"{user_input[:30]}...", "content": f"## Question\n{user_input}\n\n## Analysis\n{response}", "note_type": note_type, "tags": [self.project_name, "auto_generated"] }) except Exception as e: print(f"[WARNING] Failed to save note: {e}") def _build_system_instructions(self) -> str: """Build system instructions""" return f"""You are a long-term assistant for the {self.project_name} project. Your responsibilities: 1. Provide coherent recommendations based on historical notes 2. Track project progress and pending issues 3. Reference relevant historical notes when answering 4. Provide specific, actionable next-step recommendations Notes: - Prioritize issues marked as blockers - Indicate source of basis in recommendations (notes, memory, or knowledge base) - Maintain awareness of overall project progress""" def _update_history(self, user_input: str, response: str): """Update conversation history""" from hello_agents.core.message import Message self.conversation_history.append( Message(content=user_input, role="user", timestamp=datetime.now()) ) self.conversation_history.append( Message(content=response, role="assistant", timestamp=datetime.now()) ) # Limit history length if len(self.conversation_history) > 10: self.conversation_history = self.conversation_history[-10:] # Usage example assistant = ProjectAssistant( name="Project Assistant", project_name="data_pipeline_refactoring" ) # First interaction: Record project status response = assistant.run( "We have completed refactoring of the data model layer, test coverage reached 85%. Next plan is to refactor the business logic layer.", note_as_action=True ) # Second interaction: Raise issue response = assistant.run( "When refactoring the business logic layer, I encountered dependency version conflict issues. How should I resolve this?" ) # View note summary summary = assistant.note_tool.run({"action": "summary"}) print(summary) ``` (3) Running Effect Demonstration ```bash [ContextBuilder] Collected 8 candidate information packages [ContextBuilder] Selected 7 information packages, total 3500 tokens ✅ Assistant answer: I noticed this issue was mentioned in your previously recorded notes. According to the note [Refactoring Project - Phase 1], your current test coverage has reached 85%, which is a good foundation. Regarding the dependency version conflict issue, I recommend: 1. **Use virtual environment isolation**: Create an independent virtual environment for the business logic layer to avoid dependency conflicts with other modules 2. **Lock versions**: Explicitly specify exact versions of all dependencies in requirements.txt 3. **Use pipdeptree**: Analyze the dependency tree to find the root cause of conflicts I will mark this issue as a blocker and recommend prioritizing its resolution. [Source: Note note_20250119_153000_0, Project knowledge base] --- 📋 Note summary: { "total_notes": 2, "type_distribution": { "action": 1, "blocker": 1 }, "recent_notes": [ { "id": "note_20250119_154500_1", "title": "When refactoring the business logic layer, I encountered dependency version conflict issues...", "type": "blocker", "updated_at": "2025-01-19T15:45:00" }, { "id": "note_20250119_153000_0", "title": "We have completed refactoring of the data model layer...", "type": "action", "updated_at": "2025-01-19T15:30:00" } ] } ``` ### 9.4.5 Best Practices When actually using NoteTool, the following best practices can help you build more powerful long-horizon agents: 1. **Reasonable note classification**: - `task_state`: Record phased progress and status - `conclusion`: Record important conclusions and findings - `blocker`: Record blocking issues, highest priority - `action`: Record next action plans - `reference`: Record important reference materials 2. **Regular cleanup and archiving**: - For resolved blockers, update to conclusion - For outdated actions, delete or update promptly - Use tags for version management, such as `["v1.0", "completed"]` 3. **Cooperation with ContextBuilder**: - Retrieve relevant notes before each round of dialogue - Set different relevance scores based on note type (blocker > action > conclusion) - Limit number of notes to avoid context overload 4. **Human-machine collaboration**: - Notes are in human-readable Markdown format, supporting manual editing - Use Git for version control to track note evolution - At key stages, manually review notes generated by Agent 5. **Automated workflow**: - Regularly generate note summary reports - Automatically generate project progress documents based on notes - Synchronize note content to other systems (such as Notion, Confluence) ## 9.5 TerminalTool: Instant File System Access In previous chapters, we introduced MemoryTool and RAGTool, which provide conversational memory and knowledge retrieval capabilities respectively. However, in many practical scenarios, agents need **instant access and exploration of the file system**—viewing log files, analyzing codebase structure, retrieving configuration files, etc. This is where TerminalTool comes in. TerminalTool provides agents with **secure command-line execution capability**, supporting common file system and text processing commands, while ensuring system security through multi-layer security mechanisms. This design implements the "Just-in-time (JIT) context" concept mentioned in Section 9.2.2—agents don't need to preload all files, but explore and retrieve on demand. ### 9.5.1 Design Philosophy and Security Mechanisms (1) Why do we need TerminalTool? When building long-horizon agents, we often encounter the following scenarios: **Scenario 1: Codebase Exploration** A development assistant needs to help users understand the structure of a large codebase: ```python # Traditional approach: Pre-index all files (high cost, may be outdated) rag_tool.add_document("./project/**/*.py") # Time-consuming, occupies large storage # TerminalTool approach: Instant exploration terminal.run({"command": "find . -name '*.py' -type f"}) # Fast, real-time terminal.run({"command": "grep -r 'class UserService' ."}) # Precise location terminal.run({"command": "head -n 50 src/services/user.py"}) # View on demand ``` **Scenario 2: Log File Analysis** An operations assistant needs to analyze application logs: ```python # Check log file size terminal.run({"command": "ls -lh /var/log/app.log"}) # View latest error logs terminal.run({"command": "tail -n 100 /var/log/app.log | grep ERROR"}) # Count error type distribution terminal.run({"command": "grep ERROR /var/log/app.log | cut -d':' -f3 | sort | uniq -c"}) ``` **Scenario 3: Data File Preview** A data analysis assistant needs to quickly understand the structure of data files: ```python # View first few lines of CSV file terminal.run({"command": "head -n 5 data/sales.csv"}) # Count lines terminal.run({"command": "wc -l data/*.csv"}) # View column names terminal.run({"command": "head -n 1 data/sales.csv | tr ',' '\n'"}) ``` The common characteristic of these scenarios is: **need real-time, lightweight file system access, rather than pre-indexing and vectorization**. TerminalTool is designed precisely for this "exploratory" workflow. (2) Security Mechanism Detailed Explanation Allowing agents to execute commands is a powerful but dangerous capability. TerminalTool ensures system security through multi-layer security mechanisms: **First Layer: Command Whitelist** Only allow safe read-only commands, completely prohibit any operations that may modify the system: ```python ALLOWED_COMMANDS = { # File listing and information 'ls', 'dir', 'tree', # File content viewing 'cat', 'head', 'tail', 'less', 'more', # File search 'find', 'grep', 'egrep', 'fgrep', # Text processing 'wc', 'sort', 'uniq', 'cut', 'awk', 'sed', # Directory operations 'pwd', 'cd', # File information 'file', 'stat', 'du', 'df', # Others 'echo', 'which', 'whereis', } ``` If the agent attempts to execute commands outside the whitelist, it will be immediately rejected: ```python terminal.run({"command": "rm -rf /"}) # ❌ Command not allowed: rm # Allowed commands: cat, cd, cut, dir, du, ... ``` **Second Layer: Working Directory Restriction (Sandbox)** TerminalTool can only access the specified working directory and its subdirectories, cannot access other parts of the system: ```python # Specify working directory during initialization terminal = TerminalTool(workspace="./project") # Allowed: Access files within working directory terminal.run({"command": "cat ./src/main.py"}) # ✅ # Prohibited: Access files outside working directory terminal.run({"command": "cat /etc/passwd"}) # ❌ Not allowed to access paths outside working directory # Prohibited: Escape through .. terminal.run({"command": "cd ../../../etc"}) # ❌ Not allowed to access paths outside working directory ``` This sandbox mechanism ensures that even if the agent's behavior is abnormal, it cannot affect other parts of the system. **Third Layer: Timeout Control** Each command has an execution time limit to prevent infinite loops or resource exhaustion: ```python terminal = TerminalTool( workspace="./project", timeout=30 # 30 second timeout ) # If command execution exceeds 30 seconds terminal.run({"command": "find / -name '*.log'"}) # ❌ Command execution timeout (exceeded 30 seconds) ``` **Fourth Layer: Output Size Limit** Limit the size of command output to prevent memory overflow: ```python terminal = TerminalTool( workspace="./project", max_output_size=10 * 1024 * 1024 # 10MB ) # If output exceeds 10MB terminal.run({"command": "cat huge_file.log"}) # ... (first 10MB of content) ... # ⚠️ Output truncated (exceeded 10485760 bytes) ``` Through these four layers of security mechanisms, TerminalTool provides powerful capabilities while maximizing system security. ### 9.5.2 Core Functionality Detailed Explanation The implementation of TerminalTool focuses on two core functions: command execution and directory navigation. (1) Command Execution The core `_execute_command` method is responsible for actually executing commands: ```python def _execute_command(self, command: str) -> str: """Execute command""" try: # Execute command in current directory result = subprocess.run( command, shell=True, cwd=str(self.current_dir), # Execute in current working directory capture_output=True, text=True, timeout=self.timeout, env=os.environ.copy() ) # Merge standard output and standard error output = result.stdout if result.stderr: output += f"\n[stderr]\n{result.stderr}" # Check output size if len(output) > self.max_output_size: output = output[:self.max_output_size] output += f"\n\n⚠️ Output truncated (exceeded {self.max_output_size} bytes)" # Add return code information if result.returncode != 0: output = f"⚠️ Command return code: {result.returncode}\n\n{output}" return output if output else "✅ Command executed successfully (no output)" except subprocess.TimeoutExpired: return f"❌ Command execution timeout (exceeded {self.timeout} seconds)" except Exception as e: return f"❌ Command execution failed: {e}" ``` Key points of this implementation: - **Current directory awareness**: Use `cwd` parameter to execute commands in the correct directory - **Error handling**: Capture and merge standard error, provide complete diagnostic information - **Return code check**: Non-zero return codes are marked as warnings - **Fault-tolerant design**: Timeouts and exceptions are handled properly, won't cause agent to crash (2) Directory Navigation Special handling of the `cd` command supports agent navigation in the file system: ```python def _handle_cd(self, parts: List[str]) -> str: """Handle cd command""" if not self.allow_cd: return "❌ cd command is disabled" if len(parts) < 2: # cd without parameters, return current directory return f"Current directory: {self.current_dir}" target_dir = parts[1] # Handle relative path if target_dir == "..": new_dir = self.current_dir.parent elif target_dir == ".": new_dir = self.current_dir elif target_dir == "~": new_dir = self.workspace else: new_dir = (self.current_dir / target_dir).resolve() # Check if within working directory try: new_dir.relative_to(self.workspace) except ValueError: return f"❌ Not allowed to access paths outside working directory: {new_dir}" # Check if directory exists if not new_dir.exists(): return f"❌ Directory does not exist: {new_dir}" if not new_dir.is_dir(): return f"❌ Not a directory: {new_dir}" # Update current directory self.current_dir = new_dir return f"✅ Switched to directory: {self.current_dir}" ``` This design supports agents in multi-step file system exploration: ```python # Step 1: View project structure terminal.run({"command": "ls -la"}) # Step 2: Enter source code directory terminal.run({"command": "cd src"}) # Step 3: Find specific files terminal.run({"command": "find . -name '*service*.py'"}) # Step 4: View file content terminal.run({"command": "cat user_service.py"}) ``` ### 9.5.3 Typical Usage Patterns TerminalTool supports various common file system operation patterns. (1) Exploratory Navigation Agents can explore codebases step by step like human developers: ```python from hello_agents.tools import TerminalTool terminal = TerminalTool(workspace="./my_project") # Step 1: View project root directory print(terminal.run({"command": "ls -la"})) """ total 24 drwxr-xr-x 6 user staff 192 Jan 19 16:00 . drwxr-xr-x 5 user staff 160 Jan 19 15:30 .. -rw-r--r-- 1 user staff 1234 Jan 19 15:30 README.md drwxr-xr-x 4 user staff 128 Jan 19 15:30 src drwxr-xr-x 3 user staff 96 Jan 19 15:30 tests -rw-r--r-- 1 user staff 456 Jan 19 15:30 requirements.txt """ # Step 2: View source code directory structure terminal.run({"command": "cd src"}) print(terminal.run({"command": "tree"})) # Step 3: Search for specific patterns print(terminal.run({"command": "grep -r 'def process' ."})) ``` (2) Data File Analysis Quickly understand the structure and content of data files: ```python terminal = TerminalTool(workspace="./data") # View first few lines of CSV file print(terminal.run({"command": "head -n 5 sales_2024.csv"})) """ date,product,quantity,revenue 2024-01-01,Widget A,150,4500.00 2024-01-01,Widget B,200,8000.00 2024-01-02,Widget A,180,5400.00 2024-01-02,Widget C,120,3600.00 """ # Count total lines print(terminal.run({"command": "wc -l *.csv"})) """ 10234 sales_2024.csv 8567 sales_2023.csv 18801 total """ # Extract and count product categories print(terminal.run({"command": "tail -n +2 sales_2024.csv | cut -d',' -f2 | sort | uniq -c"})) """ 3456 Widget A 4123 Widget B 2655 Widget C """ ``` (3) Log File Analysis Real-time analysis of application logs, quickly locate issues: ```python terminal = TerminalTool(workspace="/var/log") # View latest error logs print(terminal.run({"command": "tail -n 50 app.log | grep ERROR"})) # Count error type distribution print(terminal.run({"command": "grep ERROR app.log | awk '{print $4}' | sort | uniq -c | sort -rn"})) """ 245 DatabaseConnectionError 123 TimeoutException 67 ValidationError 34 AuthenticationError """ # Find logs for specific time period print(terminal.run({"command": "grep '2024-01-19 15:' app.log | tail -n 20"})) ``` (4) Codebase Analysis Assist code review and understanding: ```python terminal = TerminalTool(workspace="./codebase") # Count lines of code print(terminal.run({"command": "find . -name '*.py' -exec wc -l {} + | tail -n 1"})) # Find all TODO comments print(terminal.run({"command": "grep -rn 'TODO' --include='*.py'"})) # Find definition of specific function print(terminal.run({"command": "grep -rn 'def process_data' --include='*.py'"})) # View function implementation print(terminal.run({"command": "sed -n '/def process_data/,/^def /p' src/processor.py | head -n -1"})) ``` ### 9.5.4 Collaboration with Other Tools The true power of TerminalTool lies in its collaborative use with MemoryTool, NoteTool, and ContextBuilder. (1) Collaboration with MemoryTool Information discovered by TerminalTool can be stored in the memory system: ```python # Use TerminalTool to discover project structure structure = terminal.run({"command": "tree -L 2 src"}) # Store in semantic memory memory_tool.execute( "add", content=f"Project structure:\n{structure}", memory_type="semantic", importance=0.8, metadata={"type": "project_structure"} ) ``` (2) Collaboration with NoteTool Important discoveries can be recorded as structured notes: ```python # Discover a performance bottleneck log_analysis = terminal.run({"command": "grep 'slow query' app.log | tail -n 10"}) # Record as blocker note note_tool.run({ "action": "create", "title": "Database Slow Query Issue", "content": f"## Problem Description\nFound multiple slow queries affecting system performance\n\n## Log Analysis\n```\n{log_analysis}\n```\n\n## Next Steps\n1. Analyze slow query SQL\n2. Add indexes\n3. Optimize query logic", "note_type": "blocker", "tags": ["performance", "database"] }) ``` (3) Collaboration with ContextBuilder TerminalTool output can be part of the context: ```python # Explore codebase code_structure = terminal.run({"command": "ls -R src"}) recent_changes = terminal.run({"command": "git log --oneline -10"}) # Convert to ContextPacket from hello_agents.context import ContextPacket from datetime import datetime packets = [ ContextPacket( content=f"Codebase structure:\n{code_structure}", timestamp=datetime.now(), token_count=len(code_structure) // 4, relevance_score=0.7, metadata={"type": "code_structure", "source": "terminal"} ), ContextPacket( content=f"Recent commits:\n{recent_changes}", timestamp=datetime.now(), token_count=len(recent_changes) // 4, relevance_score=0.8, metadata={"type": "git_history", "source": "terminal"} ) ] # Include this information when building context context = context_builder.build( user_query="How to refactor the user service module?", custom_packets=packets ) ``` ## 9.6 Long-Horizon Agent in Practice: Codebase Maintenance Assistant Now, let's integrate ContextBuilder, NoteTool, and TerminalTool to build a complete long-horizon agent—**Codebase Maintenance Assistant**. This assistant can: 1. Explore and understand codebase structure 2. Record discovered issues and improvement points 3. Track long-term refactoring tasks 4. Maintain coherence under context window limitations ### 9.6.1 Scenario Setup and Requirements Analysis **Business Scenario** Suppose we are maintaining a medium-sized Python web application. This codebase contains about 50 Python files, built with the Flask framework, covering data models, business logic, API interfaces, and other modules, while also having some technical debt that needs to be gradually cleaned up. In this scenario, we need an intelligent assistant to help us explore the codebase, understand project structure, dependencies, and code style; identify issues in the code, such as code duplication, excessive complexity, lack of tests, etc.; track task progress, record to-do items, completed work, and encountered blockers; and provide coherent refactoring recommendations based on historical context. **Challenges and Solutions** This scenario faces several typical long-horizon task challenges. First is the problem of information exceeding the context window—the entire codebase may contain tens of thousands of lines of code, which cannot be placed in the context window all at once. We solve this by using TerminalTool for instant, on-demand code exploration, viewing specific files only when needed. Second is the cross-session state management challenge—refactoring tasks may last for days and need to maintain progress across multiple sessions. We address this by using NoteTool to record phased progress, to-do items, and key decisions. Finally, there's the issue of context quality and relevance—each conversation needs to review relevant historical information but cannot be overwhelmed by irrelevant information. We use ContextBuilder to intelligently filter and organize context, ensuring high signal density. ### 9.6.2 System Architecture Design Our codebase maintenance assistant adopts a three-layer architecture, as shown in Figure 9.3:

Figure 9.3 Three-layer architecture of codebase maintenance assistant

### 9.6.3 Core Implementation Now let's implement the core class of this system: ```python from typing import Dict, Any, List, Optional from datetime import datetime import json from hello_agents import SimpleAgent, HelloAgentsLLM from hello_agents.context import ContextBuilder, ContextConfig, ContextPacket from hello_agents.tools import MemoryTool, NoteTool, TerminalTool from hello_agents.core.message import Message class CodebaseMaintainer: """Codebase Maintenance Assistant - Long-horizon agent example Integrates ContextBuilder + NoteTool + TerminalTool + MemoryTool Implements cross-session codebase maintenance task management """ def __init__( self, project_name: str, codebase_path: str, llm: Optional[HelloAgentsLLM] = None ): self.project_name = project_name self.codebase_path = codebase_path self.session_id = f"session_{datetime.now().strftime('%Y%m%d_%H%M%S')}" # Initialize LLM self.llm = llm or HelloAgentsLLM() # Initialize tools self.memory_tool = MemoryTool(user_id=project_name) self.note_tool = NoteTool(workspace=f"./{project_name}_notes") self.terminal_tool = TerminalTool(workspace=codebase_path, timeout=60) # Initialize context builder self.context_builder = ContextBuilder( memory_tool=self.memory_tool, rag_tool=None, # This case does not use RAG config=ContextConfig( max_tokens=4000, reserve_ratio=0.15, min_relevance=0.2, enable_compression=True ) ) # Conversation history self.conversation_history: List[Message] = [] # Statistics self.stats = { "session_start": datetime.now(), "commands_executed": 0, "notes_created": 0, "issues_found": 0 } print(f"✅ Codebase maintenance assistant initialized: {project_name}") print(f"📁 Working directory: {codebase_path}") print(f"🆔 Session ID: {self.session_id}") def run(self, user_input: str, mode: str = "auto") -> str: """Run assistant Args: user_input: User input mode: Running mode - "auto": Automatically decide whether to use tools - "explore": Focus on code exploration - "analyze": Focus on problem analysis - "plan": Focus on task planning Returns: str: Assistant's answer """ print(f"\n{'='*80}") print(f"👤 User: {user_input}") print(f"{'='*80}\n") # Step 1: Execute preprocessing based on mode pre_context = self._preprocess_by_mode(user_input, mode) # Step 2: Retrieve relevant notes relevant_notes = self._retrieve_relevant_notes(user_input) note_packets = self._notes_to_packets(relevant_notes) # Step 3: Build optimized context context = self.context_builder.build( user_query=user_input, conversation_history=self.conversation_history, system_instructions=self._build_system_instructions(mode), custom_packets=note_packets + pre_context ) # Step 4: Call LLM print("🤖 Thinking...") response = self.llm.invoke(context) # Step 5: Post-processing self._postprocess_response(user_input, response) # Step 6: Update conversation history self._update_history(user_input, response) print(f"\n🤖 Assistant: {response}\n") print(f"{'='*80}\n") return response def _preprocess_by_mode( self, user_input: str, mode: str ) -> List[ContextPacket]: """Execute preprocessing based on mode, collect relevant information""" packets = [] if mode == "explore" or mode == "auto": # Explore mode: Automatically view project structure print("🔍 Exploring codebase structure...") structure = self.terminal_tool.run({"command": "find . -type f -name '*.py' | head -n 20"}) self.stats["commands_executed"] += 1 packets.append(ContextPacket( content=f"[Codebase Structure]\n{structure}", timestamp=datetime.now(), token_count=len(structure) // 4, relevance_score=0.6, metadata={"type": "code_structure", "source": "terminal"} )) if mode == "analyze": # Analyze mode: Check code complexity and issues print("📊 Analyzing code quality...") # Count lines of code loc = self.terminal_tool.run({"command": "find . -name '*.py' -exec wc -l {} + | tail -n 1"}) # Find TODO and FIXME todos = self.terminal_tool.run({"command": "grep -rn 'TODO\\|FIXME' --include='*.py' | head -n 10"}) self.stats["commands_executed"] += 2 packets.append(ContextPacket( content=f"[Code Statistics]\n{loc}\n\n[To-Do Items]\n{todos}", timestamp=datetime.now(), token_count=(len(loc) + len(todos)) // 4, relevance_score=0.7, metadata={"type": "code_analysis", "source": "terminal"} )) if mode == "plan": # Planning mode: Load recent notes print("📋 Loading task planning...") task_notes = self.note_tool.run({ "action": "list", "note_type": "task_state", "limit": 3 }) if task_notes: content = "\n".join([f"- {note['title']}" for note in task_notes]) packets.append(ContextPacket( content=f"[Current Tasks]\n{content}", timestamp=datetime.now(), token_count=len(content) // 4, relevance_score=0.8, metadata={"type": "task_plan", "source": "notes"} )) return packets def _retrieve_relevant_notes(self, query: str, limit: int = 3) -> List[Dict]: """Retrieve relevant notes""" try: # Prioritize retrieving blockers blockers = self.note_tool.run({ "action": "list", "note_type": "blocker", "limit": 2 }) # Search relevant notes search_results = self.note_tool.run({ "action": "search", "query": query, "limit": limit }) # Merge and deduplicate all_notes = {note.get('note_id') or note.get('id'): note for note in (blockers or []) + (search_results or [])} return list(all_notes.values())[:limit] except Exception as e: print(f"[WARNING] Note retrieval failed: {e}") return [] def _notes_to_packets(self, notes: List[Dict]) -> List[ContextPacket]: """Convert notes to context packets""" packets = [] for note in notes: # Set different relevance scores based on note type relevance_map = { "blocker": 0.9, "action": 0.8, "task_state": 0.75, "conclusion": 0.7 } note_type = note.get('type', 'general') relevance = relevance_map.get(note_type, 0.6) content = f"[Note: {note.get('title', 'Untitled')}]\nType: {note_type}\n\n{note.get('content', '')}" packets.append(ContextPacket( content=content, timestamp=datetime.fromisoformat(note.get('updated_at', datetime.now().isoformat())), token_count=len(content) // 4, relevance_score=relevance, metadata={ "type": "note", "note_type": note_type, "note_id": note.get('note_id') or note.get('id') } )) return packets def _build_system_instructions(self, mode: str) -> str: """Build system instructions""" base_instructions = f"""You are the codebase maintenance assistant for the {self.project_name} project. Your core capabilities: 1. Use TerminalTool to explore codebase (ls, cat, grep, find, etc.) 2. Use NoteTool to record discoveries and tasks 3. Provide coherent recommendations based on historical notes Current session ID: {self.session_id} """ mode_specific = { "explore": """ Current mode: Explore codebase You should: - Actively use terminal commands to understand code structure - Identify key modules and files - Record project architecture in notes """, "analyze": """ Current mode: Analyze code quality You should: - Find code issues (duplication, complexity, TODOs, etc.) - Evaluate code quality - Record discovered issues as blocker or action notes """, "plan": """ Current mode: Task planning You should: - Review historical notes and tasks - Formulate next action plan - Update task status notes """, "auto": """ Current mode: Auto decision You should: - Flexibly choose strategies based on user needs - Use tools when needed - Maintain professionalism and practicality in responses """ } return base_instructions + mode_specific.get(mode, mode_specific["auto"]) def _postprocess_response(self, user_input: str, response: str): """Post-processing: Analyze response, automatically record important information""" # If issues found, automatically create blocker note if any(keyword in response.lower() for keyword in ["issue", "bug", "error", "blocker", "problem"]): try: self.note_tool.run({ "action": "create", "title": f"Issue found: {user_input[:30]}...", "content": f"## User Input\n{user_input}\n\n## Issue Analysis\n{response[:500]}...", "note_type": "blocker", "tags": [self.project_name, "auto_detected", self.session_id] }) self.stats["notes_created"] += 1 self.stats["issues_found"] += 1 print("📝 Automatically created issue note") except Exception as e: print(f"[WARNING] Failed to create note: {e}") # If task planning, automatically create action note elif any(keyword in user_input.lower() for keyword in ["plan", "next", "task", "todo"]): try: self.note_tool.run({ "action": "create", "title": f"Task planning: {user_input[:30]}...", "content": f"## Discussion\n{user_input}\n\n## Action Plan\n{response[:500]}...", "note_type": "action", "tags": [self.project_name, "planning", self.session_id] }) self.stats["notes_created"] += 1 print("📝 Automatically created action plan note") except Exception as e: print(f"[WARNING] Failed to create note: {e}") def _update_history(self, user_input: str, response: str): """Update conversation history""" self.conversation_history.append( Message(content=user_input, role="user", timestamp=datetime.now()) ) self.conversation_history.append( Message(content=response, role="assistant", timestamp=datetime.now()) ) # Limit history length (keep recent 10 rounds of conversation) if len(self.conversation_history) > 20: self.conversation_history = self.conversation_history[-20:] # === Convenience methods === def explore(self, target: str = ".") -> str: """Explore codebase""" return self.run(f"Please explore the code structure of {target}", mode="explore") def analyze(self, focus: str = "") -> str: """Analyze code quality""" query = f"Please analyze code quality" + (f", focusing on {focus}" if focus else "") return self.run(query, mode="analyze") def plan_next_steps(self) -> str: """Plan next steps""" return self.run("Based on current progress, plan next steps", mode="plan") def execute_command(self, command: str) -> str: """Execute terminal command""" result = self.terminal_tool.run({"command": command}) self.stats["commands_executed"] += 1 return result def create_note( self, title: str, content: str, note_type: str = "general", tags: List[str] = None ) -> str: """Create note""" result = self.note_tool.run({ "action": "create", "title": title, "content": content, "note_type": note_type, "tags": tags or [self.project_name] }) self.stats["notes_created"] += 1 return result def get_stats(self) -> Dict[str, Any]: """Get statistics""" duration = (datetime.now() - self.stats["session_start"]).total_seconds() # Get note summary try: note_summary = self.note_tool.run({"action": "summary"}) except: note_summary = {} return { "session_info": { "session_id": self.session_id, "project": self.project_name, "duration_seconds": duration }, "activity": { "commands_executed": self.stats["commands_executed"], "notes_created": self.stats["notes_created"], "issues_found": self.stats["issues_found"] }, "notes": note_summary } def generate_report(self, save_to_file: bool = True) -> Dict[str, Any]: """Generate session report""" report = self.get_stats() if save_to_file: report_file = f"maintainer_report_{self.session_id}.json" with open(report_file, 'w', encoding='utf-8') as f: json.dump(report, f, ensure_ascii=False, indent=2, default=str) report["report_file"] = report_file print(f"📄 Report saved: {report_file}") return report ``` ### 9.6.4 Complete Usage Example Now let's demonstrate the workflow of this long-horizon agent through a complete usage scenario: ```python # ========== Initialize Assistant ========== from hello_agents import HelloAgentsLLM maintainer = CodebaseMaintainer( project_name="my_flask_app", codebase_path="./my_flask_app", llm=HelloAgentsLLM() ) # ========== Day 1: Explore Codebase ========== # 1. Initial exploration response = maintainer.explore() """ 🔍 Exploring codebase structure... 🤖 Thinking... 🤖 Assistant: I have scanned the codebase, this is a Flask web application, mainly containing the following modules: 📁 Project structure: - app/ ├── models/ # Data models (User, Product, Order, etc.) ├── routes/ # API routes ├── services/ # Business logic └── utils/ # Utility functions - tests/ # Test files - migrations/ # Database migrations - config.py # Configuration file Total approximately 3,500 lines of Python code. Recommended next steps: 1. Review core model definitions 2. Check routes and API design 3. Evaluate test coverage """ # 2. Deep dive into a specific module response = maintainer.run("Please analyze the data model design in the app/models/ directory") """ 🤖 Assistant: I reviewed the app/models/ directory and found the following data models: 1. User (user.py) - Fields: id, username, email, password_hash, created_at - Relationships: One-to-many Order - Issue: ⚠️ Missing index, email field not set with unique constraint 2. Product (product.py) - Fields: id, name, description, price, stock - Issue: ✅ Well designed 3. Order (order.py) - Fields: id, user_id, product_id, quantity, status - Issue: ⚠️ Missing creation time field, not conducive to data analysis Recommendations: 1. Add unique constraint to User.email 2. Add created_at and updated_at fields to Order 3. Consider adding soft delete support I have recorded these issues as blocker notes. """ # [Automatically created blocker note] # ========== Day 2: Analyze Code Quality ========== # 1. Overall quality analysis response = maintainer.analyze() """ 📊 Analyzing code quality... 🤖 Thinking... 🤖 Assistant: Code quality analysis report: 📊 Code statistics: - Total lines: 3,542 lines - Average file length: approximately 150 lines - Largest file: services/order_service.py (456 lines) ⚠️ Issues found: 1. Code duplication (HIGH): - user_service.py and product_service.py have similar CRUD logic - Recommendation: Extract base class BaseService 2. Excessive complexity (MEDIUM): - order_service.py::process_order method has 8 levels of nesting - Recommendation: Split into multiple small methods 3. Lack of tests (HIGH): - Test coverage only 45% - services/ directory has no tests at all 4. TODO/FIXME (12 items): - Most are "optimize later" type comments - Recommendation: Clean up or convert to formal tasks I have recorded these issues as notes and marked priorities. """ # [Automatically created multiple blocker notes] # 2. View specific issue response = maintainer.run("Please review the process_order method in order_service.py and provide refactoring recommendations") """ 🤖 Assistant: I reviewed the code of the process_order method: 【Original Code】 def process_order(self, order_id): order = Order.query.get(order_id) if order: if order.status == 'pending': product = Product.query.get(order.product_id) if product: if product.stock >= order.quantity: # ... 8 levels of nesting continue ... 【Problem Analysis】 1. Deep nesting leads to poor readability 2. Lack of error handling 3. Business logic mixed with data access 【Refactoring Recommendations】 def process_order(self, order_id): # 1. Early return pattern, reduce nesting order = self._get_order(order_id) if not order or order.status != 'pending': return False # 2. Extract methods, single responsibility if not self._check_stock(order): return False # 3. Use transactions to ensure consistency with db.transaction(): self._update_stock(order) self._update_order_status(order, 'processing') return True 【Next Steps】 1. I have created a refactoring plan note 2. Recommend writing test cases first 3. Refactor gradually, keeping functionality unchanged """ # [Automatically created action note] # ========== Day 3: Plan Refactoring Tasks ========== # 1. Review progress response = maintainer.plan_next_steps() """ 📋 Loading task planning... 🤖 Thinking... 🤖 Assistant: Based on the exploration and analysis from the past two days, I have organized refactoring task priorities: 🔴 High priority (complete this week): 1. [blocker] Add unique constraint to User.email - Risk: Low, just database migration - Effort: 1 hour 2. [blocker] Extract BaseService to reduce code duplication - Risk: Medium, affects multiple service classes - Effort: Half a day 3. [blocker] Refactor process_order method - Risk: High, core business logic - Effort: 1 day (including writing tests) 🟡 Medium priority (next week): 4. Add unit tests for services/ 5. Clean up TODO comments 6. Add time fields to Order 🟢 Low priority (TBD): 7. Performance optimization 8. Documentation updates Recommendations: - This week, first resolve data model issues (tasks 1, 6) - Next week, conduct code refactoring (tasks 2, 3) - Continuously increase test coverage (task 4) I have updated task status notes, you can check progress anytime. """ # 2. Manually create detailed refactoring plan maintainer.create_note( title="Weekly Refactoring Plan - Week 1", content="""## Objectives Complete optimization of data model layer ## Task Checklist - [ ] Add unique constraint to User.email - [ ] Add created_at, updated_at fields to Order - [ ] Write database migration scripts - [ ] Update related test cases ## Schedule - Monday: Design migration scripts - Tuesday-Wednesday: Execute migration and test - Thursday: Update test cases - Friday: Code Review ## Risks - Database migration may affect production environment, needs to be executed during off-peak hours - Existing data may have duplicate emails, need to clean up first """, note_type="task_state", tags=["refactoring", "week1", "high_priority"] ) print("✅ Created detailed refactoring plan") # ========== One Week Later: Check Progress ========== # View note summary summary = maintainer.note_tool.run({"action": "summary"}) print("📊 Note summary:") print(json.dumps(summary, indent=2, ensure_ascii=False)) """ { "total_notes": 8, "type_distribution": { "blocker": 3, "action": 2, "task_state": 2, "conclusion": 1 }, "recent_notes": [ { "id": "note_20250119_160000_7", "title": "Weekly Refactoring Plan - Week 1", "type": "task_state", "updated_at": "2025-01-19T16:00:00" }, ... ] } """ # Generate complete report report = maintainer.generate_report() print("\n📄 Session report:") print(json.dumps(report, indent=2, ensure_ascii=False)) """ { "session_info": { "session_id": "session_20250119_150000", "project": "my_flask_app", "duration_seconds": 172800 # 2 days }, "activity": { "commands_executed": 24, "notes_created": 8, "issues_found": 3 }, "notes": { ... } } """ ``` ### 9.6.5 Running Effect Analysis Through this complete case study, we can see several key characteristics of long-horizon agents. First is cross-session coherence—the agent maintains task coherence across multiple days and sessions through NoteTool. Issues explored on day one are automatically considered during day two analysis, day three planning can synthesize all discoveries from the previous two days, and the complete history is preserved when checking a week later. Second is intelligent context management—ContextBuilder ensures high-quality context for each conversation, automatically gathering relevant notes (especially blocker types), dynamically adjusting preprocessing strategies based on conversation mode, and selecting the most relevant information within the token budget. The third characteristic is instant file system access—TerminalTool supports flexible code exploration without needing to pre-index the entire codebase, can view specific file content instantly, and supports complex text processing (grep, awk, etc.). Fourth is automated knowledge management—the system automatically manages discovered knowledge, automatically creating blocker notes when issues are found, automatically creating action notes when discussing plans, and automatically storing key information in the memory system. Finally is human-machine collaboration—this system supports flexible human-machine collaboration modes, where agents can automatically complete exploration and analysis, humans can intervene and guide through the note system, and supports manually creating detailed planning notes. This basic framework can be further extended, such as integrating RAGTool to build vector indexes for codebases combined with semantic retrieval, splitting into specialized explorers, analyzers, and planners to implement multi-agent collaboration, integrating testing tools to automatically verify refactoring results, executing git commands through TerminalTool to track code changes, or building visual interfaces using Gradio/Streamlit. ## 9.7 Chapter Summary In this chapter, we deeply explored the theoretical foundations and engineering practices of context engineering: ### Theoretical Level 1. **Essence of Context Engineering**: Evolution from "prompt engineering" to "context engineering", the core is managing limited attention budget 2. **Context Rot**: Understanding performance degradation brought by long contexts, recognizing context as a scarce resource 3. **Three Major Strategies**: Compaction, structured note-taking, sub-agent architectures ### Engineering Practice 1. **ContextBuilder**: Implements GSSC pipeline, provides unified context management interface 2. **NoteTool**: Hybrid format of Markdown+YAML, supports structured long-term memory 3. **TerminalTool**: Secure command-line tool, supports instant file system access 4. **Long-Horizon Agent**: Integrates three major tools, builds cross-session codebase maintenance assistant ### Core Takeaways - **Layered Design**: Instant access (TerminalTool) + session memory (MemoryTool) + persistent notes (NoteTool) - **Intelligent Filtering**: Scoring mechanism based on relevance and recency - **Security First**: Multi-layer security mechanisms ensure system stability - **Human-Machine Collaboration**: Balance between automation and controllability Through this chapter's learning, you have not only mastered the core technologies of context engineering, but more importantly, understood how to build agent systems that can maintain coherence and effectiveness over long time spans. These skills will become an important foundation for you to build production-level agent applications. In the next chapter, we will explore agent communication protocols and learn how to enable agents to interact more broadly with the external world. ## Exercises > **Note**: Some exercises do not have standard answers. The focus is on cultivating learners' comprehensive understanding and practical ability in context engineering and long-horizon task management. 1. This chapter introduced the difference between context engineering and prompt engineering. Please analyze: - Section 9.1 mentioned "context must be viewed as a limited resource with diminishing marginal returns". Please explain what the "context rot" phenomenon is? Why do we still need to carefully manage context even when models support 100K or even 200K context windows? - Suppose you want to build a "code review assistant" that needs to analyze a codebase containing 50 files. Please compare two strategies: (1) Load all file content into context at once; (2) Use JIT (Just-in-time) context, retrieving files on demand through tools. Analyze the advantages, disadvantages, and applicable scenarios of each. - Section 9.2.1 mentioned two extreme pitfalls of system prompts: "over-hardcoding" and "too vague". Please give a practical example of each and explain how to find the right balance. 2. The GSSC (Gather-Select-Structure-Compress) pipeline is the core technology of this chapter. Please think deeply: > **Note**: This is a hands-on practice question, actual operation is recommended - In the ContextBuilder implementation in Section 9.3, the four stages each have different responsibilities. Please analyze: If a certain stage fails (such as the Select stage selecting irrelevant information, or the Compress stage over-compressing leading to information loss), what impact will it have on the final agent performance? - Based on the code in Section 9.3.4, add a "context quality assessment" function to ContextBuilder: After each context build, automatically evaluate the information density, relevance, and completeness of the context, and provide optimization suggestions. - The "compression" stage in the GSSC pipeline uses LLM for intelligent summarization. Please think: Under what circumstances might simple truncation or sliding window strategies be more appropriate than LLM summarization? Design a hybrid compression strategy that combines the advantages of multiple compression methods. 3. NoteTool and TerminalTool are key tools supporting long-horizon tasks. Based on Sections 9.4 and 9.5, please complete the following extension practices: > **Note**: This is a hands-on practice question, actual operation is recommended - NoteTool uses a hierarchical note system (project notes, task notes, temporary notes). Please design an "automatic note organization" mechanism: When temporary notes accumulate to a certain number, the agent can automatically analyze these notes, promote important information to task notes or project notes, and clean up redundant content. - TerminalTool provides file system operation capabilities, but Section 9.5.2 emphasizes security design. Please analyze: Are the current security mechanisms (path validation, command whitelist, permission check) sufficient? If the agent needs to access sensitive files or execute dangerous operations, how should a "human-machine collaborative approval" process be designed? - Combining NoteTool and TerminalTool, design an "intelligent code refactoring assistant": Can analyze codebase structure, record refactoring plans, execute refactoring operations step by step, and track progress and encountered problems in notes. Please draw a complete workflow diagram. 4. In the "long-horizon task management" case in Section 9.6, we saw the value of context engineering in practical applications. Please analyze in depth: - The case uses a "layered context management" strategy: instant access (TerminalTool) + session memory (MemoryTool) + persistent notes (NoteTool). Please analyze: How should these three layers coordinate? What information should be placed in which layer? How to avoid information redundancy and inconsistency? - Suppose an interruption occurs during task execution (such as system crash, network disconnection), the agent needs to recover state from notes and continue execution. Please design a "resume from breakpoint" mechanism: How to record sufficient state information in notes? How to verify that the recovered state is correct? - Long-horizon tasks often involve parallel or serial execution of multiple subtasks. Please design a "task dependency management" system: Can express dependency relationships between tasks (such as "Task B must be executed after Task A is completed"), and automatically schedule task execution order. How should this system integrate with NoteTool? 5. This chapter repeatedly mentioned the concept of "progressive disclosure". Please think: - In Section 9.2.2, progressive disclosure is described as "each interaction step produces new context, which in turn guides the next decision". Please design a specific application scenario (such as academic paper writing, complex problem debugging), demonstrating how progressive disclosure helps agents complete tasks more efficiently. - A potential risk of progressive disclosure is "inefficient exploration": The agent may waste time on unimportant details or miss key information. Please design an "exploration guidance" mechanism: Through heuristic rules or metacognitive strategies, help the agent make smarter decisions about "what to explore next". - Compare "progressive disclosure" with traditional "load all context at once": In what types of tasks does the former have obvious advantages? In what types of tasks might the latter be more appropriate? Please provide at least 3 examples of different types of tasks. ## References [1] Anthropic. Effective Context Engineering for AI Agents. `https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents` [2] David Kim. Context-Engineering (GitHub). `https://github.com/davidkimai/Context-Engineering`