Просмотр исходного кода

feat: 添加混合搜索支持并更新依赖配置

- 添加 google-search-results 依赖以支持 SerpApi 搜索后端
- 更新 requirements.txt 结构,明确区分核心、搜索和音频依赖
- 在配置中新增 Tavily 和 SerpApi API 密钥字段
- 实现搜索工具的懒加载初始化,支持多后端 API 密钥配置
- 添加 FFmpeg 路径检查,防止音频生成失败
- 更新 README 文档,完善快速开始指南和配置说明
- 添加搜索验证脚本,便于测试搜索服务连通性
JJSun 5 месяцев назад
Родитель
Сommit
b2bc97cedf

+ 57 - 18
Co-creation-projects/JJason-DeepCastAgent/backend/README.md

@@ -8,10 +8,10 @@ DeepCast 旨在解决信息获取的"枯燥"问题,将严肃的深度研究报
 
 ## ✨ 核心功能
 
-- [x] **深度全网调研**:自动拆解问题,多轮搜索,生成结构化深度报告。
-- [ ] **自动化脚本生成**:将研究报告改编为 Host (主持人) 与 Guest (专家) 的对谈脚本。
-- [ ] **高品质语音合成**:基于 ECNU-TTS 生成逼真的双人对话音频。
-- [ ] **一键播客生成**:自动合成最终 MP3 文件,即刻收听。
+- [x] **深度全网调研**:自动拆解问题,多轮搜索(Hybrid Search),生成结构化深度报告。
+- [x] **自动化脚本生成**:将研究报告改编为 Host (Xiayu) 与 Guest (Liwa) 的对谈脚本。
+- [x] **高品质语音合成**:基于 ECNU-TTS 生成逼真的双人对话音频。
+- [x] **一键播客生成**:自动合成最终 MP3 文件,即刻收听。
 
 ## 🛠️ 技术栈
 
@@ -20,41 +20,80 @@ DeepCast 旨在解决信息获取的"枯燥"问题,将严肃的深度研究报
 - **模型支持**:
     - 推理/脚本: `ecnu-max`, `ecnu-reasoner`
     - 语音: `ecnu-tts`
-- **工具**: Tavily (搜索), Pydub (音频处理)
+- **搜索服务**: 
+    - 混合搜索 (Hybrid Search): Tavily + SerpApi (Google)
+    - 备用方案: DuckDuckGo
+- **音频处理**: Pydub, FFmpeg
 
 ## 🚀 快速开始
 
-### 环境要求
+### 1. 环境准备
 
 - Python 3.10+
-- `uv` 包管理器 (推荐) 或 `pip`
+- `uv` 包管理器 (推荐)
+- **FFmpeg**: 必须安装并配置到系统 PATH,或在配置中指定路径。
 
-### 安装依赖
+### 2. 安装依赖
 
 ```bash
 cd backend
 uv sync
+# 或者使用 pip
+# pip install -r requirements.txt
 ```
 
-### 配置 API 密钥
+### 3. 配置环境变量
 
-复制 `.env.example` 到 `.env` 并填入必要的 API Key
+复制 `env.example` 为 `.env` 并填入必要的配置
 
 ```bash
-cp .env.example .env
+cp env.example .env
 ```
 
-需要配置:
-- `LLM_PROVIDER`: 如 `openai` (兼容接口)
-- `LLM_API_KEY`: 你的模型 API Key
-- `SEARCH_API_KEY`: 搜索服务 Key (如 Tavily)
-
-### 运行项目
+**关键配置项**:
+
+- **LLM**:
+    ```env
+    LLM_PROVIDER=custom
+    LLM_MODEL_ID=ecnu-max
+    LLM_API_KEY=your_key
+    LLM_BASE_URL=https://chat.ecnu.edu.cn/open/api/v1
+    ```
+
+- **TTS**:
+    ```env
+    TTS_API_KEY=your_key
+    TTS_BASE_URL=https://chat.ecnu.edu.cn/open/api/v1/audio/speech
+    TTS_MODEL=ecnu-tts
+    ```
+
+- **搜索 (推荐配置)**:
+    ```env
+    SEARCH_API=hybrid
+    TAVILY_API_KEY=your_tavily_key
+    SERPAPI_API_KEY=your_serpapi_key
+    ```
+
+- **音频工具**:
+    ```env
+    # 如果 ffmpeg 不在系统 PATH 中,请指定绝对路径
+    FFMPEG_PATH=C:\ffmpeg\bin\ffmpeg.exe
+    ```
+
+### 4. 运行项目
 
 ```bash
-python src/main.py
+uv run src/main.py
 ```
 
+## 🧪 验证脚本
+
+项目包含一系列测试脚本,用于验证各组件配置是否正确:
+
+- `tests/verify_ffmpeg.py`: 检查 FFmpeg 是否可用。
+- `tests/verify_search.py`: 测试混合搜索(Tavily/SerpApi)是否连通。
+- `tests/verify_ecnu_tts.py`: 测试 TTS 语音生成服务。
+
 ## 🤝 贡献指南
 
 欢迎提出 Issue 和 Pull Request!

+ 1 - 0
Co-creation-projects/JJason-DeepCastAgent/backend/pyproject.toml

@@ -20,6 +20,7 @@ dependencies = [
     "loguru>=0.7.3",
     "huggingface-hub>=1.3.3",
     "pydub>=0.25.1",
+    "google-search-results>=2.4.2",
 ]
 
 [project.optional-dependencies]

+ 16 - 9
Co-creation-projects/JJason-DeepCastAgent/backend/requirements.txt

@@ -1,14 +1,21 @@
 # 核心依赖
-hello-agents[all]>=0.2.7
+hello-agents>=0.2.8
+fastapi>=0.115.0
+uvicorn[standard]>=0.32.0
 
-# LLM相关
-openai>=1.0.0
+# 搜索后端
+tavily-python>=0.5.0
+ddgs>=9.6.1
+google-search-results>=2.4.2
 
-# 数据处理
-pandas>=2.0.0
-numpy>=1.24.0
+# LLM 与 AI 服务
+openai>=1.12.0
+huggingface-hub>=1.3.3
 
-# 其他工具
-python-dotenv>=1.0.0
-requests>=2.31.0
+# 音频处理
+pydub>=0.25.1
 
+# 工具与基础库
+python-dotenv==1.0.1
+requests>=2.31.0
+loguru>=0.7.3

+ 12 - 0
Co-creation-projects/JJason-DeepCastAgent/backend/src/config.py

@@ -111,6 +111,16 @@ class Configuration(BaseModel):
         title="FFmpeg Path",
         description="Path to ffmpeg executable",
     )
+    tavily_api_key: Optional[str] = Field(
+        default=None,
+        title="Tavily API Key",
+        description="API key for Tavily search",
+    )
+    serpapi_api_key: Optional[str] = Field(
+        default=None,
+        title="SerpApi Key",
+        description="API key for SerpApi",
+    )
 
     @classmethod
     def from_env(cls, overrides: Optional[dict[str, Any]] = None) -> "Configuration":
@@ -145,6 +155,8 @@ class Configuration(BaseModel):
             "tts_model": os.getenv("TTS_MODEL"),
             "audio_output_dir": os.getenv("AUDIO_OUTPUT_DIR"),
             "ffmpeg_path": os.getenv("FFMPEG_PATH"),
+            "tavily_api_key": os.getenv("TAVILY_API_KEY"),
+            "serpapi_api_key": os.getenv("SERPAPI_API_KEY"),
         }
 
         # Handle NO_PROXY

+ 5 - 0
Co-creation-projects/JJason-DeepCastAgent/backend/src/services/audio_generator.py

@@ -9,6 +9,7 @@ from pathlib import Path
 from typing import List, Optional
 
 from config import Configuration
+from pydub import AudioSegment
 
 logger = logging.getLogger(__name__)
 
@@ -41,6 +42,10 @@ class AudioGenerationService:
         Returns:
             List of paths to generated audio files
         """
+        # 检查FFmpeg路径是否配置
+        if not self._config.ffmpeg_path:
+            logger.error("FFmpeg path not configured. Audio generation will fail.")
+            return []
         if not self._config.tts_api_key:
             logger.warning("TTS API key not configured. Skipping audio generation.")
             return []

+ 15 - 2
Co-creation-projects/JJason-DeepCastAgent/backend/src/services/search.py

@@ -17,7 +17,19 @@ from utils import (
 logger = logging.getLogger(__name__)
 
 MAX_TOKENS_PER_SOURCE = 2000
-_GLOBAL_SEARCH_TOOL = SearchTool(backend="hybrid")
+_GLOBAL_SEARCH_TOOL = None
+
+
+def get_global_search_tool(config: Configuration) -> SearchTool:
+    """Lazy initialization of the global search tool with API keys."""
+    global _GLOBAL_SEARCH_TOOL
+    if _GLOBAL_SEARCH_TOOL is None:
+        _GLOBAL_SEARCH_TOOL = SearchTool(
+            backend="hybrid",
+            tavily_key=config.tavily_api_key,
+            serpapi_key=config.serpapi_api_key,
+        )
+    return _GLOBAL_SEARCH_TOOL
 
 
 def dispatch_search(
@@ -28,9 +40,10 @@ def dispatch_search(
     """Execute configured search backend and normalise response payload."""
 
     search_api = get_config_value(config.search_api)
+    search_tool = get_global_search_tool(config)
 
     try:
-        raw_response = _GLOBAL_SEARCH_TOOL.run(
+        raw_response = search_tool.run(
             {
                 "input": query,
                 "backend": search_api,

+ 63 - 0
Co-creation-projects/JJason-DeepCastAgent/backend/tests/verify_search.py

@@ -0,0 +1,63 @@
+import os
+import sys
+from dotenv import load_dotenv
+
+# Add src to path
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../src')))
+
+# Load env
+load_dotenv(os.path.join(os.path.dirname(__file__), '../.env'))
+
+from config import Configuration
+from services.search import get_global_search_tool
+
+def test_search_configuration():
+    print("Testing search configuration...")
+    
+    # Load config from env
+    config = Configuration.from_env()
+    
+    # Print loaded keys (masked)
+    tavily_key = config.tavily_api_key
+    serpapi_key = config.serpapi_api_key
+    
+    print(f"Tavily Key: {'*' * 8 + tavily_key[-4:] if tavily_key else 'None'}")
+    print(f"SerpApi Key: {'*' * 8 + serpapi_key[-4:] if serpapi_key else 'None'}")
+    print(f"Search API: {config.search_api}")
+    
+    # Initialize search tool
+    search_tool = get_global_search_tool(config)
+    print(f"Search Tool Backend: {search_tool.backend}")
+    print(f"Available Backends: {search_tool.available_backends}")
+    
+    if not search_tool.available_backends:
+        print("❌ No search backends available. Please check API keys.")
+        return
+
+    # Test search
+    query = "DeepSeek technology overview"
+    print(f"\nRunning search for: '{query}'...")
+    
+    try:
+        response = search_tool.run({
+            "input": query,
+            "backend": "hybrid",
+            "max_results": 2
+        })
+        
+        if isinstance(response, dict):
+            backend = response.get("backend", "unknown")
+            results = response.get("results", [])
+            print(f"✅ Search successful using backend: {backend}")
+            print(f"Found {len(results)} results:")
+            for i, res in enumerate(results, 1):
+                print(f"  {i}. {res.get('title')} ({res.get('url')})")
+        else:
+            print(f"❌ Unexpected response format: {type(response)}")
+            print(response)
+            
+    except Exception as e:
+        print(f"❌ Search failed: {e}")
+
+if __name__ == "__main__":
+    test_search_configuration()

+ 11 - 0
Co-creation-projects/JJason-DeepCastAgent/backend/uv.lock

@@ -359,6 +359,7 @@ source = { editable = "." }
 dependencies = [
     { name = "ddgs" },
     { name = "fastapi" },
+    { name = "google-search-results" },
     { name = "hello-agents" },
     { name = "huggingface-hub" },
     { name = "loguru" },
@@ -385,6 +386,7 @@ dev = [
 requires-dist = [
     { name = "ddgs", specifier = ">=9.6.1" },
     { name = "fastapi", specifier = ">=0.115.0" },
+    { name = "google-search-results", specifier = ">=2.4.2" },
     { name = "hello-agents", specifier = ">=0.2.8" },
     { name = "huggingface-hub", specifier = ">=1.3.3" },
     { name = "loguru", specifier = ">=0.7.3" },
@@ -455,6 +457,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/01/c9/97cc5aae1648dcb851958a3ddf73ccd7dbe5650d95203ecb4d7720b4cdbf/fsspec-2026.1.0-py3-none-any.whl", hash = "sha256:cb76aa913c2285a3b49bdd5fc55b1d7c708d7208126b60f2eb8194fe1b4cbdcc", size = 201838, upload-time = "2026-01-09T15:21:34.041Z" },
 ]
 
+[[package]]
+name = "google-search-results"
+version = "2.4.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "requests" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/77/30/b3a6f6a2e00f8153549c2fa345c58ae1ce8e5f3153c2fe0484d444c3abcb/google_search_results-2.4.2.tar.gz", hash = "sha256:603a30ecae2af8e600b22635757a6df275dad4b934f975e67878ccd640b78245", size = 18818, upload-time = "2023-03-10T11:13:09.953Z" }
+
 [[package]]
 name = "h11"
 version = "0.16.0"