Ver Fonte

fix table12.5

jjyaoao há 7 meses atrás
pai
commit
919494d0b1

+ 1 - 1
docs/chapter12/Chapter12-Agent-Performance-Evaluation.md

@@ -1865,7 +1865,7 @@ In our implementation, LLM Judge evaluates AIME problem quality from four key di
 
 <div align="center">
   <p>Table 12.5 LLM Judge Evaluation Dimensions for AIME Problems</p>
-  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-4.png" alt="" width="85%"/>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-5.png" alt="" width="85%"/>
 </div>
 
 After obtaining scores from four dimensions, we need to aggregate these scores into overall evaluation metrics. We define three key metrics to measure the quality level of generated problems:

+ 1 - 1
docs/chapter12/第十二章 智能体性能评估.md

@@ -1853,7 +1853,7 @@ AIME 是美国数学协会(MAA)主办的中等难度数学竞赛,介于 AM
 
 <div align="center">
   <p>表 12.5 LLM Judge 评估 AIME 题目的维度</p>
-  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-4.png" alt="" width="85%"/>
+  <img src="https://raw.githubusercontent.com/datawhalechina/Hello-Agents/main/docs/images/12-figures/12-table-5.png" alt="" width="85%"/>
 </div>
 
 有了四个维度的评分后,我们需要将这些评分汇总成整体的评估指标。我们定义了三个关键指标来衡量生成题目的质量水平: