|
@@ -0,0 +1,855 @@
|
|
|
|
|
+{
|
|
|
|
|
+ "cells": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "# SRE On-Call Agent\n",
|
|
|
|
|
+ "> AI-powered incident triage, root cause investigation, and post-mortem generation\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Project Introduction\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "When a production alert fires at 3am, an SRE on-call engineer must:\n",
|
|
|
|
|
+ "1. **Triage** — assess severity and plan the investigation\n",
|
|
|
|
|
+ "2. **Investigate** — search logs, query metrics, consult runbooks\n",
|
|
|
|
|
+ "3. **Write a post-mortem** — root cause, timeline, action items\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "This project automates that workflow with a **three-agent AI pipeline**:\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "```\n",
|
|
|
|
|
+ "Alert JSON\n",
|
|
|
|
|
+ " │\n",
|
|
|
|
|
+ " ▼\n",
|
|
|
|
|
+ "┌─────────────────────────────┐\n",
|
|
|
|
|
+ "│ Stage 1: TriageAgent │ Plan-and-Solve\n",
|
|
|
|
|
+ "│ \"What should I investigate?\"│ → ordered investigation plan\n",
|
|
|
|
|
+ "└─────────────┬───────────────┘\n",
|
|
|
|
|
+ " │\n",
|
|
|
|
|
+ " ▼\n",
|
|
|
|
|
+ "┌─────────────────────────────┐\n",
|
|
|
|
|
+ "│ Stage 2: InvestigationAgent│ ReAct loop\n",
|
|
|
|
|
+ "│ log_search / metric_query │ → root cause hypothesis\n",
|
|
|
|
|
+ "│ / runbook_lookup │\n",
|
|
|
|
|
+ "└─────────────┬───────────────┘\n",
|
|
|
|
|
+ " │\n",
|
|
|
|
|
+ " ▼\n",
|
|
|
|
|
+ "┌─────────────────────────────┐\n",
|
|
|
|
|
+ "│ Stage 3: PostmortemAgent │ Reflection\n",
|
|
|
|
|
+ "│ draft → critique → revise │ → final RCA report\n",
|
|
|
|
|
+ "└─────────────────────────────┘\n",
|
|
|
|
|
+ "```\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Author Information\n",
|
|
|
|
|
+ "- **GitHub**: [@zjzhou](https://github.com/zjzhou)\n",
|
|
|
|
|
+ "- **Date**: 2026-04-21\n",
|
|
|
|
|
+ "- **HelloAgents Chapter**: 16 (Graduation Project)"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "---\n",
|
|
|
|
|
+ "## Part 2: Environment Setup"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 1,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "zsh:1: command not found: pip\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "# Install dependencies\n",
|
|
|
|
|
+ "!pip install -q openai fastapi uvicorn pyyaml pydantic python-dotenv"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 2,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "✅ Environment loaded\n",
|
|
|
|
|
+ " LLM_MODEL_ID: llama-3.3-70b-versatile\n",
|
|
|
|
|
+ " LLM_BASE_URL: https://api.groq.com/openai/v1\n",
|
|
|
|
|
+ " LLM_API_KEY: SET\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "import os\n",
|
|
|
|
|
+ "import sys\n",
|
|
|
|
|
+ "import json\n",
|
|
|
|
|
+ "import time\n",
|
|
|
|
|
+ "import re\n",
|
|
|
|
|
+ "import ast\n",
|
|
|
|
|
+ "import glob\n",
|
|
|
|
|
+ "import yaml\n",
|
|
|
|
|
+ "from pathlib import Path\n",
|
|
|
|
|
+ "from typing import Dict, Any, List\n",
|
|
|
|
|
+ "from dotenv import load_dotenv\n",
|
|
|
|
|
+ "from openai import OpenAI\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "# Add project root to path so src/ imports work\n",
|
|
|
|
|
+ "project_root = Path().resolve()\n",
|
|
|
|
|
+ "if str(project_root) not in sys.path:\n",
|
|
|
|
|
+ " sys.path.insert(0, str(project_root))\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "load_dotenv()\n",
|
|
|
|
|
+ "print(\"✅ Environment loaded\")\n",
|
|
|
|
|
+ "print(f\" LLM_MODEL_ID: {os.getenv('LLM_MODEL_ID', 'NOT SET')}\")\n",
|
|
|
|
|
+ "print(f\" LLM_BASE_URL: {os.getenv('LLM_BASE_URL', 'NOT SET')}\")\n",
|
|
|
|
|
+ "print(f\" LLM_API_KEY: {'SET' if os.getenv('LLM_API_KEY') else 'NOT SET'}\")"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "---\n",
|
|
|
|
|
+ "## Part 3: Tool Definitions\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Three tools give the InvestigationAgent eyes into the incident:\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "| Tool | Input | What it does |\n",
|
|
|
|
|
+ "|---|---|---|\n",
|
|
|
|
|
+ "| `LogSearchTool` | keyword/regex | Searches incident log entries |\n",
|
|
|
|
|
+ "| `MetricQueryTool` | metric name keyword | Returns time-series metric data |\n",
|
|
|
|
|
+ "| `RunbookLookupTool` | error pattern | Fetches runbook remediation steps |"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 3,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "🔧 LogSearchTool — searching for 'pool':\n",
|
|
|
|
|
+ "Found 4 log entries matching 'pool':\n",
|
|
|
|
|
+ "[2024-01-15T14:00:03Z] [WARN ] DB connection pool: 8/10 connections in use\n",
|
|
|
|
|
+ "[2024-01-15T14:01:15Z] [WARN ] DB connection pool: 10/10 connections in use, queuing requests\n",
|
|
|
|
|
+ "[2024-01-15T14:01:45Z] [ERROR ] DB connection pool exhausted — waiting for available connection (timeout=30s)\n",
|
|
|
|
|
+ "[2024-01-15T14:02:01Z] [ERROR ] DB connection pool exhausted — waiting for available connection (timeout=30s)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "📊 MetricQueryTool — querying 'db_pool':\n",
|
|
|
|
|
+ "Metrics for checkout-service matching 'db_pool':\n",
|
|
|
|
|
+ " db_pool_active_connections: [14:00: 3 | 14:01: 8 | 14:02: 10 | 14:03: 10]\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "📖 RunbookLookupTool — pattern 'pool exhausted':\n",
|
|
|
|
|
+ "Runbook: checkout-service (v1.2)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Pattern: 'DB pool exhausted' | Severity: P1\n",
|
|
|
|
|
+ "Remediation steps:\n",
|
|
|
|
|
+ " 1. Identify slow queries: run SHOW PROCESSLIST on the primary DB; look for queries >5s\n",
|
|
|
|
|
+ " 2. Run EXPLAIN on the slowest query to identify full table scans or missing indexes\n",
|
|
|
|
|
+ " 3. Temporary mitigation: increase pool_size from 10 to 20 via config hot-reload\n",
|
|
|
|
|
+ " 4. Permanent fix: add the missing index. Example: CREATE INDEX idx_orders_user_id ON orders(user_id)\n",
|
|
|
|
|
+ " 5. Validate index creation: run EXPLAIN again to confirm index is used\n",
|
|
|
|
|
+ " 6. Restart pods once index is verified: kubectl rollout restart deployment/checkout-service\n",
|
|
|
|
|
+ " 7. Monitor db_pool_active_connections and http_request_duration_p99 to confirm recovery\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "from src.tools.log_search_tool import LogSearchTool\n",
|
|
|
|
|
+ "from src.tools.metric_query_tool import MetricQueryTool\n",
|
|
|
|
|
+ "from src.tools.runbook_tool import RunbookLookupTool\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "# --- Quick tool smoke test ---\n",
|
|
|
|
|
+ "incident_path = project_root / \"data\" / \"incidents\" / \"db_pool_exhaustion.json\"\n",
|
|
|
|
|
+ "with open(incident_path) as f:\n",
|
|
|
|
|
+ " sample_incident = json.load(f)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "log_tool = LogSearchTool(sample_incident)\n",
|
|
|
|
|
+ "metric_tool = MetricQueryTool(sample_incident)\n",
|
|
|
|
|
+ "runbook_tool = RunbookLookupTool(\n",
|
|
|
|
|
+ " service=\"checkout-service\",\n",
|
|
|
|
|
+ " runbooks_dir=str(project_root / \"data\" / \"runbooks\")\n",
|
|
|
|
|
+ ")\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "print(\"🔧 LogSearchTool — searching for 'pool':\")\n",
|
|
|
|
|
+ "print(log_tool.run(\"pool\"))\n",
|
|
|
|
|
+ "print()\n",
|
|
|
|
|
+ "print(\"📊 MetricQueryTool — querying 'db_pool':\")\n",
|
|
|
|
|
+ "print(metric_tool.run(\"db_pool\"))\n",
|
|
|
|
|
+ "print()\n",
|
|
|
|
|
+ "print(\"📖 RunbookLookupTool — pattern 'pool exhausted':\")\n",
|
|
|
|
|
+ "print(runbook_tool.run(\"pool exhausted\"))"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "---\n",
|
|
|
|
|
+ "## Part 4: Agent Construction\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Three agents implementing different paradigms from the Hello-Agents curriculum:\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "- **TriageAgent** — Plan-and-Solve (Chapter 4): LLM creates a structured investigation plan\n",
|
|
|
|
|
+ "- **InvestigationAgent** — ReAct (Chapter 4): iterative Reason-Act-Observe loop with tools\n",
|
|
|
|
|
+ "- **PostmortemAgent** — Reflection (Chapter 4): draft → critique → revise"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 4,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "LLM OK\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "LLM connection test: LLM OK\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Available incident fixtures: ['db_pool_exhaustion', 'memory_leak_oom', 'external_api_ratelimit']\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "from src.core.llm_client import HelloAgentsLLM\n",
|
|
|
|
|
+ "from src.agents.triage_agent import TriageAgent\n",
|
|
|
|
|
+ "from src.agents.investigation_agent import InvestigationAgent\n",
|
|
|
|
|
+ "from src.agents.postmortem_agent import PostmortemAgent\n",
|
|
|
|
|
+ "from src.agents.pipeline import run_pipeline, list_incidents, load_incident\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "# Verify LLM connection\n",
|
|
|
|
|
+ "llm = HelloAgentsLLM(verbose=True)\n",
|
|
|
|
|
+ "test_response = llm.think([{\"role\": \"user\", \"content\": \"Reply with exactly: LLM OK\"}])\n",
|
|
|
|
|
+ "print(f\"\\nLLM connection test: {test_response.strip()}\")\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "print(f\"\\nAvailable incident fixtures: {list_incidents()}\")"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "---\n",
|
|
|
|
|
+ "## Part 5: Demo — Full Pipeline\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "### Demo 1: DB Connection Pool Exhaustion (INC-001)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "**Alert:** `checkout-service` P99 latency 8.3s > threshold 1.0s \n",
|
|
|
|
|
+ "**Root cause (ground truth):** Missing index on `orders.user_id` → full table scan → DB pool exhaustion"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 5,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "📣 Incident Alert:\n",
|
|
|
|
|
+ "{\n",
|
|
|
|
|
+ " \"incident_id\": \"INC-001\",\n",
|
|
|
|
|
+ " \"service\": \"checkout-service\",\n",
|
|
|
|
|
+ " \"severity\": \"P1\",\n",
|
|
|
|
|
+ " \"alert\": {\n",
|
|
|
|
|
+ " \"metric\": \"http_request_duration_p99\",\n",
|
|
|
|
|
+ " \"value\": 8.3,\n",
|
|
|
|
|
+ " \"threshold\": 1.0,\n",
|
|
|
|
|
+ " \"unit\": \"seconds\",\n",
|
|
|
|
|
+ " \"timestamp\": \"2024-01-15T14:03:22Z\",\n",
|
|
|
|
|
+ " \"description\": \"P99 request latency critically exceeds SLO threshold\"\n",
|
|
|
|
|
+ " }\n",
|
|
|
|
|
+ "}\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "# Show the incident alert\n",
|
|
|
|
|
+ "incident_1 = load_incident(\"db_pool_exhaustion\")\n",
|
|
|
|
|
+ "print(\"📣 Incident Alert:\")\n",
|
|
|
|
|
+ "print(json.dumps({\n",
|
|
|
|
|
+ " \"incident_id\": incident_1[\"incident_id\"],\n",
|
|
|
|
|
+ " \"service\": incident_1[\"service\"],\n",
|
|
|
|
|
+ " \"severity\": incident_1[\"severity\"],\n",
|
|
|
|
|
+ " \"alert\": incident_1[\"alert\"]\n",
|
|
|
|
|
+ "}, indent=2))"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": null,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🚨 STAGE 1: TRIAGE — Generating investigation plan\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "```python\n",
|
|
|
|
|
+ "[\n",
|
|
|
|
|
+ " {\"tool\": \"metric_query\", \"query\": \"http_request_duration_p99\", \"reason\": \"Verify the current value of the metric that triggered the alert\"},\n",
|
|
|
|
|
+ " {\"tool\": \"log_search\", \"query\": \"checkout-service ERROR\", \"reason\": \"Find all error-level log entries from the checkout-service to identify the failure pattern\"},\n",
|
|
|
|
|
+ " {\"tool\": \"metric_query\", \"query\": \"http_request_count\", \"reason\": \"Check if there's an unusual increase in request volume that could be contributing to the latency\"},\n",
|
|
|
|
|
+ " {\"tool\": \"log_search\", \"query\": \"checkout-service WARNING|ERROR\", \"reason\": \"Find all warning and error-level log entries from the checkout-service to identify potential issues\"},\n",
|
|
|
|
|
+ " {\"tool\": \"runbook_lookup\", \"query\": \"high latency checkout-service\", \"reason\": \"Get standard remediation steps for high latency issues in the checkout-service\"}\n",
|
|
|
|
|
+ "]\n",
|
|
|
|
|
+ "```\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✅ Investigation plan (5 steps):\n",
|
|
|
|
|
+ " 1. [metric_query] http_request_duration_p99 — Verify the current value of the metric that triggered the alert\n",
|
|
|
|
|
+ " 2. [log_search] checkout-service ERROR — Find all error-level log entries from the checkout-service to identify the failure pattern\n",
|
|
|
|
|
+ " 3. [metric_query] http_request_count — Check if there's an unusual increase in request volume that could be contributing to the latency\n",
|
|
|
|
|
+ " 4. [log_search] checkout-service WARNING|ERROR — Find all warning and error-level log entries from the checkout-service to identify potential issues\n",
|
|
|
|
|
+ " 5. [runbook_lookup] high latency checkout-service — Get standard remediation steps for high latency issues in the checkout-service\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🔍 STAGE 2: INVESTIGATION — ReAct tool loop\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 1 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: The incident is related to high latency in the checkout-service, with a P99 request latency of 8.3 seconds, which is critically exceeding the SLO threshold. To start investigating, I need to verify the current value of the metric that triggered the alert to confirm if the issue is still ongoing. This will help me understand if the problem is persistent or if it was a temporary spike.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: metric_query[http_request_duration_p99]\n",
|
|
|
|
|
+ "💭 Thought: The incident is related to high latency in the checkout-service, with a P99 request latency of 8.3 seconds, which is critically exceeding the SLO threshold. To start investigating, I need to verify the current value of the metric that triggered the alert to confirm if the issue is still ongoing. This will help me understand if the problem is persistent or if it was a temporary spike.\n",
|
|
|
|
|
+ "🔧 Action: metric_query[http_request_duration_p99]\n",
|
|
|
|
|
+ "👀 Observation: Metrics for checkout-service matching 'http_request_duration_p99':\n",
|
|
|
|
|
+ " http_request_duration_p99: [14:00: 0.3 | 14:01: 1.2 | 14:02: 4.7 | 14:03: 8.3]\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 2 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: The current value of http_request_duration_p99 has been verified, and it indeed exceeded the threshold, triggering the alert. The next step is to investigate if there are any error-level log entries from the checkout-service that could indicate the cause of the high latency. This will help identify any failure patterns or potential issues.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: log_search[checkout-service ERROR]\n",
|
|
|
|
|
+ "💭 Thought: The current value of http_request_duration_p99 has been verified, and it indeed exceeded the threshold, triggering the alert. The next step is to investigate if there are any error-level log entries from the checkout-service that could indicate the cause of the high latency. This will help identify any failure patterns or potential issues.\n",
|
|
|
|
|
+ "🔧 Action: log_search[checkout-service ERROR]\n",
|
|
|
|
|
+ "👀 Observation: No log entries found matching 'checkout-service ERROR' in checkout-service logs.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 3 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have verified that the http_request_duration_p99 metric is currently high, and we didn't find any error-level log entries in the checkout-service logs. The next step is to check if there's an unusual increase in request volume that could be contributing to the latency. This will help us understand if the issue is related to the service's capacity to handle requests or if it's a problem with the service itself.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: metric_query[http_request_count]\n",
|
|
|
|
|
+ "💭 Thought: We have verified that the http_request_duration_p99 metric is currently high, and we didn't find any error-level log entries in the checkout-service logs. The next step is to check if there's an unusual increase in request volume that could be contributing to the latency. This will help us understand if the issue is related to the service's capacity to handle requests or if it's a problem with the service itself.\n",
|
|
|
|
|
+ "🔧 Action: metric_query[http_request_count]\n",
|
|
|
|
|
+ "👀 Observation: No metrics found matching 'http_request_count' for checkout-service.\n",
|
|
|
|
|
+ "Available metrics: http_request_duration_p99, db_pool_active_connections, db_query_duration_p99_ms, request_queue_depth\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 4 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have already verified the current value of the metric that triggered the alert, which is the http_request_duration_p99, and it's indeed higher than the threshold. We also searched for error-level log entries from the checkout-service but found none. Additionally, we checked for an unusual increase in request volume by querying the http_request_count metric, but it seems that metric is not available. However, we did find other available metrics such as db_pool_active_connections, db_query_duration_p99_ms, and request_queue_depth. Our next step should be to find all warning and error-level log entries from the checkout-service to identify potential issues.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: log_search[checkout-service WARNING|ERROR]\n",
|
|
|
|
|
+ "💭 Thought: We have already verified the current value of the metric that triggered the alert, which is the http_request_duration_p99, and it's indeed higher than the threshold. We also searched for error-level log entries from the checkout-service but found none. Additionally, we checked for an unusual increase in request volume by querying the http_request_count metric, but it seems that metric is not available. However, we did find other available metrics such as db_pool_active_connections, db_query_duration_p99_ms, and request_queue_depth. Our next step should be to find all warning and error-level log entries from the checkout-service to identify potential issues.\n",
|
|
|
|
|
+ "🔧 Action: log_search[checkout-service WARNING|ERROR]\n",
|
|
|
|
|
+ "👀 Observation: Found 4 log entries matching 'checkout-service WARNING|ERROR':\n",
|
|
|
|
|
+ "[2024-01-15T14:01:45Z] [ERROR ] DB connection pool exhausted — waiting for available connection (timeout=30s)\n",
|
|
|
|
|
+ "[2024-01-15T14:02:00Z] [ERROR ] Query timeout: SELECT * FROM orders WHERE user_id=12345 (full table scan, duration=30s)\n",
|
|
|
|
|
+ "[20...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 5 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries suggest a DB connection pool exhaustion issue, which could be contributing to the high latency. To further investigate, we should look up standard remediation steps for high latency issues in the checkout-service, specifically related to DB connection pool exhaustion.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "💭 Thought: We have investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries suggest a DB connection pool exhaustion issue, which could be contributing to the high latency. To further investigate, we should look up standard remediation steps for high latency issues in the checkout-service, specifically related to DB connection pool exhaustion.\n",
|
|
|
|
|
+ "🔧 Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "👀 Observation: Runbook: checkout-service (v1.2)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Pattern: 'DB pool exhausted' | Severity: P1\n",
|
|
|
|
|
+ "Remediation steps:\n",
|
|
|
|
|
+ " 1. Identify slow queries: run SHOW PROCESSLIST on the primary DB; look for queries >5s\n",
|
|
|
|
|
+ " 2. Run EXPLAIN on the slowest query to identify full table scans or missing indexes\n",
|
|
|
|
|
+ " 3. Temporary mitigation: i...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 6 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have already investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries revealed a pattern of DB connection pool exhaustion and query timeouts, which could be contributing to the high latency. The runbook lookup provided remediation steps for high latency issues in the checkout-service, including identifying slow queries and increasing the pool size. However, we still need to verify if there's an unusual increase in request volume that could be contributing to the latency, but since the metric_query for http_request_count didn't return any results, we should try to find an alternative metric that can give us insight into the request volume.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: metric_query[request_queue_depth]\n",
|
|
|
|
|
+ "💭 Thought: We have already investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries revealed a pattern of DB connection pool exhaustion and query timeouts, which could be contributing to the high latency. The runbook lookup provided remediation steps for high latency issues in the checkout-service, including identifying slow queries and increasing the pool size. However, we still need to verify if there's an unusual increase in request volume that could be contributing to the latency, but since the metric_query for http_request_count didn't return any results, we should try to find an alternative metric that can give us insight into the request volume.\n",
|
|
|
|
|
+ "🔧 Action: metric_query[request_queue_depth]\n",
|
|
|
|
|
+ "👀 Observation: Metrics for checkout-service matching 'request_queue_depth':\n",
|
|
|
|
|
+ " request_queue_depth: [14:00: 12 | 14:01: 85 | 14:02: 220 | 14:03: 450]\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 7 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have already investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries revealed a pattern of DB connection pool exhaustion and query timeouts, which could be contributing to the high latency. The request queue depth metric also showed a significant increase, which could be related to the DB connection pool exhaustion. We should now look up standard remediation steps for high latency issues in the checkout-service, specifically for the DB connection pool exhausted error pattern.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "💭 Thought: We have already investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries revealed a pattern of DB connection pool exhaustion and query timeouts, which could be contributing to the high latency. The request queue depth metric also showed a significant increase, which could be related to the DB connection pool exhaustion. We should now look up standard remediation steps for high latency issues in the checkout-service, specifically for the DB connection pool exhausted error pattern.\n",
|
|
|
|
|
+ "🔧 Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "👀 Observation: Runbook: checkout-service (v1.2)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Pattern: 'DB pool exhausted' | Severity: P1\n",
|
|
|
|
|
+ "Remediation steps:\n",
|
|
|
|
|
+ " 1. Identify slow queries: run SHOW PROCESSLIST on the primary DB; look for queries >5s\n",
|
|
|
|
|
+ " 2. Run EXPLAIN on the slowest query to identify full table scans or missing indexes\n",
|
|
|
|
|
+ " 3. Temporary mitigation: i...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 8 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have already investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries revealed a pattern of DB connection pool exhaustion and query timeouts. The request_queue_depth metric is also increasing, which could be contributing to the latency. We should now look up standard remediation steps for high latency issues in the checkout-service, specifically focusing on the DB connection pool exhaustion pattern.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "💭 Thought: We have already investigated the current value of the metric that triggered the alert, searched for error-level log entries, checked for unusual increases in request volume, and searched for warning and error-level log entries. The error-level log entries revealed a pattern of DB connection pool exhaustion and query timeouts. The request_queue_depth metric is also increasing, which could be contributing to the latency. We should now look up standard remediation steps for high latency issues in the checkout-service, specifically focusing on the DB connection pool exhaustion pattern.\n",
|
|
|
|
|
+ "🔧 Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "👀 Observation: Runbook: checkout-service (v1.2)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Pattern: 'DB pool exhausted' | Severity: P1\n",
|
|
|
|
|
+ "Remediation steps:\n",
|
|
|
|
|
+ " 1. Identify slow queries: run SHOW PROCESSLIST on the primary DB; look for queries >5s\n",
|
|
|
|
|
+ " 2. Run EXPLAIN on the slowest query to identify full table scans or missing indexes\n",
|
|
|
|
|
+ " 3. Temporary mitigation: i...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 9 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have already completed several steps in the investigation plan, including verifying the current value of http_request_duration_p99, searching for error-level log entries, checking for unusual increases in request volume, and searching for warning and error-level log entries. The log search revealed error messages related to DB connection pool exhaustion and query timeouts, which suggests that the database may be the bottleneck. The request_queue_depth metric also showed a significant increase, indicating a potential overload. To further investigate, we should follow the next step in the plan, which is to get standard remediation steps for high latency issues in the checkout-service.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: runbook_lookup[high latency checkout-service]\n",
|
|
|
|
|
+ "💭 Thought: We have already completed several steps in the investigation plan, including verifying the current value of http_request_duration_p99, searching for error-level log entries, checking for unusual increases in request volume, and searching for warning and error-level log entries. The log search revealed error messages related to DB connection pool exhaustion and query timeouts, which suggests that the database may be the bottleneck. The request_queue_depth metric also showed a significant increase, indicating a potential overload. To further investigate, we should follow the next step in the plan, which is to get standard remediation steps for high latency issues in the checkout-service.\n",
|
|
|
|
|
+ "🔧 Action: runbook_lookup[high latency checkout-service]\n",
|
|
|
|
|
+ "👀 Observation: Runbook: checkout-service (v1.2)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Pattern: 'DB pool exhausted' | Severity: P1\n",
|
|
|
|
|
+ "Remediation steps:\n",
|
|
|
|
|
+ " 1. Identify slow queries: run SHOW PROCESSLIST on the primary DB; look for queries >5s\n",
|
|
|
|
|
+ " 2. Run EXPLAIN on the slowest query to identify full table scans or missing indexes\n",
|
|
|
|
|
+ " 3. Temporary mitigation: i...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 10 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have already investigated the current value of the metric that triggered the alert, which is http_request_duration_p99, and found that it has increased significantly. We also searched for error-level log entries from the checkout-service and found none initially, but when we searched for warning and error-level log entries, we found several entries indicating a DB connection pool exhaustion issue. Additionally, we checked the request count metric but found that it was not available, however, we did find the request_queue_depth metric which showed a significant increase. Given this information, our next step should be to look up standard remediation steps for high latency issues in the checkout-service, specifically for the DB connection pool exhausted error pattern.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "💭 Thought: We have already investigated the current value of the metric that triggered the alert, which is http_request_duration_p99, and found that it has increased significantly. We also searched for error-level log entries from the checkout-service and found none initially, but when we searched for warning and error-level log entries, we found several entries indicating a DB connection pool exhaustion issue. Additionally, we checked the request count metric but found that it was not available, however, we did find the request_queue_depth metric which showed a significant increase. Given this information, our next step should be to look up standard remediation steps for high latency issues in the checkout-service, specifically for the DB connection pool exhausted error pattern.\n",
|
|
|
|
|
+ "🔧 Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "👀 Observation: Runbook: checkout-service (v1.2)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Pattern: 'DB pool exhausted' | Severity: P1\n",
|
|
|
|
|
+ "Remediation steps:\n",
|
|
|
|
|
+ " 1. Identify slow queries: run SHOW PROCESSLIST on the primary DB; look for queries >5s\n",
|
|
|
|
|
+ " 2. Run EXPLAIN on the slowest query to identify full table scans or missing indexes\n",
|
|
|
|
|
+ " 3. Temporary mitigation: i...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 11 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: We have already investigated the current value of the metric that triggered the alert, which is http_request_duration_p99, and found that it has increased significantly. We also searched for error-level log entries from the checkout-service but found none initially. However, when we searched for both warning and error-level log entries, we found several entries indicating a DB connection pool exhaustion issue. Additionally, we checked the request count metric but found that the actual metric available is request_queue_depth, which is also showing an unusual increase. Given this information, our next step should be to follow the investigation plan and check for unusual increases in request volume that could be contributing to the latency by querying the request_queue_depth metric, which we have already done. Now, we should look up standard remediation steps for high latency issues in the checkout-service, specifically focusing on the DB connection pool exhausted pattern.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "💭 Thought: We have already investigated the current value of the metric that triggered the alert, which is http_request_duration_p99, and found that it has increased significantly. We also searched for error-level log entries from the checkout-service but found none initially. However, when we searched for both warning and error-level log entries, we found several entries indicating a DB connection pool exhaustion issue. Additionally, we checked the request count metric but found that the actual metric available is request_queue_depth, which is also showing an unusual increase. Given this information, our next step should be to follow the investigation plan and check for unusual increases in request volume that could be contributing to the latency by querying the request_queue_depth metric, which we have already done. Now, we should look up standard remediation steps for high latency issues in the checkout-service, specifically focusing on the DB connection pool exhausted pattern.\n",
|
|
|
|
|
+ "🔧 Action: runbook_lookup[high latency checkout-service DB connection pool exhausted]\n",
|
|
|
|
|
+ "👀 Observation: Runbook: checkout-service (v1.2)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Pattern: 'DB pool exhausted' | Severity: P1\n",
|
|
|
|
|
+ "Remediation steps:\n",
|
|
|
|
|
+ " 1. Identify slow queries: run SHOW PROCESSLIST on the primary DB; look for queries >5s\n",
|
|
|
|
|
+ " 2. Run EXPLAIN on the slowest query to identify full table scans or missing indexes\n",
|
|
|
|
|
+ " 3. Temporary mitigation: i...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 12 ---\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "Thought: The incident context indicates a P1 severity issue with the checkout-service, where the P99 request latency has exceeded the SLO threshold. The current value of http_request_duration_p99 has been verified to be 8.3 seconds, which is above the threshold. The next step is to find all error-level log entries from the checkout-service to identify the failure pattern.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Action: log_search[checkout-service ERROR]\n",
|
|
|
|
|
+ "💭 Thought: The incident context indicates a P1 severity issue with the checkout-service, where the P99 request latency has exceeded the SLO threshold. The current value of http_request_duration_p99 has been verified to be 8.3 seconds, which is above the threshold. The next step is to find all error-level log entries from the checkout-service to identify the failure pattern.\n",
|
|
|
|
|
+ "🔧 Action: log_search[checkout-service ERROR]\n",
|
|
|
|
|
+ "👀 Observation: No log entries found matching 'checkout-service ERROR' in checkout-service logs.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "📝 STAGE 3: POST-MORTEM — Reflection (draft → critique → revise)\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n",
|
|
|
|
|
+ "## Executive Summary\n",
|
|
|
|
|
+ "On January 15, 2024, at 14:03:22Z, the checkout-service experienced a critical incident with P99 request latency exceeding the SLO threshold of 1.0 seconds, reaching 8.3 seconds. This incident affected 2847 users and was resolved by identifying and mitigating a DB connection pool exhaustion issue. The root cause was traced back to a missing index on the orders table, leading to slow queries and eventual pool exhaustion.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Incident Timeline\n",
|
|
|
|
|
+ "* 14:00:00Z - http_request_duration_p99 metric at 0.3 seconds, within normal range\n",
|
|
|
|
|
+ "* 14:01:00Z - http_request_duration_p99 metric increases to 1.2 seconds\n",
|
|
|
|
|
+ "* 14:01:45Z - [ERROR] DB connection pool exhausted — waiting for available connection (timeout=30s)\n",
|
|
|
|
|
+ "* 14:02:00Z - [ERROR] Query timeout: SELECT * FROM orders WHERE user_id=12345 (full table scan, duration=30s)\n",
|
|
|
|
|
+ "* 14:02:01Z - [ERROR] DB connection pool exhausted — waiting for available connection (timeout=30s)\n",
|
|
|
|
|
+ "* 14:02:30Z - [ERROR] SQLSTATE[HY000]: General error: too many connections\n",
|
|
|
|
|
+ "* 14:03:22Z - Alert triggered for P99 request latency exceeding SLO threshold\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Root Cause Analysis\n",
|
|
|
|
|
+ "1. Symptom: P99 request latency exceeded the SLO threshold of 1.0 seconds.\n",
|
|
|
|
|
+ "Why? The http_request_duration_p99 metric was increasing over time, indicating a performance issue.\n",
|
|
|
|
|
+ "2. Why was the http_request_duration_p99 metric increasing? The request_queue_depth metric was also increasing, reaching 450, indicating a backlog of requests.\n",
|
|
|
|
|
+ "3. Why was the request_queue_depth metric increasing? The DB connection pool was exhausted, causing requests to wait for available connections.\n",
|
|
|
|
|
+ "4. Why was the DB connection pool exhausted? Slow queries, such as the one at 14:02:00Z, were causing the pool to be depleted.\n",
|
|
|
|
|
+ "5. Why were slow queries occurring? A missing index on the orders table was leading to full table scans, resulting in slow query performance.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Impact Assessment\n",
|
|
|
|
|
+ "* Severity: P1\n",
|
|
|
|
|
+ "* Affected users: 2847\n",
|
|
|
|
|
+ "* Duration estimate: Approximately 3 minutes (from 14:00:00Z to 14:03:22Z)\n",
|
|
|
|
|
+ "* Business impact: The incident resulted in a significant increase in request latency, potentially leading to a poor user experience and lost sales.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Immediate Remediation Steps\n",
|
|
|
|
|
+ "1. Increase the DB connection pool size from 10 to 20 via config hot-reload to provide temporary relief.\n",
|
|
|
|
|
+ "2. Identify slow queries using SHOW PROCESSLIST on the primary DB and look for queries >5s.\n",
|
|
|
|
|
+ "3. Run EXPLAIN on the slowest query to identify full table scans or missing indexes.\n",
|
|
|
|
|
+ "4. Add a missing index on the orders table, such as CREATE INDEX idx_orders_user_id ON orders(user_id).\n",
|
|
|
|
|
+ "5. Validate index creation by running EXPLAIN again to confirm the index is used.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Action Items\n",
|
|
|
|
|
+ "| Action | Owner | Due Date | Priority |\n",
|
|
|
|
|
+ "| --- | --- | --- | --- |\n",
|
|
|
|
|
+ "| Add missing index on orders table | Database Team | 2024-01-16 | High |\n",
|
|
|
|
|
+ "| Implement pod autoscaling for checkout-service | Infrastructure Team | 2024-01-17 | Medium |\n",
|
|
|
|
|
+ "| Enable slow query logging on DB | Database Team | 2024-01-18 | Low |\n",
|
|
|
|
|
+ "| Review and optimize DB queries | Database Team | 2024-01-19 | Medium |\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "## Lessons Learned\n",
|
|
|
|
|
+ "* The importance of monitoring DB connection pool usage and request queue depth to detect potential performance issues.\n",
|
|
|
|
|
+ "* The need for regular review and optimization of DB queries to prevent slow query performance.\n",
|
|
|
|
|
+ "* The value of having a runbook with clear remediation steps for common issues, such as DB connection pool exhaustion.\n",
|
|
|
|
|
+ "* The importance of implementing pod autoscaling to handle increased load and prevent request queue buildup.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✍️ Draft post-mortem written.\n",
|
|
|
|
|
+ "🧠 Calling llama-3.3-70b-versatile...\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "# Run the full pipeline\n",
|
|
|
|
|
+ "start = time.time()\n",
|
|
|
|
|
+ "result_1 = run_pipeline(\"db_pool_exhaustion\", verbose=True)\n",
|
|
|
|
|
+ "elapsed_1 = round(time.time() - start, 1)\n",
|
|
|
|
|
+ "print(f\"\\n⏱️ Pipeline completed in {elapsed_1}s\")"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 38,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "data": {
|
|
|
|
|
+ "text/markdown": [],
|
|
|
|
|
+ "text/plain": [
|
|
|
|
|
+ "<IPython.core.display.Markdown object>"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "output_type": "display_data"
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "# Display the generated post-mortem report\n",
|
|
|
|
|
+ "from IPython.display import Markdown, display\n",
|
|
|
|
|
+ "display(Markdown(result_1[\"report\"]))"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "### Demo 2: External API Rate Limit Cascade (INC-003)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "**Alert:** `payment-service` success rate 23.4% < threshold 95% \n",
|
|
|
|
|
+ "**Root cause (ground truth):** Promotional traffic spike + no exponential backoff → retry storm → Stripe 429 cascade"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 39,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🚨 STAGE 1: TRIAGE — Generating investigation plan\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🧠 Calling gpt-4o-mini...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261177927272665835401)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Could not parse structured plan; using fallback plan.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🔍 STAGE 2: INVESTIGATION — ReAct tool loop\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 1 ---\n",
|
|
|
|
|
+ "🧠 Calling gpt-4o-mini...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261185271520637880320)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Empty LLM response, stopping.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "📝 STAGE 3: POST-MORTEM — Reflection (draft → critique → revise)\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🧠 Calling gpt-4o-mini...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261192673164517328253)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✍️ Draft post-mortem written.\n",
|
|
|
|
|
+ "🧠 Calling gpt-4o-mini...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261199902531542463148)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "🔍 Critique (revision 1):\n",
|
|
|
|
|
+ "...\n",
|
|
|
|
|
+ " Quality score: 7/10\n",
|
|
|
|
|
+ " Revising post-mortem (score 7 < 8)...\n",
|
|
|
|
|
+ "🧠 Calling gpt-4o-mini...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 202604270626127809566451572748)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ " Revision 1 complete.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✅ Final post-mortem ready.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "⏱️ Pipeline completed in 0.4s\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "data": {
|
|
|
|
|
+ "text/markdown": [],
|
|
|
|
|
+ "text/plain": [
|
|
|
|
|
+ "<IPython.core.display.Markdown object>"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "output_type": "display_data"
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "start = time.time()\n",
|
|
|
|
|
+ "result_3 = run_pipeline(\"external_api_ratelimit\", verbose=True)\n",
|
|
|
|
|
+ "elapsed_3 = round(time.time() - start, 1)\n",
|
|
|
|
|
+ "print(f\"\\n⏱️ Pipeline completed in {elapsed_3}s\")\n",
|
|
|
|
|
+ "display(Markdown(result_3[\"report\"]))"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "---\n",
|
|
|
|
|
+ "## Part 6: Performance Evaluation\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Run all 3 incident fixtures and measure: pipeline speed, root cause accuracy, and report quality."
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "code",
|
|
|
|
|
+ "execution_count": 40,
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "outputs": [
|
|
|
|
|
+ {
|
|
|
|
|
+ "name": "stdout",
|
|
|
|
|
+ "output_type": "stream",
|
|
|
|
|
+ "text": [
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🚨 STAGE 1: TRIAGE — Generating investigation plan\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261223482086656223927)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Could not parse structured plan; using fallback plan.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🔍 STAGE 2: INVESTIGATION — ReAct tool loop\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 1 ---\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261231052236455543539)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Empty LLM response, stopping.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "📝 STAGE 3: POST-MORTEM — Reflection (draft → critique → revise)\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261240102403213861415)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✍️ Draft post-mortem written.\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261247825417010034927)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "🔍 Critique (revision 1):\n",
|
|
|
|
|
+ "...\n",
|
|
|
|
|
+ " Quality score: 7/10\n",
|
|
|
|
|
+ " Revising post-mortem (score 7 < 8)...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261256116126829506230)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ " Revision 1 complete.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✅ Final post-mortem ready.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🚨 STAGE 1: TRIAGE — Generating investigation plan\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261273255223804179156)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Could not parse structured plan; using fallback plan.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🔍 STAGE 2: INVESTIGATION — ReAct tool loop\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 1 ---\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261281273660804723994)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Empty LLM response, stopping.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "📝 STAGE 3: POST-MORTEM — Reflection (draft → critique → revise)\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261289003520739111729)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✍️ Draft post-mortem written.\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261297094600607657398)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "🔍 Critique (revision 1):\n",
|
|
|
|
|
+ "...\n",
|
|
|
|
|
+ " Quality score: 7/10\n",
|
|
|
|
|
+ " Revising post-mortem (score 7 < 8)...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 202604270626134677146634593157)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ " Revision 1 complete.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✅ Final post-mortem ready.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🚨 STAGE 1: TRIAGE — Generating investigation plan\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261320092757163126141)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Could not parse structured plan; using fallback plan.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "🔍 STAGE 2: INVESTIGATION — ReAct tool loop\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "--- Step 1 ---\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261328430175304565034)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "⚠️ Empty LLM response, stopping.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "📝 STAGE 3: POST-MORTEM — Reflection (draft → critique → revise)\n",
|
|
|
|
|
+ "============================================================\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261336639858400171735)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✍️ Draft post-mortem written.\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261344133585616970293)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "🔍 Critique (revision 1):\n",
|
|
|
|
|
+ "...\n",
|
|
|
|
|
+ " Quality score: 7/10\n",
|
|
|
|
|
+ " Revising post-mortem (score 7 < 8)...\n",
|
|
|
|
|
+ "❌ LLM API error: Error code: 403 - {'error': {'message': 'Your account balance is insufficient. Please recharge your account to continue using the API. (tid: 2026042706261351413276818741033)', 'type': 'Aihubmix_api_error', 'param': '', 'code': 'insufficient_user_quota'}}\n",
|
|
|
|
|
+ " Revision 1 complete.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "✅ Final post-mortem ready.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "=== Evaluation Results ===\n",
|
|
|
|
|
+ " [❌] db_pool_exhaustion | 0.5s | ...\n",
|
|
|
|
|
+ " [❌] memory_leak_oom | 0.5s | ...\n",
|
|
|
|
|
+ " [❌] external_api_ratelimit | 0.5s | ...\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "Accuracy: 0% Avg pipeline time: 0.5s\n"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "GROUND_TRUTH = {\n",
|
|
|
|
|
+ " \"db_pool_exhaustion\": \"missing index\",\n",
|
|
|
|
|
+ " \"memory_leak_oom\": \"no TTL\",\n",
|
|
|
|
|
+ " \"external_api_ratelimit\": \"retry storm\",\n",
|
|
|
|
|
+ "}\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "eval_results = []\n",
|
|
|
|
|
+ "for incident_id in list_incidents():\n",
|
|
|
|
|
+ " t0 = time.time()\n",
|
|
|
|
|
+ " res = run_pipeline(incident_id, verbose=False)\n",
|
|
|
|
|
+ " elapsed = round(time.time() - t0, 1)\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ " ground_truth_keyword = GROUND_TRUTH.get(incident_id, \"\")\n",
|
|
|
|
|
+ " root_cause_text = res[\"findings\"].get(\"root_cause\", \"\").lower()\n",
|
|
|
|
|
+ " correct = ground_truth_keyword.lower() in root_cause_text\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ " eval_results.append({\n",
|
|
|
|
|
+ " \"incident_id\": incident_id,\n",
|
|
|
|
|
+ " \"service\": res[\"service\"],\n",
|
|
|
|
|
+ " \"severity\": res[\"severity\"],\n",
|
|
|
|
|
+ " \"elapsed_s\": elapsed,\n",
|
|
|
|
|
+ " \"root_cause_correct\": \"✅\" if correct else \"❌\",\n",
|
|
|
|
|
+ " \"root_cause_found\": res[\"findings\"].get(\"root_cause\", \"\")[:80] + \"...\",\n",
|
|
|
|
|
+ " })\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "print(\"\\n=== Evaluation Results ===\")\n",
|
|
|
|
|
+ "for r in eval_results:\n",
|
|
|
|
|
+ " print(f\" [{r['root_cause_correct']}] {r['incident_id']:30s} \"\n",
|
|
|
|
|
+ " f\"| {r['elapsed_s']:5.1f}s | {r['root_cause_found']}\")\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "accuracy = sum(1 for r in eval_results if r[\"root_cause_correct\"] == \"✅\") / len(eval_results)\n",
|
|
|
|
|
+ "avg_time = sum(r[\"elapsed_s\"] for r in eval_results) / len(eval_results)\n",
|
|
|
|
|
+ "print(f\"\\nAccuracy: {accuracy:.0%} Avg pipeline time: {avg_time:.1f}s\")"
|
|
|
|
|
+ ]
|
|
|
|
|
+ },
|
|
|
|
|
+ {
|
|
|
|
|
+ "cell_type": "markdown",
|
|
|
|
|
+ "metadata": {},
|
|
|
|
|
+ "source": [
|
|
|
|
|
+ "---\n",
|
|
|
|
|
+ "## Part 7: Summary and Outlook\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "### What We Built\n",
|
|
|
|
|
+ "- **TriageAgent (Plan-and-Solve)**: converts a raw alert into an ordered investigation plan, reducing hallucinated tool calls in the ReAct stage\n",
|
|
|
|
|
+ "- **InvestigationAgent (ReAct)**: iterates through log search → metric queries → runbook lookup to arrive at a root cause with evidence\n",
|
|
|
|
|
+ "- **PostmortemAgent (Reflection)**: produces a structured RCA report and self-critiques it against quality criteria before finalizing\n",
|
|
|
|
|
+ "- **FastAPI backend**: exposes the pipeline as a REST API, ready for frontend integration\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "### Agent Paradigms Demonstrated\n",
|
|
|
|
|
+ "| Agent | Paradigm | Chapter Reference |\n",
|
|
|
|
|
+ "|---|---|---|\n",
|
|
|
|
|
+ "| TriageAgent | Plan-and-Solve | Chapter 4 |\n",
|
|
|
|
|
+ "| InvestigationAgent | ReAct | Chapter 4 |\n",
|
|
|
|
|
+ "| PostmortemAgent | Reflection | Chapter 4 |\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "### Challenges and Lessons\n",
|
|
|
|
|
+ "1. **Structured output parsing**: LLMs don't always return valid JSON/Python lists. Robust parsing with fallback logic is essential.\n",
|
|
|
|
|
+ "2. **Tool design**: Tool descriptions must be precise — ambiguous descriptions cause the ReAct agent to call the wrong tool.\n",
|
|
|
|
|
+ "3. **Context window management**: Passing the full investigation history in each ReAct turn grows quickly. In production, use a sliding window or summarization.\n",
|
|
|
|
|
+ "4. **Reflection threshold**: The Reflection pattern only adds value if the critique is rigorous. Vague critique prompts produce vague improvements.\n",
|
|
|
|
|
+ "\n",
|
|
|
|
|
+ "### Future Plans\n",
|
|
|
|
|
+ "- [ ] **SSE streaming**: stream agent reasoning steps to a frontend in real-time\n",
|
|
|
|
|
+ "- [ ] **Vue/React frontend**: incident selector UI + live reasoning trace + markdown report viewer\n",
|
|
|
|
|
+ "- [ ] **Real log ingestion**: connect to actual log aggregators (Loki, CloudWatch, Datadog)\n",
|
|
|
|
|
+ "- [ ] **Memory across incidents**: vector store for past RCA reports to accelerate future investigations\n",
|
|
|
|
|
+ "- [ ] **Automated runbook execution**: give the agent permission to run safe remediation commands (kubectl, SQL index creation)"
|
|
|
|
|
+ ]
|
|
|
|
|
+ }
|
|
|
|
|
+ ],
|
|
|
|
|
+ "metadata": {
|
|
|
|
|
+ "kernelspec": {
|
|
|
|
|
+ "display_name": "Python 3",
|
|
|
|
|
+ "language": "python",
|
|
|
|
|
+ "name": "python3"
|
|
|
|
|
+ },
|
|
|
|
|
+ "language_info": {
|
|
|
|
|
+ "codemirror_mode": {
|
|
|
|
|
+ "name": "ipython",
|
|
|
|
|
+ "version": 3
|
|
|
|
|
+ },
|
|
|
|
|
+ "file_extension": ".py",
|
|
|
|
|
+ "mimetype": "text/x-python",
|
|
|
|
|
+ "name": "python",
|
|
|
|
|
+ "nbconvert_exporter": "python",
|
|
|
|
|
+ "pygments_lexer": "ipython3",
|
|
|
|
|
+ "version": "3.9.6"
|
|
|
|
|
+ }
|
|
|
|
|
+ },
|
|
|
|
|
+ "nbformat": 4,
|
|
|
|
|
+ "nbformat_minor": 4
|
|
|
|
|
+}
|