Epic 9 — In-App Observability

Covers per-run token usage, per-stage latency, MCP cache-hit ratio (Valkey amtp:mcp:tree:* and amtp:mcp:blob:* key spaces), and deep-links from the run timeline into Grafana/Tempo panels. This is a read-only surface; no mutations occur.

Personas: BU (consumer) OP (consumer + Grafana access)

Shared modules: CorrelationChip LastSyncedBadge EnvProvenance

Story 9.1 — Per-Run Token Usage

As a
BU
I want
to see the total LLM token consumption for a run broken down by stage
So that
I can understand the cost profile of each pipeline execution
Scenario: Token usage panel renders on run detail
Giventhe user is viewing /runs/{run_id} for a completed or in-progress run WhenGET /runs/{run_id}/metrics resolves Thena Token Usage panel is rendered (collapsed by default, Show token usage toggle) showing: total tokens (prompt + completion) and a breakdown table with one row per stage: stage name, prompt tokens, completion tokens, total tokens
Scenario: Token data unavailable for running stage
Givena stage's status is still running Thenthe row for that stage shows in place of token counts and a note reads Stage in progress — token usage will appear when complete
Endpoint / DBPurpose
GET /runs/{run_id}/metricsToken usage, latency, and cache metrics
DB stages.statusGuard for in-progress rows

Story 9.2 — Per-Stage Latency

As a
BU
I want
to see the elapsed time for each stage in a run
So that
I can identify slow stages and estimate future run durations
Scenario: Latency breakdown renders
Giventhe run has at least one completed stage WhenGET /runs/{run_id}/metrics resolves Thena Latency panel displays one row per stage: stage name, elapsed time (finished_at - started_at), attempt count; the longest-elapsed stage is visually highlighted
Scenario: Stage still running — live elapsed counter
Givena stage has status = 'running' Thenthe row shows a live elapsed counter incrementing every second; the final elapsed time replaces the live counter when the stage transitions to a terminal state
Endpoint / DBPurpose
GET /runs/{run_id}/metricsIncludes per-stage latency
DB stages.started_at, stages.finished_at, stages.attemptLatency source

Story 9.3 — MCP Cache-Hit Ratio

As a
BU
I want
to see the Valkey MCP cache-hit ratio for each run
So that
I can understand how effectively the platform avoids redundant GitHub API calls
Scenario: Cache-hit ratio panel renders
Giventhe user is viewing a run that used the GitHub MCP tool WhenGET /runs/{run_id}/metrics resolves with a mcp_cache block Thenan MCP Cache panel shows: amtp:mcp:tree:* hits/misses/ratio, amtp:mcp:blob:* hits/misses/ratio, and overall cache-hit percentage
Scenario: Cache metrics unavailable — no MCP calls in this run
Giventhe run's stages did not invoke the GitHub MCP tool Thenthe MCP Cache panel shows No MCP calls in this run rather than zero-filled metrics

Story 9.4 — Grafana and Tempo Deep-Links

As an
OP
I want
to open the relevant Grafana panel or Tempo trace for a run directly from the run detail page
So that
I can pivot from the AMTP UI to the observability stack without manually constructing URLs
Scenario: Grafana run panel link renders for OP
Giventhe user has the OP role and GET /config provides the configured Grafana base URL Whenthe run detail page renders Thenan Open in Grafana link is shown in the observability section; the link navigates to the pre-scoped Grafana panel for run_id={run_id} in a new tab
Scenario: Tempo trace link for a specific stage
Giventhe user has the OP role and GET /runs/{run_id}/metrics provides a trace_id for a stage Thena View trace link is shown beside the stage row navigating to the Tempo trace URL scoped to trace_id (new tab)
Scenario: Grafana links absent for BU and AU roles
Giventhe user has BU or Auditor role Thenthe Grafana and Tempo deep-links are absent from the DOM
Endpoint / DBPurpose
GET /configGrafana base URL, Tempo base URL
GET /runs/{run_id}/metricsIncludes trace_id per stage