Epic 9 — In-App Observability
Covers per-run token usage, per-stage latency, MCP cache-hit ratio (Valkey amtp:mcp:tree:* and amtp:mcp:blob:* key spaces), and deep-links from the run timeline into Grafana/Tempo panels. This is a read-only surface; no mutations occur.
Personas: BU (consumer) OP (consumer + Grafana access)
Shared modules:
CorrelationChip
LastSyncedBadge
EnvProvenance
Story 9.1 — Per-Run Token Usage
- As a
- BU
- I want
- to see the total LLM token consumption for a run broken down by stage
- So that
- I can understand the cost profile of each pipeline execution
Scenario: Token usage panel renders on run detail
Giventhe user is viewing
/runs/{run_id} for a completed or in-progress run
WhenGET /runs/{run_id}/metrics resolves
Thena Token Usage panel is rendered (collapsed by default, Show token usage toggle) showing: total tokens (prompt + completion) and a breakdown table with one row per stage: stage name, prompt tokens, completion tokens, total tokens
Scenario: Token data unavailable for running stage
Givena stage's
status is still running
Thenthe row for that stage shows — in place of token counts and a note reads Stage in progress — token usage will appear when complete
| Endpoint / DB | Purpose |
|---|---|
GET /runs/{run_id}/metrics | Token usage, latency, and cache metrics |
DB stages.status | Guard for in-progress rows |
Story 9.2 — Per-Stage Latency
- As a
- BU
- I want
- to see the elapsed time for each stage in a run
- So that
- I can identify slow stages and estimate future run durations
Scenario: Latency breakdown renders
Giventhe run has at least one completed stage
When
GET /runs/{run_id}/metrics resolves
Thena Latency panel displays one row per stage: stage name, elapsed time (finished_at - started_at), attempt count; the longest-elapsed stage is visually highlighted
Scenario: Stage still running — live elapsed counter
Givena stage has
status = 'running'
Thenthe row shows a live elapsed counter incrementing every second; the final elapsed time replaces the live counter when the stage transitions to a terminal state
| Endpoint / DB | Purpose |
|---|---|
GET /runs/{run_id}/metrics | Includes per-stage latency |
DB stages.started_at, stages.finished_at, stages.attempt | Latency source |
Story 9.3 — MCP Cache-Hit Ratio
- As a
- BU
- I want
- to see the Valkey MCP cache-hit ratio for each run
- So that
- I can understand how effectively the platform avoids redundant GitHub API calls
Scenario: Cache-hit ratio panel renders
Giventhe user is viewing a run that used the GitHub MCP tool
When
GET /runs/{run_id}/metrics resolves with a mcp_cache block
Thenan MCP Cache panel shows: amtp:mcp:tree:* hits/misses/ratio, amtp:mcp:blob:* hits/misses/ratio, and overall cache-hit percentage
Scenario: Cache metrics unavailable — no MCP calls in this run
Giventhe run's stages did not invoke the GitHub MCP tool
Thenthe MCP Cache panel shows No MCP calls in this run rather than zero-filled metrics
Story 9.4 — Grafana and Tempo Deep-Links
- As an
- OP
- I want
- to open the relevant Grafana panel or Tempo trace for a run directly from the run detail page
- So that
- I can pivot from the AMTP UI to the observability stack without manually constructing URLs
Scenario: Grafana run panel link renders for OP
Giventhe user has the OP role and
GET /config provides the configured Grafana base URL
Whenthe run detail page renders
Thenan Open in Grafana link is shown in the observability section; the link navigates to the pre-scoped Grafana panel for run_id={run_id} in a new tab
Scenario: Tempo trace link for a specific stage
Giventhe user has the OP role and
GET /runs/{run_id}/metrics provides a trace_id for a stage
Thena View trace link is shown beside the stage row navigating to the Tempo trace URL scoped to trace_id (new tab)
Scenario: Grafana links absent for BU and AU roles
Giventhe user has BU or Auditor role
Thenthe Grafana and Tempo deep-links are absent from the DOM
| Endpoint / DB | Purpose |
|---|---|
GET /config | Grafana base URL, Tempo base URL |
GET /runs/{run_id}/metrics | Includes trace_id per stage |