Epic 4 — Live Run Timeline
Covers real-time stage-status updates via Server-Sent Events (SSE), the ReconnectController circuit-breaker (3 disconnects / 60 s → Manual Reconnect Only), reconciliation snapshots to prevent ghost stages, TemporalLifecycleDetector for distinguishing network idle from Activity Heartbeat Timeout, stalled workflow detection, run cancellation, and stage-level error states (MalformedLlmOutput, SchemaValidationError).
Personas: BU AP OP
Shared modules:
ReconnectController
ReconciliationFetcher
TemporalLifecycleDetector
CorrelationChip
LastSyncedBadge
Story 4.1 — Real-Time Stage Updates via SSE
- As a
- BU
- I want
- to see each pipeline stage's status update in real time without refreshing
- So that
- I can monitor run progress and react quickly to failures or approval gates
Scenario: Stage status card updates live
Giventhe user is viewing
/runs/{run_id} and SSE /runs/{run_id}/events is open
Whenthe SSE stream emits a stage_update event for a stage
Thenthe stage card updates its status badge, attempt counter, and elapsed timer without a full page reload; the aria-live region announces the change
Scenario: Run transitions to terminal state
Whenthe SSE stream emits a
run_completed or run_failed event
Thenthe run status banner updates to the terminal state; the timeline is frozen; the SSE connection is closed by the client; the LastSyncedBadge switches from Live to Synced at: <time>
| Endpoint / DB | Purpose |
|---|---|
SSE /runs/{run_id}/events | Stage and run status events stream |
GET /runs/{run_id} | Initial page load snapshot |
DB stages.status, stages.attempt | Stage state source of truth |
Story 4.2 — Circuit-Breaker & Manual Reconnect
- As a
- BU
- I want
- the timeline to notify me and stop auto-reconnecting after repeated SSE failures
- So that
- I am not trapped in a silent reconnect loop consuming resources on an unstable connection
Scenario: Circuit-breaker trips after 3 disconnects in 60 s
Giventhe SSE connection has dropped and reconnected 3 times within 60 s
Whenthe fourth disconnect occurs
Then
ReconnectController enters Manual Reconnect Only state; auto-reconnect is suppressed; a banner reads: Live connection lost — click to reconnect manually with the CorrelationChip
Andthe LastSyncedBadge shows Synced at: <time> · Snapshot
AndReconciliationFetcher is suppressed in Manual Reconnect Only state
Scenario: Manual reconnect restores live stream
Whenthe user clicks Reconnect
Thenthe circuit-breaker counter resets;
ReconnectController re-opens the SSE connection; the banner is dismissed; ReconciliationFetcher fires once to re-sync state
Story 4.3 — Reconciliation Snapshot on Reconnect
- As a
- BU
- I want
- the timeline to resync with the server state on every reconnect
- So that
- stages completed during the disconnection gap are not ghosted as permanently running
Scenario: Reconciliation fetch removes ghost stages
Giventhe SSE connection dropped while stage S was
running; during the gap stage S transitioned to passed
Whenthe connection is restored (non-Manual-Reconnect-Only) and ReconciliationFetcher calls GET /runs/{run_id}
Thenstage S's card shows passed; no ghost running spinner remains; the LastSyncedBadge updates to Live
Scenario: Reconciliation suppressed in Manual Reconnect Only
Giventhe circuit-breaker is in Manual Reconnect Only state
Then
ReconciliationFetcher does not issue GET /runs/{run_id}; the timeline is frozen at last-known state with the snapshot badge
Story 4.4 — Network Idle vs Activity Heartbeat Timeout
- As a
- BU
- I want
- the timeline to distinguish a temporary network pause from a genuine Temporal Activity Heartbeat Timeout
- So that
- I am not shown a misleading stall warning during a brief internet outage
Scenario: Network Idle detected — no stall warning
Giventhe SSE stream goes silent but
TemporalLifecycleDetector classifies the silence as Network Idle
Thenno stall warning is displayed; the circuit-breaker increments normally; the stage cards remain in their last-known state
Scenario: Activity Heartbeat Timeout classified — stall banner shown
Given
TemporalLifecycleDetector classifies the silence as Activity Heartbeat Timeout
Thenthe offending stage card shows a Workflow stalled amber banner with elapsed stall duration and the CorrelationChip; OP role additionally sees the Force terminate action (cross-link to Epic 14)
Story 4.5 — Activity Heartbeat Timeout Stage Card
- As a
- OP
- I want
- to see a dedicated Activity Heartbeat Timeout stage state with actionable controls
- So that
- I can decide whether to wait for Temporal's retry or force-terminate the stalled workflow
Scenario: Stage card shows timeout state
Givena stage has exceeded its
heartbeat_timeout and Temporal is attempting a retry
Whenthe SSE stream emits an activity_heartbeat_timeout event for the stage
Thenthe stage card badge changes to Heartbeat Timeout; the attempt counter increments; for OP role the Force terminate stalled workflow button appears (cross-link to Epic 14 Story 14.3)
Story 4.6 — Run Cancellation
- As a
- BU
- I want
- to cancel an in-progress run
- So that
- I can stop a run that was triggered with incorrect parameters without waiting for it to fail
Scenario: Run cancelled successfully
Giventhe run's
status is running or pending
Whenthe user clicks Cancel run and confirms the modal
ThenPOST /runs/{run_id}/cancel is issued; the run status updates to cancelled; the SSE stream is closed; the LastSyncedBadge switches to Synced at: <time>
Scenario: Cancel sign-out race — SSE torn down before cancel request
Giventhe user initiates sign-out while a cancellation modal is open
Thenthe SSE connection is torn down first (per Story 1.5); the cancel request is not issued after sign-out begins (cross-link to Epic 1)
| Endpoint / DB | Purpose |
|---|---|
POST /runs/{run_id}/cancel | Request workflow cancellation |
DB runs.status | Guard: must be running or pending |
Story 4.7 — Stage Error States
- As a
- BU
- I want
- to see a specific error card when a stage fails with a known error class
- So that
- I understand the root cause without reading raw logs
Scenario: MalformedLlmOutput error card
Givena stage fails with
error_class: "MalformedLlmOutput"
Thenthe stage card shows a MalformedLlmOutput badge with copy: The LLM response could not be parsed as valid structured output. The stage will be retried.; attempt counter is visible; CorrelationChip is shown
Scenario: SchemaValidationError error card
Givena stage fails with
error_class: "SchemaValidationError"
Thenthe stage card shows a SchemaValidationError badge; the artifact link for this stage is disabled with tooltip Artifact failed schema validation — download only
| Endpoint / DB | Purpose |
|---|---|
SSE /runs/{run_id}/events | Emits stage_error events with error_class |
DB stages.error_class | Stored on stage failure |