Epic 4 — Live Run Timeline

Covers real-time stage-status updates via Server-Sent Events (SSE), the ReconnectController circuit-breaker (3 disconnects / 60 s → Manual Reconnect Only), reconciliation snapshots to prevent ghost stages, TemporalLifecycleDetector for distinguishing network idle from Activity Heartbeat Timeout, stalled workflow detection, run cancellation, and stage-level error states (MalformedLlmOutput, SchemaValidationError).

Personas: BU AP OP

Shared modules: ReconnectController ReconciliationFetcher TemporalLifecycleDetector CorrelationChip LastSyncedBadge

Story 4.1 — Real-Time Stage Updates via SSE

As a
BU
I want
to see each pipeline stage's status update in real time without refreshing
So that
I can monitor run progress and react quickly to failures or approval gates
Scenario: Stage status card updates live
Giventhe user is viewing /runs/{run_id} and SSE /runs/{run_id}/events is open Whenthe SSE stream emits a stage_update event for a stage Thenthe stage card updates its status badge, attempt counter, and elapsed timer without a full page reload; the aria-live region announces the change
Scenario: Run transitions to terminal state
Whenthe SSE stream emits a run_completed or run_failed event Thenthe run status banner updates to the terminal state; the timeline is frozen; the SSE connection is closed by the client; the LastSyncedBadge switches from Live to Synced at: <time>
Endpoint / DBPurpose
SSE /runs/{run_id}/eventsStage and run status events stream
GET /runs/{run_id}Initial page load snapshot
DB stages.status, stages.attemptStage state source of truth

Story 4.2 — Circuit-Breaker & Manual Reconnect

As a
BU
I want
the timeline to notify me and stop auto-reconnecting after repeated SSE failures
So that
I am not trapped in a silent reconnect loop consuming resources on an unstable connection
Scenario: Circuit-breaker trips after 3 disconnects in 60 s
Giventhe SSE connection has dropped and reconnected 3 times within 60 s Whenthe fourth disconnect occurs ThenReconnectController enters Manual Reconnect Only state; auto-reconnect is suppressed; a banner reads: Live connection lost — click to reconnect manually with the CorrelationChip Andthe LastSyncedBadge shows Synced at: <time> · Snapshot AndReconciliationFetcher is suppressed in Manual Reconnect Only state
Scenario: Manual reconnect restores live stream
Whenthe user clicks Reconnect Thenthe circuit-breaker counter resets; ReconnectController re-opens the SSE connection; the banner is dismissed; ReconciliationFetcher fires once to re-sync state

Story 4.3 — Reconciliation Snapshot on Reconnect

As a
BU
I want
the timeline to resync with the server state on every reconnect
So that
stages completed during the disconnection gap are not ghosted as permanently running
Scenario: Reconciliation fetch removes ghost stages
Giventhe SSE connection dropped while stage S was running; during the gap stage S transitioned to passed Whenthe connection is restored (non-Manual-Reconnect-Only) and ReconciliationFetcher calls GET /runs/{run_id} Thenstage S's card shows passed; no ghost running spinner remains; the LastSyncedBadge updates to Live
Scenario: Reconciliation suppressed in Manual Reconnect Only
Giventhe circuit-breaker is in Manual Reconnect Only state ThenReconciliationFetcher does not issue GET /runs/{run_id}; the timeline is frozen at last-known state with the snapshot badge

Story 4.4 — Network Idle vs Activity Heartbeat Timeout

As a
BU
I want
the timeline to distinguish a temporary network pause from a genuine Temporal Activity Heartbeat Timeout
So that
I am not shown a misleading stall warning during a brief internet outage
Scenario: Network Idle detected — no stall warning
Giventhe SSE stream goes silent but TemporalLifecycleDetector classifies the silence as Network Idle Thenno stall warning is displayed; the circuit-breaker increments normally; the stage cards remain in their last-known state
Scenario: Activity Heartbeat Timeout classified — stall banner shown
GivenTemporalLifecycleDetector classifies the silence as Activity Heartbeat Timeout Thenthe offending stage card shows a Workflow stalled amber banner with elapsed stall duration and the CorrelationChip; OP role additionally sees the Force terminate action (cross-link to Epic 14)

Story 4.5 — Activity Heartbeat Timeout Stage Card

As a
OP
I want
to see a dedicated Activity Heartbeat Timeout stage state with actionable controls
So that
I can decide whether to wait for Temporal's retry or force-terminate the stalled workflow
Scenario: Stage card shows timeout state
Givena stage has exceeded its heartbeat_timeout and Temporal is attempting a retry Whenthe SSE stream emits an activity_heartbeat_timeout event for the stage Thenthe stage card badge changes to Heartbeat Timeout; the attempt counter increments; for OP role the Force terminate stalled workflow button appears (cross-link to Epic 14 Story 14.3)

Story 4.6 — Run Cancellation

As a
BU
I want
to cancel an in-progress run
So that
I can stop a run that was triggered with incorrect parameters without waiting for it to fail
Scenario: Run cancelled successfully
Giventhe run's status is running or pending Whenthe user clicks Cancel run and confirms the modal ThenPOST /runs/{run_id}/cancel is issued; the run status updates to cancelled; the SSE stream is closed; the LastSyncedBadge switches to Synced at: <time>
Scenario: Cancel sign-out race — SSE torn down before cancel request
Giventhe user initiates sign-out while a cancellation modal is open Thenthe SSE connection is torn down first (per Story 1.5); the cancel request is not issued after sign-out begins (cross-link to Epic 1)
Endpoint / DBPurpose
POST /runs/{run_id}/cancelRequest workflow cancellation
DB runs.statusGuard: must be running or pending

Story 4.7 — Stage Error States

As a
BU
I want
to see a specific error card when a stage fails with a known error class
So that
I understand the root cause without reading raw logs
Scenario: MalformedLlmOutput error card
Givena stage fails with error_class: "MalformedLlmOutput" Thenthe stage card shows a MalformedLlmOutput badge with copy: The LLM response could not be parsed as valid structured output. The stage will be retried.; attempt counter is visible; CorrelationChip is shown
Scenario: SchemaValidationError error card
Givena stage fails with error_class: "SchemaValidationError" Thenthe stage card shows a SchemaValidationError badge; the artifact link for this stage is disabled with tooltip Artifact failed schema validation — download only
Endpoint / DBPurpose
SSE /runs/{run_id}/eventsEmits stage_error events with error_class
DB stages.error_classStored on stage failure