Temporal Orchestration
AMTP uses
Temporal
as its workflow orchestration engine. Temporal provides a deterministic
event-history model, durable execution across worker restarts, and a
structured activity retry framework. The AMTP workflow
(TestGenerationWorkflow) is responsible for sequencing the
three LLM agent activities, enforcing context resets between them, and
managing terminal failure states.
Workflow Determinism Requirements #
Temporal replays workflow history to rebuild state after a worker restart. Any non-determinism in the workflow code breaks replay. All non-deterministic operations are therefore delegated to activities.
-
No
Date.now()ornew Date()in workflow code — useworkflow.currentTime(). -
No
Math.random()in workflow code — useworkflow.random(). - No direct I/O (network, filesystem, Postgres, Valkey) in workflow code — every I/O call is wrapped in an activity.
- All LLM invocations, schema validation, JSON parsing, and GitHub API calls are activities.
Run & Stage Lifecycle #
State transitions are persisted to Postgres. The allowed values are
enforced by CHECK constraints in the schema. See
migrations/sql/V2__runs.sql and
migrations/sql/V3__stages.sql.
Run States (runs.status)
#
| State | Set by | Description |
|---|---|---|
pending |
Webhook receiver | Row created; workflow not yet dispatched to Temporal. |
running |
Workflow start signal | Temporal workflow has started; at least one stage is active. |
passed |
Workflow completion | PR created successfully; all stages completed. |
failed |
Workflow terminal error |
Non-retryable failure (e.g. StaleBaseBranch,
SchemaValidationError).
|
cancelled |
External signal / user action | Workflow cancelled before completion. |
Stage States (stages.status)
#
Standard path
Branch protection path
| State | Description |
|---|---|
pending |
Stage queued; activity not yet scheduled. |
running |
Temporal activity executing. |
awaiting_approval |
Activity blocked on an approvals row (e.g.
BranchProtectionViolation).
|
passed |
Activity completed successfully. |
failed |
Activity failed terminally. |
skipped |
Stage bypassed by workflow logic (future feature). |
cancelled |
Cancelled before execution. |
LLM Output Sanitization #
LLM providers frequently wrap JSON responses in Markdown code fences.
The deterministic
SanitizeLlmOutput step runs between the raw LLM response
and JSON parsing.
Order of operations per agent activity:
LLMCall
raw string
SanitizeLlmOutput
stripped string
↳ MalformedLlmOutput
JSON.parse
object
JSONSchemaValidate
validated object
↳ SchemaValidationError
PersistArtifact
Postgres artifacts
The sanitizer is a pure function: strips one leading
```json or ``` fence, strips one trailing
```, trims whitespace. No content
mutation. The sanitizer semver is recorded in
artifacts.content.meta.sanitizer for replay
reproducibility. For full specification see
Agents § SanitizeLlmOutput.
Activity Idempotency Contract #
Every Temporal activity that produces a side effect is designed to be safely re-executed. The idempotency key for each activity is derived deterministically:
idem_key = sha256( run_id || ":" || activity_name || ":" || canonical_json(input) )
canonical_json = keys sorted lexicographically, no extra
whitespace. The key is stored in a future
activity_idempotency table (not present in V1–V6
migrations; documented as required future schema).
Per-Activity Rules #
| Activity | Idempotency mechanism | Replay behavior |
|---|---|---|
CrawlRepo |
(run_id, ref, depth_level) → check for
existing
artifacts.kind='repo_crawler_output' before LLM
call.
|
If artifact exists, return its artifact_id;
skip LLM call entirely.
|
GenerateTestCases |
Upstream artifact_id +
depth_level form the idem key.
|
Same upstream SHA ↠ reuse existing
test_case_generator_output artifact.
|
GenerateTestCode |
Upstream artifact_id +
target_framework.
|
Same inputs ↠ reuse existing
test_engineer_output artifact.
|
CreatePullRequest |
Git Trees API: tree SHA is deterministic from file contents.
POST /git/refs with idempotency key header.
Valkey pr_lock for serialization (see below).
|
409 (ref exists) → return existing PR URL; non-retryable short-circuit. |
IncrRepoConcurrency |
Paired with DecrRepoConcurrency via
run_id; Valkey safety EXPIRE 3600.
|
INCR is safe on retry; double-increment
prevented by run_state gate in the app layer.
|
RateLimitCheck |
ZADD member = run_id; duplicate
member with same score is idempotent in sorted sets.
|
Replay does not inflate the sliding-window count. |
Side-Effect Isolation Rule #
All activities that touch external systems (GitHub API, LLM provider, Valkey, Postgres) must be at-most-once observable. Implementation requirements:
-
All Postgres writes use
INSERT ... ON CONFLICT DO NOTHINGor compare-and-swap via unique constraints. - All Valkey writes for rate-limiting use the semantics documented in infra/valkey/NAMESPACES.md.
- GitHub writes use the Git Trees API (append-only; existing tree SHAs are reused).
- Pure reads are unrestricted and require no idempotency handling.
Activity Retry Policy #
| Parameter | Value |
|---|---|
| Initial interval | 2 seconds |
| Backoff coefficient | 2.0 |
| Maximum interval | 30 seconds |
| Maximum attempts | 20 |
| Non-retryable error types |
StaleBaseBranch,
BranchProtectionViolation,
SchemaValidationError,
GitHubForbidden, RateLimitExceeded
|
Per-Repository PR Serialization #
Concurrent runs targeting the same repository must not race to open
duplicate pull requests. AMTP enforces serialization using a Valkey
distributed lock. All activities, including
CreatePullRequest, run on the single static task queue
amtp-activities. There are no dynamic task queues.
Valkey pr_lock Protocol
#
Before performing any GitHub API write,
CreatePullRequest acquires the per-repo lock:
SET amtp:rl:repo:{repo}:pr_lock {run_id} NX EX 120
- NX — only sets if the key does not exist (atomic acquisition).
- EX 120 — 120-second TTL as a safety net against a crashed activity that never releases the lock.
-
If
SET NXreturnsnil(key held by another run), the activity raisesRepoPrLockContended(retryable) and defers to Temporal’s standard activity backoff.
The lock is always released in a finally path using a
Lua compare-and-delete script to ensure only the holder can release
it:
-- Lua CAS release: only delete if the value matches our run_id
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
CreatePullRequest Failure Classes #
| Error | HTTP / source | Retryable | Outcome |
|---|---|---|---|
RepoPrLockContended |
Valkey SET NX returns nil |
Yes — backoff | Temporal retries with 2s initial interval, coefficient 2, max 30s interval. |
GitTreesApi409 |
GitHub 409 (ref already exists) | No |
Idempotent replay: return existing PR
artifact_id.
|
StaleBaseBranch |
GitHub 422 (base tree out of date) | No | Terminal workflow failure (see section below). |
BranchProtectionViolation |
GitHub 403 | No |
Writes approvals row; stage →
awaiting_approval.
|
GitHubForbidden |
GitHub 403 (not branch protection) | No | Terminal; token scope or repository permission issue. |
Design rationale. No tree-rebase logic. Serialization is enforced by the Valkey lock, not by dynamic task-queue routing. Different repositories operate in full parallelism; only concurrent runs on the same repository are serialized.
StaleBaseBranch — Terminal Workflow State #
A GitHub 422 response with a “base tree is out of date”
body indicates that
base_branch has advanced since the Repo Crawler captured
the commit SHA. Because AMTP does not perform tree rebasing, this is a
terminal non-retryable failure.
State Transitions on StaleBaseBranch #
-
Temporal activity fails with
StaleBaseBranch(non-retryable). -
Parent workflow catches the failure and writes:
-
runs.status = 'failed',runs.finished_at = now() -
stages.status = 'failed'for the activeCreatePullRequeststage -
A new artifact row:
artifacts.kind = 'failure_report'with:{ "error": "StaleBaseBranch", "base_branch": "<branch>", "observed_head": "<sha-at-crawl-time>", "expected_parent": "<sha-used-in-git-tree>" }
-
-
The Temporal workflow execution terminates in a
FAILEDstate.
Recovery Protocol #
Recovery requires provisioning an entirely new pipeline execution via an external trigger:
-
An external webhook fires — typically a subsequent
pushevent onbase_branch, or an explicit user-initiatedPOST /runsAPI call. -
The receiver provisions a
new
run_idviagen_random_uuid()(per migrations/sql/V2__runs.sql). -
The Repo Crawler re-crawls the repository against the new
base_branchHEAD, producing a freshrepo_crawler_outputartifact tied to the newrun_id. - The full pipeline proceeds as a fully independent workflow instance.
Rationale. A re-crawl against the advanced HEAD is the only way to guarantee the Test Case Generator and Test Engineer reason about the same source-of-truth that the PR targets. Any in-place rebase would require a second LLM round-trip, invalidating the stateless-agent context-reset guarantee and the idempotency of the crawler output artifact.
The StaleBaseBranch failure is observable in Postgres
(runs.status = 'failed' with a
failure_report artifact) and should be surfaced by the
operational alerting layer.
GitHub Branch-Protection Contract #
Token Scope (Documented Assumption) #
The GitHub App installation token used by
CreatePullRequest is provisioned with the following
permissions:
| Permission | Scope |
|---|---|
contents |
write — required to create trees,
commits, and refs via the Git Trees API.
|
pull_requests |
write — required to open PRs. |
bypass_branch_protections |
Not granted. |
administration |
Not granted. |
workflows |
Not granted. |
secrets |
Not granted. |
Branch-Protection Rules AMTP Honors #
| Rule | AMTP behavior |
|---|---|
| Required reviews |
AMTP opens the PR. Merge is out of scope. Review gate is
delegated to humans and the approvals table
(migrations/sql/V5__approvals.sql).
|
| Required status checks | AMTP never bypasses them. AMTP contributes no CI checks of its own (it does not execute tests). |
| Linear history / squash-only | Irrelevant to AMTP; it does not merge PRs. |
| Block force-pushes | AMTP only creates a new head ref via the Trees API. No refs are force-updated. |
| Restrict who can push |
The GitHub App identity must be allow-listed if this rule is
enabled. Otherwise the activity fails with
BranchProtectionViolation.
|
| Signed commits required | Out of current scope. GitHub App commits via Trees API are attributed to the app; GPG signing requires separate key provisioning. Tracked as a future requirement. |
BranchProtectionViolation Handling #
A GitHub 403 response classified as a branch-protection
error is non-retryable. The workflow:
-
Writes an
approvalsrow withdecision = 'pending',approver = 'branch-protection', andcomment = <github error body>. -
Transitions the active stage to
awaiting_approval. - Pauses the Temporal workflow pending an external signal.
-
Resumes only when an external signal updates the
approvalsrow toapproved(and the operator has resolved the underlying protection issue).
awaiting_approval is a valid
stages.status CHECK constraint value. See
migrations/sql/V3__stages.sql.