AMTP coordinates three
LLM agents through a
deterministic
Temporal orchestrator.
Each agent operates in a completely isolated LLM session enforced by
hard context resets. Payloads crossing agent boundaries
are validated against
JSON Schema Draft
2020-12 contracts before persistence and before injection into the next
agent’s context window.
One session per agent invocation. A new LLM client
is instantiated for each activity. No prior messages, no memory, no
summarization.
Temperature ≤ 0.2, top-p = 1. Seeded where the
provider supports it. Minimizes stochastic variation between
retries.
Schema-first outputs. The system prompt for every
agent instructs the model to respond with a single JSON object
conforming to the agent’s output schema. Any response not
parseable as conforming JSON is classified as
MalformedLlmOutput or
SchemaValidationError and triggers the Temporal
activity retry policy.
No in-memory carryover. The worker process hosting
an agent is torn down at agent-boundary completion. No heap, no
conversation state, no tool-call history crosses to the next agent.
The Repo Crawler interrogates the target GitHub repository via the
GitHub MCP server
(amtp-github-mcp), using the repo.tree tool
to enumerate the filtered file tree and the
repo.read_file tool to sample key source files. It builds
a structured representation of the file tree, entry points, and
detected technology stack. It does not generate test cases;
its sole output is a machine-readable map of what exists in the
repository.
Receives the validated Repo Crawler output and the target
depth_level. Produces a structured list of test cases
with identifiers, preconditions, steps, expected outcomes, and
priority classifications. No code is generated at this stage.
Receives the validated test case list and a
target_framework selector. Produces framework-specific
code files and the metadata required to open a GitHub pull request.
The following protocol governs every agent boundary in the pipeline.
It applies identically to the Repo Crawler → Test Case Generator
handoff and the Test Case Generator → Test Engineer handoff.
Persistence boundary.
The upstream agent’s raw LLM response is first sanitized
(markdown fences and whitespace stripped — see
Orchestration § LLM Output Sanitization),
then parsed as JSON, then validated against the agent’s
output JSON Schema. On success the validated object is written to
the artifacts table
(migrations/sql/V4__artifacts.sql) as a JSONB
content column with
kind = '{agent}_output'. The returned
artifact_id is handed back to the Temporal workflow.
Worker teardown.
The worker process that hosted the upstream agent terminates. No
heap, no LLM client instance, no conversation messages survive.
Hydration payload assembly.
For the downstream agent, the orchestrator loads
artifacts.content by artifact_id from
Postgres, re-validates it against the downstream agent’s
input JSON Schema, and wraps it in a
deterministic handoff envelope:
Prompt injection.
The envelope is serialized as
canonical JSON (keys sorted, no extra whitespace)
and injected as the first user message of the downstream
agent’s new LLM session, immediately after the agent’s
static system prompt. No narration, no summarization.
Determinism constraints.
Temperature ≤ 0.2, top-p = 1, provider seed set where
supported. Any LLM output that fails sanitization, JSON parsing,
or schema validation triggers an activity failure and standard
Temporal retry.
Cache coherence.
Hydration reads are sourced exclusively from Postgres
artifacts — the authoritative store of truth. Valkey
caches only MCP API responses and rate-limit state
(infra/valkey/NAMESPACES.md); it never holds agent
handoff payloads. A cache eviction or Valkey restart cannot
corrupt a handoff in flight.
LLM providers frequently wrap JSON responses in Markdown code fences
(e.g.,
```json ... ```). A deterministic sanitizer runs before
JSON parsing. It is a pure function with no side
effects.
Strip one leading ```json or ``` fence
if present.
Strip one trailing ``` fence if present.
Trim leading and trailing whitespace.
No content mutation, key re-casing, or JSON
re-serialization.
If stripping does not yield a JSON.parse-able string,
the activity fails with MalformedLlmOutput (retryable).
Successful parse followed by schema violation fails with
SchemaValidationError (non-retryable; requires human
review).
The sanitizer version is recorded in the persisted artifact:
artifacts.content.meta.sanitizer = "v1.0.0". This
ensures future replays use the same logic version.
// Reference implementation — deterministic, no side-effects
function sanitizeLlmOutput(raw) {
let s = raw.trim();
if (s.startsWith("```json")) s = s.slice("```json".length);
else if (s.startsWith("```")) s = s.slice(3);
if (s.endsWith("```")) s = s.slice(0, s.length - 3);
return s.trim();
}