Agent Boundaries & Contracts

AMTP coordinates three LLM agents through a deterministic Temporal orchestrator. Each agent operates in a completely isolated LLM session enforced by hard context resets. Payloads crossing agent boundaries are validated against JSON Schema Draft 2020-12 contracts before persistence and before injection into the next agent’s context window.

Stateless-LLM Contract #

One session per agent invocation. A new LLM client is instantiated for each activity. No prior messages, no memory, no summarization.
Temperature ≤ 0.2, top-p = 1. Seeded where the provider supports it. Minimizes stochastic variation between retries.
Schema-first outputs. The system prompt for every agent instructs the model to respond with a single JSON object conforming to the agent’s output schema. Any response not parseable as conforming JSON is classified as MalformedLlmOutput or SchemaValidationError and triggers the Temporal activity retry policy.
No in-memory carryover. The worker process hosting an agent is torn down at agent-boundary completion. No heap, no conversation state, no tool-call history crosses to the next agent.

Agent Handoff Summary #

Agent 1 Repo Crawler repo_crawler_output

Agent 2 Test Case Generator test_case_generator_output

Agent 3 Test Engineer test_engineer_output

Activity CreatePullRequest GitHub PR

Producer	Consumer	Artifact kind (`artifacts.kind`)	Correlation key
Repo Crawler	Test Case Generator	`repo_crawler_output`	`run_id`
Test Case Generator	Test Engineer	`test_case_generator_output`	`run_id`
Test Engineer	`CreatePullRequest` activity	`test_engineer_output`	`run_id`

All run_id values are UUIDs matching runs.run_id in the Postgres schema. See migrations/sql/V2__runs.sql.

Agent 1 — Repo Crawler #

The Repo Crawler interrogates the target GitHub repository via the GitHub MCP server (amtp-github-mcp), using the repo.tree tool to enumerate the filtered file tree and the repo.read_file tool to sample key source files. It builds a structured representation of the file tree, entry points, and detected technology stack. It does not generate test cases; its sole output is a machine-readable map of what exists in the repository.

Input Schema #

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://amtp.zagtrader.internal/schemas/repo_crawler/input.json",
  "title": "RepoCrawlerInput",
  "description": "Payload injected as the first user message of the Repo Crawler's LLM session.",
  "type": "object",
  "required": ["run_id", "repo_full_name", "ref", "depth_level"],
  "additionalProperties": false,
  "properties": {
    "run_id": {
      "type": "string",
      "format": "uuid",
      "description": "Correlation key matching runs.run_id in Postgres."
    },
    "repo_full_name": {
      "type": "string",
      "pattern": "^[^/]+/[^/]+$",
      "description": "GitHub repository in owner/name format, e.g. 'acme/api'."
    },
    "ref": {
      "type": "string",
      "minLength": 1,
      "description": "Branch name or full commit SHA to crawl."
    },
    "depth_level": {
      "type": "string",
      "enum": ["smoke", "core", "standard", "deep"],
      "description": "Controls crawl breadth: smoke = entry points only; deep = full recursive tree."
    }
  }
}

schemas/repo_crawler/input.json — validated before workflow start.

Output Schema #

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://amtp.zagtrader.internal/schemas/repo_crawler/output.json",
  "title": "RepoCrawlerOutput",
  "description": "Validated JSON persisted as artifacts.kind='repo_crawler_output'.",
  "type": "object",
  "required": ["run_id", "repo_full_name", "ref", "file_tree", "entry_points", "detected_stack", "cache_hits"],
  "additionalProperties": false,
  "properties": {
    "run_id":         { "type": "string", "format": "uuid" },
    "repo_full_name": { "type": "string", "pattern": "^[^/]+/[^/]+$" },
    "ref":            { "type": "string", "minLength": 1 },
    "file_tree": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["path", "size", "sha"],
        "additionalProperties": false,
        "properties": {
          "path": { "type": "string", "minLength": 1 },
          "size": { "type": "integer", "minimum": 0 },
          "sha":  { "type": "string", "pattern": "^[0-9a-f]{40}$" }
        }
      }
    },
    "entry_points": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["path", "kind"],
        "additionalProperties": false,
        "properties": {
          "path": { "type": "string", "minLength": 1 },
          "kind": {
            "type": "string",
            "enum": ["api_route", "ui_component", "test_fixture", "config"]
          }
        }
      }
    },
    "detected_stack": {
      "type": "object",
      "description": "Language/framework detection, e.g. {\"runtime\":\"node\",\"frameworks\":[\"express\"]}.",
      "additionalProperties": true
    },
    "cache_hits": {
      "type": "integer",
      "minimum": 0,
      "description": "Number of Valkey MCP cache hits during this crawl."
    }
  }
}

schemas/repo_crawler/output.json — persisted to migrations/sql/V4__artifacts.sql as kind='repo_crawler_output'.

Agent 2 — Test Case Generator #

Receives the validated Repo Crawler output and the target depth_level. Produces a structured list of test cases with identifiers, preconditions, steps, expected outcomes, and priority classifications. No code is generated at this stage.

Input Schema #

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://amtp.zagtrader.internal/schemas/test_case_generator/input.json",
  "title": "TestCaseGeneratorInput",
  "description": "Hydration envelope injected as the first user message of the Test Case Generator session.",
  "type": "object",
  "required": ["run_id", "depth_level", "repo_full_name", "ref",
               "file_tree", "entry_points", "detected_stack", "cache_hits"],
  "additionalProperties": false,
  "properties": {
    "run_id":         { "type": "string", "format": "uuid" },
    "depth_level":    { "type": "string", "enum": ["smoke", "core", "standard", "deep"] },
    "repo_full_name": { "type": "string", "pattern": "^[^/]+/[^/]+$" },
    "ref":            { "type": "string", "minLength": 1 },
    "file_tree": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["path", "size", "sha"],
        "additionalProperties": false,
        "properties": {
          "path": { "type": "string" },
          "size": { "type": "integer", "minimum": 0 },
          "sha":  { "type": "string", "pattern": "^[0-9a-f]{40}$" }
        }
      }
    },
    "entry_points": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["path", "kind"],
        "additionalProperties": false,
        "properties": {
          "path": { "type": "string" },
          "kind": { "type": "string", "enum": ["api_route", "ui_component", "test_fixture", "config"] }
        }
      }
    },
    "detected_stack": { "type": "object", "additionalProperties": true },
    "cache_hits":     { "type": "integer", "minimum": 0 }
  }
}

This schema is identical in shape to schemas/repo_crawler/output.json plus depth_level. The orchestrator re-validates before injection.

Output Schema #

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://amtp.zagtrader.internal/schemas/test_case_generator/output.json",
  "title": "TestCaseGeneratorOutput",
  "description": "Validated JSON persisted as artifacts.kind='test_case_generator_output'.",
  "type": "object",
  "required": ["run_id", "test_cases", "coverage_notes"],
  "additionalProperties": false,
  "properties": {
    "run_id": { "type": "string", "format": "uuid" },
    "test_cases": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["id", "title", "preconditions", "steps", "expected", "priority"],
        "additionalProperties": false,
        "properties": {
          "id": {
            "type": "string",
            "pattern": "^tc_[a-z0-9_]+$",
            "description": "Stable kebab-style identifier, e.g. 'tc_login_success'."
          },
          "title":        { "type": "string", "minLength": 1 },
          "preconditions": {
            "type": "array",
            "items": { "type": "string", "minLength": 1 }
          },
          "steps": {
            "type": "array",
            "minItems": 1,
            "items": { "type": "string", "minLength": 1 }
          },
          "expected": { "type": "string", "minLength": 1 },
          "priority": {
            "type": "string",
            "enum": ["critical", "high", "medium", "low"]
          }
        }
      }
    },
    "coverage_notes": {
      "type": "string",
      "description": "Free-text notes on coverage gaps or assumptions made during generation."
    }
  }
}

schemas/test_case_generator/output.json — persisted to migrations/sql/V4__artifacts.sql as kind='test_case_generator_output'.

Agent 3 — Test Engineer #

Receives the validated test case list and a target_framework selector. Produces framework-specific code files and the metadata required to open a GitHub pull request.

Input Schema #

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://amtp.zagtrader.internal/schemas/test_engineer/input.json",
  "title": "TestEngineerInput",
  "description": "Hydration envelope injected as the first user message of the Test Engineer session.",
  "type": "object",
  "required": ["run_id", "target_framework", "test_cases", "coverage_notes"],
  "additionalProperties": false,
  "properties": {
    "run_id":           { "type": "string", "format": "uuid" },
    "target_framework": { "type": "string", "enum": ["playwright", "maestro"] },
    "test_cases": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["id", "title", "preconditions", "steps", "expected", "priority"],
        "additionalProperties": false,
        "properties": {
          "id":           { "type": "string", "pattern": "^tc_[a-z0-9_]+$" },
          "title":        { "type": "string", "minLength": 1 },
          "preconditions": { "type": "array", "items": { "type": "string" } },
          "steps":        { "type": "array", "minItems": 1, "items": { "type": "string" } },
          "expected":     { "type": "string", "minLength": 1 },
          "priority":     { "type": "string", "enum": ["critical", "high", "medium", "low"] }
        }
      }
    },
    "coverage_notes": { "type": "string" }
  }
}

Subset of schemas/test_case_generator/output.json re-validated before injection.

Output Schema #

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://amtp.zagtrader.internal/schemas/test_engineer/output.json",
  "title": "TestEngineerOutput",
  "description": "Validated JSON persisted as artifacts.kind='test_engineer_output'. Used by CreatePullRequest.",
  "type": "object",
  "required": ["run_id", "framework", "files", "pr_title", "pr_body", "base_branch", "head_branch"],
  "additionalProperties": false,
  "properties": {
    "run_id":    { "type": "string", "format": "uuid" },
    "framework": { "type": "string", "enum": ["playwright", "maestro"] },
    "files": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["path", "contents"],
        "additionalProperties": false,
        "properties": {
          "path": {
            "type": "string",
            "pattern": "^[^/]",
            "description": "Relative path inside the repository, no leading slash."
          },
          "contents": { "type": "string", "minLength": 1 }
        }
      }
    },
    "pr_title": {
      "type": "string",
      "minLength": 1,
      "maxLength": 256
    },
    "pr_body":      { "type": "string" },
    "base_branch":  { "type": "string", "minLength": 1 },
    "head_branch":  { "type": "string", "minLength": 1 }
  }
}

schemas/test_engineer/output.json — consumed directly by the CreatePullRequest Temporal activity.

Hard Context Reset & State Hydration #

The following protocol governs every agent boundary in the pipeline. It applies identically to the Repo Crawler → Test Case Generator handoff and the Test Case Generator → Test Engineer handoff.

Protocol Steps #

Persistence boundary. The upstream agent’s raw LLM response is first sanitized (markdown fences and whitespace stripped — see Orchestration § LLM Output Sanitization), then parsed as JSON, then validated against the agent’s output JSON Schema. On success the validated object is written to the artifacts table (migrations/sql/V4__artifacts.sql) as a JSONB content column with kind = '{agent}_output'. The returned artifact_id is handed back to the Temporal workflow.
Worker teardown. The worker process that hosted the upstream agent terminates. No heap, no LLM client instance, no conversation messages survive.
Hydration payload assembly. For the downstream agent, the orchestrator loads artifacts.content by artifact_id from Postgres, re-validates it against the downstream agent’s input JSON Schema, and wraps it in a deterministic handoff envelope:
```
{
  "run_id": "<uuid>",
  "upstream": {
    "agent": "repo_crawler",
    "artifact_id": "<uuid>",
    "schema_id": "https://amtp.zagtrader.internal/schemas/repo_crawler/output.json"
  },
  "payload": { /* upstream validated JSON, verbatim — no re-serialization */ }
}
```
Prompt injection. The envelope is serialized as canonical JSON (keys sorted, no extra whitespace) and injected as the first user message of the downstream agent’s new LLM session, immediately after the agent’s static system prompt. No narration, no summarization.
Determinism constraints. Temperature ≤ 0.2, top-p = 1, provider seed set where supported. Any LLM output that fails sanitization, JSON parsing, or schema validation triggers an activity failure and standard Temporal retry.
Cache coherence. Hydration reads are sourced exclusively from Postgres artifacts — the authoritative store of truth. Valkey caches only MCP API responses and rate-limit state (infra/valkey/NAMESPACES.md); it never holds agent handoff payloads. A cache eviction or Valkey restart cannot corrupt a handoff in flight.

LLM Output Pre-Processing (SanitizeLlmOutput) #

LLM providers frequently wrap JSON responses in Markdown code fences (e.g., ```json ... ```). A deterministic sanitizer runs before JSON parsing. It is a pure function with no side effects.

Strip one leading ```json or ``` fence if present.
Strip one trailing ``` fence if present.
Trim leading and trailing whitespace.
No content mutation, key re-casing, or JSON re-serialization.

If stripping does not yield a JSON.parse-able string, the activity fails with MalformedLlmOutput (retryable). Successful parse followed by schema violation fails with SchemaValidationError (non-retryable; requires human review).

The sanitizer version is recorded in the persisted artifact: artifacts.content.meta.sanitizer = "v1.0.0". This ensures future replays use the same logic version.

// Reference implementation — deterministic, no side-effects
function sanitizeLlmOutput(raw) {
  let s = raw.trim();
  if (s.startsWith("```json")) s = s.slice("```json".length);
  else if (s.startsWith("```"))  s = s.slice(3);
  if (s.endsWith("```")) s = s.slice(0, s.length - 3);
  return s.trim();
}