Architecture Overview

AMTP is a test-generation pipeline, not a test-execution engine. A user connects a private GitHub repository; the system produces a pull request containing Playwright or Maestro test code. No tests are run inside AMTP.

Architectural future state The three LLM agents (Repo Crawler, Test Case Generator, Test Engineer) and the Temporal orchestrator are documented as the intended architecture. They are not present in the current repository. The infrastructure stack (Postgres, PgBouncer, Valkey, Healthcheck, CI/CD) and the GitHub MCP server (amtp-github-mcp) are implemented and operational.

System Architecture #

AMTP System Architecture — The Three-Agent Pipeline

Three Specialized LLM Agents — Stateless · Isolated Prompt Libraries · Concurrency Limited

MCP Abstraction Layer — All external vendor access brokered through MCP tools

GitHub MCP

repo.tree · blob.create · tree.create · pr.create

Code Intelligence (AST) MCP

tree-sitter · summarize_file · extract_routes

Test Coverage MCP

coverage.discover · match_to_taxonomy · diff

Framework Detection MCP

framework.detect · Valkey-cached rules engine

Storage & Observability Layer

Postgres (3+ structured)

S3-compatible (artifacts)

Valkey (MCP cache)

OTel Collector

OTLP gRPC/HTTP · memory_limiter · batch · attributes/llm_meta

Grafana Tempo 2.4

trace storage · TraceQL · span metrics generator

Prometheus 2.52

LLM token metrics · 15 d retention · remote-write receiver

Grafana 10.4

LLM Token Accounting dashboard · Tempo · alerting

End-to-End Flow #

The following describes a single pipeline run from trigger to pull request.

External Webhook / POST /runs Delivers repo_full_name, ref, depth_level — provisions new run_id

Rate-limit check Valkey amtp:rl:* — sliding-window per user & per repo — reject if exceeded

Temporal Workflow: TestGenerationWorkflow

Stage 1: CrawlRepo

Repo Crawler LLM

repo_crawler_output

Valkey MCP cache — amtp:mcp:tree:{repo}:{ref}

GitHub MCP server — repo.tree

Stage 2: GenerateTestCases

Test Case Generator LLM

test_case_generator_output

Stage 3: GenerateTestCode

Test Engineer LLM

test_engineer_output

CreatePullRequest

GitHub Git Trees API non-destructive

Valkey amtp:rl:repo:{repo}:pr_lock — per-repo serialization

✓ PR opened on GitHub

Trigger. An external webhook (GitHub push event or explicit POST /runs API call) delivers repo_full_name, ref, and depth_level. A new run_id is provisioned via gen_random_uuid() and persisted to the runs table with status = 'pending'.
Rate-limit check. Before dispatching to Temporal, the application layer checks the sliding-window user rate limit and the per-repo concurrency counter in Valkey. If either limit is exceeded, the run is rejected immediately with no Temporal workflow start.
Temporal workflow start. The orchestrator starts a TestGenerationWorkflow carrying the run_id. All downstream state is loaded from Postgres; the workflow carries no domain payload in its event history beyond identifiers.
Stage 1 — Repo Crawler. The Temporal activity CrawlRepo invokes the Repo Crawler agent. The agent calls the GitHub MCP server to fetch the repository file tree, entry points, and detected stack. Responses are cached in Valkey under amtp:mcp:tree:{repo}:{ref}. The validated JSON output is persisted as artifacts.kind = 'repo_crawler_output'. The agent’s LLM session is then discarded.
Stage 2 — Test Case Generator. A new LLM session is opened. The orchestrator loads the repo_crawler_output artifact, validates it against the Test Case Generator’s input JSON Schema, and injects it as the first user message. The agent produces a structured list of test cases. Output is persisted as artifacts.kind = 'test_case_generator_output'. The session is discarded.
Stage 3 — Test Engineer. A new LLM session is opened with the validated test-case list. The agent generates framework-specific Playwright or Maestro code files. Output is persisted as artifacts.kind = 'test_engineer_output'. The session is discarded.
Pull request creation. The Temporal activity CreatePullRequest acquires the per-repo Valkey lock, creates a new Git Tree via the GitHub Trees API (non-destructive), opens a pull request targeting base_branch, and releases the lock. The runs.status transitions to 'passed'.

Stateless LLM Contract #

The LLM is invoked statelessly. Each agent boundary enforces a hard context reset:

The worker process hosting the upstream agent is torn down before the downstream agent starts.
No messages, tool-call history, or in-memory state cross the boundary.
The downstream agent receives only its static system prompt and the upstream agent’s validated JSON output injected as the first user message.
Conversation memory, summarization, and cross-agent shared state are explicitly prohibited.

This design ensures reproducibility: given identical inputs, a deterministic temperature (≤ 0.2) and the same JSON payload will produce equivalent outputs. It also isolates failure domains — a hallucination in Stage 2 cannot corrupt Stage 1’s persisted artifact.

System Boundary #

In Scope #

Crawling private GitHub repositories via the GitHub MCP server (authenticated GitHub App token).
Generating structured test cases from repository analysis.
Generating Playwright (.spec.ts) or Maestro (.yaml) test files.
Opening a pull request on the target repository via the Git Trees API.
Persisting run state, stage state, artifacts, and approval records to Postgres.
Per-user sliding-window rate limiting and per-repo concurrency control via Valkey.

Explicit Non-Goals #

Test execution. AMTP never runs the generated test suite.
CI integration. AMTP does not trigger or monitor the repository’s own CI pipeline.
Merging pull requests. All merge decisions are delegated to humans and branch-protection rules.
Force-pushing. All Git writes use the non-destructive Trees API; no refs are force-updated.
Automated re-runs on stale base branch. A StaleBaseBranch failure terminates the run as 'failed'; recovery requires a new external webhook triggering a new run_id.
Bypassing branch protection. The provisioned GitHub App token holds no bypass_branch_protections permission.

Supporting Infrastructure Summary #

Component	Technology	Role	Status
PostgreSQL 15	`postgres:15-alpine`	Primary persistence: runs, stages, artifacts, approvals	Implemented
PgBouncer 1.22	`edoburu/pgbouncer:1.22.1`	Connection pooling in transaction mode	Implemented
Flyway 10	`flyway/flyway:10-alpine`	Schema migration management (V1–V6)	Implemented
Valkey 8.0	`valkey/valkey:8.0`	MCP cache, rate-limit accounting, PR serialization lock	Implemented
Healthcheck	Node.js 20 / Express 4 / ioredis 5	Valkey write-availability probe; exposes `GET /health`	Implemented
Garage S3	Garage (S3-compatible API), port 3900	Test bundle storage, 7-day retention	Planned
Temporal	Temporal server + workers	Deterministic workflow orchestration	Planned
LLM Agents (3)	Repo Crawler, Test Case Generator, Test Engineer	Stateless LLM invocations with JSON Schema contracts	Planned
Observability	OTel Collector · Tempo 2.4 · Prometheus 2.52 · Grafana 10.4	LLM trace ingestion, token-accounting metrics, alerting	Implemented

The Observability plane is implemented and operational. LLM agents emit structured OpenTelemetry spans and token-count metrics through an OTel Collector, which routes traces to Grafana Tempo and metrics to Prometheus. A pre-built LLM Token Accounting dashboard in Grafana provides real-time token usage, P95 latency, and call-rate panels. See Observability for the full pipeline reference.