Architecture Overview

AMTP is a test-generation pipeline, not a test-execution engine. A user connects a private GitHub repository; the system produces a pull request containing Playwright or Maestro test code. No tests are run inside AMTP.

System Architecture #

AMTP System Architecture — The Three-Agent Pipeline
Business User
  • Product Manager
  • QA Lead · Business Analyst
  • Tailored approvals
  • Non-technical sponsor
Web Frontend
  • Next.js / TypeScript
  • Tailwind
  • Craft.io · Dashboards / Approvals
API Gateway
  • OAuth · Projects CRUD
  • Run lifecycle · SSE/WebSocket
  • Holds zero LLM state
Temporal Orchestrator
  • Top-state workflow machine
  • Approval gates · 7-day pause
  • Durable · Recoverable
Three Specialized LLM Agents — Stateless · Isolated Prompt Libraries · Concurrency Limited
Agent 1 – Repo Crawler
Maps application features
Three-pass strategy:
  • Skeleton pass
  • Targeted expansion
  • Depth-aware detail
Output → Feature Taxonomy JSON
Agent 2 – Test Case Generator
Translates features → scenarios
Key characteristics:
  • One feature at a time
  • Parallel execution
  • Prompt caching enabled
Output → Test Plan JSON
Agent 3 – Test Engineer
Produces automation code
Responsibilities:
  • Coverage lifting
  • Selector resolution
  • Playwright / Maestro codegen
Output → GitHub Pull Request
MCP Abstraction Layer — All external vendor access brokered through MCP tools
GitHub MCP
repo.tree · blob.create · tree.create · pr.create
Code Intelligence (AST) MCP
tree-sitter · summarize_file · extract_routes
Test Coverage MCP
coverage.discover · match_to_taxonomy · diff
Framework Detection MCP
framework.detect · Valkey-cached rules engine
Storage & Observability Layer
Postgres (3+ structured)
S3-compatible (artifacts)
Valkey (MCP cache)
OTel Collector
OTLP gRPC/HTTP · memory_limiter · batch · attributes/llm_meta
Grafana Tempo 2.4
trace storage · TraceQL · span metrics generator
Prometheus 2.52
LLM token metrics · 15 d retention · remote-write receiver
Grafana 10.4
LLM Token Accounting dashboard · Tempo · alerting

End-to-End Flow #

The following describes a single pipeline run from trigger to pull request.

External Webhook  /  POST /runs Delivers repo_full_name, ref, depth_level — provisions new run_id
Rate-limit check Valkey amtp:rl:* — sliding-window per user & per repo — reject if exceeded
Temporal Workflow: TestGenerationWorkflow
Stage 1: CrawlRepo
Repo Crawler LLM
repo_crawler_output
Valkey MCP cache — amtp:mcp:tree:{repo}:{ref}
GitHub MCP server — repo.tree
Stage 2: GenerateTestCases
Test Case Generator LLM
test_case_generator_output
Stage 3: GenerateTestCode
Test Engineer LLM
test_engineer_output
CreatePullRequest
GitHub Git Trees API non-destructive
Valkey amtp:rl:repo:{repo}:pr_lock — per-repo serialization
✓ PR opened on GitHub
  1. Trigger. An external webhook (GitHub push event or explicit POST /runs API call) delivers repo_full_name, ref, and depth_level. A new run_id is provisioned via gen_random_uuid() and persisted to the runs table with status = 'pending'.
  2. Rate-limit check. Before dispatching to Temporal, the application layer checks the sliding-window user rate limit and the per-repo concurrency counter in Valkey. If either limit is exceeded, the run is rejected immediately with no Temporal workflow start.
  3. Temporal workflow start. The orchestrator starts a TestGenerationWorkflow carrying the run_id. All downstream state is loaded from Postgres; the workflow carries no domain payload in its event history beyond identifiers.
  4. Stage 1 — Repo Crawler. The Temporal activity CrawlRepo invokes the Repo Crawler agent. The agent calls the GitHub MCP server to fetch the repository file tree, entry points, and detected stack. Responses are cached in Valkey under amtp:mcp:tree:{repo}:{ref}. The validated JSON output is persisted as artifacts.kind = 'repo_crawler_output'. The agent’s LLM session is then discarded.
  5. Stage 2 — Test Case Generator. A new LLM session is opened. The orchestrator loads the repo_crawler_output artifact, validates it against the Test Case Generator’s input JSON Schema, and injects it as the first user message. The agent produces a structured list of test cases. Output is persisted as artifacts.kind = 'test_case_generator_output'. The session is discarded.
  6. Stage 3 — Test Engineer. A new LLM session is opened with the validated test-case list. The agent generates framework-specific Playwright or Maestro code files. Output is persisted as artifacts.kind = 'test_engineer_output'. The session is discarded.
  7. Pull request creation. The Temporal activity CreatePullRequest acquires the per-repo Valkey lock, creates a new Git Tree via the GitHub Trees API (non-destructive), opens a pull request targeting base_branch, and releases the lock. The runs.status transitions to 'passed'.

Stateless LLM Contract #

The LLM is invoked statelessly. Each agent boundary enforces a hard context reset:

This design ensures reproducibility: given identical inputs, a deterministic temperature (≤ 0.2) and the same JSON payload will produce equivalent outputs. It also isolates failure domains — a hallucination in Stage 2 cannot corrupt Stage 1’s persisted artifact.

System Boundary #

In Scope #

  • Crawling private GitHub repositories via the GitHub MCP server (authenticated GitHub App token).
  • Generating structured test cases from repository analysis.
  • Generating Playwright (.spec.ts) or Maestro (.yaml) test files.
  • Opening a pull request on the target repository via the Git Trees API.
  • Persisting run state, stage state, artifacts, and approval records to Postgres.
  • Per-user sliding-window rate limiting and per-repo concurrency control via Valkey.

Explicit Non-Goals #

  • Test execution. AMTP never runs the generated test suite.
  • CI integration. AMTP does not trigger or monitor the repository’s own CI pipeline.
  • Merging pull requests. All merge decisions are delegated to humans and branch-protection rules.
  • Force-pushing. All Git writes use the non-destructive Trees API; no refs are force-updated.
  • Automated re-runs on stale base branch. A StaleBaseBranch failure terminates the run as 'failed'; recovery requires a new external webhook triggering a new run_id.
  • Bypassing branch protection. The provisioned GitHub App token holds no bypass_branch_protections permission.

Supporting Infrastructure Summary #

Component Technology Role Status
PostgreSQL 15 postgres:15-alpine Primary persistence: runs, stages, artifacts, approvals Implemented
PgBouncer 1.22 edoburu/pgbouncer:1.22.1 Connection pooling in transaction mode Implemented
Flyway 10 flyway/flyway:10-alpine Schema migration management (V1–V6) Implemented
Valkey 8.0 valkey/valkey:8.0 MCP cache, rate-limit accounting, PR serialization lock Implemented
Healthcheck Node.js 20 / Express 4 / ioredis 5 Valkey write-availability probe; exposes GET /health Implemented
Garage S3 Garage (S3-compatible API), port 3900 Test bundle storage, 7-day retention Planned
Temporal Temporal server + workers Deterministic workflow orchestration Planned
LLM Agents (3) Repo Crawler, Test Case Generator, Test Engineer Stateless LLM invocations with JSON Schema contracts Planned
Observability OTel Collector · Tempo 2.4 · Prometheus 2.52 · Grafana 10.4 LLM trace ingestion, token-accounting metrics, alerting Implemented

The Observability plane is implemented and operational. LLM agents emit structured OpenTelemetry spans and token-count metrics through an OTel Collector, which routes traces to Grafana Tempo and metrics to Prometheus. A pre-built LLM Token Accounting dashboard in Grafana provides real-time token usage, P95 latency, and call-rate panels. See Observability for the full pipeline reference.

AMTP System Architecture — The Three-Agent Pipeline
Business User
  • Product Manager
  • QA Lead · Business Analyst
  • Tailored approvals
  • Non-technical sponsor
Web Frontend
  • Next.js / TypeScript
  • Tailwind
  • Craft.io · Dashboards / Approvals
API Gateway
  • OAuth · Projects CRUD
  • Run lifecycle · SSE/WebSocket
  • Holds zero LLM state
Temporal Orchestrator
  • Top-state workflow machine
  • Approval gates · 7-day pause
  • Durable · Recoverable
Three Specialized LLM Agents — Stateless · Isolated Prompt Libraries · Concurrency Limited
Agent 1 – Repo Crawler
Maps application features
Three-pass strategy:
  • Skeleton pass
  • Targeted expansion
  • Depth-aware detail
Output → Feature Taxonomy JSON
Agent 2 – Test Case Generator
Translates features → scenarios
Key characteristics:
  • One feature at a time
  • Parallel execution
  • Prompt caching enabled
Output → Test Plan JSON
Agent 3 – Test Engineer
Produces automation code
Responsibilities:
  • Coverage lifting
  • Selector resolution
  • Playwright / Maestro codegen
Output → GitHub Pull Request
MCP Abstraction Layer — All external vendor access brokered through MCP tools
GitHub MCP
repo.tree · blob.create · tree.create · pr.create
Code Intelligence (AST) MCP
tree-sitter · summarize_file · extract_routes
Test Coverage MCP
coverage.discover · match_to_taxonomy · diff
Framework Detection MCP
framework.detect · Valkey-cached rules engine
Storage & Observability Layer
Postgres (3+ structured)
S3-compatible (artifacts)
Valkey (MCP cache)
OTel Collector
OTLP gRPC/HTTP · memory_limiter · batch · attributes/llm_meta
Grafana Tempo 2.4
trace storage · TraceQL · span metrics generator
Prometheus 2.52
LLM token metrics · 15 d retention · remote-write receiver
Grafana 10.4
LLM Token Accounting dashboard · Tempo · alerting