Architecture Overview
AMTP is a test-generation pipeline, not a test-execution engine. A user connects a private GitHub repository; the system produces a pull request containing Playwright or Maestro test code. No tests are run inside AMTP.
System Architecture #
- Product Manager
- QA Lead · Business Analyst
- Tailored approvals
- Non-technical sponsor
- Next.js / TypeScript
- Tailwind
- Craft.io · Dashboards / Approvals
- OAuth · Projects CRUD
- Run lifecycle · SSE/WebSocket
- Holds zero LLM state
- Top-state workflow machine
- Approval gates · 7-day pause
- Durable · Recoverable
- Skeleton pass
- Targeted expansion
- Depth-aware detail
- One feature at a time
- Parallel execution
- Prompt caching enabled
- Coverage lifting
- Selector resolution
- Playwright / Maestro codegen
End-to-End Flow #
The following describes a single pipeline run from trigger to pull request.
POST /runs
Delivers repo_full_name, ref,
depth_level — provisions new
run_id
amtp:rl:* — sliding-window per user
& per repo — reject if exceeded
repo_crawler_output
amtp:mcp:tree:{repo}:{ref}
repo.tree
test_case_generator_output
test_engineer_output
amtp:rl:repo:{repo}:pr_lock —
per-repo serialization
-
Trigger. An external webhook (GitHub
pushevent or explicitPOST /runsAPI call) deliversrepo_full_name,ref, anddepth_level. A newrun_idis provisioned viagen_random_uuid()and persisted to therunstable withstatus = 'pending'. - Rate-limit check. Before dispatching to Temporal, the application layer checks the sliding-window user rate limit and the per-repo concurrency counter in Valkey. If either limit is exceeded, the run is rejected immediately with no Temporal workflow start.
-
Temporal workflow start. The orchestrator starts a
TestGenerationWorkflowcarrying therun_id. All downstream state is loaded from Postgres; the workflow carries no domain payload in its event history beyond identifiers. -
Stage 1 — Repo Crawler. The Temporal activity
CrawlRepoinvokes the Repo Crawler agent. The agent calls the GitHub MCP server to fetch the repository file tree, entry points, and detected stack. Responses are cached in Valkey underamtp:mcp:tree:{repo}:{ref}. The validated JSON output is persisted asartifacts.kind = 'repo_crawler_output'. The agent’s LLM session is then discarded. -
Stage 2 — Test Case Generator. A new LLM
session is opened. The orchestrator loads the
repo_crawler_outputartifact, validates it against the Test Case Generator’s input JSON Schema, and injects it as the first user message. The agent produces a structured list of test cases. Output is persisted asartifacts.kind = 'test_case_generator_output'. The session is discarded. -
Stage 3 — Test Engineer. A new LLM session is
opened with the validated test-case list. The agent generates
framework-specific Playwright or Maestro code files. Output is
persisted as
artifacts.kind = 'test_engineer_output'. The session is discarded. -
Pull request creation. The Temporal activity
CreatePullRequestacquires the per-repo Valkey lock, creates a new Git Tree via the GitHub Trees API (non-destructive), opens a pull request targetingbase_branch, and releases the lock. Theruns.statustransitions to'passed'.
Stateless LLM Contract #
The LLM is invoked statelessly. Each agent boundary enforces a hard context reset:
- The worker process hosting the upstream agent is torn down before the downstream agent starts.
- No messages, tool-call history, or in-memory state cross the boundary.
- The downstream agent receives only its static system prompt and the upstream agent’s validated JSON output injected as the first user message.
- Conversation memory, summarization, and cross-agent shared state are explicitly prohibited.
This design ensures reproducibility: given identical inputs, a deterministic temperature (≤ 0.2) and the same JSON payload will produce equivalent outputs. It also isolates failure domains — a hallucination in Stage 2 cannot corrupt Stage 1’s persisted artifact.
System Boundary #
In Scope #
- Crawling private GitHub repositories via the GitHub MCP server (authenticated GitHub App token).
- Generating structured test cases from repository analysis.
-
Generating Playwright (
.spec.ts) or Maestro (.yaml) test files. - Opening a pull request on the target repository via the Git Trees API.
- Persisting run state, stage state, artifacts, and approval records to Postgres.
- Per-user sliding-window rate limiting and per-repo concurrency control via Valkey.
Explicit Non-Goals #
- Test execution. AMTP never runs the generated test suite.
- CI integration. AMTP does not trigger or monitor the repository’s own CI pipeline.
- Merging pull requests. All merge decisions are delegated to humans and branch-protection rules.
- Force-pushing. All Git writes use the non-destructive Trees API; no refs are force-updated.
-
Automated re-runs on stale base branch. A
StaleBaseBranchfailure terminates the run as'failed'; recovery requires a new external webhook triggering a newrun_id. -
Bypassing branch protection. The provisioned
GitHub App token holds no
bypass_branch_protectionspermission.
Supporting Infrastructure Summary #
| Component | Technology | Role | Status |
|---|---|---|---|
| PostgreSQL 15 | postgres:15-alpine |
Primary persistence: runs, stages, artifacts, approvals | Implemented |
| PgBouncer 1.22 | edoburu/pgbouncer:1.22.1 |
Connection pooling in transaction mode | Implemented |
| Flyway 10 | flyway/flyway:10-alpine |
Schema migration management (V1–V6) | Implemented |
| Valkey 8.0 | valkey/valkey:8.0 |
MCP cache, rate-limit accounting, PR serialization lock | Implemented |
| Healthcheck | Node.js 20 / Express 4 / ioredis 5 |
Valkey write-availability probe; exposes
GET /health
|
Implemented |
| Garage S3 | Garage (S3-compatible API), port 3900 | Test bundle storage, 7-day retention | Planned |
| Temporal | Temporal server + workers | Deterministic workflow orchestration | Planned |
| LLM Agents (3) | Repo Crawler, Test Case Generator, Test Engineer | Stateless LLM invocations with JSON Schema contracts | Planned |
| Observability | OTel Collector · Tempo 2.4 · Prometheus 2.52 · Grafana 10.4 | LLM trace ingestion, token-accounting metrics, alerting | Implemented |
The Observability plane is implemented and operational. LLM agents emit structured OpenTelemetry spans and token-count metrics through an OTel Collector, which routes traces to Grafana Tempo and metrics to Prometheus. A pre-built LLM Token Accounting dashboard in Grafana provides real-time token usage, P95 latency, and call-rate panels. See Observability for the full pipeline reference.