Concepts & Primer

If you are new to this codebase, read this page first. It explains every specialized technology and concept used by AMTP in plain language, in the order you will encounter them. Each section follows the same pattern: what it is, why AMTP uses it, and where to read the implementation details.

How to Read These Docs #

The documentation is split across nine pages, each focused on one layer of the system. If you are reading for the first time, the recommended order is: Concepts (this page) → Architecture → Agents → Orchestration → Infrastructure → GitHub MCP → CI/CD → Observability → Deployment → Data Model.

Two implementation states appear throughout the docs, marked with coloured callout banners:

Implemented — the code exists in the repository, the service runs in Docker Compose, and CI/CD is wired up. The GitHub MCP server, the entire infrastructure stack, and the observability stack are in this category.
Planned / Future state — the architecture is fully specified and documented as the intended design, but the code does not yet exist in the repository. The three LLM agents and the Temporal orchestrator are in this category.

When a page says “future state” it means: this is what we are building toward; the documentation is a specification, not a description of running code.

The Mental Model #

AMTP is a test-generation pipeline. A user connects a private GitHub repository; the system produces a pull request containing Playwright or Maestro test code. Nothing is executed inside AMTP. The pipeline ends when the PR is open on GitHub — reviewing, merging, and running the tests is left to the humans and the repository’s own CI.

Internally, the pipeline works through three isolated LLM agents that hand off structured JSON to one another: the Repo Crawler analyses the codebase, the Test Case Generator turns that analysis into test scenarios, and the Test Engineer writes the actual test code. A deterministic orchestrator (Temporal) sequences the agents, handles retries, and keeps durable state so a partial failure does not lose work.

Test Generation, Not Execution #

AMTP generates tests in two frameworks:

Playwright — a browser automation library made by Microsoft. Tests are TypeScript files (.spec.ts) that launch a real browser, navigate pages, click elements, and assert outcomes. Playwright tests are suitable for web applications. AMTP produces .spec.ts files; the target repository’s own CI pipeline runs them.
Maestro — a mobile UI testing framework that describes interactions in .yaml flow files. It is used for iOS and Android applications. AMTP produces .yaml flow files; the target repository’s own Maestro CLI runs them.

The separation matters: AMTP never installs a browser, never runs npx playwright test, never touches the test results. It is a code-generation service, not a test runner. This keeps the system boundary clean and means AMTP does not need access to any staging environment.

LLM Agents & Stateless Agents #

An LLM agent is a program that decides what to do by asking a Large Language Model (like GPT-4 or Claude) and acts on the answer. In AMTP, each agent is given a system prompt that describes its role, then receives a structured JSON payload as its input, and is expected to respond with a different structured JSON payload as its output. The LLM is the decision-maker; the surrounding TypeScript code validates, retries, and persists the results.

The word stateless describes how each agent is invoked: it starts with a completely blank conversation history. There is no “memory” of the previous agent’s messages, no tool-call history, and no shared in-memory objects. This is enforced by a hard context reset: before the next agent starts, the worker process hosting the previous agent is torn down entirely. The next agent receives only its static system prompt and the upstream agent’s validated JSON output injected as the first user message — nothing else crosses the boundary.

This design has two important benefits:

Reproducibility. Given the same input JSON and the same temperature (≤ 0.2), two runs produce equivalent outputs. There is no accumulated context drift from earlier messages influencing later ones.
Isolated failure domains. A hallucination or bad output from the Test Case Generator cannot corrupt the Repo Crawler’s already-persisted artifact. Failures are caught at the schema-validation step before the bad data spreads downstream.

Model Context Protocol (MCP) #

Model Context Protocol (MCP) is an open standard that lets an LLM agent call external tools in a structured way. Instead of writing one-off HTTP clients inside each agent, you wrap the external service (e.g. “fetch the GitHub repository file tree”) as an MCP tool with a defined name, input schema, and output schema. The agent then calls the tool by name; it does not need to know how GitHub’s API works.

Under the hood, MCP uses JSON-RPC 2.0 — a lightweight remote-procedure-call protocol that sends plain JSON objects. An MCP server exposes two key methods:

tools/list — returns the catalogue of available tools and their input schemas so the LLM knows what it can call.
tools/call — executes one tool with the provided arguments and returns the result.

A minimal tools/call request looks like this:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "repo.tree",
    "arguments": {
      "repo": "acme-corp/frontend",
      "ref": "main",
      "recursive": true
    }
  }
}

AMTP’s GitHub MCP server uses the Streamable HTTP transport. This means each MCP call is an ordinary HTTP POST request to a running HTTP server. The server is stateless at the MCP layer: a new server instance is created for each incoming HTTP request and discarded when the response closes. This is different from the older stdio transport (used in local desktop tools), where the MCP server runs as a subprocess and communication happens over standard input/output pipes.

Why does AMTP use an MCP abstraction layer at all? Because it decouples the agents from vendor-specific APIs. Each agent calls a named tool; the underlying implementation (GitHub REST API, Octokit library, Valkey cache, auth tokens) is entirely invisible to the LLM. If the GitHub API changes or we swap the cache backend, we change one MCP server, not every agent.

Temporal in 5 Minutes #

Temporal is a durable workflow engine. You write your business logic as a normal function (called a workflow), and Temporal guarantees it will run to completion even if the server crashes, the network drops, or an external API returns an error.

The two core concepts are:

Workflow — the top-level function that describes the steps of a process in sequential code. In AMTP this is TestGenerationWorkflow. Workflows must be deterministic: given the same sequence of events, a re-executed workflow must produce the same decisions. This means no Date.now(), no Math.random(), and no direct I/O inside a workflow function. All side-effects go into activities.
Activity — a function that does real work: calling an LLM, writing to Postgres, calling the GitHub API. Activities are allowed to fail and be retried. Temporal records each activity completion in an event history, so after a crash it can replay the workflow and skip activities that already succeeded.

Why not just use a cron job or a Postgres-backed queue? Three reasons:

Durability. A cron job that crashes mid-run leaves partial state with no record of where it stopped. Temporal’s event history lets it resume from the exact last successful activity.
Long-running waits. AMTP has an approval gate that can pause a workflow for up to 7 days waiting for a human to approve. Temporal handles this natively. A cron-based approach would require polling and timeout management in application code.
Retry semantics. Temporal applies configurable retry policies with exponential back-off automatically. An activity that fails is retried transparently; the workflow code does not need try/catch for transient failures.

Junior tip — idempotency: Because Temporal can re-run activities on retry, every activity in AMTP is designed to be idempotent: running it twice with the same inputs produces the same result as running it once. For example, persisting an artifact checks for an existing record with the same key before inserting, using ON CONFLICT DO NOTHING.

GitHub App vs Personal Access Token #

When a service needs to access the GitHub API it must authenticate. There are two common approaches, and AMTP uses a GitHub App:

Personal Access Token (PAT) — a long-lived token tied to a specific GitHub user account. If that user leaves the organisation, the token stops working. PATs are simple to set up but create a dependency on an individual’s account.
GitHub App — an independent entity registered on GitHub that is installed into one or more organisations or repositories. Tokens are short-lived (15 minutes), scoped to exactly the permissions the app was granted, and are not tied to any human user account. This is the production-grade choice for automated services.

The authentication flow for a GitHub App works in three steps:

App private key → JWT. The GitHub App is provisioned with an RSA private key (the PEM file). At runtime, the server uses this key to sign a short-lived JSON Web Token (JWT) that identifies the app to GitHub. The JWT is valid for a maximum of 10 minutes.
JWT → installation access token. Using the JWT, the server calls the GitHub API to obtain an installation access token — a temporary credential (valid for up to one hour) scoped to a specific installation of the app. This is the token that is actually used to call the repository API.
Installation access token → API call. All GitHub REST API calls (fetching trees, reading files, creating blobs) use this token in the Authorization header. The token is managed automatically by Octokit, the official GitHub API client library for JavaScript. Octokit refreshes the installation token before it expires without any application-level code.

In AMTP, the PEM private key is never stored as an environment variable. It is provisioned as a Docker secret and mounted read-only at /run/secrets/github_app_key inside the container. This ensures the key does not appear in docker inspect output or container logs.

Read more: GitHub MCP → Authentication.

GitHub Trees API vs Contents API #

GitHub exposes two APIs for reading and writing repository content. AMTP deliberately uses the Git Trees API for all writes and avoids the simpler Contents API for that purpose.

The Contents API (PUT /repos/:owner/:repo/contents/:path) writes a single file to an existing branch by creating a commit directly on that branch. It is convenient but destructive: if two concurrent processes write to the same branch they can overwrite each other’s changes, and it requires working directly on an existing ref.

The Git Trees API lets you assemble a commit from scratch using immutable content-addressed objects. Every Git object is identified by the SHA-1 hash of its content, which means the same content always produces the same SHA and objects are never mutated — only new ones are created. The four-step sequence AMTP uses is:

Create Blob → returns blob SHA. For each generated test file, POST the file content to POST /repos/:owner/:repo/git/blobs. GitHub stores the content and returns a 40-character hex SHA (e.g. a1b2c3d4…). Nothing in the repository has changed yet — the blob is an orphaned object.
Create Tree → returns tree SHA. POST to POST /repos/:owner/:repo/git/trees with a list of entries, each pairing a file path with the blob SHA from step 1. GitHub assembles a tree object and returns its SHA. This tree describes the directory structure of the new PR branch but still does not affect any existing branch.
Create Commit → returns commit SHA. POST to POST /repos/:owner/:repo/git/commits with the tree SHA from step 2 and the current commit SHA of the base branch as the parent. GitHub creates a new commit object and returns its SHA. This commit exists in the object database but is not yet reachable from any branch reference.
Update Reference (no force-push). POST to POST /repos/:owner/:repo/git/refs to create a new branch reference pointing at the commit SHA from step 3. Because this creates a new reference rather than overwriting an existing one, it is completely non-destructive — main is never touched. AMTP’s GitHub App token holds no bypass_branch_protections permission, making a force-push impossible by design.

After these four steps, a standard pull request is opened targeting base_branch. The entire write operation is atomic from GitHub’s perspective: either all objects are created and the branch reference is set, or nothing visible in the repository changes.

Postgres, PgBouncer & Flyway #

AMTP uses PostgreSQL 15 as its primary persistent store. Every run, stage, artifact, project registration, and approval record lives in Postgres. It is the single source of truth for pipeline state.

Application services do not connect directly to Postgres. Instead they connect through PgBouncer, a lightweight connection pooler. The reason: Postgres allocates a dedicated OS process per connection. Opening a new connection takes ~1–5 ms and holds memory for as long as the connection lives. A pool of short-lived connections from many container replicas would exhaust Postgres’s connection limit quickly. PgBouncer maintains a small pool of long-lived backend connections and multiplexes many short application connections through them. AMTP uses transaction pooling mode, which means a backend connection is only held for the duration of a transaction, not the entire application session — the most efficient mode.

Schema changes are managed by Flyway, a database migration tool. Flyway applies versioned SQL scripts in numeric order: V1__projects.sql, V2__runs.sql, and so on. Each script is applied exactly once; Flyway records which scripts have run in a flyway_schema_history table. This ensures every environment (dev, staging, production) reaches exactly the same schema state deterministically. One migration in AMTP (V6__indexes.sql) is annotated with executeInTransaction=false because CREATE INDEX CONCURRENTLY cannot run inside a transaction in Postgres.

Valkey — Cache, Rate-Limit & Lock #

Valkey is an open-source in-memory key-value store and a community-maintained fork of Redis. It stores data entirely in RAM, which makes reads and writes orders of magnitude faster than a relational database for simple lookups. AMTP uses it for three distinct purposes: caching GitHub API responses, enforcing rate limits, and serialising pull-request creation.

LRU eviction means that when Valkey’s memory cap (512 MB in AMTP) is reached, it automatically evicts the least-recently-used keys. This is safe for a cache because the original data (GitHub API responses) can always be re-fetched if a key is missing. Zero persistence is configured, so Valkey data does not survive a container restart — this is intentional because the cache is a performance optimisation, not a source of truth.

TTL (Time-To-Live) is a per-key expiry. Every key stored in Valkey has a TTL after which Valkey deletes it automatically. For example, a cached repository tree is stored for 600 seconds; a cached file blob for 3600 seconds. Setting a TTL prevents stale data from accumulating indefinitely.

Sliding-window rate limiting is the technique AMTP uses to cap how many pipeline runs a user can trigger per unit time. Here is a concrete example: suppose the limit is 5 runs per 60 seconds. Each time a user triggers a run, a timestamped entry is added to a sorted set in Valkey keyed by user ID. Before allowing the run, the system counts how many entries fall within the last 60-second window. If the count exceeds 5, the run is rejected. Entries older than 60 seconds are discarded. The “window slides” with real time rather than resetting at a fixed clock boundary (which would allow bursting right around the reset point).

The per-repo PR lock (amtp:rl:repo:{repo}:pr_lock) is a Valkey key that acts as a distributed mutex. Before creating a pull request, the Temporal activity acquires this lock (held for up to 120 seconds). This prevents two concurrent runs on the same repository from opening duplicate PRs simultaneously.

The OpenTelemetry Stack #

OpenTelemetry (OTel) is a vendor-neutral standard for collecting telemetry data from applications: traces, metrics, and logs. AMTP uses it to track every LLM call, measure token usage, and expose latency data without being tied to any specific monitoring vendor.

The key concepts:

Span — a single unit of work with a start time, end time, and a bag of key-value attributes. For example, one tools/call to the GitHub MCP server produces a span named mcp.tools/call with attributes like the tool name, the repo, and the response status.
Trace — a tree of spans representing the full path of one request or pipeline run, from entry point to completion. A single AMTP run produces a trace that contains spans for each LLM call, each MCP tool invocation, and each database write.
Metric — an aggregated numeric measurement recorded over time: for example, a histogram of LLM input/output token counts per agent per model. Metrics are cheaper to store than traces and are suitable for alerting.

The signal flow through the AMTP observability stack:

OpenTelemetry SDK — embedded in each service (the GitHub MCP server embeds the Node SDK). It creates spans and metrics, then exports them via OTLP (OpenTelemetry Protocol), a gRPC or HTTP wire format, to the collector.
OTel Collector — a standalone process that receives OTLP data, applies processors (e.g. adding metadata, batching), then routes traces to Tempo and metrics to Prometheus. The Collector is the central fan-out point.
Grafana Tempo — a distributed trace storage backend. It stores full traces and makes them queryable using TraceQL, a query language specifically for traces (e.g. “show me all traces where an LLM call took longer than 5 seconds”).
Prometheus — a time-series metrics database. It scrapes metric endpoints on a regular interval and makes them queryable using PromQL (e.g. “sum of input tokens per model over the last hour”).
Grafana — a visualisation dashboard that connects to both Tempo and Prometheus as data sources. AMTP ships a pre-built LLM Token Accounting dashboard showing real-time token usage, P95 latency, and call rates.

CI/CD — Self-hosted GitHub Actions #

GitHub Actions is GitHub’s built-in CI/CD platform. A workflow is a YAML file in .github/workflows/ that defines a series of jobs triggered by events (a push, a pull request, a manual dispatch). Each job runs on a runner — a machine that executes the job steps.

By default, GitHub provides cloud-hosted runners (Ubuntu, Windows, macOS). AMTP uses a self-hosted runner instead. A self-hosted runner is a machine you manage yourself, registered with your GitHub repository. AMTP’s runner is a RHEL server on the same private network as the Docker Compose stack. This means CI/CD jobs can run Docker Compose commands, access the Postgres and Valkey containers, and apply Flyway migrations without any network tunnelling.

The runner is identified by its runs-on label. In workflow YAML you write:

runs-on: [self-hosted, Linux, X64, amtp-dev]

This tells GitHub Actions: only send this job to a runner that has all four of those labels. The amtp-dev label is the custom label registered for AMTP’s runner.

AMTP uses a root-orchestrator + child workflow pattern. A single root workflow (ci-cd.yml) is triggered by a push and fans out to five reusable child workflows using the workflow_call event. This avoids duplicating runner configuration, secrets, and steps across every child workflow.

Secrets (API keys, tokens, passwords) are stored in the GitHub repository’s settings under “Secrets and variables”. They are injected into workflow steps as environment variables at runtime and are never visible in logs. Variables are non-sensitive configuration values stored in the same place but without encryption, suitable for things like port numbers or feature flags.

Docker Compose, Networks & Secrets #

Docker Compose is a tool for defining and running multi-container applications. You declare each service (Postgres, Valkey, the GitHub MCP server, etc.) in a docker-compose.yml file with its image, environment variables, port bindings, and dependencies. Running docker compose up starts all services in the right order.

AMTP defines two Compose projects: the main application stack (docker-compose.yml) and the observability stack (docker-compose.observability.yml). A Compose project is a logical grouping of services that share a network namespace. Services within the same project can reach each other by their service name (e.g. http://github-mcp:8090) because Docker creates a virtual DNS record for each service name on the shared network.

The shared network is called amtp_net and is a Docker bridge network. The observability stack joins amtp_net as an external network, which is why the OTel Collector can receive spans from the GitHub MCP server even though they are in different Compose projects.

Docker secrets are a mechanism for passing sensitive data to containers without embedding it in environment variables. You declare a secret in the Compose file pointing to a file on the host; Docker mounts that file read-only at /run/secrets/<secret-name> inside the container. The advantage over an environment variable is that Docker secrets are never visible in docker inspect or in /proc/<pid>/environ, reducing the risk of accidental exposure in logs or container metadata. AMTP mounts the GitHub App private key as a Docker secret.

Read more: Infrastructure, Deployment Runbook.

Schema Validation — Zod & JSON Schema Draft 2020-12 #

When an LLM agent produces output, or when an MCP tool receives input, AMTP validates the data against a strict schema before using it. This catches hallucinations and malformed responses at the boundary, before they propagate to downstream stages.

Two validation tools are used, each suited to a different context:

Zod — a TypeScript-first schema library. You define a schema as TypeScript code (e.g. z.object({ repo: z.string(), recursive: z.boolean() })) and call schema.parse(data) at runtime. Zod throws a typed error with detailed messages if validation fails. It is used in the TypeScript services (the GitHub MCP server) where compile-time and runtime types should agree. The benefit: your TypeScript types and your runtime validation are the same source of truth — no drift.
JSON Schema Draft 2020-12 — a language-agnostic standard for describing the structure of a JSON document. It is used to document the contracts between LLM agents: the output schema of Agent 1 is the input schema of Agent 2. Being language-agnostic means the contracts can be read, validated, and generated by any language or tool, not just TypeScript.

p-limit, Retry & Octokit Resiliency #

When the GitHub MCP server processes a repository tree it may need to fetch hundreds of blobs concurrently (one per file). Sending all requests at the same time would likely trigger GitHub’s secondary rate limits — restrictions on the number of concurrent requests in a short window (distinct from the primary API rate limit of requests per hour).

p-limit is a small Node.js library that enforces a maximum concurrency. p-limit(10) creates a limiter that allows at most 10 promises to run simultaneously. Every Octokit API call in the GitHub MCP server passes through this limiter, ensuring GitHub never sees more than 10 in-flight requests from AMTP at any moment.

Octokit (the official GitHub JavaScript SDK) adds two more layers of resiliency automatically when configured:

Retry plugin — automatically retries failed requests with exponential back-off up to a configured maximum number of attempts. Transient network errors and server-side 5xx responses are retried transparently.
Throttle plugin — detects GitHub’s rate-limit response headers (X-RateLimit-Remaining, Retry-After) and pauses outgoing requests until the limit resets, instead of failing immediately.

The combination of p-limit(10), Octokit retry, and Octokit throttle means the GitHub MCP server is resilient to temporary GitHub unavailability and self-regulating with respect to rate limits — all without application-level retry loops.

Glossary #

Alphabetical quick-reference. Each term links to its full explanation above.

Term	Definition
Blob SHA	The 40-character SHA-1 hash GitHub returns when you POST file content to the Git blobs endpoint.
Bridge network	A Docker virtual network on which containers resolve each other by service name via built-in DNS.
Compose project	A logical grouping of Docker services defined in one docker-compose.yml file that share a network namespace.
Context window	The maximum number of tokens an LLM can process in one call, including both the prompt and the response.
Docker secret	Sensitive data mounted read-only inside a container at /run/secrets/, invisible to docker inspect and container logs.
Deterministic (workflow)	A Temporal workflow that, given the same event history, always makes the same decisions with no random or time-dependent code.
Flyway	A database migration tool that applies versioned SQL scripts in order and tracks which scripts have already run.
GitHub App	An identity registered on GitHub that authenticates with short-lived installation tokens, independent of any user account.
Grafana	A visualisation platform that connects to Tempo and Prometheus to render dashboards and fire alerts.
Hard context reset	The act of tearing down the LLM worker process after each agent boundary so no messages or state carry over to the next agent.
Idempotency	The property that running an operation twice with the same inputs produces the same result as running it once.
Installation token	A short-lived GitHub API credential scoped to a specific App installation, obtained by exchanging a JWT.
ioredis	A Node.js client library for Redis-compatible stores including Valkey, used by the GitHub MCP server to read and write the cache.
JSON Schema Draft 2020-12	A language-agnostic specification for describing the structure and constraints of a JSON document.
JWT (JSON Web Token)	A signed token the GitHub App generates from its private key to prove its identity before exchanging it for an installation token.
LLM agent	A program that asks a Large Language Model to decide what to do and acts on the structured response.
LRU eviction	Valkey's memory-management policy that automatically removes the least-recently-used keys when the memory cap is reached.
Maestro	A mobile UI testing framework that drives iOS and Android apps using YAML flow files.
MCP (Model Context Protocol)	An open JSON-RPC standard for exposing external capabilities as named tools that LLM agents can call.
Octokit	The official GitHub JavaScript SDK that manages App authentication, request retries, and rate-limit throttling automatically.
OTel Collector	A standalone process that receives OTLP telemetry from services, applies processors, and routes traces to Tempo and metrics to Prometheus.
OTLP	OpenTelemetry Protocol, the gRPC or HTTP wire format used to transmit spans and metrics from the SDK to the Collector.
PAT (Personal Access Token)	A long-lived GitHub token tied to a specific user account, used for simple automation but not recommended for production services.
PgBouncer	A connection pooler that multiplexes many short-lived application connections through a small pool of long-lived Postgres backend connections.
Playwright	A browser automation library by Microsoft used to write end-to-end tests as TypeScript spec files.
p-limit	A Node.js library that enforces a maximum number of concurrently running async operations.
PromQL	The query language for Prometheus, used to filter and aggregate time-series metrics.
Prometheus	A time-series metrics database that scrapes metric endpoints on a schedule and stores numeric measurements for alerting and dashboards.
Self-hosted runner	A machine you register with GitHub Actions that executes CI/CD jobs instead of GitHub's cloud-hosted machines.
Span	A single unit of traced work with a start time, end time, and key-value attributes describing what happened.
Stateless agent	An LLM agent that starts each invocation with a blank conversation history and no memory of previous calls.
Streamable HTTP transport	An MCP transport mode where each tool call is an ordinary HTTP POST to a running server, with the server instance created and discarded per request.
Temporal	A durable workflow engine that guarantees a workflow function runs to completion even after crashes or transient failures.
Tempo	Grafana's distributed trace storage backend, queryable with TraceQL.
Trace	A tree of spans representing the full execution path of one request or pipeline run from start to finish.
TraceQL	The query language for Grafana Tempo, used to filter and inspect distributed traces.
Trees API	The GitHub API endpoint that assembles commits from content-addressed blob and tree objects without mutating any existing branch.
TTL (Time-To-Live)	A per-key expiry setting in Valkey after which the key is automatically deleted.
Valkey	An open-source in-memory key-value store (Redis fork) used in AMTP for caching, rate limiting, and PR serialisation locks.
Workflow (Temporal)	A deterministic function in Temporal that sequences activities and orchestrates the overall pipeline run.
amtp_net	The Docker bridge network shared by AMTP's application services and the observability stack.
Zod	A TypeScript-first schema validation library whose runtime checks and static types are derived from the same schema definition.