Infrastructure

AMTP’s local infrastructure is defined in docker-compose.yml and docker-compose.override.yml. All services share the amtp_net bridge network. The override file restricts select port bindings to loopback (127.0.0.1) for local development. The observability stack runs as a separate Compose project and also attaches to amtp_net as an external network — see Observability.

Service Map #

Service Image Host port Container port Status
postgres postgres:15-alpine 5432 (override only) 5432 Implemented
pgbouncer edoburu/pgbouncer:1.22.1 6432 5432 Implemented
flyway flyway/flyway:10-alpine Implemented (opt-in migrate profile)
valkey valkey/valkey:8.0 6379 (override only) 6379 Implemented
healthcheck Built from apps/healthcheck/Dockerfile 8083 (override) / 8080 8080 Implemented
github-mcp Built from apps/github-mcp/Dockerfile 8090 8090 Implemented
docs nginx:1.27-alpine ${DOCS_HOST_PORT:-80} 80 Implemented
Garage S3 Garage (S3-compatible) 3900 3900 Planned

PostgreSQL 15 #

Primary persistence layer. Stores all run state, stage state, artifacts, approvals, and project registrations. migrations/sql/V1__projects.sql through migrations/sql/V6__indexes.sql define the complete schema.

Configuration #

postgres:
  image: postgres:15-alpine
  restart: unless-stopped
  environment:
    POSTGRES_DB:            ${POSTGRES_DB:-amtp}
    POSTGRES_USER:          ${POSTGRES_USER:-amtp}
    POSTGRES_PASSWORD:      ${POSTGRES_PASSWORD:-amtp}
    POSTGRES_INITDB_ARGS:   "--auth-host=scram-sha-256 --auth-local=scram-sha-256"
  command:
    - "postgres"
    - "-c" - "password_encryption=scram-sha-256"
    - "-c" - "max_connections=200"
    - "-c" - "shared_buffers=256MB"
    - "-c" - "work_mem=16MB"
    - "-c" - "log_min_duration_statement=500"
  volumes:
    - amtp_pgdata:/var/lib/postgresql/data
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-amtp} -d ${POSTGRES_DB:-amtp}"]
    interval: 5s
    timeout:  3s
    retries:  20
docker-compose.yml § postgres (L8–L39)

Tuning notes #

PgBouncer 1.22 #

Connection pooler in transaction mode. Applications connect to pgbouncer:5432 (internal) or localhost:6432 (override). PgBouncer multiplexes up to 500 client connections onto a default pool of 25 server connections.

Key parameters #

Parameter Value Notes
POOL_MODE transaction Connection released to pool after each transaction, not each session.
MAX_CLIENT_CONN 500 Maximum simultaneous client connections.
DEFAULT_POOL_SIZE 25 Server connections per database/user pair.
RESERVE_POOL_SIZE 5 Extra connections for spikes; available after RESERVE_POOL_TIMEOUT = 3s.
AUTH_TYPE scram-sha-256 Matches Postgres authentication method.
MAX_PREPARED_STATEMENTS 100 Limits prepared-statement caching in transaction mode.
SERVER_IDLE_TIMEOUT 240s Idle server connections closed after 4 minutes.

Note on prepared statements. Transaction-mode pooling is incompatible with session-persistent prepared statements. Applications must use named-prepare-per-transaction patterns or rely on the MAX_PREPARED_STATEMENTS caching mechanism.

docker-compose.yml § pgbouncer (L41–L67)

Flyway 10 (Migration Runner) #

Schema migrations are managed by Flyway. The service is defined under the opt-in migrate Compose profile; it does not start automatically with docker compose up. See the Deployment Runbook for explicit invocation instructions.

Configuration #

flyway.url=jdbc:postgresql://postgres:5432/amtp
flyway.user=amtp
flyway.schemas=public
flyway.locations=filesystem:/flyway/sql
flyway.baselineOnMigrate=true
flyway.validateOnMigrate=true
flyway.cleanDisabled=true
migrations/flyway.conf

Valkey 8.0 (Cache & Rate-Limit Layer) #

Single-node Valkey instance serving three purposes: MCP API response caching, per-repo concurrency control, and sliding-window user rate limiting. It is the core token-accounting infrastructure.

Eviction & Persistence Policy (Required Flags) #

valkey:
  image: valkey/valkey:8.0
  command:
    - "valkey-server"
    - "--save"         - ""           # disable RDB snapshots
    - "--appendonly"   - "no"         # disable AOF persistence
    - "--requirepass"  - "${VALKEY_PASSWORD}"
    - "--maxmemory"    - "512mb"      # hard memory cap
    - "--maxmemory-policy" - "allkeys-lru"  # evict LRU keys when cap is reached
docker-compose.yml § valkey (L83–L109)
Flag Value Rationale
--maxmemory 512mb Hard upper bound for the in-memory store. Prevents unbounded memory growth.
--maxmemory-policy allkeys-lru Evicts least-recently-used keys across all namespaces when the cap is reached. Chosen over volatile-lru because rate-limit sorted-set keys (amtp:rl:user:*:runs) may not carry a TTL at the moment of eviction pressure.
--save "" (empty — disables RDB) Valkey is intentionally ephemeral. No RDB snapshot file is written.
--appendonly no no AOF persistence disabled. Data is not durable across container restarts by design.

OOM failure mode if allkeys-lru is omitted

Without --maxmemory-policy allkeys-lru, once maxmemory is reached Valkey rejects all write commands with:

OOM command not allowed when used memory > 'maxmemory'

The healthcheck’s PING still returns PONG in this state, so a PING-only probe will not detect the degradation. Application-layer SET, ZADD, and INCR calls will fail silently from the probe’s perspective. This is precisely why the healthcheck requires a synthetic write (see Healthcheck section below).

Key Namespaces & TTL Contract #

All keys are scoped under the amtp: root namespace. Reference: infra/valkey/NAMESPACES.md.

Purpose Key pattern TTL
GitHub MCP repo.tree result cache amtp:mcp:tree:{owner}:{repo}:{ref} 600 s
GitHub MCP .gitignore blob cache amtp:mcp:blob:{sha} 3600 s
GitHub MCP absent .gitignore sentinel amtp:mcp:null_gitignore:{owner}:{repo}:{ref}:{path} 600 s
User rate limit (sliding window) amtp:rl:user:{userId}:runs 3600 s sliding
Repo concurrency counter amtp:rl:repo:{repo}:concurrency 3600 s safety net
PR serialization lock amtp:rl:repo:{repo}:pr_lock 120 s (per acquisition)
Prompt cache amtp:prompt:{model}:{sha256(prompt)} 3600 s

Sliding-Window Rate Limit #

A single stable sorted-set key per user (amtp:rl:user:{userId}:runs) backs the per-hour run limit. The calling worker computes now and cutoff client-side and injects them as literal epoch-millisecond integers.

const now    = Date.now();
const cutoff = now - 3_600_000;
const key    = `amtp:rl:user:${userId}:runs`;

const results = await client
  .multi()
  .zremrangebyscore(key, 0, cutoff)   // evict entries older than 1 h
  .zadd(key, now, runId)              // score = epoch ms; member = run id
  .zcard(key)                         // count of runs in current window
  .expire(key, 3600)                  // safety TTL
  .exec();

const count = results[2][1]; // [err, value] pairs; index 2 = ZCARD
Reference: infra/valkey/NAMESPACES.md § Sliding-Window Rate-Limit Semantics

All four commands execute atomically via MULTI/EXEC. Literal integers, not expressions, are on the wire.

Healthcheck Service (Node.js / Express / ioredis) #

Lightweight Express service that probes Valkey availability and exposes the result as a structured JSON response. Used by load-balancers and monitoring pipelines to gate on cache availability. Source: apps/healthcheck/src/server.js and apps/healthcheck/src/valkey.js.

Endpoint #

GET /health

Condition Status Body
All three probe steps succeed within 500 ms 200 OK { "status": "ok", "valkey": "up", "latency_ms": <n> }
Timeout or any probe step fails 503 Service Unavailable { "status": "degraded", "valkey": "down", "error": "...", "probe": "<step>", "latency_ms": <n> }

Three-Step Write Probe (Required Contract) #

The required probe sequence, executed within a single 500 ms timeout window:

  1. PING — verifies Valkey liveness.
  2. SET _amtp_health_ping 1 EX 1 — unconditional overwrite. Verifies write availability. No NX flag: concurrent probes within the 1-second TTL window must not cause false negatives.
  3. GET _amtp_health_ping — verifies round-trip read path. Fails with "probe read mismatch" if the returned value is not "1".

Why NX was rejected

With NX (Set if Not eXists), a second probe fired within the 1-second TTL window observes the key as existing, receives nil from SET NX, and a naive implementation treats this as a write failure, returning 503 degraded even though Valkey is fully healthy. Unconditional SET always returns OK when writes are accepted, regardless of concurrent probes.

Reference implementation (not yet applied to repository)

const PROBE_KEY = "_amtp_health_ping";

app.get("/health", async (req, res) => {
  const start = performance.now();
  try {
    const timeout = new Promise((_, reject) =>
      setTimeout(() => reject(new Error("Timeout")), 500)
    );

    await Promise.race([
      (async () => {
        await client.ping();                              // step 1: liveness
        await client.set(PROBE_KEY, "1", "EX", 1);      // step 2: write (no NX)
        const echo = await client.get(PROBE_KEY);        // step 3: read
        if (echo !== "1") throw new Error("probe read mismatch");
      })(),
      timeout,
    ]);

    const latency_ms = Number.parseFloat((performance.now() - start).toFixed(2));
    return res.status(200).json({ status: "ok", valkey: "up", latency_ms });
  } catch (err) {
    const latency_ms = Number.parseFloat((performance.now() - start).toFixed(2));
    return res.status(503).json({
      status: "degraded", valkey: "down",
      error: err.message, latency_ms,
    });
  }
});
Replaces current apps/healthcheck/src/server.js implementation.

Probe key hygiene

The probe key _amtp_health_ping intentionally lives outside the amtp: namespace (per infra/valkey/NAMESPACES.md § Key Naming Conventions) to prevent any collision with application keys. It carries a 1-second TTL and requires no explicit delete step.

Environment Variables #

Variable Default Description
VALKEY_HOST valkey Hostname of the Valkey instance.
VALKEY_PORT 6379 Port of the Valkey instance.
VALKEY_PASSWORD Auth password (requirepass value). Required.
PORT 8080 Port this service listens on. Host override maps to 8083.

Reference: apps/healthcheck/README.md, .env.example.

Dockerfile #

FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm install --omit=dev

FROM node:20-alpine
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY package.json ./
COPY src/ ./src/
USER node
EXPOSE 8080
CMD ["node", "src/server.js"]
apps/healthcheck/Dockerfile — two-stage build; runs as unprivileged node user.

GitHub MCP Server (amtp-github-mcp) #

A stateless Streamable-HTTP MCP server that exposes two tools — repo.tree and repo.read_file — over a GitHub App installation. It connects to Valkey for caching (see namespaces above) and to the OTel Collector for traces and metrics.

Environment variables #

Variable Default Description
GITHUB_APP_ID GitHub App numeric ID. Required.
GITHUB_APP_INSTALLATION_ID Installation ID for the target org/user. Required.
MCP_HTTP_PORT 8090 Port the MCP HTTP server listens on.
VALKEY_HOST / VALKEY_PORT / VALKEY_PASSWORD valkey / 6379 / — Valkey connection. Password required.
OTEL_EXPORTER_OTLP_ENDPOINT http://otel-collector:4318 OTLP HTTP endpoint for traces and metrics.
OTEL_SERVICE_NAME amtp-github-mcp Service name attributed in Tempo / Grafana.

The GitHub App private key is mounted as a Docker secret at /run/secrets/github_app_key (source file: ./secrets/github_app_key.pem). See Deployment → Prerequisites.

Docs Service (nginx) #

Static documentation server based on nginx:1.27-alpine. Serves the compiled docs/AMTP_Docs_Website/ directory over HTTP. Host port is parameterized via DOCS_HOST_PORT (default 80). Configuration: infra/nginx/default.conf.

nginx configuration highlights #

Garage S3 (Artifact Storage) #

Garage is an S3-compatible object store provisioned for AMTP test bundle storage. It is a lightweight, self-hosted alternative to AWS S3.

Parameter Value
API port 3900
Object retention 7 days (lifecycle policy on the test-bundle bucket)
API compatibility S3-compatible (AWS SDK v2/v3 compatible)
Usage Storage of generated test bundle archives referenced by artifacts.storage_uri (see migrations/sql/V4__artifacts.sql)

Network & Volumes #

amtp_net
Bridge network. All services communicate by service name (Docker internal DNS). No service is exposed to the host by default except via the override file.
Network pinning. The network name is amtp_net — fixed by COMPOSE_PROJECT_NAME=amtp in .env. Changing the project name changes the network name and breaks all cross-stack service discovery. The observability stack in docker-compose.observability.yml declares amtp_net: external: true and attaches to this exact name. The main stack must be brought up at least once before the observability stack deploys. See Observability → Network Dependency.
amtp_pgdata
Named volume for Postgres data directory (/var/lib/postgresql/data). Persists across container restarts. Must be explicitly removed with docker compose down -v to reset.