Infrastructure
AMTP’s local infrastructure is defined in
docker-compose.yml and
docker-compose.override.yml. All services share the
amtp_net bridge network. The override file restricts select
port bindings to loopback (127.0.0.1) for local
development. The observability stack runs as a separate Compose project
and also attaches to amtp_net as an external network
— see Observability.
Service Map #
| Service | Image | Host port | Container port | Status |
|---|---|---|---|---|
postgres |
postgres:15-alpine |
5432 (override only) |
5432 |
Implemented |
pgbouncer |
edoburu/pgbouncer:1.22.1 |
6432 |
5432 |
Implemented |
flyway |
flyway/flyway:10-alpine |
— | — |
Implemented
(opt-in migrate profile)
|
valkey |
valkey/valkey:8.0 |
6379 (override only) |
6379 |
Implemented |
healthcheck |
Built from apps/healthcheck/Dockerfile | 8083 (override) / 8080 |
8080 |
Implemented |
github-mcp |
Built from apps/github-mcp/Dockerfile | 8090 |
8090 |
Implemented |
docs |
nginx:1.27-alpine |
${DOCS_HOST_PORT:-80} |
80 |
Implemented |
| Garage S3 | Garage (S3-compatible) | 3900 |
3900 |
Planned |
PostgreSQL 15 #
Primary persistence layer. Stores all run state, stage state, artifacts, approvals, and project registrations. migrations/sql/V1__projects.sql through migrations/sql/V6__indexes.sql define the complete schema.
Configuration #
postgres:
image: postgres:15-alpine
restart: unless-stopped
environment:
POSTGRES_DB: ${POSTGRES_DB:-amtp}
POSTGRES_USER: ${POSTGRES_USER:-amtp}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-amtp}
POSTGRES_INITDB_ARGS: "--auth-host=scram-sha-256 --auth-local=scram-sha-256"
command:
- "postgres"
- "-c" - "password_encryption=scram-sha-256"
- "-c" - "max_connections=200"
- "-c" - "shared_buffers=256MB"
- "-c" - "work_mem=16MB"
- "-c" - "log_min_duration_statement=500"
volumes:
- amtp_pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-amtp} -d ${POSTGRES_DB:-amtp}"]
interval: 5s
timeout: 3s
retries: 20
Tuning notes #
-
scram-sha-256enforced for both host and local auth. -
max_connections = 200— applications connect via PgBouncer (not directly) to stay well below this ceiling. -
shared_buffers = 256 MB,work_mem = 16 MB— dev-tuned values; increase for production. - Queries slower than 500 ms are logged. Reference: infra/postgres/postgresql.conf.
PgBouncer 1.22 #
Connection pooler in transaction mode. Applications
connect to pgbouncer:5432 (internal) or
localhost:6432 (override). PgBouncer multiplexes up to
500 client connections onto a default pool of 25 server connections.
Key parameters #
| Parameter | Value | Notes |
|---|---|---|
POOL_MODE |
transaction |
Connection released to pool after each transaction, not each session. |
MAX_CLIENT_CONN |
500 |
Maximum simultaneous client connections. |
DEFAULT_POOL_SIZE |
25 |
Server connections per database/user pair. |
RESERVE_POOL_SIZE |
5 |
Extra connections for spikes; available after
RESERVE_POOL_TIMEOUT = 3s.
|
AUTH_TYPE |
scram-sha-256 |
Matches Postgres authentication method. |
MAX_PREPARED_STATEMENTS |
100 |
Limits prepared-statement caching in transaction mode. |
SERVER_IDLE_TIMEOUT |
240s |
Idle server connections closed after 4 minutes. |
Note on prepared statements. Transaction-mode pooling
is incompatible with session-persistent prepared statements.
Applications must use named-prepare-per-transaction patterns or rely
on the MAX_PREPARED_STATEMENTS caching mechanism.
Flyway 10 (Migration Runner) #
Schema migrations are managed by
Flyway. The service is defined under the opt-in
migrate Compose profile; it does
not start automatically with
docker compose up. See the
Deployment Runbook
for explicit invocation instructions.
Configuration #
flyway.url=jdbc:postgresql://postgres:5432/amtp
flyway.user=amtp
flyway.schemas=public
flyway.locations=filesystem:/flyway/sql
flyway.baselineOnMigrate=true
flyway.validateOnMigrate=true
flyway.cleanDisabled=true
-
cleanDisabled=true—flyway cleanis permanently disabled to prevent accidental schema destruction in any environment. -
validateOnMigrate=true— checksums of previously applied migrations are verified on every run. -
baselineOnMigrate=true— allows Flyway to baseline an existing schema on first run. -
V6__indexes.sql.confsetsexecuteInTransaction=falsebecauseCREATE INDEXstatements cannot run inside a transaction block in Postgres.
Valkey 8.0 (Cache & Rate-Limit Layer) #
Single-node Valkey instance serving three purposes: MCP API response caching, per-repo concurrency control, and sliding-window user rate limiting. It is the core token-accounting infrastructure.
Eviction & Persistence Policy (Required Flags) #
valkey:
image: valkey/valkey:8.0
command:
- "valkey-server"
- "--save" - "" # disable RDB snapshots
- "--appendonly" - "no" # disable AOF persistence
- "--requirepass" - "${VALKEY_PASSWORD}"
- "--maxmemory" - "512mb" # hard memory cap
- "--maxmemory-policy" - "allkeys-lru" # evict LRU keys when cap is reached
| Flag | Value | Rationale |
|---|---|---|
--maxmemory |
512mb |
Hard upper bound for the in-memory store. Prevents unbounded memory growth. |
--maxmemory-policy |
allkeys-lru |
Evicts least-recently-used keys across
all namespaces when the cap is reached. Chosen over
volatile-lru because rate-limit sorted-set keys
(amtp:rl:user:*:runs) may not carry a TTL at
the moment of eviction pressure.
|
--save "" |
(empty — disables RDB) | Valkey is intentionally ephemeral. No RDB snapshot file is written. |
--appendonly no |
no |
AOF persistence disabled. Data is not durable across container restarts by design. |
OOM failure mode if allkeys-lru is omitted
Without --maxmemory-policy allkeys-lru, once
maxmemory is reached Valkey rejects all write commands
with:
OOM command not allowed when used memory > 'maxmemory'
The healthcheck’s PING still returns
PONG in this state, so a PING-only probe
will not detect the degradation. Application-layer
SET, ZADD, and INCR calls
will fail silently from the probe’s perspective. This is
precisely why the healthcheck requires a synthetic write (see
Healthcheck section below).
Key Namespaces & TTL Contract #
All keys are scoped under the amtp: root namespace.
Reference: infra/valkey/NAMESPACES.md.
| Purpose | Key pattern | TTL |
|---|---|---|
GitHub MCP
repo.tree result cache
|
amtp:mcp:tree:{owner}:{repo}:{ref} |
600 s |
GitHub MCP
.gitignore blob cache
|
amtp:mcp:blob:{sha} |
3600 s |
GitHub MCP
absent .gitignore sentinel
|
amtp:mcp:null_gitignore:{owner}:{repo}:{ref}:{path}
|
600 s |
| User rate limit (sliding window) | amtp:rl:user:{userId}:runs |
3600 s sliding |
| Repo concurrency counter | amtp:rl:repo:{repo}:concurrency |
3600 s safety net |
| PR serialization lock | amtp:rl:repo:{repo}:pr_lock |
120 s (per acquisition) |
| Prompt cache | amtp:prompt:{model}:{sha256(prompt)} |
3600 s |
Sliding-Window Rate Limit #
A single stable sorted-set key per user
(amtp:rl:user:{userId}:runs) backs the per-hour run
limit. The calling worker computes now and
cutoff
client-side and injects them as literal epoch-millisecond integers.
const now = Date.now();
const cutoff = now - 3_600_000;
const key = `amtp:rl:user:${userId}:runs`;
const results = await client
.multi()
.zremrangebyscore(key, 0, cutoff) // evict entries older than 1 h
.zadd(key, now, runId) // score = epoch ms; member = run id
.zcard(key) // count of runs in current window
.expire(key, 3600) // safety TTL
.exec();
const count = results[2][1]; // [err, value] pairs; index 2 = ZCARD
All four commands execute atomically via
MULTI/EXEC. Literal integers, not
expressions, are on the wire.
Healthcheck Service (Node.js / Express / ioredis) #
Lightweight Express service that probes Valkey availability and exposes the result as a structured JSON response. Used by load-balancers and monitoring pipelines to gate on cache availability. Source: apps/healthcheck/src/server.js and apps/healthcheck/src/valkey.js.
Endpoint #
GET /health
| Condition | Status | Body |
|---|---|---|
| All three probe steps succeed within 500 ms | 200 OK |
{ "status": "ok", "valkey": "up", "latency_ms": <n>
}
|
| Timeout or any probe step fails | 503 Service Unavailable |
{ "status": "degraded", "valkey": "down", "error": "...",
"probe": "<step>", "latency_ms": <n> }
|
Three-Step Write Probe (Required Contract) #
The required probe sequence, executed within a single 500 ms timeout window:
- PING — verifies Valkey liveness.
-
SET _amtp_health_ping 1 EX 1— unconditional overwrite. Verifies write availability. NoNXflag: concurrent probes within the 1-second TTL window must not cause false negatives. -
GET _amtp_health_ping— verifies round-trip read path. Fails with"probe read mismatch"if the returned value is not"1".
Why NX was rejected
With NX (Set if Not eXists), a second probe fired
within the 1-second TTL window observes the key as existing,
receives nil from SET NX, and a naive
implementation treats this as a write failure, returning
503 degraded even though Valkey is fully healthy.
Unconditional SET always returns OK when
writes are accepted, regardless of concurrent probes.
Reference implementation (not yet applied to repository)
const PROBE_KEY = "_amtp_health_ping";
app.get("/health", async (req, res) => {
const start = performance.now();
try {
const timeout = new Promise((_, reject) =>
setTimeout(() => reject(new Error("Timeout")), 500)
);
await Promise.race([
(async () => {
await client.ping(); // step 1: liveness
await client.set(PROBE_KEY, "1", "EX", 1); // step 2: write (no NX)
const echo = await client.get(PROBE_KEY); // step 3: read
if (echo !== "1") throw new Error("probe read mismatch");
})(),
timeout,
]);
const latency_ms = Number.parseFloat((performance.now() - start).toFixed(2));
return res.status(200).json({ status: "ok", valkey: "up", latency_ms });
} catch (err) {
const latency_ms = Number.parseFloat((performance.now() - start).toFixed(2));
return res.status(503).json({
status: "degraded", valkey: "down",
error: err.message, latency_ms,
});
}
});
Probe key hygiene
The probe key _amtp_health_ping intentionally lives
outside the amtp: namespace (per
infra/valkey/NAMESPACES.md § Key Naming Conventions) to
prevent any collision with application keys. It carries a 1-second
TTL and requires no explicit delete step.
Environment Variables #
| Variable | Default | Description |
|---|---|---|
VALKEY_HOST |
valkey |
Hostname of the Valkey instance. |
VALKEY_PORT |
6379 |
Port of the Valkey instance. |
VALKEY_PASSWORD |
— |
Auth password (requirepass value). Required.
|
PORT |
8080 |
Port this service listens on. Host override maps to
8083.
|
Reference: apps/healthcheck/README.md, .env.example.
Dockerfile #
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm install --omit=dev
FROM node:20-alpine
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY package.json ./
COPY src/ ./src/
USER node
EXPOSE 8080
CMD ["node", "src/server.js"]
node user.
GitHub MCP Server (amtp-github-mcp)
#
A stateless
Streamable-HTTP MCP server that exposes
two tools — repo.tree and
repo.read_file — over a GitHub App installation. It
connects to Valkey for caching (see namespaces above) and to the OTel
Collector for traces and metrics.
Environment variables #
| Variable | Default | Description |
|---|---|---|
GITHUB_APP_ID |
— | GitHub App numeric ID. Required. |
GITHUB_APP_INSTALLATION_ID |
— | Installation ID for the target org/user. Required. |
MCP_HTTP_PORT |
8090 |
Port the MCP HTTP server listens on. |
VALKEY_HOST / VALKEY_PORT /
VALKEY_PASSWORD
|
valkey / 6379 / — |
Valkey connection. Password required. |
OTEL_EXPORTER_OTLP_ENDPOINT |
http://otel-collector:4318 |
OTLP HTTP endpoint for traces and metrics. |
OTEL_SERVICE_NAME |
amtp-github-mcp |
Service name attributed in Tempo / Grafana. |
The GitHub App private key is mounted as a Docker secret at
/run/secrets/github_app_key (source file:
./secrets/github_app_key.pem). See
Deployment → Prerequisites.
Docs Service (nginx) #
Static documentation server based on
nginx:1.27-alpine. Serves the compiled
docs/AMTP_Docs_Website/ directory over HTTP. Host port is
parameterized via DOCS_HOST_PORT (default
80). Configuration:
infra/nginx/default.conf.
nginx configuration highlights #
-
gzip enabled for
text/html,text/css,application/javascript,application/json, andimage/svg+xml. -
Security headers:
X-Frame-Options: SAMEORIGIN,X-Content-Type-Options: nosniff,Referrer-Policy: strict-origin-when-cross-origin. -
/healthzreturns200 OKwith bodyok. Used by docker-compose.yml healthcheck and CI smoke tests. -
DAC requirement: all files under
docs/AMTP_Docs_Website/ and
infra/nginx/ must be readable by the nginx worker
(
chmod -R a+rXapplied in CI before serving).
Garage S3 (Artifact Storage) #
Garage is an S3-compatible object store provisioned for AMTP test bundle storage. It is a lightweight, self-hosted alternative to AWS S3.
| Parameter | Value |
|---|---|
| API port | 3900 |
| Object retention | 7 days (lifecycle policy on the test-bundle bucket) |
| API compatibility | S3-compatible (AWS SDK v2/v3 compatible) |
| Usage |
Storage of generated test bundle archives referenced by
artifacts.storage_uri (see
migrations/sql/V4__artifacts.sql)
|
Network & Volumes #
amtp_net- Bridge network. All services communicate by service name (Docker internal DNS). No service is exposed to the host by default except via the override file.
-
Network pinning. The network name is
amtp_net— fixed byCOMPOSE_PROJECT_NAME=amtpin .env. Changing the project name changes the network name and breaks all cross-stack service discovery. The observability stack in docker-compose.observability.yml declaresamtp_net: external: trueand attaches to this exact name. The main stack must be brought up at least once before the observability stack deploys. See Observability → Network Dependency. amtp_pgdata-
Named volume for Postgres data directory
(
/var/lib/postgresql/data). Persists across container restarts. Must be explicitly removed withdocker compose down -vto reset.