CI/CD & Self-hosted Runner
AMTP uses a dedicated RHEL
self-hosted GitHub Actions
runner (amtp-dev-runner, label amtp-dev) to
execute all CI/CD jobs. A root orchestrator
(.github/workflows/ci-cd.yml) fans out to five reusable
child workflows via uses: / workflow_call.
Self-hosted Runner #
| Attribute | Value |
|---|---|
| Operating system | RHEL / Rocky / Alma Linux (dnf-based) |
| Runner name | amtp-dev-runner |
| Runner label |
amtp-dev (custom);
self-hosted, Linux, X64 (auto-added)
|
runs-on value |
[self-hosted, Linux, X64, amtp-dev] |
| Runner type | Persistent, non-ephemeral |
| Docker availability | Required — installed by infra/bootstrap.sh. All CI jobs use Docker Compose. |
| Provisioning |
One-time:
sudo ./infra/bootstrap.sh <runner_user>.
See
Dev-host bootstrap runbook.
|
Because the runner is persistent, any files written
to the checkout directory persist across job runs unless explicitly
deleted. The .env cleanup step documented in
Secret-File Hygiene addresses this.
Workflow Topology #
The root .github/workflows/ci-cd.yml fans out to four reusable child workflows in parallel.
workflow_call dispatch (parallel)
secrets: inherit.
Trigger paths #
on:
push:
branches: [main, dev]
paths:
- "migrations/**"
- "apps/healthcheck/**"
- "apps/github-mcp/**"
- "apps/otel-verify/**"
- "docs/**"
- "infra/nginx/**"
- "infra/observability/**"
- "docker-compose.yml"
- "docker-compose.observability.yml"
- "docker-compose.obs-ci.yml"
- ".github/workflows/ci-cd.yml"
- ".github/workflows/db-ci-cd.yml"
- ".github/workflows/healthcheck-ci-cd.yml"
- ".github/workflows/github-mcp-ci-cd.yml"
- ".github/workflows/docs-ci-cd.yml"
- ".github/workflows/observability-ci-cd.yml"
pull_request:
branches: [main, dev]
paths:
- "migrations/**"
- "apps/healthcheck/**"
- "apps/github-mcp/**"
- "apps/otel-verify/**"
- "docs/**"
- "infra/nginx/**"
- "infra/observability/**"
workflow_dispatch:
Root workflow jobs #
jobs:
database:
name: Database CI/CD
uses: ./.github/workflows/db-ci-cd.yml
secrets: inherit
healthcheck:
name: Healthcheck CI/CD
uses: ./.github/workflows/healthcheck-ci-cd.yml
secrets: inherit
github-mcp:
name: GitHub MCP CI/CD
uses: ./.github/workflows/github-mcp-ci-cd.yml
secrets: inherit
needs: healthcheck
docs:
name: Docs CI/CD
uses: ./.github/workflows/docs-ci-cd.yml
secrets: inherit
observability:
name: Observability CI/CD
uses: ./.github/workflows/observability-ci-cd.yml
secrets: inherit
secrets: inherit is mandatory on every child-workflow
call (see
Required GitHub Secrets & Vars).
Database Workflow (db-ci-cd.yml) #
Two jobs: validate (always runs) and
migrate (runs only on dev via
workflow_call or workflow_dispatch).
Job: validate
#
Spins up an ephemeral Postgres container namespaced with the run ID
(COMPOSE_PROJECT_NAME=amtp-ci-${{ github.run_id }}),
runs flyway migrate against it, prints migration
status, then tears down the stack including all volumes.
- name: Start ephemeral Postgres
run: docker compose -f docker-compose.yml up -d postgres
- name: Run Flyway migrate (ephemeral)
run: docker compose -f docker-compose.yml --profile migrate run --rm flyway migrate
- name: Show migration status
run: docker compose -f docker-compose.yml --profile migrate run --rm flyway info
- name: Tear down ephemeral stack
if: always()
run: docker compose -f docker-compose.yml down -v --remove-orphans
Job: migrate
#
Applies validated migrations to the persistent
dev environment. Gated to the dev branch.
Uses concurrency group amtp-db-dev with
cancel-in-progress: false to ensure migrations are
never cancelled mid-flight.
Healthcheck Workflow (healthcheck-ci-cd.yml) #
Two jobs: validate (builds the Docker image) and
deploy (brings up Valkey + healthcheck on
dev and probes /health).
Job: validate
#
- name: Build healthcheck image
run: docker compose -f docker-compose.yml build healthcheck
- name: Tear down
if: always()
run: docker compose -f docker-compose.yml down --volumes --remove-orphans
Job: deploy
#
Builds and starts Valkey and the healthcheck service, blocks until
both services report healthy, then smoke-tests
http://localhost:8083/health.
- name: Build and start Valkey + Healthcheck (block until healthy)
run: docker compose -f docker-compose.yml up -d --build --wait valkey healthcheck
- name: Smoke-test health endpoint from host
run: curl -sf http://localhost:8083/health
Concurrency group: amtp-healthcheck-dev,
cancel-in-progress: false.
GitHub MCP Workflow (github-mcp-ci-cd.yml) #
Two jobs: validate (TypeScript typecheck + unit tests
inside a Docker build, then production image build) and
deploy (writes GitHub App credentials, brings up the
service, and smoke-tests /healthz). The
deploy job runs only on the dev branch and
requires needs: healthcheck in the root orchestrator so
Valkey is healthy before the MCP container starts.
Job: validate
#
- name: Create ephemeral .env and dummy key
run: |
cat > .env << 'EOF'
GITHUB_APP_ID=0
GITHUB_APP_INSTALLATION_ID=0
EOF
mkdir -p secrets
printf '-----BEGIN RSA PRIVATE KEY-----\nMIIE...\n-----END RSA PRIVATE KEY-----\n' \
> secrets/github_app_key.pem
- name: Typecheck + test (inside Docker)
run: docker build --target test -t amtp-github-mcp-test:${{ github.run_id }} apps/github-mcp
- name: Build production Docker image
run: docker compose -f docker-compose.yml build github-mcp
- name: Tear down
if: always()
run: |
docker compose -f docker-compose.yml down --volumes --remove-orphans
docker rmi amtp-github-mcp-test:${{ github.run_id }} || true
--target test stage runs
npm run typecheck then npm test inside
the Docker build layer, so no Node.js is required on the runner.
Job: deploy
#
- name: Write .env from secrets
run: |
printf '...GITHUB_APP_ID=%s\nGITHUB_APP_INSTALLATION_ID=%s\n' \
"${{ secrets.GH_APP_ID }}" \
"${{ secrets.GH_APP_INSTALLATION_ID }}" >> .env
- name: Write GitHub App private key
run: |
mkdir -p secrets
printf '%s' "${{ secrets.GH_APP_PRIVATE_KEY_B64 }}" | base64 -d \
> secrets/github_app_key.pem
chmod 644 secrets/github_app_key.pem
- name: Deploy GitHub MCP (wait for healthy)
run: docker compose -f docker-compose.yml up -d --build --wait github-mcp
- name: Smoke-test health endpoint
run: curl -sf http://localhost:8090/healthz
amtp-github-mcp-dev,
cancel-in-progress: false.
Docs Workflow (docs-ci-cd.yml) #
Builds the static docs site (node build.js + Pagefind
index), validates it by serving it from an ephemeral nginx container,
and deploys it to the persistent docs nginx service.
Job: validate
#
Runs in an isolated compose project
(COMPOSE_PROJECT_NAME=amtp-docs-validate-${{ github.run_id }})
with DOCS_HOST_PORT=8088 so the ephemeral validation
container cannot collide with the persistent
docs container that owns port 80.
- name: Guard against non-conf files in infra/nginx/
run: test -z "$(find infra/nginx -maxdepth 1 -type f ! -name '*.conf')"
- name: Install docs build dependencies
run: npm install
working-directory: docs/AMTP_Docs_Website
- name: Hydrate diagrams and inject build timestamp
run: node build.js
working-directory: docs/AMTP_Docs_Website
- name: Build Pagefind search index
run: npx pagefind --site . --output-path pagefind
working-directory: docs/AMTP_Docs_Website
- name: Remove node_modules before serving
run: rm -rf docs/AMTP_Docs_Website/node_modules
- name: Normalize file permissions for nginx DAC
run: chmod -R a+rX docs infra/nginx
- name: Start docs (block until healthy)
run: docker compose -f docker-compose.yml up -d --wait docs
- name: Smoke-test healthz
run: curl -sf "http://127.0.0.1:${DOCS_HOST_PORT}/healthz"
- name: Smoke-test index page
run: curl -sf "http://127.0.0.1:${DOCS_HOST_PORT}/" -o /dev/null
- name: Tear down ephemeral stack
if: always()
run: docker compose -f docker-compose.yml down --volumes --remove-orphans
The nginx conf guard (find infra/nginx -maxdepth 1 -type f ! -name
'*.conf') catches any non-.conf file accidentally added to
infra/nginx/. That directory is bind-mounted over
/etc/nginx/conf.d/; nginx will crash on any file that
isn’t valid conf syntax.
The DAC normalization (chmod -R a+rX docs infra/nginx) ensures the
unprivileged nginx worker can read files that a
restrictive runner umask may have left at mode
600 or 700, which would otherwise produce
403s.
Job: deploy
#
Same build steps as validate, then deploys to the
persistent docs service on port 80.
Concurrency group: amtp-docs-dev,
cancel-in-progress: false.
- name: Deploy docs (block until healthy)
run: docker compose -f docker-compose.yml up -d --wait docs
- name: Smoke-test healthz from host
run: curl -sf http://localhost/healthz
- name: Smoke-test index page from host
run: curl -sf http://localhost/ -o /dev/null
Observability Workflow (observability-ci-cd.yml) #
Deploys the four-service observability stack (OTel Collector, Tempo, Prometheus, Grafana) and verifies the end-to-end trace pipeline with a synthetic LLM span. See Observability for the full stack reference.
Job: validate
#
Uses docker-compose.obs-ci.yml which defines
amtp_net as a local network (not
external), so each validate run creates its own isolated network and
never collides with concurrent runs or the production stack. No host
port bindings — all inter-service traffic uses container
names.
Job: deploy
#
Compose project isolation
Legacy-orphan self-healing
Before deploying, the workflow removes any observability containers
left over from a previous run that used the incorrect project name
(amtp instead of amtp-obs). Without this
step, those containers would hold the host ports the new deploy
needs, causing
Bind for 0.0.0.0:<port> failed: port is already
allocated.
Rolling update
- name: Rolling update — observability stack (block until all healthy)
run: >
docker compose -f docker-compose.observability.yml
up -d --wait --wait-timeout 180 --remove-orphans
--wait blocks on each service’s compose-defined
healthcheck and returns on first-healthy. 180 s timeout covers
cold Grafana + Tempo startup on a loaded runner.
Post-deploy trace verification
After the stack is healthy, CI emits a synthetic LLM trace via
apps/otel-verify/verify_trace.py and confirms it
appeared in Tempo.
TRACE_ID=$(
.venv/bin/python verify_trace.py \
--endpoint "http://localhost:${OBS_OTEL_HTTP_PORT}" \
2>/dev/null \
| grep -oE '[0-9a-f]{32}' \
| head -n 1
)
head -n 1 collapses the duplicate match on the hint
line.
for i in $(seq 1 20); do
STATUS=$(curl -s -o /dev/null -w '%{http_code}' \
"http://localhost:${OBS_TEMPO_HTTP_PORT}/api/traces/${TRACE_ID}" || true)
if [ "$STATUS" = "200" ]; then
echo "Trace ${TRACE_ID} confirmed in Tempo (attempt $i)"
exit 0
fi
echo "Waiting for Tempo to ingest trace... ($i/20, status=$STATUS)"
sleep 3
done
exit 1
Secret-File Hygiene on the Persistent Runner #
Every job that writes a .env file must include the
following step as its final step with
if: always():
- name: Purge ephemeral secret files
if: always()
shell: bash
run: |
rm -f .env
find . -maxdepth 2 -type f -name '.env.*' -delete || true
test ! -e .env
.env.
Additional hardening recommendations #
-
Add
chmod 600 .envimmediately afterprintfto restrict file permissions to the runner user only. -
The
.envfile is already listed in .gitignore, preventing accidental commit. - Consider enabling ephemeral runner mode (fresh workspace per job) when runner infrastructure permits, eliminating the accumulation risk entirely.
Required GitHub Secrets & Vars #
Core secrets (all workflows) #
| Secret name | Used in | Description |
|---|---|---|
POSTGRES_PASSWORD |
db-ci-cd.yml, healthcheck-ci-cd.yml,
observability-ci-cd.yml
|
PostgreSQL superuser password. |
VALKEY_PASSWORD |
healthcheck-ci-cd.yml (deploy),
observability-ci-cd.yml
|
Valkey requirepass value. |
GH_APP_ID |
github-mcp-ci-cd.yml (deploy) |
GitHub App numeric ID. The GITHUB_ prefix is
reserved by Actions; secret must be named
GH_APP_ID.
|
GH_APP_INSTALLATION_ID |
github-mcp-ci-cd.yml (deploy) |
GitHub App installation ID for the target org/user. |
GH_APP_PRIVATE_KEY_B64 |
github-mcp-ci-cd.yml (deploy) |
Base64-encoded PEM private key for the GitHub App. Decoded
on-runner to secrets/github_app_key.pem and
mounted as a Docker secret.
|
Grafana secrets #
| Secret name | Description |
|---|---|
GF_SECURITY_ADMIN_PASSWORD |
Grafana admin password. Must change from default
changeme.
|
GF_SECURITY_SECRET_KEY |
32-char secret key for Grafana session signing. |
GF_AUTH_GENERIC_OAUTH_CLIENT_ID |
OAuth 2.0 / OIDC client ID (required if SSO enabled). |
GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET |
OAuth 2.0 / OIDC client secret. |
GF_SMTP_USER |
SMTP authentication username (required if SMTP enabled). |
GF_SMTP_PASSWORD |
SMTP authentication password. |
SLACK_WEBHOOK_URL |
Incoming Webhook URL for the #llm-alerts Slack
channel. Also set as the literal value in
CHANGEME_SLACK_WEBHOOK_URL inside
alerting.yaml before container start.
|
GitHub Actions vars.* (non-secret)
#
| Variable | Default | Description |
|---|---|---|
OBS_OTEL_GRPC_PORT |
4317 |
OTel Collector OTLP gRPC host port |
OBS_OTEL_HTTP_PORT |
4318 |
OTel Collector OTLP HTTP host port |
OBS_OTEL_METRICS_PORT |
8888 |
OTel Collector self-metrics host port |
OBS_OTEL_HEALTH_PORT |
13133 |
OTel health_check extension host port |
OBS_OTEL_ZPAGES_PORT |
55679 |
OTel zpages debug UI host port |
OBS_TEMPO_HTTP_PORT |
3200 |
Tempo HTTP API host port |
OBS_TEMPO_GRPC_PORT |
9095 |
Tempo gRPC API host port |
OBS_PROMETHEUS_PORT |
9091 |
Prometheus HTTP API host port (maps to container :9090) |
OBS_GRAFANA_PORT |
3000 |
Grafana UI and API host port |
GF_AUTH_GENERIC_OAUTH_ENABLED |
false |
Enable generic OAuth SSO |
GF_SMTP_ENABLED |
false |
Enable SMTP for email alerts |
ALERT_EMAIL_ADDRESSES |
— | Comma-separated list of alert recipients (also set in alerting.yaml) |
All secrets are scoped to the dev GitHub environment
(Settings → Environments → dev). Non-secret
variables that may conflict with local port allocations are set under
Settings → Environments → dev → Variables.