Observability
AMTP ships a full observability stack deployed as a separate Docker Compose project (docker-compose.observability.yml). LLM agents emit structured OpenTelemetry traces and token-count metrics through an OTel Collector, which forwards traces to Grafana Tempo and metrics to Prometheus. Grafana aggregates both signals into a pre-built LLM Token Accounting dashboard and an alerting pipeline that fires to Slack and email.
Pipeline Overview #
The end-to-end signal flow from agent execution to Grafana visualization:
otel/opentelemetry-collector-contrib:0.100.0- otlp — grpc :4317, http :4318
- memory_limiter → attributes/llm_meta → batch
- otlp/tempo — traces
- prometheus — metrics
- HTTP :3200
- gRPC :9095
- TraceQL query engine
amtp-prometheus- :9091 (host)
- 15 d retention
- PromQL query engine
- Grafana Tempo — TraceQL
- Prometheus — PromQL
| Service | Image | Role | Host port(s) |
|---|---|---|---|
otel-collector |
otel/opentelemetry-collector-contrib:0.100.0
|
Receive, process, and route all telemetry |
${OBS_OTEL_GRPC_PORT:-4317},
${OBS_OTEL_HTTP_PORT:-4318},
${OBS_OTEL_METRICS_PORT:-8888},
${OBS_OTEL_HEALTH_PORT:-13133},
${OBS_OTEL_ZPAGES_PORT:-55679}
|
tempo |
grafana/tempo:2.4.1 |
Distributed trace storage & query |
${OBS_TEMPO_HTTP_PORT:-3200},
${OBS_TEMPO_GRPC_PORT:-9095}
|
amtp-prometheus |
prom/prometheus:v2.52.0 |
Metrics storage, scraping, & alerting |
${OBS_PROMETHEUS_PORT:-9091}→9090
|
grafana |
grafana/grafana:10.4.2 |
Visualization, dashboards, alerting UI | ${OBS_GRAFANA_PORT:-3000} |
OpenTelemetry Collector #
The Collector is the single ingestion point for all AMTP telemetry. Configuration: infra/observability/otel-collector-config.yaml.
Receivers #
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins: ["http://*", "https://*"]
Processors #
| Processor | Purpose |
|---|---|
memory_limiter |
Hard cap at 512 MiB; spike limit 128 MiB; checked every 1 s. |
attributes/llm_meta |
Upserts telemetry.sdk.name=opentelemetry; inserts
pipeline.source=zt-amtp on every span for
downstream filtering.
|
batch |
Aggregates spans into batches (timeout 1 s, size 1024, max 2048) before forwarding to reduce network overhead. |
Pipeline execution order:
memory_limiter → attributes/llm_meta → batch.
Exporters #
| Exporter | Target | Notes |
|---|---|---|
otlp/tempo |
tempo:4317 (gRPC, insecure) |
Retry on failure: initial 1 s, max 30 s, elapsed 120 s. Queue: 4 consumers, 1000 entries. |
prometheus |
Scrape endpoint 0.0.0.0:8889 |
Namespace llm → metric names become
llm_prompt_tokens_total,
llm_completion_tokens_total,
llm_total_tokens_total. 5-minute metric
expiration. Resource attributes converted to labels.
|
debug |
Collector stdout | Verbosity: basic. Development aid only. |
Extensions #
| Extension | Endpoint | Use |
|---|---|---|
health_check |
0.0.0.0:13133 |
External liveness probe (not usable inside the distroless container itself). |
pprof |
0.0.0.0:1777 |
Go pprof profiling for collector performance analysis. |
zpages |
0.0.0.0:55679 |
In-process debug page showing pipeline stats, latency, errors. |
Grafana Tempo 2.4.1 #
Distributed trace backend. Receives spans from the OTel Collector via
OTLP gRPC on its internal port 4317. Exposes a Grafana
datasource on HTTP 3200 for TraceQL queries.
Configuration: infra/observability/tempo.yaml.
Metrics Generator #
Tempo’s built-in metrics_generator derives span
metrics and service graphs from ingested traces, then remote-writes
them to Prometheus:
metrics_generator:
storage:
remote_write:
- url: http://amtp-prometheus:9090/api/v1/write
send_exemplars: true
processor:
service_graphs:
dimensions: [service.name, span.kind]
span_metrics:
dimensions:
- service.name
- span.name
- span.kind
- status.code
- llm.model
- llm.provider
- gen_ai.system
enable_target_info: true
The span-metric dimensions include llm.model,
llm.provider, and gen_ai.system so Grafana
can break down P95 latency and throughput by model and provider
without requiring a separate labeling step.
Storage & Retention #
-
Backend: local filesystem
(
/var/tempo/blocks) backed by Docker volumetempo_data. -
Retention:
block_retention: 72h— traces older than 72 hours are compacted and dropped. -
WAL:
/var/tempo/walfor durability across ingester restarts. -
Bloom filter: false-positive rate
0.01; block encodingzstd. - Max traces per user: 100,000. Ingestion rate limit: 15 MB/s, burst 20 MB/s.
Prometheus 2.52 (amtp-prometheus)
#
Metrics store. The service is named amtp-prometheus (not
the default prometheus) to avoid collisions with other
Prometheus instances on the amtp_net network. Host port
${OBS_PROMETHEUS_PORT:-9091} maps to container port
9090. Configuration:
infra/observability/prometheus.yml.
Scrape Targets #
scrape_configs:
# LLM token metrics emitted by agents via OTEL collector
- job_name: otel-collector
static_configs:
- targets: ['otel-collector:8889']
# Tempo span metrics (service graphs, span durations) via metrics_generator
- job_name: tempo
static_configs:
- targets: ['tempo:3200']
metrics_path: /metrics
Runtime Flags #
-
--storage.tsdb.retention.time=15d— metrics retained for 15 days. -
--web.enable-remote-write-receiver— accepts remote-write from Tempo’s metrics generator. -
--web.enable-lifecycle— exposes/-/reloadand/-/quitendpoints. -
External label:
cluster=zt-amtpadded to all scraped metrics.
Grafana 10.4.2 #
Visualization layer. Depends on both Tempo and Prometheus being healthy before starting. Provisioned via Git-backed YAML files mounted from infra/observability/grafana/provisioning/.
Provisioned Datasources #
| UID | Type | URL | Notes |
|---|---|---|---|
tempo |
Tempo | http://tempo:3200 |
TraceQL; linked to Prometheus for trace→metrics correlation. |
prometheus |
Prometheus | http://amtp-prometheus:9090 |
Default datasource. PromQL for LLM token panels and span metrics. |
LLM Token Accounting Dashboard #
UID: llm-token-accounting. Auto-provisioned from
infra/observability/grafana/provisioning/dashboards/llm-token-accounting.json. Refresh: 30
s. Default time range: last 3 hours.
| Panel | Type | Query (simplified) |
|---|---|---|
| Total Tokens | Stat |
sum(llm_prompt_tokens_total) +
sum(llm_completion_tokens_total)
|
| Prompt Tokens | Stat | sum(llm_prompt_tokens_total) |
| Completion Tokens | Stat | sum(llm_completion_tokens_total) |
| P95 LLM Latency | Stat (ms) |
histogram_quantile(0.95, sum
by(le)(rate(traces_spanmetrics_duration_seconds_bucket{span_name="llm.completion"}[5m])))
* 1000
|
| LLM Calls / sec | Stat (req/s) |
sum(rate(traces_spanmetrics_calls_total{span_name="llm.completion"}[5m]))
|
| Token Rate (per minute) | Time series | Prompt + Completion + Total rates over 1 m window |
| LLM Trace Search | Traces (Tempo) |
Native Tempo search filtered by service_name /
span_name template variables
|
Generic OAuth / OIDC SSO #
Grafana supports any OAuth 2.0 / OIDC provider via the
GF_AUTH_GENERIC_OAUTH_* environment variable conventions
(mapped to the [auth.generic_oauth] ini section). SSO is
disabled by default
(GF_AUTH_GENERIC_OAUTH_ENABLED=false). Role attribute
path defaults to: grant Admin if the user is in group
grafana-admins, otherwise Viewer.
SMTP for email alerts is likewise disabled by default
(GF_SMTP_ENABLED=false). All configurable values are
documented in .env.example § Observability stack.
Alerting #
Alerting is provisioned from infra/observability/grafana/provisioning/alerting/alerting.yaml. The provisioning file defines contact points, notification policies, mute timings, and alert rules in a single YAML document.
Contact Points #
| Name | Type | Channel |
|---|---|---|
slack-llm-alerts |
Slack |
#llm-alerts via
CHANGEME_SLACK_WEBHOOK_URL
|
email-llm-alerts |
CHANGEME_ALERT_EMAIL_ADDRESSES (single combined
email)
|
Notification Policies #
-
Default receiver:
slack-llm-alerts. Group by:alertname, service_name, cluster. - Group wait 30 s → interval 5 m → repeat 4 h.
-
Sub-route for
severity=critical: additionally routes toemail-llm-alertswith group wait 0 s, interval 1 m, repeat 1 h,continue: trueso Slack still fires.
Mute Timings #
A maintenance-window mute timing suppresses all
notifications on
Saturday and Sunday 00:00–23:59
(server timezone).
Alert Rules #
| Rule UID | Title | Severity | For | Summary |
|---|---|---|---|---|
otel-collector-down |
OTEL Collector Unreachable | critical | 2 m |
No scrape data from otel-collector:8888 for >2
minutes. Traces may be lost.
|
Pipeline Verification (apps/otel-verify/) #
apps/otel-verify/verify_trace.py is a self-contained Python script that emits one structured LLM trace plus LLM token counters through the real OTel Collector to verify the full observability pipeline end-to-end. It is used by .github/workflows/observability-ci-cd.yml as the post-deploy acceptance gate.
What it emits #
-
Trace: a root span
llm.completionwith two child spans (llm.inference,llm.response.parse). Span attributes match the LLM instrumentation contract below (128 prompt / 64 completion / 192 total tokens; modelgpt-4o; provideropenai). -
Metrics: counters
prompt_tokens,completion_tokens,total_tokenswith attributesllm.model,llm.provider,gen_ai.system.
Trace ID extraction #
verify_trace.py prints the phrase “Trace ID”
on multiple lines — once with the 32-hex value and once in the
user-hint line “search by Trace ID above”. A
naïve awk '{print $NF}' produces a multi-line string
that breaks the subsequent curl URL. The canonical
extraction in CI uses a strict regex:
TRACE_ID=$(
python apps/otel-verify/verify_trace.py \
| grep -oE '[0-9a-f]{32}' \
| head -n 1
)
grep -oE '[0-9a-f]{32}' extracts only that token;
head -n 1 collapses duplicate matches.
Tempo polling #
After emitting the trace, CI polls
GET /api/traces/{TRACE_ID} on Tempo in a retry loop (20
attempts × 3 s = 60 s budget):
for i in $(seq 1 20); do
STATUS=$(curl -s -o /dev/null -w '%{http_code}' \
"http://localhost:${OBS_TEMPO_HTTP_PORT}/api/traces/${TRACE_ID}" || true)
if [ "$STATUS" = "200" ]; then
echo "Trace ${TRACE_ID} confirmed in Tempo (attempt $i)"
exit 0
fi
echo "Waiting for Tempo to ingest trace... ($i/20, status=$STATUS)"
sleep 3
done
exit 1
Dependencies #
opentelemetry-api==1.24.0
opentelemetry-sdk==1.24.0
opentelemetry-exporter-otlp-proto-grpc==1.24.0
opentelemetry-exporter-otlp-proto-http==1.24.0
.venv during each CI run.
LLM Instrumentation Contract #
All LLM agent spans must carry the following attributes to be correctly bucketed by the LLM Token Accounting dashboard and the Tempo span-metric dimensions.
Required span attributes #
| Attribute | Type | Example |
|---|---|---|
llm.provider |
string | openai |
llm.model |
string | gpt-4o |
llm.request.type |
string | chat |
llm.tokens.prompt |
int | 128 |
llm.tokens.completion |
int | 64 |
llm.tokens.total |
int | 192 |
gen_ai.system |
string | openai |
gen_ai.request.model |
string | gpt-4o |
Required metric counters #
Emitted as OTLP counter metrics (no llm_ prefix in the
instrument name — the Collector’s Prometheus exporter adds
the llm namespace and the _total suffix):
| Instrument name | Prometheus name | Unit |
|---|---|---|
prompt_tokens |
llm_prompt_tokens_total |
token |
completion_tokens |
llm_completion_tokens_total |
token |
total_tokens |
llm_total_tokens_total |
token |
Required metric attributes: llm.model,
llm.provider, gen_ai.system.
GitHub MCP instrumentation #
The amtp-github-mcp service
emits its own OpenTelemetry spans and metrics via the same OTLP HTTP
endpoint (http://otel-collector:4318).
OTEL_SERVICE_NAME=amtp-github-mcp is set in the container
environment so traces and metrics are attributed separately from the
LLM agents. Key span names: repo.tree,
repo.read_file, github.api,
valkey.get, valkey.set. Key metric
histograms: mcp.tool.duration,
mcp.github_api.duration. These use MCP-specific
attributes (mcp.tool, mcp.cache_hit,
mcp.exclusion_reason) and are not subject to the
LLM instrumentation contract above.
Host Ports & OBS_* Overrides
#
Every observability host port is parameterized via an
OBS_* environment variable so that port conflicts on a
shared dev host can be resolved without code changes — set the
relevant variable in GitHub Actions
Settings → Environments → dev → Variables.
| Variable | Default | Service | Purpose |
|---|---|---|---|
OBS_OTEL_GRPC_PORT |
4317 |
otel-collector |
OTLP gRPC ingestion endpoint |
OBS_OTEL_HTTP_PORT |
4318 |
otel-collector |
OTLP HTTP ingestion endpoint |
OBS_OTEL_METRICS_PORT |
8888 |
otel-collector |
Collector self-metrics (Prometheus scrape) |
OBS_OTEL_HEALTH_PORT |
13133 |
otel-collector |
health_check extension endpoint |
OBS_OTEL_ZPAGES_PORT |
55679 |
otel-collector |
zpages debug UI |
OBS_TEMPO_HTTP_PORT |
3200 |
tempo |
Tempo HTTP API (Grafana datasource + trace lookup) |
OBS_TEMPO_GRPC_PORT |
9095 |
tempo |
Tempo gRPC API |
OBS_PROMETHEUS_PORT |
9091 |
amtp-prometheus |
Prometheus HTTP API (maps to container :9090) |
OBS_GRAFANA_PORT |
3000 |
grafana |
Grafana UI and API |
Network Dependency (amtp_net)
#
Two remediation paths (apply one):
-
Workflow-level sequencing (recommended). Add
needs: healthcheckto theobservabilityjob in .github/workflows/ci-cd.yml. This makes the base-stack bring-up a precondition, visible in the workflow graph. -
Bootstrap-level creation. Add the following
idempotent command to infra/bootstrap.sh so the network
pre-exists any CI run:
docker network create amtp_net --label amtp.owner=bootstrap 2>/dev/null || trueAdd after the Docker Engine installation block in infra/bootstrap.sh.
Until one path is applied, on a pristine host manually run either
docker network create amtp_net or
docker compose -f docker-compose.yml up -d --wait healthcheck
before triggering the observability workflow.
See also: Observability CI/CD workflow and Dev-host bootstrap runbook.