Skip to content

Metrics Guide

Every request that passes through a ChaosLLM or ChaosWeb server is recorded in a SQLite database. You can query these metrics to understand error rates, latency distributions, and how your pipeline responds to injected faults.

What Gets Recorded

Each request produces a row in the requests table with fields including:

Field Description
request_id Unique ID for this request
timestamp_utc ISO-8601 timestamp
endpoint / path The requested URL path
outcome success, error_injected, error_malformed, error_redirect, etc.
status_code HTTP status code returned (NULL for connection-level errors)
error_type Specific error injected (e.g., rate_limit, timeout, malformed_truncated)
injection_type Category of injection applied
latency_ms Total request duration in milliseconds
injected_delay_ms Artificial delay added (latency simulation + slow response)

ChaosLLM additionally records:

Field Description
model Requested model name (as sent by client, not fabricated)
deployment Azure deployment name (if using Azure endpoint)
message_count Number of messages in the chat request
prompt_tokens_approx Approximate prompt token count
response_tokens Response token count
response_mode Content generation mode used (random, template, echo, preset)

ChaosWeb additionally records:

Field Description
content_type_served Content-Type header returned
encoding_served Actual encoding used (for encoding mismatch errors)
redirect_target SSRF redirect destination URL
redirect_hops Number of hops in redirect chain

Querying via Admin API

All admin endpoints require authentication with Authorization: Bearer <token>.

GET /admin/stats -- Summary Statistics

Returns aggregated statistics for the current run:

curl http://localhost:8000/admin/stats \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Example response:

{
  "run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "started_utc": "2025-03-15T14:30:00.123456+00:00",
  "total_requests": 1500,
  "requests_by_outcome": {
    "success": 1275,
    "error_injected": 195,
    "error_malformed": 30
  },
  "error_rate": 15.0,
  "requests_by_status_code": {
    "200": 1305,
    "429": 120,
    "529": 45,
    "503": 15,
    "500": 15
  },
  "latency_stats": {
    "avg_ms": 125.4,
    "p50_ms": 108.2,
    "p95_ms": 215.6,
    "p99_ms": 485.3,
    "max_ms": 15234.1
  }
}

The latency_stats object provides percentile-based latency distribution:

Field Description
avg_ms Mean latency across all requests
p50_ms Median latency (50th percentile)
p95_ms 95th percentile latency
p99_ms 99th percentile latency
max_ms Maximum observed latency

GET /admin/export -- Raw Data Export

Returns all raw request records and time-series data for external analysis:

curl http://localhost:8000/admin/export \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Example response:

{
  "run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "started_utc": "2025-03-15T14:30:00.123456+00:00",
  "requests": [
    {
      "request_id": "req-001",
      "timestamp_utc": "2025-03-15T14:30:01.000000+00:00",
      "endpoint": "/v1/chat/completions",
      "outcome": "success",
      "status_code": 200,
      "latency_ms": 112.5,
      "model": "gpt-4",
      "message_count": 3,
      "response_mode": "random"
    },
    {
      "request_id": "req-002",
      "timestamp_utc": "2025-03-15T14:30:01.500000+00:00",
      "endpoint": "/v1/chat/completions",
      "outcome": "error_injected",
      "status_code": 429,
      "error_type": "rate_limit",
      "latency_ms": 2.1,
      "model": "gpt-4",
      "message_count": 1
    }
  ],
  "timeseries": [
    {
      "bucket_utc": "2025-03-15T14:30:01+00:00",
      "requests_total": 42,
      "requests_success": 36,
      "requests_rate_limited": 4,
      "requests_error": 2,
      "avg_latency_ms": 118.7,
      "p99_latency_ms": 312.4
    }
  ],
  "config": {
    "server": {"host": "127.0.0.1", "port": 8000, "workers": 4},
    "metrics": {"database": "file:chaosllm-metrics?mode=memory&cache=shared", "timeseries_bucket_sec": 1},
    "error_injection": {"rate_limit_pct": 5.0, "...": "..."},
    "response": {"mode": "random", "...": "..."},
    "latency": {"base_ms": 100, "jitter_ms": 50}
  }
}

The export includes the full server configuration used for this run, making it self-documenting for later analysis.

POST /admin/reset -- Reset Metrics

Clears all request and timeseries data and starts a new run:

curl -X POST http://localhost:8000/admin/reset \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Response:

{
  "status": "reset",
  "new_run_id": "new-uuid-here"
}

Tip

Reset between test scenarios so metrics from one test do not contaminate the next. Each reset generates a new run_id.

Time-Series Aggregation

Metrics are aggregated into time-series buckets using SQLite UPSERT. The bucket size is configurable via timeseries_bucket_sec (default: 1 second).

Each bucket tracks:

  • requests_total -- Total requests in this time window
  • Per-outcome counters (e.g., requests_success, requests_rate_limited)
  • avg_latency_ms -- Average latency for the bucket
  • p99_latency_ms -- Approximate 99th percentile latency

Time-series data is included in the /admin/export response and is useful for observing how error rates and latency change over time, especially around burst windows.

Storage Options

In-Memory (Default)

By default, metrics are stored in a shared in-memory SQLite database:

metrics:
  database: "file:chaosllm-metrics?mode=memory&cache=shared"

This is fast and requires no cleanup, but data is lost when the server stops. The cache=shared URI parameter allows multiple threads to access the same in-memory database.

File-Backed

For persistent storage, specify a file path:

uv run chaosllm serve --preset=realistic --database=/tmp/metrics.db

Or in YAML:

metrics:
  database: /tmp/metrics.db

File-backed databases use WAL (Write-Ahead Logging) mode and synchronous=NORMAL for good write performance without sacrificing durability. The directory is created automatically if it does not exist.

Note

In-memory databases use journal_mode=MEMORY and synchronous=OFF for maximum speed, since durability is not a concern.

Thread Safety

The MetricsStore uses thread-local SQLite connections. Each worker thread gets its own connection, avoiding contention. Connections are tracked and cleaned up when threads exit.

Metrics recording is best-effort: if a SQLite write fails, the error is logged but the chaos response is still returned to the client. A metrics side-effect should never replace an intended chaos response with an unintended real 500.

Python API

When using the server programmatically, you have direct access to metrics:

from errorworks.llm.server import ChaosLLMServer
from errorworks.llm.config import load_config

config = load_config(preset="realistic")
server = ChaosLLMServer(config)

# After running some requests...
stats = server.get_stats()
print(f"Total: {stats['total_requests']}, Error rate: {stats['error_rate']:.1f}%")

# Export everything
data = server.export_metrics()

# Reset for next test
new_run_id = server.reset()