Summary
Measuring
Promise 1 — Cheaper
—
Cost reduction versus naive full-session token spend.
Data collection started. Results pending sufficient traffic.
Data collection started. Results pending sufficient traffic.
Measuring
Promise 2 — Better
—
Quality lift versus single-vendor baseline.
Requires quality rating data from production sessions.
Requires quality rating data from production sessions.
Measuring
Promise 3 — Barely Slower
Latency benchmark data not yet available.
Methodology
Pipeline latency is measured via an in-process canonical workload benchmark
(mocked vendor calls, 500–2000 samples). It reflects routing, context injection, policy checks,
and telemetry overhead — not vendor API latency, which varies by model and network.
Cost reduction and quality lift require production traffic with sufficient session depth. These metrics will populate as real usage data accumulates.
Benchmarks run nightly via GitHub Actions. Source:
Cost reduction and quality lift require production traffic with sufficient session depth. These metrics will populate as real usage data accumulates.
Benchmarks run nightly via GitHub Actions. Source:
bench-out/three-promise.json.