(language, vertical) on three axes:
- Accuracy — STT word error rate, LLM eval pass rate, TTS naturalness.
- Latency — p50 / p95 round-trip time from a representative location.
- Cost — per-second / per-token / per-character price.
optimizeFor preset combines the three axes. See routing for the weights.
Refresh cadence
Benchmarks rerun on a schedule and on every benchmark suite update. The active snapshot is identified byscoresRunId, returned with every routing decision. Two calls with identical intent within the same snapshot will route the same way; across snapshots, a re-ranking can move a different provider into the top spot.
Health gating
Providers with a recent error rate above 50% are dropped from candidate selection regardless of score. They re-enter once their error rate recovers.Why benchmarks beat a single eval
Production traffic is heterogeneous: Spanish healthcare dictation has different leaders than English casual chat. A static “best STT” decision under-serves anything outside the benchmarked happy path. Speko’s routing layer means you get the leader per call, not per integration choice.Inspecting scores
GET /v1/routing/preview?language=en-US&vertical=healthcare&optimize_for=accuracy returns the same SelectedCandidate shape the router would use, including runnersUp[].score. Use it to debug “why did Speko pick X for this intent?”.
Custom benchmarks
Not in v1. The vertical taxonomy is fixed (general, healthcare, finance, legal). Reach out if your traffic doesn’t fit.