synthesize
POST /v1/synthesize — text-to-speech with automatic provider routing.
Synthesize text into audio. The router picks the best TTS provider for your (language, region, optimizeFor) and fails over automatically.
const result = await speko.synthesize('Hello world', {
language: 'en',
});Signature
speko.synthesize(
text: string,
options: SynthesizeOptions,
abortSignal?: AbortSignal,
): Promise<SynthesizeResult>
speko.synthesizeStream(
text: string,
options: SynthesizeOptions,
abortSignal?: AbortSignal,
): Promise<SynthesizeStreamResult>Parameters
text: string
The text to synthesize. The server-side cap is 50,000 characters per call (raised from 10,000 to handle long-form audiobook / podcast content). The upstream provider may still apply its own limit; if you need longer than 50K, chunk the script and call synthesize per chunk.
options: SynthesizeOptions
Extends RoutingIntent:
| Field | Type | Description |
|---|---|---|
language | string (BCP-47) | e.g. "en", "es-MX". |
region | string? | Region to rank streaming providers in. Defaults to global server-side. |
optimizeFor | OptimizeFor? | balanced, accuracy, latency, cost. |
voice | string? | Voice id override. The router interprets it per provider (e.g. a Cartesia voice UUID). Browse the catalog with speko.voices.list(). |
model | string? | Upstream model name (e.g. eleven_multilingual_v2, sonic-2, gpt-4o-mini-tts, qwen3-tts-flash). Overrides the selector's choice on the primary candidate only — failover candidates use the selector's model so a model intended for provider A isn't sent to provider B. |
speed | number? | Speech speed multiplier. Providers vary in what range they accept — 1.0 is always neutral. |
constraints | PipelineConstraints? | Allow-list constraints. |
abortSignal?: AbortSignal
Cancel an in-flight request.
Returns
SynthesizeResult
| Field | Type | Description |
|---|---|---|
audio | Uint8Array | Raw audio bytes. Format depends on the chosen provider — always check contentType. |
contentType | string | MIME type. ElevenLabs returns audio/mpeg. Cartesia returns audio/pcm;rate=24000. |
provider | string | Upstream provider that ran the request. |
model | string | Provider-specific model identifier (e.g. voice model name). |
failoverCount | number | Providers tried before this one succeeded. |
scoresRunId | string | null | Scoring run id that selected this provider. |
Wire format
The SDK sends POST /v1/synthesize with a JSON body:
{
"text": "Hello world",
"intent": { "language": "en", "region": "global", "optimizeFor": "latency" },
"voice": "…",
"speed": 1.0,
"constraints": { "allowedProviders": { "tts": ["cartesia"] } }
}The response is chunked binary audio. provider, model, failoverCount, and
scoresRunId are parsed from response headers (X-Speko-Provider,
X-Speko-Model, X-Speko-Failover-Count, X-Speko-Scores-Run-Id).
speko.synthesize() consumes the chunks into one Uint8Array; use
speko.synthesizeStream() to handle chunks as they arrive.
Example: write to disk
import { writeFile } from 'node:fs/promises';
const result = await speko.synthesize('Welcome to the clinic.', {
language: 'en',
voice: 'sonic-english',
});
const ext = result.contentType.includes('mpeg')
? 'mp3'
: result.contentType.includes('pcm')
? 'pcm'
: 'bin';
await writeFile(`greeting.${ext}`, result.audio);Example: pin a provider for deterministic output
await speko.synthesize('…', {
language: 'en',
constraints: { allowedProviders: { tts: ['cartesia'] } },
});Example: pin a specific model
Useful for benchmarking (e.g. eleven_v3 vs eleven_multilingual_v2) or for
long-form runs where you want to lock in a particular model's stability profile:
await speko.synthesize('…', {
language: 'en',
constraints: { allowedProviders: { tts: ['elevenlabs'] } },
model: 'eleven_multilingual_v2',
});Format gotchas
The return type depends on the provider Speko picks. If your downstream consumer only handles PCM (e.g. @spekoai/adapter-livekit v1), either pin a PCM provider via constraints or branch on contentType before you decode.