synthesize

Synthesize text into audio. The router picks the best TTS provider for your (language, region, optimizeFor) and fails over automatically.

const result = await speko.synthesize('Hello world', {
  language: 'en',
});

Signature

speko.synthesize(
  text: string,
  options: SynthesizeOptions,
  abortSignal?: AbortSignal,
): Promise<SynthesizeResult>

speko.synthesizeStream(
  text: string,
  options: SynthesizeOptions,
  abortSignal?: AbortSignal,
): Promise<SynthesizeStreamResult>

The text to synthesize. The server-side cap is 50,000 characters per call (raised from 10,000 to handle long-form audiobook / podcast content). The upstream provider may still apply its own limit; if you need longer than 50K, chunk the script and call synthesize per chunk.

`options: SynthesizeOptions`

Extends RoutingIntent:

Field	Type	Description
`language`	`string` (BCP-47)	e.g. `"en"`, `"es-MX"`.
`region`	`string?`	Region to rank streaming providers in. Defaults to `global` server-side.
`optimizeFor`	`OptimizeFor?`	`balanced`, `accuracy`, `latency`, `cost`.
`voice`	`string?`	Voice id override. The router interprets it per provider (e.g. a Cartesia voice UUID). Browse the catalog with `speko.voices.list()`.
`model`	`string?`	Upstream model name (e.g. `eleven_multilingual_v2`, `sonic-2`, `gpt-4o-mini-tts`, `qwen3-tts-flash`). Overrides the selector's choice on the primary candidate only — failover candidates use the selector's model so a model intended for provider A isn't sent to provider B.
`speed`	`number?`	Speech speed multiplier. Providers vary in what range they accept — `1.0` is always neutral.
`constraints`	`PipelineConstraints?`	Allow-list constraints.

`abortSignal?: AbortSignal`

Cancel an in-flight request.

Returns

`SynthesizeResult`

Field	Type	Description
`audio`	`Uint8Array`	Raw audio bytes. Format depends on the chosen provider — always check `contentType`.
`contentType`	`string`	MIME type. ElevenLabs returns `audio/mpeg`. Cartesia returns `audio/pcm;rate=24000`.
`provider`	`string`	Upstream provider that ran the request.
`model`	`string`	Provider-specific model identifier (e.g. voice model name).
`failoverCount`	`number`	Providers tried before this one succeeded.
`scoresRunId`	`string \| null`	Scoring run id that selected this provider.

Wire format

The SDK sends POST /v1/synthesize with a JSON body:

{
  "text": "Hello world",
  "intent": { "language": "en", "region": "global", "optimizeFor": "latency" },
  "voice": "…",
  "speed": 1.0,
  "constraints": { "allowedProviders": { "tts": ["cartesia"] } }
}

The response is chunked binary audio. provider, model, failoverCount, and scoresRunId are parsed from response headers (X-Speko-Provider, X-Speko-Model, X-Speko-Failover-Count, X-Speko-Scores-Run-Id). speko.synthesize() consumes the chunks into one Uint8Array; use speko.synthesizeStream() to handle chunks as they arrive.

Example: write to disk

import { writeFile } from 'node:fs/promises';

const result = await speko.synthesize('Welcome to the clinic.', {
  language: 'en',
  voice: 'sonic-english',
});

const ext = result.contentType.includes('mpeg')
  ? 'mp3'
  : result.contentType.includes('pcm')
    ? 'pcm'
    : 'bin';

await writeFile(`greeting.${ext}`, result.audio);

Example: pin a provider for deterministic output

await speko.synthesize('…', {
  language: 'en',
  constraints: { allowedProviders: { tts: ['cartesia'] } },
});

Example: pin a specific model

Useful for benchmarking (e.g. eleven_v3 vs eleven_multilingual_v2) or for long-form runs where you want to lock in a particular model's stability profile:

await speko.synthesize('…', {
  language: 'en',
  constraints: { allowedProviders: { tts: ['elevenlabs'] } },
  model: 'eleven_multilingual_v2',
});

Format gotchas

The return type depends on the provider Speko picks. If your downstream consumer only handles PCM (e.g. @spekoai/adapter-livekit v1), either pin a PCM provider via constraints or branch on contentType before you decode.