SpekoTTS

SpekoTTS is a tts.TTS implementation. Each sentence is synthesised via the Speko proxy, decoded into PCM, chunked into AudioFrames at 50 Hz (20 ms frames), and pushed to the LiveKit session.

import { SpekoTTS } from '@spekoai/adapter-livekit';
import { tts as ttsNs, tokenize } from '@livekit/agents';

const spekoTTS = new SpekoTTS({
  speko,
  intent: { language: 'en' },
  voice: 'sonic-english',
  sampleRate: 24_000,
});

const wrapped = new ttsNs.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer());

Constructor

new SpekoTTS(options: SpekoTTSOptions)

`SpekoTTSOptions`

Field	Type	Required	Description
`speko`	`Speko`	✅	`@spekoai/sdk` client.
`intent`	`Intent`	✅	Validated at construction time.
`voice`	`string?`		Voice id forwarded to the proxy.
`speed`	`number?`		Speech-speed multiplier forwarded to the proxy.
`sampleRate`	`number?`		Output sample rate advertised to LiveKit. Default `24000` (Cartesia Sonic). Must match what the upstream provider emits.
`constraints`	`PipelineConstraints?`		Allow-list constraints.

Properties

label = 'speko.TTS'
provider = 'speko'
model = 'speko-router'
numChannels = 1, streaming = false

Streaming requirement

SpekoTTS.stream() throws because LiveKit's TTS StreamAdapter handles sentence tokenization for this class. /v1/synthesize streams audio bytes for each sentence request. Wrap:

import { tts, tokenize } from '@livekit/agents';

const adapter = new tts.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer());

Or use createSpekoComponents which does this for you.

`.synthesize(text, connOptions?, abortSignal?)`

Returns a SpekoTTSChunkedStream (exported for type use). Internally:

Calls speko.synthesize(text, { ...intent, voice, speed, constraints }).
Decodes the response via decodeSynthesisResult.
Rejects if the decoded sample rate doesn't match the configured one — ensures playback isn't pitched.
Chunks the PCM into AudioFrames of round(sampleRate / 50) samples each via AudioByteStream.
Pushes frames onto the output queue, marking the last one final: true.

Empty provider output throws SpekoTTS: provider returned empty audio.

Audio format support (v1)

decodeSynthesisResult(result) branches on result.contentType:

Content type	Behavior
`audio/pcm;rate=NNNN`	Raw PCM, rate parsed from the MIME. Channels pinned to `1` (Cartesia's contract).
`audio/wav` / `audio/x-wav`	Header stripped via `parseWav`. Stereo WAV throws.
`audio/mpeg`	Throws — v1 doesn't include an MP3 decoder. Pin Cartesia or another PCM-capable provider via `constraints`.
anything else	Throws with provider info for debugging.

Work around MP3 by pinning your TTS pool:

new SpekoTTS({
  speko,
  intent,
  constraints: { allowedProviders: { tts: ['cartesia'] } },
});

Sample-rate mismatch

If sampleRate option and the decoded rate disagree, synthesis rejects:

SpekoTTS: provider returned audio at 16000 Hz but the TTS was configured for 24000 Hz. Either set `sampleRate: 16000` on SpekoTTS or pin the Speko router to a provider that matches the expected rate.

`decodeSynthesisResult`

Exported for unit testing. Given a SynthesizeResult, returns { pcm, sampleRate, channels }. Throws for unsupported content types (see table above).

import { decodeSynthesisResult } from '@spekoai/adapter-livekit';

On this page