SpekoTTS
LiveKit Agents TTS adapter backed by POST /v1/synthesize.
SpekoTTS is a tts.TTS implementation. Each sentence is synthesised via the Speko proxy, decoded into PCM, chunked into AudioFrames at 50 Hz (20 ms frames), and pushed to the LiveKit session.
import { SpekoTTS } from '@spekoai/adapter-livekit';
import { tts as ttsNs, tokenize } from '@livekit/agents';
const spekoTTS = new SpekoTTS({
speko,
intent: { language: 'en' },
voice: 'sonic-english',
sampleRate: 24_000,
});
const wrapped = new ttsNs.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer());Constructor
new SpekoTTS(options: SpekoTTSOptions)SpekoTTSOptions
| Field | Type | Required | Description |
|---|---|---|---|
speko | Speko | ✅ | @spekoai/sdk client. |
intent | Intent | ✅ | Validated at construction time. |
voice | string? | Voice id forwarded to the proxy. | |
speed | number? | Speech-speed multiplier forwarded to the proxy. | |
sampleRate | number? | Output sample rate advertised to LiveKit. Default 24000 (Cartesia Sonic). Must match what the upstream provider emits. | |
constraints | PipelineConstraints? | Allow-list constraints. |
Properties
label = 'speko.TTS'provider = 'speko'model = 'speko-router'numChannels = 1,streaming = false
Streaming requirement
SpekoTTS.stream() throws because LiveKit's TTS StreamAdapter handles
sentence tokenization for this class. /v1/synthesize streams audio bytes for
each sentence request. Wrap:
import { tts, tokenize } from '@livekit/agents';
const adapter = new tts.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer());Or use createSpekoComponents which does this for you.
.synthesize(text, connOptions?, abortSignal?)
Returns a SpekoTTSChunkedStream (exported for type use). Internally:
- Calls
speko.synthesize(text, { ...intent, voice, speed, constraints }). - Decodes the response via
decodeSynthesisResult. - Rejects if the decoded sample rate doesn't match the configured one — ensures playback isn't pitched.
- Chunks the PCM into
AudioFrames ofround(sampleRate / 50)samples each viaAudioByteStream. - Pushes frames onto the output queue, marking the last one
final: true.
Empty provider output throws SpekoTTS: provider returned empty audio.
Audio format support (v1)
decodeSynthesisResult(result) branches on result.contentType:
| Content type | Behavior |
|---|---|
audio/pcm;rate=NNNN | Raw PCM, rate parsed from the MIME. Channels pinned to 1 (Cartesia's contract). |
audio/wav / audio/x-wav | Header stripped via parseWav. Stereo WAV throws. |
audio/mpeg | Throws — v1 doesn't include an MP3 decoder. Pin Cartesia or another PCM-capable provider via constraints. |
| anything else | Throws with provider info for debugging. |
Work around MP3 by pinning your TTS pool:
new SpekoTTS({
speko,
intent,
constraints: { allowedProviders: { tts: ['cartesia'] } },
});Sample-rate mismatch
If sampleRate option and the decoded rate disagree, synthesis rejects:
SpekoTTS: provider returned audio at 16000 Hz but the TTS was configured for 24000 Hz. Either set `sampleRate: 16000` on SpekoTTS or pin the Speko router to a provider that matches the expected rate.decodeSynthesisResult
Exported for unit testing. Given a SynthesizeResult, returns { pcm, sampleRate, channels }. Throws for unsupported content types (see table above).
import { decodeSynthesisResult } from '@spekoai/adapter-livekit';