Skip to main content
SpekoTTS is a tts.TTS implementation. Each sentence is synthesised via the Speko proxy, decoded into PCM, chunked into AudioFrames at 50 Hz (20 ms frames), and pushed to the LiveKit session.
import { SpekoTTS } from '@spekoai/adapter-livekit';
import { tts as ttsNs, tokenize } from '@livekit/agents';

const spekoTTS = new SpekoTTS({
  speko,
  intent: { language: 'en', vertical: 'general' },
  voice: 'sonic-english',
  sampleRate: 24_000,
});

const wrapped = new ttsNs.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer());

Constructor

new SpekoTTS(options: SpekoTTSOptions)

SpekoTTSOptions

FieldTypeRequiredDescription
spekoSpeko@spekoai/sdk client.
intentIntentValidated at construction time.
voicestring?Voice id forwarded to the proxy.
speednumber?Speech-speed multiplier forwarded to the proxy.
sampleRatenumber?Output sample rate advertised to LiveKit. Default 24000 (Cartesia Sonic). Must match what the upstream provider emits.
constraintsPipelineConstraints?Allow-list constraints.

Properties

  • label = 'speko.TTS'
  • provider = 'speko'
  • model = 'speko-router'
  • numChannels = 1, streaming = false

Streaming requirement

SpekoTTS.stream() throws — the proxy is buffered. Wrap:
import { tts, tokenize } from '@livekit/agents';

const adapter = new tts.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer());
Or use createSpekoComponents which does this for you.

.synthesize(text, connOptions?, abortSignal?)

Returns a SpekoTTSChunkedStream (exported for type use). Internally:
  1. Calls speko.synthesize(text, { ...intent, voice, speed, constraints }).
  2. Decodes the response via decodeSynthesisResult.
  3. Rejects if the decoded sample rate doesn’t match the configured one — ensures playback isn’t pitched.
  4. Chunks the PCM into AudioFrames of round(sampleRate / 50) samples each via AudioByteStream.
  5. Pushes frames onto the output queue, marking the last one final: true.
Empty provider output throws SpekoTTS: provider returned empty audio.

Audio format support (v1)

decodeSynthesisResult(result) branches on result.contentType:
Content typeBehavior
audio/pcm;rate=NNNNRaw PCM, rate parsed from the MIME. Channels pinned to 1 (Cartesia’s contract).
audio/wav / audio/x-wavHeader stripped via parseWav. Stereo WAV throws.
audio/mpegThrows — v1 doesn’t include an MP3 decoder. Pin Cartesia or another PCM-capable provider via constraints.
anything elseThrows with provider info for debugging.
Work around MP3 by pinning your TTS pool:
new SpekoTTS({
  speko,
  intent,
  constraints: { allowedProviders: { tts: ['cartesia'] } },
});

Sample-rate mismatch

If sampleRate option and the decoded rate disagree, synthesis rejects:
SpekoTTS: provider returned audio at 16000 Hz but the TTS was configured for 24000 Hz. Either set `sampleRate: 16000` on SpekoTTS or pin the Speko router to a provider that matches the expected rate.

decodeSynthesisResult

Exported for unit testing. Given a SynthesizeResult, returns { pcm, sampleRate, channels }. Throws for unsupported content types (see table above).
import { decodeSynthesisResult } from '@spekoai/adapter-livekit';