Speko Docs

SpekoSTT

LiveKit Agents STT adapter backed by POST /v1/transcribe.

SpekoSTT is a stt.STT implementation. It encodes each utterance's audio frames into a WAV payload and uploads it to the Speko proxy. The router picks the best STT provider for your (language, region, optimizeFor) and handles failover.

import { SpekoSTT } from '@spekoai/adapter-livekit';
import { stt as sttNs } from '@livekit/agents';

const spekoSTT = new SpekoSTT({
  speko,
  intent: { language: 'en-US' },
});

const wrapped = new sttNs.StreamAdapter(spekoSTT, vad);

Constructor

new SpekoSTT(options: SpekoSTTOptions)

SpekoSTTOptions

FieldTypeRequiredDescription
spekoSpeko@spekoai/sdk client.
intentIntentValidated at construction time.
constraintsPipelineConstraints?Allow-list constraints passed on every call.

The constructor calls validateIntent(intent) — a broken routing hint throws here rather than deep inside the first transcription.

Properties

  • label = 'speko.STT'
  • provider = 'speko'
  • model = 'speko-router'
  • streaming = false, interimResults = false

Streaming requirement

SpekoSTT.stream() throws because this adapter uploads one VAD-segmented WAV per utterance. The /v1/transcribe response itself streams transcript events, and speko.transcribe() aggregates the final result for this class. Wrap the instance:

import { stt } from '@livekit/agents';

const adapter = new stt.StreamAdapter(spekoSTT, vad);

Or use createSpekoComponents which does this for you.

Per-utterance flow

  1. StreamAdapter + VAD segment the user's audio into utterances.
  2. SpekoSTT._recognize(frame, abortSignal) is invoked for each utterance.
  3. Frames are combined (combineAudioFrames) and encoded into PCM16 mono WAV via framesToWav.
  4. The WAV is uploaded via speko.transcribe() with the intent header and any constraints.
  5. The result is emitted as a single FINAL_TRANSCRIPT event with confidence defaulting to 1 when the upstream provider doesn't report one.

Aborts propagate: when the session tears down, the AbortSignal passed by StreamAdapter cancels the in-flight HTTP request.

Mono-only

Multi-channel audio throws at the WAV-encode step:

SpekoSTT: expected mono audio (1 channel), got 2. …

Configure your LiveKit AgentSession to pass mono audio, or pre-mix upstream.

On this page