Speko Docs

@spekoai/adapter-livekit

LiveKit Agents adapter — route STT, LLM, and TTS through Speko.

@spekoai/adapter-livekit bridges a LiveKit Agents worker to the Speko proxy. Drop it into a standard agent entry file and the router picks the best STT, LLM, and TTS provider per call. Failover is server-side; you don't ship provider API keys.

Install

npm install @spekoai/sdk @spekoai/adapter-livekit \
            @livekit/agents @livekit/agents-plugin-silero @livekit/rtc-node

@livekit/agents and @livekit/rtc-node are peer dependencies — pin the versions you actually run against in your own package.json.

Quickstart

import {
  type JobContext,
  type JobProcess,
  ServerOptions,
  cli,
  defineAgent,
  voice,
} from '@livekit/agents';
import * as silero from '@livekit/agents-plugin-silero';
import { Speko } from '@spekoai/sdk';
import { createSpekoComponents } from '@spekoai/adapter-livekit';
import { fileURLToPath } from 'node:url';

const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });

export default defineAgent({
  prewarm: async (proc: JobProcess) => {
    proc.userData.vad = await silero.VAD.load();
  },
  entry: async (ctx: JobContext) => {
    const vad = ctx.proc.userData.vad as silero.VAD;

    const { stt, llm, tts } = createSpekoComponents({
      speko,
      vad,
      intent: { language: 'en-US', optimizeFor: 'balanced' },
    });

    const session = new voice.AgentSession({ vad, stt, llm, tts });

    await session.start({
      agent: new voice.Agent({
        instructions: 'You are a helpful voice assistant. Be concise.',
      }),
      room: ctx.room,
    });

    await ctx.connect();
    session.generateReply({ instructions: 'Greet the user and offer your assistance.' });
  },
});

cli.runApp(
  new ServerOptions({
    agent: fileURLToPath(import.meta.url),
    agentName: 'speko-demo',
  }),
);

Architecture

The adapter exports three @livekit/agents-compatible classes — SpekoSTT, SpekoLLM, SpekoTTS — and a convenience factory createSpekoComponents() that wraps STT and TTS with StreamAdapter helpers so Speko's streaming REST proxy can participate in a streaming voice.AgentSession:

  • SpekoSTT declares { streaming: false }, so it must be wrapped with new stt.StreamAdapter(spekoSTT, vad) to segment utterances with VAD before calling /v1/transcribe.
  • SpekoTTS is sentence-bounded in LiveKit, so it is wrapped with new tts.StreamAdapter(spekoTTS, sentenceTokenizer) before each streaming /v1/synthesize call.
  • SpekoLLM is used directly — it's a llm.LLM backed by streaming /v1/complete responses.

createSpekoComponents handles the wrapping for you and returns { stt, llm, tts } ready to pass to voice.AgentSession.

v1 limitations

  • STT request upload is utterance-bounded. /v1/transcribe streams transcript events back, but this adapter still uploads one VAD-segmented WAV per utterance instead of full-duplex microphone audio.
  • TTS remains sentence-bounded in LiveKit. /v1/synthesize streams audio bytes; the adapter still calls it once per tokenized sentence.
  • Tool calls are supported. Inline tools return to the LiveKit runtime; registered webhook, builtin, and integration tools run server-side through /v1/complete.
  • TTS output format. Accepts audio/pcm;rate=NNNN (Cartesia) and audio/wav. Throws on audio/mpeg (ElevenLabs MP3) — pick a routing intent that prefers Cartesia, or pin a PCM-capable provider via constraints.allowedProviders.tts.
  • STT input format. Mono PCM16, encoded into a WAV wrapper per utterance. Multi-channel frames throw. Speko handles sample-rate conversion downstream — whatever the AudioFrame carries is what's uploaded.

Reference

On this page