# Speko (/) One API for every voice provider. Speko benchmarks STT, LLM, and TTS in real time and routes each call to the best provider for your language and latency/cost target. Speko is a voice gateway. You ship one integration; we route each request to the highest-scoring provider for your `(language, region, optimizeFor)` intent. Failover is server-side. Providers rotate without a code change. ## Two ways to integrate [#two-ways-to-integrate] Call `/v1/transcribe`, `/v1/synthesize`, `/v1/complete` directly. Best for batch jobs and server pipelines. Real-time voice in the browser. Mic in, agent voice out, transcripts on the data channel. Inbound and outbound PSTN calls with lifecycle webhooks, reports, recordings, and transfers. ## Start here [#start-here] Sign up, mint an API key, make your first transcribe call. Intent, scoring, failover. The model behind every Speko call. Use your own provider credentials. Speko routes; you pay providers directly. Every `/v1/*` endpoint, request/response shape, headers. ## SDKs [#sdks] TypeScript HTTP client. Browser SDK for real-time voice. Async + sync Python client. Drop-in STT/LLM/TTS for your existing voice agent framework. # Audio helpers (/adapter-livekit/audio) WAV encode / decode and MIME parsing utilities. The adapter exports the three audio helpers it uses internally. They're stable exports — safe to reuse if you're building custom pipelines or writing tests. ```ts import { framesToWav, parseWav, pcmSampleRateFromContentType, } from '@spekoai/adapter-livekit'; ``` ## `framesToWav` [#framestowav] ```ts function framesToWav(buffer: AudioBuffer): Uint8Array; ``` Encode one or more LiveKit `AudioFrame`s (or an array) into a PCM16 mono WAV byte stream. Used by `SpekoSTT` to wrap each utterance before uploading to `/v1/transcribe`. * Combines frames via `combineAudioFrames` from `@livekit/rtc-node`. * Writes a standard 44-byte RIFF/WAVE header: `fmt ` chunk (PCM, 16-bit, mono, `sampleRate` from frames) + `data` chunk. * Sample rate is pulled from the input frames — whatever LiveKit gives you is what's encoded. **Mono-only.** A multi-channel `AudioBuffer` throws: ``` SpekoSTT: expected mono audio (1 channel), got 2. Configure your LiveKit AgentSession to pass mono audio or pre-mix upstream of the STT. ``` ## `parseWav` [#parsewav] ```ts function parseWav(bytes: Uint8Array): { pcm: Uint8Array; sampleRate: number; channels: number; }; ``` Minimal PCM16 WAV parser. Used by `SpekoTTS` to unwrap WAV-encoded proxy responses into raw samples for `AudioByteStream`. Accepted subset: * Valid `RIFF` / `WAVE` header. * `fmt ` chunk present and of `format = 1` (PCM). * 16-bit samples. * `data` chunk reachable by walking subsequent chunks (tolerates e.g. `LIST` chunks between `fmt ` and `data`). Anything outside this subset throws a descriptive error. `channels` is returned as-is — the caller is responsible for deciding whether stereo is acceptable. `SpekoTTS` currently throws on stereo. ## `pcmSampleRateFromContentType` [#pcmsampleratefromcontenttype] ```ts function pcmSampleRateFromContentType( contentType: string, fallback: number, ): number; ``` Parse the `rate` parameter out of a Cartesia-style content type: ```ts pcmSampleRateFromContentType('audio/pcm;rate=24000', 16_000); // 24000 pcmSampleRateFromContentType('audio/pcm', 16_000); // 16000 pcmSampleRateFromContentType('audio/pcm;rate=abc', 16_000); // 16000 ``` Falls back when the rate is missing, zero, or unparseable. Case-insensitive on `rate=`. ## Intended usage [#intended-usage] You shouldn't need these helpers when consuming the adapter through [`createSpekoComponents`](/adapter-livekit/create-speko-components) — they're used internally by `SpekoSTT` and `SpekoTTS`. They're exported for: * **Unit tests** — build canned WAV fixtures with `framesToWav`, round-trip them through `parseWav`. * **Custom STT / TTS pipelines** that need to reuse the same WAV framing Speko uses. * **Debugging** — decode what an upstream provider returned without instantiating a full TTS. # createSpekoComponents (/adapter-livekit/create-speko-components) Build a { stt, llm, tts } bundle ready for voice.AgentSession. `createSpekoComponents` is the one-call wiring helper for `voice.AgentSession`. It constructs `SpekoSTT`, `SpekoLLM`, `SpekoTTS` from a single options object and wraps STT and TTS with LiveKit's `StreamAdapter` so Speko's streaming REST proxy can drive a streaming session. ```ts import { createSpekoComponents } from '@spekoai/adapter-livekit'; const { stt, llm, tts } = createSpekoComponents({ speko, vad, intent: { language: 'en-US', optimizeFor: 'balanced' }, }); const session = new voice.AgentSession({ vad, stt, llm, tts }); ``` ## Signature [#signature] ```ts function createSpekoComponents( options: CreateSpekoComponentsOptions, ): SpekoComponents; ``` ## `CreateSpekoComponentsOptions` [#createspekocomponentsoptions] | Field | Type | Required | Description | | ------------------------ | ----------------------------------- | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `speko` | `Speko` | ✅ | Initialised `@spekoai/sdk` client. | | `intent` | [`Intent`](/adapter-livekit/intent) | ✅ | Routing hint shared by STT, LLM, and TTS. | | `vad` | `VAD` | ✅ | VAD instance used by the `stt.StreamAdapter`. Typically `await silero.VAD.load()`. | | `voice` | `string?` | | Voice id passed to `SpekoTTS` (maps to the Speko proxy's `voice` param). | | `constraints` | `PipelineConstraints?` | | Allow-list constraints applied to all three modalities. | | `sentenceTokenizer` | `tokenize.SentenceTokenizer?` | | Tokenizer for chunking LLM output before TTS. Defaults to `tokenize.basic.SentenceTokenizer`. | | `llm` | `{ temperature?, maxTokens? }?` | | Tuning forwarded to `/v1/complete`. | | `ttsOptions` | `{ sampleRate?, speed? }?` | | Output sample rate and speech speed forwarded to `SpekoTTS`. | | `agentId` | `string?` | | Enables the [registered-tools loader](/guides/tool-calling). When set, the adapter calls `speko.agents.tools.listChatTools(agentId)` once per session — using the `speko` client you pass for auth and base URL — and merges the result with LiveKit's runtime `ToolContext`. Registered tools win on name collision. Omit to keep runtime-only behavior. | | `apiBaseUrl` | `string?` | | **Deprecated and ignored** — the loader reads the base URL from the `speko` client. Safe to omit. | | `apiKey` | `string?` | | **Deprecated and ignored** — the loader reads the API key from the `speko` client. Safe to omit. | | `onRegisteredToolsError` | `(err: Error) => void?` | | Called once if the registered-tools fetch fails. Voice session keeps running with runtime-only tools — this is a soft degradation, not a crash. | ## Registered tools [#registered-tools] When `agentId` is set, `createSpekoComponents` constructs a `RegisteredToolsLoader` for the underlying `SpekoLLM`. The loader lazily calls `speko.agents.tools.listChatTools(agentId)` on the first `chat()` of each session — reusing the `Speko` client you pass for auth and base URL — and caches the result for the LLM's lifetime. Voice sessions live for seconds-to-minutes and `chat()` is called many times — re-fetching every turn would be wasteful. (`apiBaseUrl`/`apiKey` are deprecated and ignored; the `speko` client carries both.) On collision with a runtime tool of the same name, the registered tool wins (it's the customer's authoritative declaration). Fetch failures are non-fatal — the loader returns `undefined` and the agent continues with runtime tools only, calling `onRegisteredToolsError` once. `listChatTools` returns every source kind — `inline`, `webhook`, `builtin`, and `integration` — already in the `ChatTool[]` shape `/v1/complete` accepts. See the [tool calling guide](/guides/tool-calling) for the full picture. ## Returns — `SpekoComponents` [#returns--spekocomponents] ```ts interface SpekoComponents { stt: stt.StreamAdapter; // wraps SpekoSTT + vad llm: SpekoLLM; // used directly tts: tts.StreamAdapter; // wraps SpekoTTS + sentenceTokenizer } ``` Drop the returned object straight into a `voice.AgentSession`. ## Custom sentence tokenizer [#custom-sentence-tokenizer] ```ts import { tokenize } from '@livekit/agents'; const { stt, llm, tts } = createSpekoComponents({ speko, vad, intent, sentenceTokenizer: new tokenize.basic.SentenceTokenizer({ minSentenceLength: 20 }), }); ``` Use a longer minimum sentence length if you want fewer, longer TTS calls at the cost of latency before the first audio chunk. ## Constraints shared across modalities [#constraints-shared-across-modalities] ```ts createSpekoComponents({ speko, vad, intent: { language: 'en' }, constraints: { allowedProviders: { stt: ['deepgram'], llm: ['anthropic'], tts: ['cartesia'], }, }, }); ``` Every underlying call (`/v1/transcribe`, `/v1/complete`, `/v1/synthesize`) receives the same constraints object. ## Opting out — use classes directly [#opting-out--use-classes-directly] If you need finer control, construct the classes yourself. `createSpekoComponents` is a convenience wrapper; nothing stops you from building the pipeline manually. ```ts import { SpekoSTT, SpekoLLM, SpekoTTS } from '@spekoai/adapter-livekit'; import { stt, tts, tokenize } from '@livekit/agents'; const spekoSTT = new SpekoSTT({ speko, intent }); const wrappedSTT = new stt.StreamAdapter(spekoSTT, vad); const spekoLLM = new SpekoLLM({ speko, intent, temperature: 0.7 }); const spekoTTS = new SpekoTTS({ speko, intent, voice: 'sonic-english' }); const wrappedTTS = new tts.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer()); ``` # Intent (/adapter-livekit/intent) Routing hint type and construction-time validator. `Intent` is the routing hint every adapter class takes. It's a re-export of `RoutingIntent` from `@spekoai/sdk`, so anything you already have typed as a `RoutingIntent` passes through without conversion. ```ts import type { Intent, OptimizeFor } from '@spekoai/adapter-livekit'; ``` ## Type [#type] ```ts type Intent = { language: string; // BCP-47 region?: string; // e.g. "global", "us-east4", "europe-west3" optimizeFor?: 'balanced' | 'accuracy' | 'latency' | 'cost'; }; ``` ## `validateIntent(intent)` [#validateintentintent] Throws a descriptive `Error` when the intent is malformed. Called by every adapter class constructor, so a bad intent fails at construction time rather than deep inside the first STT / LLM / TTS call. ```ts import { validateIntent } from '@spekoai/adapter-livekit'; validateIntent({ language: 'en-US' }); // ok validateIntent({ language: '' }); // throws: SpekoAdapter: intent.language is required (BCP-47 tag) validateIntent({ language: 'en', optimizeFor: 'speed' as any }); // throws: SpekoAdapter: unknown optimizeFor "speed". Expected one of: balanced, accuracy, latency, cost. ``` Validation rules: * `language` must be a non-empty string. * `region`, if set, is forwarded to Speko for region-aware latency ranking. * `optimizeFor`, if set, must be one of `balanced`, `accuracy`, `latency`, `cost`. No BCP-47 syntactic validation beyond "is a non-empty string" — the router accepts short codes (`en`) and region-tagged codes (`es-MX`) and normalises downstream. ## Sharing one intent [#sharing-one-intent] The adapter pattern is "one intent per agent session, shared across modalities": ```ts const intent: Intent = { language: 'en-US', region: 'global', optimizeFor: 'latency' }; const { stt, llm, tts } = createSpekoComponents({ speko, vad, intent }); ``` If you need per-modality divergence (e.g. latency-optimised STT with cost-optimised TTS), construct the classes directly: ```ts const sttAdapter = new SpekoSTT({ speko, intent: { ...intent, optimizeFor: 'latency' } }); const ttsAdapter = new SpekoTTS({ speko, intent: { ...intent, optimizeFor: 'cost' } }); ``` # @spekoai/adapter-livekit (/adapter-livekit/overview) LiveKit Agents adapter — route STT, LLM, and TTS through Speko. `@spekoai/adapter-livekit` bridges a [LiveKit Agents](https://docs.livekit.io/agents/) worker to the Speko proxy. Drop it into a standard agent entry file and the router picks the best STT, LLM, and TTS provider per call. Failover is server-side; you don't ship provider API keys. ## Install [#install] ```sh npm install @spekoai/sdk @spekoai/adapter-livekit \ @livekit/agents @livekit/agents-plugin-silero @livekit/rtc-node ``` `@livekit/agents` and `@livekit/rtc-node` are peer dependencies — pin the versions you actually run against in your own `package.json`. ## Quickstart [#quickstart] ```ts import { type JobContext, type JobProcess, ServerOptions, cli, defineAgent, voice, } from '@livekit/agents'; import * as silero from '@livekit/agents-plugin-silero'; import { Speko } from '@spekoai/sdk'; import { createSpekoComponents } from '@spekoai/adapter-livekit'; import { fileURLToPath } from 'node:url'; const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! }); export default defineAgent({ prewarm: async (proc: JobProcess) => { proc.userData.vad = await silero.VAD.load(); }, entry: async (ctx: JobContext) => { const vad = ctx.proc.userData.vad as silero.VAD; const { stt, llm, tts } = createSpekoComponents({ speko, vad, intent: { language: 'en-US', optimizeFor: 'balanced' }, }); const session = new voice.AgentSession({ vad, stt, llm, tts }); await session.start({ agent: new voice.Agent({ instructions: 'You are a helpful voice assistant. Be concise.', }), room: ctx.room, }); await ctx.connect(); session.generateReply({ instructions: 'Greet the user and offer your assistance.' }); }, }); cli.runApp( new ServerOptions({ agent: fileURLToPath(import.meta.url), agentName: 'speko-demo', }), ); ``` ## Architecture [#architecture] The adapter exports three `@livekit/agents`-compatible classes — `SpekoSTT`, `SpekoLLM`, `SpekoTTS` — and a convenience factory `createSpekoComponents()` that wraps STT and TTS with `StreamAdapter` helpers so Speko's streaming REST proxy can participate in a streaming `voice.AgentSession`: * **`SpekoSTT`** declares `{ streaming: false }`, so it must be wrapped with `new stt.StreamAdapter(spekoSTT, vad)` to segment utterances with VAD before calling `/v1/transcribe`. * **`SpekoTTS`** is sentence-bounded in LiveKit, so it is wrapped with `new tts.StreamAdapter(spekoTTS, sentenceTokenizer)` before each streaming `/v1/synthesize` call. * **`SpekoLLM`** is used directly — it's a `llm.LLM` backed by streaming `/v1/complete` responses. `createSpekoComponents` handles the wrapping for you and returns `{ stt, llm, tts }` ready to pass to `voice.AgentSession`. ## v1 limitations [#v1-limitations] * **STT request upload is utterance-bounded.** `/v1/transcribe` streams transcript events back, but this adapter still uploads one VAD-segmented WAV per utterance instead of full-duplex microphone audio. * **TTS remains sentence-bounded in LiveKit.** `/v1/synthesize` streams audio bytes; the adapter still calls it once per tokenized sentence. * **Tool calls are supported.** Inline tools return to the LiveKit runtime; registered webhook, builtin, and integration tools run server-side through `/v1/complete`. * **TTS output format.** Accepts `audio/pcm;rate=NNNN` (Cartesia) and `audio/wav`. Throws on `audio/mpeg` (ElevenLabs MP3) — pick a routing intent that prefers Cartesia, or pin a PCM-capable provider via `constraints.allowedProviders.tts`. * **STT input format.** Mono PCM16, encoded into a WAV wrapper per utterance. Multi-channel frames throw. Speko handles sample-rate conversion downstream — whatever the `AudioFrame` carries is what's uploaded. ## Reference [#reference] * [`createSpekoComponents`](/adapter-livekit/create-speko-components) — convenience factory. * [`SpekoSTT`](/adapter-livekit/speko-stt) — STT class. * [`SpekoLLM`](/adapter-livekit/speko-llm) — LLM class. * [`SpekoTTS`](/adapter-livekit/speko-tts) — TTS class. * [`Intent`](/adapter-livekit/intent) — routing hint type and validator. * [Audio helpers](/adapter-livekit/audio) — WAV encode/decode utilities. # SpekoLLM (/adapter-livekit/speko-llm) LiveKit Agents LLM adapter backed by POST /v1/complete. `SpekoLLM` is a `llm.LLM` implementation. It flattens a LiveKit `ChatContext` into Speko's `messages` format and calls the proxy. The router picks the best LLM provider per intent and fails over automatically. ```ts import { SpekoLLM } from '@spekoai/adapter-livekit'; const spekoLLM = new SpekoLLM({ speko, intent: { language: 'en' }, temperature: 0.7, maxTokens: 400, }); ``` Unlike STT and TTS, `SpekoLLM` doesn't need a `StreamAdapter`. It calls the streaming `/v1/complete` endpoint through the SDK and emits a LiveKit `LLMStream` chunk when the routed completion is ready. ## Constructor [#constructor] ```ts new SpekoLLM(options: SpekoLLMOptions) ``` ### `SpekoLLMOptions` [#spekollmoptions] | Field | Type | Required | Description | | ------------------------ | ----------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `speko` | `Speko` | ✅ | `@spekoai/sdk` client. | | `intent` | [`Intent`](/adapter-livekit/intent) | ✅ | Validated at construction time. | | `temperature` | `number?` | | Forwarded to `/v1/complete`. | | `maxTokens` | `number?` | | Forwarded to `/v1/complete`. | | `constraints` | `PipelineConstraints?` | | Allow-list constraints. | | `agentId` | `string?` | | When set, enables the registered-tools loader. The adapter calls `speko.agents.tools.listChatTools(agentId)` once per session — using the `speko` client for auth and base URL — and merges the result with LiveKit's runtime `ToolContext`. Registered tools win on collision. Omit to keep runtime-only behavior. See [tool calling](/guides/tool-calling). | | `apiBaseUrl` | `string?` | | **Deprecated and ignored** — the loader reads the base URL from the `speko` client. Safe to omit. | | `apiKey` | `string?` | | **Deprecated and ignored** — the loader reads the API key from the `speko` client. Safe to omit. | | `onRegisteredToolsError` | `(err: Error) => void?` | | Called once if the registered-tools fetch fails. Soft degradation — the call continues with runtime-only tools rather than crashing. | ## Properties [#properties] * `label() → 'speko.LLM'` * `provider = 'speko'` * `model = 'speko-router'` ## `.chat(params)` [#chatparams] Standard LiveKit LLM entry point. Returns an `LLMStream` that emits a `ChatChunk` carrying the assistant response or tool calls, then closes. Signature (from `@livekit/agents`): ```ts chat(params: { chatCtx: llm.ChatContext; toolCtx?: llm.ToolContext; connOptions?: APIConnectOptions; parallelToolCalls?: boolean; toolChoice?: llm.ToolChoice; extraKwargs?: Record; }): llm.LLMStream; ``` The emitted chunk includes usage: ```ts { id: '', delta: { role: 'assistant', content: result.text }, usage: { promptTokens: result.usage.promptTokens, completionTokens: result.usage.completionTokens, promptCachedTokens: 0, totalTokens: result.usage.promptTokens + result.usage.completionTokens, }, } ``` ## Context conversion — `chatContextToSpeko` [#context-conversion--chatcontexttospeko] Exported for when you want to reuse the flattening logic (e.g. unit tests, custom pipelines). ```ts import { chatContextToSpeko } from '@spekoai/adapter-livekit'; const messages = chatContextToSpeko(chatCtx); ``` Rules: * Only `llm.ChatMessage` items are considered. Function-call and handoff items are skipped. * Roles are normalised: `developer` → `system`; `system` / `user` / `assistant` pass through; anything else is dropped. * Empty `textContent` messages are skipped. * Ordering is preserved. If the result is empty, `.chat()` rejects with `SpekoAdapterError('INVALID_CONTEXT')`. ## Tool Calls [#tool-calls] Runtime tools from LiveKit's `toolCtx` are forwarded as inline tools. Registered webhook, builtin, and integration tools can also be loaded by `agentId` — the same set `speko.agents.tools.listChatTools(agentId)` returns — and executed server-side by Speko before the final response is returned to LiveKit. ## Errors [#errors] * `SpekoAdapterError` (exported): thrown for adapter-internal problems. `code` is one of: * `'INVALID_CONTEXT'` — `ChatContext` produced no convertible messages. API-layer errors from the underlying `speko.complete()` surface unchanged — `SpekoApiError`, `SpekoAuthError`, `SpekoRateLimitError` from `@spekoai/sdk`. # SpekoSTT (/adapter-livekit/speko-stt) LiveKit Agents STT adapter backed by POST /v1/transcribe. `SpekoSTT` is a `stt.STT` implementation. It encodes each utterance's audio frames into a WAV payload and uploads it to the Speko proxy. The router picks the best STT provider for your `(language, region, optimizeFor)` and handles failover. ```ts import { SpekoSTT } from '@spekoai/adapter-livekit'; import { stt as sttNs } from '@livekit/agents'; const spekoSTT = new SpekoSTT({ speko, intent: { language: 'en-US' }, }); const wrapped = new sttNs.StreamAdapter(spekoSTT, vad); ``` ## Constructor [#constructor] ```ts new SpekoSTT(options: SpekoSTTOptions) ``` ### `SpekoSTTOptions` [#spekosttoptions] | Field | Type | Required | Description | | ------------- | ----------------------------------- | -------- | -------------------------------------------- | | `speko` | `Speko` | ✅ | `@spekoai/sdk` client. | | `intent` | [`Intent`](/adapter-livekit/intent) | ✅ | Validated at construction time. | | `constraints` | `PipelineConstraints?` | | Allow-list constraints passed on every call. | The constructor calls `validateIntent(intent)` — a broken routing hint throws here rather than deep inside the first transcription. ## Properties [#properties] * `label = 'speko.STT'` * `provider = 'speko'` * `model = 'speko-router'` * `streaming = false`, `interimResults = false` ## Streaming requirement [#streaming-requirement] `SpekoSTT.stream()` throws because this adapter uploads one VAD-segmented WAV per utterance. The `/v1/transcribe` response itself streams transcript events, and `speko.transcribe()` aggregates the final result for this class. Wrap the instance: ```ts import { stt } from '@livekit/agents'; const adapter = new stt.StreamAdapter(spekoSTT, vad); ``` Or use [`createSpekoComponents`](/adapter-livekit/create-speko-components) which does this for you. ## Per-utterance flow [#per-utterance-flow] 1. `StreamAdapter` + VAD segment the user's audio into utterances. 2. `SpekoSTT._recognize(frame, abortSignal)` is invoked for each utterance. 3. Frames are combined (`combineAudioFrames`) and encoded into PCM16 mono WAV via [`framesToWav`](/adapter-livekit/audio#framestowav). 4. The WAV is uploaded via `speko.transcribe()` with the intent header and any `constraints`. 5. The result is emitted as a single `FINAL_TRANSCRIPT` event with confidence defaulting to `1` when the upstream provider doesn't report one. Aborts propagate: when the session tears down, the `AbortSignal` passed by `StreamAdapter` cancels the in-flight HTTP request. ## Mono-only [#mono-only] Multi-channel audio throws at the WAV-encode step: ``` SpekoSTT: expected mono audio (1 channel), got 2. … ``` Configure your LiveKit `AgentSession` to pass mono audio, or pre-mix upstream. # SpekoTTS (/adapter-livekit/speko-tts) LiveKit Agents TTS adapter backed by POST /v1/synthesize. `SpekoTTS` is a `tts.TTS` implementation. Each sentence is synthesised via the Speko proxy, decoded into PCM, chunked into `AudioFrame`s at 50 Hz (20 ms frames), and pushed to the LiveKit session. ```ts import { SpekoTTS } from '@spekoai/adapter-livekit'; import { tts as ttsNs, tokenize } from '@livekit/agents'; const spekoTTS = new SpekoTTS({ speko, intent: { language: 'en' }, voice: 'sonic-english', sampleRate: 24_000, }); const wrapped = new ttsNs.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer()); ``` ## Constructor [#constructor] ```ts new SpekoTTS(options: SpekoTTSOptions) ``` ### `SpekoTTSOptions` [#spekottsoptions] | Field | Type | Required | Description | | ------------- | ----------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------ | | `speko` | `Speko` | ✅ | `@spekoai/sdk` client. | | `intent` | [`Intent`](/adapter-livekit/intent) | ✅ | Validated at construction time. | | `voice` | `string?` | | Voice id forwarded to the proxy. | | `speed` | `number?` | | Speech-speed multiplier forwarded to the proxy. | | `sampleRate` | `number?` | | Output sample rate advertised to LiveKit. Default `24000` (Cartesia Sonic). Must match what the upstream provider emits. | | `constraints` | `PipelineConstraints?` | | Allow-list constraints. | ## Properties [#properties] * `label = 'speko.TTS'` * `provider = 'speko'` * `model = 'speko-router'` * `numChannels = 1`, `streaming = false` ## Streaming requirement [#streaming-requirement] `SpekoTTS.stream()` throws because LiveKit's TTS `StreamAdapter` handles sentence tokenization for this class. `/v1/synthesize` streams audio bytes for each sentence request. Wrap: ```ts import { tts, tokenize } from '@livekit/agents'; const adapter = new tts.StreamAdapter(spekoTTS, new tokenize.basic.SentenceTokenizer()); ``` Or use [`createSpekoComponents`](/adapter-livekit/create-speko-components) which does this for you. ## `.synthesize(text, connOptions?, abortSignal?)` [#synthesizetext-connoptions-abortsignal] Returns a `SpekoTTSChunkedStream` (exported for type use). Internally: 1. Calls `speko.synthesize(text, { ...intent, voice, speed, constraints })`. 2. Decodes the response via [`decodeSynthesisResult`](#decodesynthesisresult). 3. Rejects if the decoded sample rate doesn't match the configured one — ensures playback isn't pitched. 4. Chunks the PCM into `AudioFrame`s of `round(sampleRate / 50)` samples each via `AudioByteStream`. 5. Pushes frames onto the output queue, marking the last one `final: true`. Empty provider output throws `SpekoTTS: provider returned empty audio`. ## Audio format support (v1) [#audio-format-support-v1] `decodeSynthesisResult(result)` branches on `result.contentType`: | Content type | Behavior | | --------------------------- | ----------------------------------------------------------------------------------------------------------- | | `audio/pcm;rate=NNNN` | Raw PCM, rate parsed from the MIME. Channels pinned to `1` (Cartesia's contract). | | `audio/wav` / `audio/x-wav` | Header stripped via [`parseWav`](/adapter-livekit/audio#parsewav). Stereo WAV throws. | | `audio/mpeg` | Throws — v1 doesn't include an MP3 decoder. Pin Cartesia or another PCM-capable provider via `constraints`. | | anything else | Throws with provider info for debugging. | Work around MP3 by pinning your TTS pool: ```ts new SpekoTTS({ speko, intent, constraints: { allowedProviders: { tts: ['cartesia'] } }, }); ``` ## Sample-rate mismatch [#sample-rate-mismatch] If `sampleRate` option and the decoded rate disagree, synthesis rejects: ``` SpekoTTS: provider returned audio at 16000 Hz but the TTS was configured for 24000 Hz. Either set `sampleRate: 16000` on SpekoTTS or pin the Speko router to a provider that matches the expected rate. ``` ## `decodeSynthesisResult` [#decodesynthesisresult] Exported for unit testing. Given a `SynthesizeResult`, returns `{ pcm, sampleRate, channels }`. Throws for unsupported content types (see table above). ```ts import { decodeSynthesisResult } from '@spekoai/adapter-livekit'; ``` # Callbacks & events (/client/callbacks) Every hook VoiceConversation exposes, and when they fire. All callbacks are optional. Pass them inside the `ConversationOptions` object. They're invoked synchronously on the media transport event loop — keep them fast or defer work with `queueMicrotask`. ## `ConversationStatus` [#conversationstatus] ```ts type ConversationStatus = 'connecting' | 'connected' | 'disconnecting' | 'disconnected'; ``` Transitions: * **`connecting`** — the initial state, set the moment the `WebRTCConnection` is constructed. * **`connected`** — after `room.connect()`, `createLocalAudioTrack()`, and `publishTrack()` all succeed. * **`disconnecting`** — `endSession()` has been called but the room hasn't acknowledged yet. * **`disconnected`** — the transport has fired `Disconnected`, OR an error during `connect()` (connection, mic) short-circuited to this state. `onStatusChange` fires only on actual transitions; duplicate transitions are deduped. ## `ConversationMode` [#conversationmode] ```ts type ConversationMode = 'listening' | 'speaking'; ``` Mirrors transport active-speaker events: `speaking` when any remote participant is in the active-speakers set, `listening` otherwise. Useful for UI states like "agent talking now — show the voice animation". Deduped on transition — `onModeChange` won't fire twice for the same mode. ## `ConversationMessage` [#conversationmessage] ```ts interface ConversationMessage { source: 'agent' | 'user'; text: string; isFinal: boolean; segmentId?: string; } ``` `onMessage` fires from two sources — live transcriptions (the common case when talking to a Speko agent) and custom data-channel packets: | Inbound event | Becomes | | -------------------------- | -------------------------------------------------------------------------------------------------------- | | Transcription segment | `{ source, text, isFinal, segmentId }` — `source` is `user` for the local participant, `agent` otherwise | | `transcript` packet | `{ source: packet.source, text, isFinal: packet.isFinal ?? true }` | | `agent_message` packet | `{ source: 'agent', text, isFinal: packet.isFinal ?? true }` | | `user_message_echo` packet | `{ source: 'user', text, isFinal: true }` | Transcription updates are **cumulative per segment**: the same `segmentId` is re-delivered with growing `text` (the agent's transcript streams word-by-word; the user's utterance is re-published in full on every recognizer update, and the final text can arrive more than once). Render by **upserting on `(source, segmentId)`** — replace that message's text in place, and only append when you see a new `segmentId`. Appending every message duplicates text, and keying only by `source` corrupts the transcript whenever user and agent updates interleave (which is normal). Messages from custom data packets carry no `segmentId`; append those. See [Data channel protocol](/client/data-channel) for the raw wire format. ## `DisconnectionDetails` [#disconnectiondetails] ```ts interface DisconnectionDetails { reason: DisconnectionReason; message?: string; } type DisconnectionReason = 'user' | 'agent' | 'error' | 'timeout' | 'unknown'; ``` The SDK maps transport disconnect reasons into a smaller, intent-oriented set: | Transport disconnect reason | Mapped `reason` | | ------------------------------------------------ | --------------- | | Client initiated | `user` | | Participant removed / room deleted / room closed | `agent` | | Join failure | `error` | | everything else (including `undefined`) | `unknown` | `message` is the raw transport enum name when available (useful for debugging / logging). ## `onConnect` [#onconnect] ```ts onConnect?: (details: { conversationId: string }) => void; ``` Fires exactly once, after the mic is publishing and status is `connected`. `conversationId` is the transport conversation id (same value as `conversation.getId()`). ## `onError` [#onerror] ```ts onError?: (error: Error) => void; ``` Non-fatal errors: * Media device errors from the transport. * Output device selection failures (`setSinkId` rejections). Malformed or unrecognised inbound data packets are silently ignored — rooms carry data from other publishers (server control topics, future participants), so a packet that isn't part of the SDK protocol is not an error. Fatal errors during `create()` are **thrown**, not routed to `onError`. See [Errors](/client/errors). # Data channel protocol (/client/data-channel) Wire format for packets exchanged between browser and agent over the media data channel. Every non-audio signal — transcripts, overrides, user-typed messages — travels as JSON-encoded bytes on the reliable media data channel. `@spekoai/client` handles encoding and decoding internally; this page documents the wire format so server / agent implementations can interoperate. ## Encoding [#encoding] * UTF-8 JSON, one message per `publishData` call. * Reliable ordering (`reliable: true`). * No framing beyond JSON — each `DataReceived` event is one complete packet. ## Outbound (browser → agent) [#outbound-browser--agent] ### `overrides` [#overrides] Sent once, immediately after the mic publishes, if the browser passed an `overrides` option. ```json { "type": "overrides", "overrides": { "agent": { "prompt": "You are a helpful receptionist.", "firstMessage": "Hi, how can I help?", "language": "en-US" }, "tts": { "voiceId": "sonic-english", "speed": 1.0 } } } ``` Any subfield is optional. The agent worker is responsible for applying what it receives. ### `user_message` [#user_message] Sent by `conversation.sendUserMessage(text)`. Use when the user types rather than speaks. ```json { "type": "user_message", "text": "I'd like to reschedule." } ``` ### `contextual_update` [#contextual_update] Sent by `conversation.sendContextualUpdate(text)`. Out-of-band context that shouldn't be treated as a turn. ```json { "type": "contextual_update", "text": "user switched to the checkout page" } ``` ## Inbound (agent → browser) [#inbound-agent--browser] ### `transcript` [#transcript] STT output for either speaker. ```json { "type": "transcript", "source": "user", "text": "Hello there.", "isFinal": true } ``` `isFinal` defaults to `true` when omitted. ### `agent_message` [#agent_message] An assistant message emitted by the agent — typically streamed token-by-token as `isFinal: false` and closed with `isFinal: true`. ```json { "type": "agent_message", "text": "Happy to help!", "isFinal": true } ``` ### `user_message_echo` [#user_message_echo] Echo of a typed `user_message` so the UI can render it in the same transcript stream. `isFinal` is always implicitly `true`. ```json { "type": "user_message_echo", "text": "I'd like to reschedule." } ``` ## Forwarding to `onMessage` [#forwarding-to-onmessage] The SDK converts each inbound packet into a [`ConversationMessage`](/client/callbacks#conversationmessage): ```ts // pseudocode switch (packet.type) { case 'transcript': return { source: packet.source, text: packet.text, isFinal: packet.isFinal ?? true }; case 'agent_message': return { source: 'agent', text: packet.text, isFinal: packet.isFinal ?? true }; case 'user_message_echo': return { source: 'user', text: packet.text, isFinal: true }; } ``` Unknown packet types are ignored (no message fired, no error). Malformed JSON is ignored the same way — rooms carry data published for other consumers (server control topics, future participants), so a packet that isn't part of this protocol is not an error. ## Extending the protocol [#extending-the-protocol] If you need a new packet type, add it on both sides: 1. Agent worker publishes a new `type` value. 2. Extend `InboundPacket` in `@spekoai/client` and handle it in `packetToMessage` (or ship a wrapper that subscribes to `room.on('dataReceived')` directly). Outbound packet types are similarly open — `WebRTCConnection.publish(packet)` accepts any `OutboundPacket`, which you can widen in a fork. # Errors (/client/errors) SpekoClientError and its error codes. The client SDK throws a single error class, `SpekoClientError`, tagged with a stable string code. ```ts import { SpekoClientError } from '@spekoai/client'; import type { SpekoClientErrorCode } from '@spekoai/client'; ``` ## Shape [#shape] ```ts class SpekoClientError extends Error { code: SpekoClientErrorCode; cause?: unknown; // original error when wrapping transport failures } ``` ## Codes [#codes] | Code | Where it's thrown | | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `CONNECTION_FAILED` | `VoiceConversation.create()` — transport connection rejected. `cause` is the original transport error. | | `MICROPHONE_FAILED` | `create()` — mic acquisition or `publishTrack()` failed. The room is disconnected before this is thrown. | | `NOT_CONNECTED` | `sendUserMessage` / `sendContextualUpdate` / internal `publish()` called while status isn't `connected`. | | `INVALID_MESSAGE` | No longer raised. Malformed inbound data packets are silently ignored (rooms carry non-protocol data from other publishers). Kept in the type for compatibility. | | `DISCONNECTED` | Reserved for future use. | ## Fatal vs non-fatal [#fatal-vs-non-fatal] * **Fatal errors** (connection and microphone failures during `create()`) are **thrown** from the `create()` promise so callers can branch at construction time. * **Non-fatal errors** (media device errors from the transport) are **routed to `onError`**. The session continues. * **Non-protocol data packets** (malformed JSON, unknown shapes from other publishers in the room) are **silently ignored** — they never reach `onError`. ## Example [#example] ```ts try { const conv = await VoiceConversation.create({ ... }); } catch (err) { if (err instanceof SpekoClientError) { switch (err.code) { case 'CONNECTION_FAILED': // Token expired, network issue, or transport outage — ask user to retry. break; case 'MICROPHONE_FAILED': // Permission denied or device in use — surface a permissions prompt. break; } } throw err; } ``` # @spekoai/client (/client/overview) Browser SDK for real-time voice conversations. `@spekoai/client` is the browser-side companion to `@spekoai/sdk`. It connects a browser tab to a Speko voice session: capture the user's microphone, play the agent's audio, and exchange structured events such as transcripts and status changes. Your server must mint a short-lived session token and return only the browser-safe session credentials. Never expose a Speko API key to browser code. For `VoiceConversation`, audio flows through Speko's browser media transport after the token is minted. For `RealtimeVoiceConversation`, audio flows browser ↔ Speko's S2S WebSocket proxy. ## Install [#install] ```bash npm install @spekoai/client # or pnpm add @spekoai/client ``` The package does not expose low-level media transport types on its public surface, so most apps only import from `@spekoai/client` directly. ## Quick start [#quick-start] ### 1. Server mints a session [#1-server-mints-a-session] ```ts // server side — using @spekoai/sdk or raw fetch const session = await fetch('/v1/sessions', { ... }); // returns { transportToken, transportUrl, roomName, identity, expiresAt } ``` See [Build a voice agent](/guides/voice-agent) for the worker side and [Real-time browser conversation](/guides/realtime-conversation) for the end-to-end browser flow. ### 2. Browser joins the room [#2-browser-joins-the-room] ```ts import { VoiceConversation } from '@spekoai/client'; const conversation = await VoiceConversation.create({ transportToken, // from server transportUrl, // from server onConnect: ({ conversationId }) => console.log('connected', conversationId), onDisconnect: ({ reason }) => console.log('disconnected', reason), onMessage: ({ source, text, isFinal }) => console.log(source, text, isFinal), onStatusChange: (status) => console.log('status', status), onModeChange: (mode) => console.log('mode', mode), onError: (err) => console.error(err), }); await conversation.setMicMuted(true); conversation.setVolume(0.8); conversation.sendUserMessage('hello'); conversation.sendContextualUpdate('user switched to the checkout page'); await conversation.endSession(); ``` ## What the SDK owns [#what-the-sdk-owns] * Connecting with supplied short-lived session credentials. * Acquiring the microphone with sensible constraints (echo cancellation, noise suppression, auto gain — all togglable via `audioConstraints`). * Playing remote audio. * Parsing inbound data-channel packets (transcripts, agent messages) and invoking your callbacks. * Sending outbound packets — overrides, user messages, contextual updates. * Mic mute, speaker volume, output device selection. * Tearing everything down on disconnect, including releasing the OS microphone capture. ## What it doesn't do [#what-it-doesnt-do] * **Mint sessions from API keys.** Keep `SPEKO_API_KEY` on your server. Browser code should only receive short-lived session tokens. * **Retries.** A failed `connect()` throws a [`SpekoClientError`](/client/errors). Retry logic belongs in your app's UX. * **Tool calls, guardrail hooks, MCP, VAD score streaming.** Deferred — see the package's `ROADMAP.md`. ## Reference [#reference] * [VoiceConversation](/client/voice-conversation) — the primary API surface. * [RealtimeVoiceConversation](/client/realtime-voice-conversation) — browser capture/playback for S2S WebSocket sessions. * [Callbacks & events](/client/callbacks) — every hook the SDK exposes. * [Data channel protocol](/client/data-channel) — wire format for inbound / outbound packets. * [Errors](/client/errors) — `SpekoClientError` and its codes. # RealtimeVoiceConversation (/client/realtime-voice-conversation) Browser capture and playback for direct speech-to-speech WebSocket sessions. `RealtimeVoiceConversation` is the browser-side helper for Speko speech-to-speech (S2S) sessions. It connects directly to the S2S WebSocket returned by `POST /v1/sessions`, captures the microphone as PCM16, plays streamed PCM16 responses, and forwards transcript and status callbacks. Use it when you want the lowest-latency S2S path and do not need the browser media transport used by [`VoiceConversation`](/client/voice-conversation). ```ts import { RealtimeVoiceConversation } from '@spekoai/client'; ``` ## Mint the session on your server [#mint-the-session-on-your-server] Create S2S sessions on your backend so `SPEKO_API_KEY` never reaches the browser. Return only the short-lived WebSocket credentials. ```ts server.ts app.post('/api/realtime-session', async (_req, res) => { const response = await fetch('https://api.speko.dev/v1/sessions', { method: 'POST', headers: { Authorization: `Bearer ${process.env.SPEKO_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ mode: 's2s', s2s: { provider: 'openai', model: 'gpt-realtime', voice: 'alloy', systemPrompt: 'You are a concise voice assistant.', }, ttlSeconds: 900, }), }); if (!response.ok) { res.status(response.status).json({ error: 'Could not start realtime session' }); return; } const session = await response.json(); res.json({ sessionId: session.sessionId, wsUrl: session.wsUrl, wsToken: session.wsToken, expiresAt: session.expiresAt, inputSampleRate: session.inputSampleRate, outputSampleRate: session.outputSampleRate, }); }); ``` ## Connect from the browser [#connect-from-the-browser] ```tsx RealtimePanel.tsx import { useEffect, useRef, useState } from 'react'; import { RealtimeVoiceConversation } from '@spekoai/client'; export function RealtimePanel() { const convRef = useRef(null); const [status, setStatus] = useState('idle'); const [transcript, setTranscript] = useState([]); async function start() { setStatus('connecting'); const session = await fetch('/api/realtime-session', { method: 'POST', }).then((r) => r.json()); const conv = await RealtimeVoiceConversation.create({ ...session, onConnect: ({ conversationId }) => { console.log('connected', conversationId); }, onStatusChange: setStatus, onMessage: ({ source, text, isFinal }) => { if (isFinal) setTranscript((items) => [...items, `${source}: ${text}`]); }, onError: (err) => console.error(err), onDisconnect: () => setStatus('idle'), }); convRef.current = conv; } async function stop() { await convRef.current?.endSession(); convRef.current = null; } useEffect(() => () => { void convRef.current?.endSession(); }, []); return (

Status: {status}

    {transcript.map((item, i) =>
  • {item}
  • )}
); } ``` ## `RealtimeVoiceConversation.create(options)` [#realtimevoiceconversationcreateoptions] ```ts static create(options: RealtimeConversationOptions): Promise ``` `create()` opens the WebSocket, waits for a `ready` frame, starts microphone capture, then resolves. ### `RealtimeConversationOptions` [#realtimeconversationoptions] | Field | Type | Required | Description | | ------------------ | ----------------------------------- | -------- | -------------------------------------------------------------------------- | | `sessionId` | `string` | yes | Server-assigned session id. Also returned by `getId()`. | | `wsUrl` | `string` | yes | Short-lived S2S WebSocket URL returned by `POST /v1/sessions`. | | `wsToken` | `string` | yes | Short-lived WebSocket token. Sent as the first WebSocket subprotocol. | | `expiresAt` | `string?` | | ISO-8601 expiry for the WebSocket token. | | `inputSampleRate` | `16000 \| 24000?` | | Requested capture rate. Defaults to `24000`; the server can negotiate it. | | `outputSampleRate` | `16000 \| 24000?` | | Requested playback rate. Defaults to `24000`; the server can negotiate it. | | `inputDeviceId` | `string?` | | Specific microphone `deviceId`. | | `audioConstraints` | `AudioConstraints?` | | `echoCancellation`, `noiseSuppression`, `autoGainControl`. | | `onConnect` | `(d: { conversationId }) => void` | | Fired after the socket is ready and microphone capture has started. | | `onDisconnect` | `(d: DisconnectionDetails) => void` | | Fired when the client or socket closes. | | `onMessage` | `(m: ConversationMessage) => void` | | Transcript frames mapped to `{ source, text, isFinal }`. | | `onStatusChange` | `(s: ConversationStatus) => void` | | `connecting`, `connected`, `disconnecting`, or `disconnected`. | | `onModeChange` | `(m: ConversationMode) => void` | | `speaking` while response audio is queued, otherwise `listening`. | | `onError` | `(err: Error) => void` | | WebSocket transport errors and provider error frames. | ## Instance methods [#instance-methods] ### `getId(): string` [#getid-string] Returns the `sessionId` passed to `create()`. ### `isOpen(): boolean` [#isopen-boolean] `true` while the SDK status is `connected` and the WebSocket is open. ### `setMicMuted(muted: boolean): Promise` [#setmicmutedmuted-boolean-promisevoid] Mute or unmute local microphone capture. Muting disables the media track and stops PCM frames from being sent. ### `setVolume(volume: number): void` [#setvolumevolume-number-void] Set response playback volume from `0` to `1`. Values outside that range are clamped. ### `endSession(): Promise` [#endsession-promisevoid] Close the WebSocket, stop microphone tracks, clear queued playback, close the `AudioContext`, and transition to `disconnected`. ## Transport notes [#transport-notes] * The SDK passes `wsToken` as the first WebSocket subprotocol because browsers cannot set custom headers on `new WebSocket()`. * Outbound microphone audio is sent as 20 ms PCM16 binary frames at the negotiated input sample rate. * Inbound binary frames are PCM16 response audio at the negotiated output sample rate. * JSON frames with `t: 'transcript'` are forwarded to `onMessage`. JSON frames with `t: 'error'` are forwarded to `onError`. * `AudioWorklet` capture is used when available; the SDK falls back to `ScriptProcessorNode` for older browsers. # VoiceConversation (/client/voice-conversation) Primary API — construct, control, and tear down a voice session. `VoiceConversation` is the public class exported from `@spekoai/client`. Always construct it via the static `create()` factory — the factory awaits connection, so by the time it resolves the session is live. ```ts import { VoiceConversation } from '@spekoai/client'; ``` There is also a legacy namespace export, `Conversation`, with a single method `Conversation.startSession(options)` — it's an alias for `VoiceConversation.create(options)`, kept so consumers migrating from other SDKs can use familiar naming. ## `VoiceConversation.create(options)` [#voiceconversationcreateoptions] ```ts static create(options: CreateOptions): Promise ``` Where `CreateOptions` is the short-lived token shape: ```ts type CreateOptions = ConversationOptions; ``` Your backend calls `POST /v1/sessions`, optionally with an [`agentId`](/concepts/agents), and forwards only `transportToken` and `transportUrl` to the browser. `VoiceConversation.create()` connects to the media transport, publishes the microphone track, sends any [`overrides`](#conversationoverrides) over the data channel, fires `onConnect`, and resolves. It throws a [`SpekoClientError`](/client/errors) on connection, network, or microphone failure. Do not send `SPEKO_API_KEY` to browser code. `VoiceConversation` no longer accepts `agentId`, `apiKey`, or `apiBaseUrl`; session minting belongs on your server. ### `ConversationOptions` [#conversationoptions] | Field | Type | Required | Description | | ------------------ | ----------------------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `transportToken` | `string` | ✅ | Browser-safe media transport token, returned by your server. | | `transportUrl` | `string` | ✅ | Media transport URL, returned by your server. Pass it straight through — the SDK does not default this so consumers can't ship against the wrong environment. | | `overrides` | `ConversationOverrides?` | | Per-session agent / TTS overrides. Sent over the data channel right after connect. | | `inputDeviceId` | `string?` | | Specific microphone `deviceId`. | | `outputDeviceId` | `string?` | | Specific speaker `deviceId`. Applied via `setSinkId`; silently ignored on browsers without support. | | `audioConstraints` | `AudioConstraints?` | | `echoCancellation`, `noiseSuppression`, `autoGainControl`. All default `true`. | | `onConnect` | `(d: { conversationId }) => void` | | Fired after the mic publishes and status becomes `connected`. | | `onDisconnect` | `(d: DisconnectionDetails) => void` | | Fired on server or client disconnect. | | `onMessage` | `(m: ConversationMessage) => void` | | Inbound transcripts, agent messages, user-message echoes. | | `onStatusChange` | `(s: ConversationStatus) => void` | | `connecting → connected → disconnecting → disconnected`. | | `onModeChange` | `(m: ConversationMode) => void` | | `listening` vs `speaking`, derived from transport active-speaker events. | `conversationToken` and `livekitUrl` are still accepted as legacy aliases for existing callers. \| `onError` | `(err: Error) => void` | | Non-fatal errors (malformed data packets, media device errors, sink-id failures). | ### `ConversationOverrides` [#conversationoverrides] ```ts interface ConversationOverrides { agent?: { prompt?: string; firstMessage?: string; language?: string; }; tts?: { voiceId?: string; speed?: number; }; } ``` Overrides are JSON-serialized and published over the data channel immediately after the mic is live. The agent worker can read them and reconfigure the session before its first reply. ### `AudioConstraints` [#audioconstraints] ```ts interface AudioConstraints { echoCancellation?: boolean; // default true noiseSuppression?: boolean; // default true autoGainControl?: boolean; // default true } ``` The SDK always routes through `createLocalAudioTrack({ ... })` rather than `setMicrophoneEnabled(true)` so that constraints are applied even when no `inputDeviceId` is passed — `setMicrophoneEnabled` silently ignores them in that case. ## Instance methods [#instance-methods] ### `getId(): string` [#getid-string] Returns the transport conversation id. Populated after `create()` resolves. ### `isOpen(): boolean` [#isopen-boolean] `true` while the underlying status is `connected`. ### `setMicMuted(muted: boolean): Promise` [#setmicmutedmuted-boolean-promisevoid] Mute / unmute the local microphone track. Uses the track-level mute API when a track is attached; falls back to `LocalParticipant.setMicrophoneEnabled()` otherwise. ### `setVolume(volume: number): void` [#setvolumevolume-number-void] Set playback volume for every remote audio element (0–1, clamped). Applied immediately to existing elements and to future ones. ### `sendUserMessage(text: string): void` [#sendusermessagetext-string-void] Publish a `user_message` packet over the reliable data channel. Use when the user types rather than speaks — the agent receives it inline with its transcript stream. ### `sendContextualUpdate(text: string): void` [#sendcontextualupdatetext-string-void] Publish a `contextual_update` packet. Use for out-of-band context (e.g. "user switched to the checkout page"). Separate from `user_message` so agents can treat it as system-level context rather than a turn. ### `endSession(): Promise` [#endsession-promisevoid] Initiate clean disconnection. Sets status to `disconnecting`, asks the transport to disconnect; the disconnect event completes the teardown (stops the mic track, removes audio elements, fires `onDisconnect`). Idempotent — calling it twice is a no-op. ## Teardown invariants [#teardown-invariants] When disconnection completes (whether triggered by `endSession()`, agent leaving, token expiry, or error), the SDK: 1. Sets status to `disconnected` and fires `onStatusChange`. 2. Stops the local microphone track so the browser's mic indicator goes away. 3. Detaches and removes every `