Speko Docs

transcribe

POST /v1/transcribe — speech-to-text with automatic provider routing.

Transcribe an audio payload. The router picks the best STT provider for your (language, region, optimizeFor) and fails over automatically.

const { text, provider, confidence } = await speko.transcribe(audio, {
  language: 'es-MX',
});

Signature

speko.transcribe(
  audio: Uint8Array,
  options: TranscribeOptions,
  abortSignal?: AbortSignal,
): Promise<TranscribeResult>

speko.transcribeStream(
  audio: Uint8Array,
  options: TranscribeOptions,
  abortSignal?: AbortSignal,
): AsyncIterable<TranscribeStreamEvent>

Parameters

audio: Uint8Array

Raw audio bytes. Default MIME is audio/wav; set options.contentType if you send something else. Providers handle resampling and format conversion downstream — you don't have to match a specific sample rate.

options: TranscribeOptions

Extends RoutingIntent:

FieldTypeDescription
languagestring (BCP-47)e.g. "en", "es-MX", "ja-JP".
regionstring?Region to rank streaming providers in. Defaults to global server-side.
optimizeFor'balanced' | 'accuracy' | 'latency' | 'cost'?Bias the weighted score. Defaults to the server-side default (currently balanced).
contentTypestring?MIME type for the body. Defaults to audio/wav.
constraintsPipelineConstraints?Allow-list constraints (see Types).
keywordsreadonly string[]?Domain words and proper nouns to bias STT output toward.

abortSignal?: AbortSignal

Cancel an in-flight request. Composed with the client-level timeout.

Returns

TranscribeResult

FieldTypeDescription
textstringTranscribed text.
providerstringUpstream provider that ran the request (e.g. deepgram, openai).
modelstringProvider-specific model identifier.
confidencenumber | nullModel-reported confidence when available, else null.
failoverCountnumberHow many providers were tried before this one succeeded.
scoresRunIdstring | nullID of the scoring run that selected this provider — useful for joining to benchmark data.

Wire format

The SDK sends the audio as the raw HTTP body with:

  • Content-Type: from options.contentType (default audio/wav).
  • X-Speko-Intent: JSON-serialized { language, region?, optimizeFor? }.
  • X-Speko-Constraints: JSON-serialized options.constraints (only if set).
  • X-Speko-Stt-Options: JSON-serialized { keywords } (only if keywords are set).

The wire response is text/event-stream with meta, transcript, done, and error events. speko.transcribe() consumes that stream and returns the final TranscribeResult; use speko.transcribeStream() to receive partial transcripts directly.

Example: non-default MIME

import { readFile } from 'node:fs/promises';

const audio = await readFile('./call.ogg');
const result = await speko.transcribe(audio, {
  language: 'en',
  contentType: 'audio/ogg',
  optimizeFor: 'accuracy',
});

Example: restrict provider pool

const result = await speko.transcribe(audio, {
  language: 'en',
  constraints: {
    allowedProviders: { stt: ['deepgram', 'assemblyai'] },
  },
});

The router still ranks candidates by benchmark score and only picks from the allow-list.

Example: bias proper nouns

const result = await speko.transcribe(audio, {
  language: 'en',
  keywords: ['Speko', 'Ava Martinez', 'Cartesia'],
});

On this page