Speko Docs
Build

One-shot APIs

POST /v1/transcribe, /v1/synthesize, /v1/complete — single-turn calls without sessions.

For batch transcription, server-side TTS, and non-voice LLM completions you don't need a real-time session — just call the one-shot endpoints directly. Each is a single round-trip with built-in routing and failover.

Auth

Every one-shot call needs a bearer API key. Mint one at API keys.

Authorization: Bearer sk_live_...

Transcribe

curl -X POST https://api.speko.dev/v1/transcribe \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: audio/wav" \
  -H "x-speko-intent: {\"language\":\"en-US\"}" \
  --data-binary @call.wav

Response:

{
  "text": "...",
  "provider": "deepgram",
  "model": "nova-2",
  "confidence": 0.94,
  "failoverCount": 0,
  "scoresRunId": "..."
}

Notes:

  • Audio body is binary. Wrap PCM/MP3/WAV/etc. in the request body — no base64.
  • Intent goes in the x-speko-intent header (JSON), not the body. Constraints in x-speko-constraints.

Synthesize

curl -X POST https://api.speko.dev/v1/synthesize \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world.",
    "intent": { "language": "en-US" },
    "voice": null,
    "speed": 1
  }' \
  --output speech.bin

Response body is the audio. Content-Type indicates the format (e.g. audio/pcm;rate=24000 for Cartesia, audio/mpeg for ElevenLabs). Routing headers (X-Speko-Provider, etc.) tell you which provider ran.

Complete

curl -X POST https://api.speko.dev/v1/complete \
  -H "Authorization: Bearer $SPEKO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are concise." },
      { "role": "user", "content": "Hi!" }
    ],
    "intent": { "language": "en" }
  }'

Response:

{
  "text": "Hello!",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "usage": { "promptTokens": 14, "completionTokens": 4 },
  "failoverCount": 0,
  "scoresRunId": "..."
}

With an SDK

Both SDKs wrap all three endpoints with matching shapes. See @spekoai/sdk and spekoai (Python).

import { Speko } from '@spekoai/sdk';

const speko = new Speko({ apiKey: process.env.SPEKO_API_KEY! });

const t = await speko.transcribe(buf, { language: 'en-US' });
const a = await speko.synthesize('Hello', { language: 'en' });
const c = await speko.complete({
  messages: [{ role: 'user', content: 'Hi!' }],
  intent: { language: 'en' },
});
from spekoai import Speko

speko = Speko(api_key=os.environ["SPEKO_API_KEY"])

t = speko.transcribe(buf, language="en-US")
a = speko.synthesize("Hello", language="en")
c = speko.complete(
    messages=[{"role": "user", "content": "Hi!"}],
    intent={"language": "en"},
)

When not to use one-shot

If you need real-time voice with sub-second latency, barge-in, and partial transcripts — use sessions. One-shot endpoints are for batch and server-internal flows.

On this page