Skip to main content
Open a speech-to-speech (S2S) session. speko.realtime.connect() mints a short-lived WebSocket token via POST /v1/sessions and returns a handle connected directly to Speko’s S2S proxy, which bridges to the underlying provider. LiveKit is skipped entirely so time-to-first-audio stays under ~300 ms.
Realtime targets the browser WebSocket global. In Node 22+ you can polyfill with ws via globalThis.WebSocket = (await import('ws')).WebSocket; — but the typical deployment is a browser paired with @spekoai/client for mic capture.
import { Speko } from '@spekoai/sdk';

const speko = new Speko({ apiKey: import.meta.env.VITE_SPEKO_API_KEY });

const session = await speko.realtime.connect({
  provider: 'openai',
  model: 'gpt-realtime',
});

session.on((frame) => {
  if (frame.type === 'audio') play(frame.pcm);
  else if (frame.type === 'transcript') console.log(frame.role, frame.text);
});

session.sendAudio(pcm16Chunk);
// ... end of user turn
session.commit();

speko.realtime.connect(params)

Signature

speko.realtime.connect(
  params: RealtimeConnectParams,
): Promise<RealtimeSessionHandle>

RealtimeConnectParams

FieldTypeDescription
provider'openai' | 'google' | 'xai'S2S provider.
modelstringProvider-specific model id (e.g. gpt-realtime, gemini-2.5-flash-native-audio).
voicestring?Voice id override — interpreted per provider.
systemPromptstring?Initial system instruction.
temperaturenumber?
inputSampleRate16000 | 24000?PCM rate you’ll be sending.
outputSampleRate16000 | 24000?PCM rate you want back.
toolsRealtimeToolSpec[]?Tool definitions the assistant may call.
metadataRecord<string, unknown>?Free-form metadata attached to the session record.
ttlSecondsnumber?Max session duration. Server-capped at 1800 (30 min).

RealtimeSessionHandle

PropertyTypeDescription
sessionIdstringServer-assigned session id.
expiresAtstringISO-8601 expiry of the WS token.

Methods

MethodDescription
sendAudio(pcm: Uint8Array): voidShip a PCM16 audio chunk up to the model.
commit(): voidSignal end-of-user-turn; server flushes buffered audio upstream.
interrupt(): voidCancel the assistant’s in-flight response.
sendToolResult(callId, output): voidReturn the result of a previously-issued tool_call.
on(handler): () => voidSubscribe to frames. Returns an unsubscribe callback.
close(code?, reason?): voidClose the socket. Idempotent.

RealtimeFrame variants

type RealtimeFrame =
  | { type: 'audio'; pcm: Uint8Array; sampleRate: number }
  | { type: 'transcript'; role: 'user' | 'assistant'; text: string; final: boolean }
  | { type: 'tool_call'; callId: string; name: string; arguments: string }
  | { type: 'usage'; inputAudioTokens: number; outputAudioTokens: number }
  | { type: 'error'; code: string; message: string }
  | { type: 'close'; code: number; reason: string };

Example — tool calls

const session = await speko.realtime.connect({
  provider: 'openai',
  model: 'gpt-realtime',
  tools: [
    {
      name: 'get_weather',
      description: 'Current weather for a city.',
      parameters: {
        type: 'object',
        properties: { city: { type: 'string' } },
        required: ['city'],
      },
    },
  ],
});

session.on(async (frame) => {
  if (frame.type === 'tool_call' && frame.name === 'get_weather') {
    const { city } = JSON.parse(frame.arguments);
    const result = await fetchWeather(city);
    session.sendToolResult(frame.callId, result);
  }
});

Transport notes

  • Auth via subprotocol. The WS token is passed as the first WebSocket subprotocol. Browsers can’t set headers on new WebSocket(), so subprotocol is the only auth carrier that doesn’t leak through URL params.
  • Binary type. The SDK forces binaryType = 'arraybuffer'; inbound audio arrives as Uint8Array over an ArrayBuffer.
  • Missing PCM. Until you call sendAudio the upstream provider sees no user input. Hook up a Web Audio AudioWorklet capture on the client side, or use @spekoai/client which handles capture for you over LiveKit.