Realtime sessions are only available on
AsyncSpeko. A synchronous WebSocket loop would block the event loop on every audio chunk, defeating the purpose of a low-latency S2S pipeline.Signature
RealtimeConnectParams
Provider-specific model id (e.g.
gpt-realtime, gemini-2.5-flash-native-audio, grok-voice-beta).Voice id override — interpreted per provider.
Tool definitions the assistant may call. Receive
tool_call frames and respond with send_tool_result.Free-form metadata attached to the session record.
Max session duration in seconds. Server-capped at 1800 (30 min).
AsyncRealtimeSession
The returned session is both an async context manager and an async iterator.
Properties
| Property | Type | Description |
|---|---|---|
session_id | str | Server-assigned session identifier. |
expires_at | str | ISO-8601 expiry for the WS token. |
Methods
Ship a PCM16 chunk up to the model (binary WS frame).
Signal end-of-user-turn; the server flushes buffered audio upstream.
Cancel the assistant’s current response mid-generation.
Return the result of a previously-issued tool call.
Close the socket. Safe to call multiple times; the context manager calls it for you on exit.
Frame types
Iterating the session yields dicts tagged bytype:
Frame type | Payload fields |
|---|---|
audio | pcm: bytes, sample_rate: 24000 |
transcript | role: 'user' | 'assistant', text: str, final: bool |
tool_call | call_id: str, name: str, arguments: str (JSON) |
usage | input_audio_tokens: int, output_audio_tokens: int |
error | code: str, message: str |
close | reason: str |