Speko Docs

complete

POST /v1/complete — LLM completion with automatic provider routing.

Run an LLM completion. The router picks the best LLM provider for your intent and fails over automatically.

const { text, provider } = await speko.complete({
  messages: [{ role: 'user', content: 'Hi!' }],
  intent: { language: 'en' },
});

Signature

speko.complete(
  params: CompleteParams,
  abortSignal?: AbortSignal,
): Promise<CompleteResult>

speko.completeStream(
  params: CompleteParams,
  abortSignal?: AbortSignal,
): AsyncIterable<CompleteStreamEvent>

Parameters

params: CompleteParams

FieldTypeDescription
messagesChatMessage[]Conversation history. Roles: system, user, assistant, tool.
intentRoutingIntentlanguage, optional region and optimizeFor.
systemPromptstring?Shortcut for a leading system message. Providers that distinguish the system channel use it natively; others fold it into the message list.
temperaturenumber?Forwarded to the provider. Defaults to the provider's default.
maxTokensnumber?Max completion tokens. Defaults to the provider's default.
reasoningEffort'none' | 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'?OpenAI reasoning-model effort override. Defaults are tuned from intent.optimizeFor.
constraintsPipelineConstraints?Allow-list constraints.
toolsChatTool[]?JSON Schema tool definitions exposed to the model.
toolChoiceChatToolChoice?auto, none, required, or a specific function name.
parallelToolCallsboolean?Provider hint for whether multiple tool calls may be emitted in one turn.
maxToolHopsnumber?Server-side hop cap for webhook or builtin tools. Defaults to 8.

ChatMessage

interface ChatMessage {
  role: 'system' | 'user' | 'assistant' | 'tool';
  content: string;
  toolCalls?: ChatToolCall[];
  toolCallId?: string;
  isError?: boolean;
}

interface ChatToolCall {
  id: string;
  name: string;
  args: string;
}

interface ChatTool {
  name: string;
  description: string;
  parameters: Record<string, unknown>;
  executionMode?: 'inline' | 'webhook' | 'builtin';
}

abortSignal?: AbortSignal

Cancel an in-flight request.

Returns

CompleteResult

FieldTypeDescription
textstringAssistant reply.
providerstringUpstream LLM provider (e.g. openai, anthropic, groq).
modelstringProvider-specific model id.
usage.promptTokensnumberPrompt token count.
usage.completionTokensnumberCompletion token count.
failoverCountnumberProviders tried before this one succeeded.
scoresRunIdstring | nullScoring run id that selected this provider.
toolCallsChatToolCall[]?Tool calls emitted by the assistant when inline tool execution is needed.

Streaming

The wire response is text/event-stream with meta, delta, tool_call, server_tool_call, done, and error events. speko.complete() consumes that stream and returns the final CompleteResult; use speko.completeStream() to render deltas or tool-call progress as it arrives.

Tool execution

Tools can run inline in your worker, through Speko-managed webhooks, or as builtins. Omitting executionMode preserves the inline behavior: the model's tool calls return in toolCalls, and your app adds role: 'tool' messages before calling complete() again. Webhook and builtin tools are executed by Speko server-side and may emit server_tool_call streaming events before the final response.

Example: multi-turn

const messages: ChatMessage[] = [
  { role: 'system', content: 'You are a concise voice assistant.' },
  { role: 'user',   content: 'Book me an appointment for Tuesday.' },
];

const first = await speko.complete({
  messages,
  intent: { language: 'en' },
  temperature: 0.3,
  maxTokens: 200,
});

messages.push({ role: 'assistant', content: first.text });
messages.push({ role: 'user',      content: '3pm, with Dr. Chen.' });

const second = await speko.complete({
  messages,
  intent: { language: 'en' },
});

Example: pin a provider

await speko.complete({
  messages: [...],
  intent: { language: 'en' },
  constraints: { allowedProviders: { llm: ['anthropic'] } },
});

On this page