complete
POST /v1/complete — LLM completion with automatic provider routing.
Run an LLM completion. The router picks the best LLM provider for your intent and fails over automatically.
const { text, provider } = await speko.complete({
messages: [{ role: 'user', content: 'Hi!' }],
intent: { language: 'en' },
});Signature
speko.complete(
params: CompleteParams,
abortSignal?: AbortSignal,
): Promise<CompleteResult>
speko.completeStream(
params: CompleteParams,
abortSignal?: AbortSignal,
): AsyncIterable<CompleteStreamEvent>Parameters
params: CompleteParams
| Field | Type | Description |
|---|---|---|
messages | ChatMessage[] | Conversation history. Roles: system, user, assistant, tool. |
intent | RoutingIntent | language, optional region and optimizeFor. |
systemPrompt | string? | Shortcut for a leading system message. Providers that distinguish the system channel use it natively; others fold it into the message list. |
temperature | number? | Forwarded to the provider. Defaults to the provider's default. |
maxTokens | number? | Max completion tokens. Defaults to the provider's default. |
reasoningEffort | 'none' | 'minimal' | 'low' | 'medium' | 'high' | 'xhigh'? | OpenAI reasoning-model effort override. Defaults are tuned from intent.optimizeFor. |
constraints | PipelineConstraints? | Allow-list constraints. |
tools | ChatTool[]? | JSON Schema tool definitions exposed to the model. |
toolChoice | ChatToolChoice? | auto, none, required, or a specific function name. |
parallelToolCalls | boolean? | Provider hint for whether multiple tool calls may be emitted in one turn. |
maxToolHops | number? | Server-side hop cap for webhook or builtin tools. Defaults to 8. |
ChatMessage
interface ChatMessage {
role: 'system' | 'user' | 'assistant' | 'tool';
content: string;
toolCalls?: ChatToolCall[];
toolCallId?: string;
isError?: boolean;
}
interface ChatToolCall {
id: string;
name: string;
args: string;
}
interface ChatTool {
name: string;
description: string;
parameters: Record<string, unknown>;
executionMode?: 'inline' | 'webhook' | 'builtin';
}abortSignal?: AbortSignal
Cancel an in-flight request.
Returns
CompleteResult
| Field | Type | Description |
|---|---|---|
text | string | Assistant reply. |
provider | string | Upstream LLM provider (e.g. openai, anthropic, groq). |
model | string | Provider-specific model id. |
usage.promptTokens | number | Prompt token count. |
usage.completionTokens | number | Completion token count. |
failoverCount | number | Providers tried before this one succeeded. |
scoresRunId | string | null | Scoring run id that selected this provider. |
toolCalls | ChatToolCall[]? | Tool calls emitted by the assistant when inline tool execution is needed. |
Streaming
The wire response is text/event-stream with meta, delta, tool_call,
server_tool_call, done, and error events. speko.complete() consumes that
stream and returns the final CompleteResult; use speko.completeStream() to
render deltas or tool-call progress as it arrives.
Tool execution
Tools can run inline in your worker, through Speko-managed webhooks, or as builtins. Omitting executionMode preserves the inline behavior: the model's tool calls return in toolCalls, and your app adds role: 'tool' messages before calling complete() again. Webhook and builtin tools are executed by Speko server-side and may emit server_tool_call streaming events before the final response.
Example: multi-turn
const messages: ChatMessage[] = [
{ role: 'system', content: 'You are a concise voice assistant.' },
{ role: 'user', content: 'Book me an appointment for Tuesday.' },
];
const first = await speko.complete({
messages,
intent: { language: 'en' },
temperature: 0.3,
maxTokens: 200,
});
messages.push({ role: 'assistant', content: first.text });
messages.push({ role: 'user', content: '3pm, with Dr. Chen.' });
const second = await speko.complete({
messages,
intent: { language: 'en' },
});Example: pin a provider
await speko.complete({
messages: [...],
intent: { language: 'en' },
constraints: { allowedProviders: { llm: ['anthropic'] } },
});