framesToWav
AudioFrames (or an array) into a PCM16 mono WAV byte stream. Used by SpekoSTT to wrap each utterance before uploading to /v1/transcribe.
- Combines frames via
combineAudioFramesfrom@livekit/rtc-node. - Writes a standard 44-byte RIFF/WAVE header:
fmtchunk (PCM, 16-bit, mono,sampleRatefrom frames) +datachunk. - Sample rate is pulled from the input frames — whatever LiveKit gives you is what’s encoded.
AudioBuffer throws:
parseWav
SpekoTTS to unwrap WAV-encoded proxy responses into raw samples for AudioByteStream.
Accepted subset:
- Valid
RIFF/WAVEheader. fmtchunk present and offormat = 1(PCM).- 16-bit samples.
datachunk reachable by walking subsequent chunks (tolerates e.g.LISTchunks betweenfmtanddata).
channels is returned as-is — the caller is responsible for deciding whether stereo is acceptable. SpekoTTS currently throws on stereo.
pcmSampleRateFromContentType
rate parameter out of a Cartesia-style content type:
rate=.
Intended usage
You shouldn’t need these helpers when consuming the adapter throughcreateSpekoComponents — they’re used internally by SpekoSTT and SpekoTTS. They’re exported for:
- Unit tests — build canned WAV fixtures with
framesToWav, round-trip them throughparseWav. - Custom STT / TTS pipelines that need to reuse the same WAV framing Speko uses.
- Debugging — decode what an upstream provider returned without instantiating a full TTS.