Streaming Transcription
Open a WebSocket
Streaming transcription is performed over a WebSocket. Provide the transcription parameters and establish a WebSocket connection to the endpoint.
Stream audio and receive transcriptions
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). In parallel, receive transcription from the WebSocket.
Headers
Your Fireworks API key, e.g. Authorization=API_KEY
.
Query Parameters
String name of the ASR model to use. Can be one of whisper-v3
or whisper-v3-turbo
. Please use the following serverless endpoints for evaluation:
wss://audio-streaming.us-virginia-1.direct.fireworks.ai
(for whisper-v3 compatible model);wss://audio-streaming-turbo.us-virginia-1.direct.fireworks.ai
(for whisper-v3-turbo compatible model);
The format in which to return the response. Currently only verbose_json
is recommended for streaming.
The target language for transcription. The set of supported target languages can be found here.
The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. Um, here's, uh, what was recorded.
will make the model to include the filler words into the transcription.
Sampling temperature to use when decoding text tokens during transcription.
Streaming Audio
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). Typically, you will:
- Resample your audio to 16 kHz if it is not already.
- Convert it to mono.
- Send 50ms chunks (16,000 Hz * 0.05s = 800 samples) of audio in 16-bit PCM (signed, little-endian) format.
Handling Responses
The client maintains a state dictionary, starting with an empty dictionary {}
. When the server sends the first transcription message, it contains a list of segments. Each segment has an id
and text
:
When the server sends the next updates to the transcription, the client updates the state dictionary based on the segment id
:
Example Usage
Dedicated endpoint
For fixed throughput and predictable SLAs, you may request a dedicated endpoints for streaming transcription at inquiries@fireworks.ai or discord.
Was this page helpful?