Streaming Transcription
Open a WebSocket
Streaming transcription is performed over a WebSocket. Provide the transcription parameters and establish a WebSocket connection to the endpoint.
Stream audio and receive transcriptions
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). In parallel, receive transcription from the WebSocket.
Explore Python sources
Stream audio to get transcription continuously in real-time.
Explore Node.js sources
Stream audio to get transcription continuously in real-time.
URL
Please use the following serverless endpoint:
Headers
Your Fireworks API key, e.g. Authorization=API_KEY
.
Query Parameters
The format in which to return the response. Currently only verbose_json
is recommended for streaming.
The target language for transcription. The set of supported target languages can be found here.
The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. Um, here's, uh, what was recorded.
will make the model to include the filler words into the transcription.
Sampling temperature to use when decoding text tokens during transcription.
Streaming Audio
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). Typically, you will:
- Resample your audio to 16 kHz if it is not already.
- Convert it to mono.
- Send 50ms chunks (16,000 Hz * 0.05s = 800 samples) of audio in 16-bit PCM (signed, little-endian) format.
Handling Responses
The client maintains a state dictionary, starting with an empty dictionary {}
. When the server sends the first transcription message, it contains a list of segments. Each segment has an id
and text
:
When the server sends the next updates to the transcription, the client updates the state dictionary based on the segment id
:
Example Usage
Check out a brief Python example below or example sources:
Dedicated endpoint
For fixed throughput and predictable SLAs, you may request a dedicated endpoints for streaming transcription at inquiries@fireworks.ai or discord.
Was this page helpful?