1
Open a WebSocket
Streaming transcription is performed over a WebSocket. Provide the transcription parameters and establish a WebSocket connection to the endpoint.
2
Stream audio and receive transcriptions
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). In parallel, receive transcription from the WebSocket.
Explore Python sources
Stream audio to get transcription continuously in real-time.
Explore Node.js sources
Stream audio to get transcription continuously in real-time.
URL
Please use the following serverless endpoint:Headers
Your Fireworks API key, e.g.
Authorization=API_KEY
. Alternatively, can be provided as a query param.Query Parameters
The format in which to return the response. Currently only
verbose_json
is recommended for streaming.The target language for transcription. See the Supported Languages section below for a complete list of available languages.
The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g.
Um, here's, uh, what was recorded.
will make the model to include the filler words into the transcription.Sampling temperature to use when decoding text tokens during transcription.
The timestamp granularities to populate for this streaming transcription. Defaults to null. Set to
word,segment
to enable timestamp granularities. Use a list for timestamp_granularities in all client libraries. A comma-separated string like word,segment
only works when manually included in the URL (e.g. in curl).Client messages
This field is for client to send audio chunks over to server. Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono).
Server messages
The task that was performed — either
transcribe
or translate
.The language of the transcribed/translated text.
The transcribed/translated text.
Extracted words and their corresponding timestamps.
Segments of the transcribed/translated text and their corresponding details.
Streaming Audio
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). Typically, you will:- Resample your audio to 16 kHz if it is not already.
- Convert it to mono.
- Send 50ms chunks (16,000 Hz * 0.05s = 800 samples) of audio in 16-bit PCM (signed, little-endian) format.
Handling Responses
The client maintains a state dictionary, starting with an empty dictionary{}
. When the server sends the first transcription message, it contains a list of segments. Each segment has an id
and text
:
id
:
Handling Connection Interruptions & Timeouts
Real-time streaming transcription over WebSockets can run for a long time. The longer a WebSocket session runs, the more likely it is to experience interruptions from network glitches to service hiccups. It is important to be aware of this and build your client to recover gracefully so the stream keeps going without user impact. In the following section, we’ll outline recommended practices for handling connection interruptions and timeouts effectively.When a connection drops
Although Fireworks is designed to keep streams running smoothly, occasional interruptions can still occur. If the WebSocket is disrupted (e.g., bandwidth limitation or network failures), your application must initialize a new WebSocket connection, start a fresh streaming session and begin sending audio as soon as the server confirms the connection is open.Avoid losing audio during reconnects
While you’re reconnecting, audio could be still being produced and you could lose that audio segment if it is not transferred to our API during this period. To minimize the risk of dropping audio during a reconnect, one effective approach is to store the audio data in a buffer until it can re-establish the connection to our API and then sends the data for transcription.Keep timestamps continuous across sessions
When timestamps are enabled, the result will include the start and end time of the segment in seconds. And each new WebSocket session will reset the timestamps to start from 00:00:00. To keep a continuous timeline, we recommend to maintain a running “stream start offset” in your app and add that offset to timestamps from each new session so they align with the overall audio timeline.Example Usage
Check out a brief Python example below or example sources:Dedicated endpoint
For fixed throughput and predictable SLAs, you may request a dedicated endpoint for streaming transcription at inquiries@fireworks.ai or discord.Supported Languages
The following languages are supported for transcription:Language Code | Language Name |
---|---|
en | English |
zh | Chinese |
de | German |
es | Spanish |
ru | Russian |
ko | Korean |
fr | French |
ja | Japanese |
pt | Portuguese |
tr | Turkish |
pl | Polish |
ca | Catalan |
nl | Dutch |
ar | Arabic |
sv | Swedish |
it | Italian |
id | Indonesian |
hi | Hindi |
fi | Finnish |
vi | Vietnamese |
he | Hebrew |
uk | Ukrainian |
el | Greek |
ms | Malay |
cs | Czech |
ro | Romanian |
da | Danish |
hu | Hungarian |
ta | Tamil |
no | Norwegian |
th | Thai |
ur | Urdu |
hr | Croatian |
bg | Bulgarian |
lt | Lithuanian |
la | Latin |
mi | Maori |
ml | Malayalam |
cy | Welsh |
sk | Slovak |
te | Telugu |
fa | Persian |
lv | Latvian |
bn | Bengali |
sr | Serbian |
az | Azerbaijani |
sl | Slovenian |
kn | Kannada |
et | Estonian |
mk | Macedonian |
br | Breton |
eu | Basque |
is | Icelandic |
hy | Armenian |
ne | Nepali |
mn | Mongolian |
bs | Bosnian |
kk | Kazakh |
sq | Albanian |
sw | Swahili |
gl | Galician |
mr | Marathi |
pa | Punjabi |
si | Sinhala |
km | Khmer |
sn | Shona |
yo | Yoruba |
so | Somali |
af | Afrikaans |
oc | Occitan |
ka | Georgian |
be | Belarusian |
tg | Tajik |
sd | Sindhi |
gu | Gujarati |
am | Amharic |
yi | Yiddish |
lo | Lao |
uz | Uzbek |
fo | Faroese |
ht | Haitian Creole |
ps | Pashto |
tk | Turkmen |
nn | Nynorsk |
mt | Maltese |
sa | Sanskrit |
lb | Luxembourgish |
my | Myanmar |
bo | Tibetan |
tl | Tagalog |
mg | Malagasy |
as | Assamese |
tt | Tatar |
haw | Hawaiian |
ln | Lingala |
ha | Hausa |
ba | Bashkir |
jw | Javanese |
su | Sundanese |
yue | Cantonese |