Open a WebSocket
Streaming transcription is performed over a WebSocket. Provide the transcription parameters and establish a WebSocket connection to the endpoint.
Stream audio and receive transcriptions
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). In parallel, receive transcription from the WebSocket.
Stream audio to get transcription continuously in real-time.
Stream audio to get transcription continuously in real-time.
Please use the following serverless endpoint:
Your Fireworks API key, e.g. Authorization=API_KEY
. Alternatively, can be provided as a query param.
The format in which to return the response. Currently only verbose_json
is recommended for streaming.
The target language for transcription. See the Supported Languages section below for a complete list of available languages.
The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. Um, here's, uh, what was recorded.
will make the model to include the filler words into the transcription.
Sampling temperature to use when decoding text tokens during transcription.
This field is for client to send audio chunks over to server. Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono).
This field is for client to send audio chunks over to server. Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono).
This field is for client event initiating the context clean up.
A unique identifier for the event.
A constant string that identifies the type of event as “stt.state.clear”.
The ID of the context or session to be cleared.
The task that was performed — either transcribe
or translate
.
The language of the transcribed/translated text.
The transcribed/translated text.
Extracted words and their corresponding timestamps.
Segments of the transcribed/translated text and their corresponding details.
The task that was performed — either transcribe
or translate
.
The language of the transcribed/translated text.
The transcribed/translated text.
Extracted words and their corresponding timestamps.
Segments of the transcribed/translated text and their corresponding details.
This field is for server to communicate it successfully cleared the context.
A unique identifier for the event.
A constant string indicating the event type is “stt.state.cleared”
The ID of the context or session that has been successfully cleared.
Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). Typically, you will:
The client maintains a state dictionary, starting with an empty dictionary {}
. When the server sends the first transcription message, it contains a list of segments. Each segment has an id
and text
:
When the server sends the next updates to the transcription, the client updates the state dictionary based on the segment id
:
Check out a brief Python example below or example sources:
For fixed throughput and predictable SLAs, you may request a dedicated endpoint for streaming transcription at inquiries@fireworks.ai or discord.
The following languages are supported for transcription:
Language Code | Language Name |
---|---|
en | English |
zh | Chinese |
de | German |
es | Spanish |
ru | Russian |
ko | Korean |
fr | French |
ja | Japanese |
pt | Portuguese |
tr | Turkish |
pl | Polish |
ca | Catalan |
nl | Dutch |
ar | Arabic |
sv | Swedish |
it | Italian |
id | Indonesian |
hi | Hindi |
fi | Finnish |
vi | Vietnamese |
he | Hebrew |
uk | Ukrainian |
el | Greek |
ms | Malay |
cs | Czech |
ro | Romanian |
da | Danish |
hu | Hungarian |
ta | Tamil |
no | Norwegian |
th | Thai |
ur | Urdu |
hr | Croatian |
bg | Bulgarian |
lt | Lithuanian |
la | Latin |
mi | Maori |
ml | Malayalam |
cy | Welsh |
sk | Slovak |
te | Telugu |
fa | Persian |
lv | Latvian |
bn | Bengali |
sr | Serbian |
az | Azerbaijani |
sl | Slovenian |
kn | Kannada |
et | Estonian |
mk | Macedonian |
br | Breton |
eu | Basque |
is | Icelandic |
hy | Armenian |
ne | Nepali |
mn | Mongolian |
bs | Bosnian |
kk | Kazakh |
sq | Albanian |
sw | Swahili |
gl | Galician |
mr | Marathi |
pa | Punjabi |
si | Sinhala |
km | Khmer |
sn | Shona |
yo | Yoruba |
so | Somali |
af | Afrikaans |
oc | Occitan |
ka | Georgian |
be | Belarusian |
tg | Tajik |
sd | Sindhi |
gu | Gujarati |
am | Amharic |
yi | Yiddish |
lo | Lao |
uz | Uzbek |
fo | Faroese |
ht | Haitian Creole |
ps | Pashto |
tk | Turkmen |
nn | Nynorsk |
mt | Maltese |
sa | Sanskrit |
lb | Luxembourgish |
my | Myanmar |
bo | Tibetan |
tl | Tagalog |
mg | Malagasy |
as | Assamese |
tt | Tatar |
haw | Hawaiian |
ln | Lingala |
ha | Hausa |
ba | Bashkir |
jw | Javanese |
su | Sundanese |
yue | Cantonese |