Transcribe audio

POST

audio

transcriptions


# Download audio file
curl -L -o "audio.flac" "https://tinyurl.com/4997djsh"

# Make request
curl -X POST "https://audio-prod.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions" \
-H "Authorization: <FIREWORKS_API_KEY>" \
-F "file=@audio.flac"

Try notebook

Send a sample audio to get a transcription.

Request

(multi-part form)

file

file | string

required

The input audio file to transcribe or an URL to the public audio file.

Max audio file size is 1 GB, there is no limit for audio duration. Common file formats such as mp3, flac, and wav are supported. Note that the audio will be resampled to 16kHz, downmixed to mono, and reformatted to 16-bit signed little-endian format before transcription. Pre-converting the file before sending it to the API can improve runtime performance.

model

string

default:"whisper-v3"

String name of the ASR model to use. Can be one of whisper-v3 or whisper-v3-turbo. Please use the following serverless endpoints:

https://audio-prod.us-virginia-1.direct.fireworks.ai (for whisper-v3);
https://audio-turbo.us-virginia-1.direct.fireworks.ai (for whisper-v3-turbo);

vad_model

string

default:"silero"

String name of the voice activity detection (VAD) model to use. Can be one of silero, or whisperx-pyannet.

alignment_model

string

default:"tdnn_ffn"

String name of the alignment model to use. Currently supported:

mms_fa optimal accuracy for multilingual speech.
tdnn_ffn optimal accuracy for English-only speech.
gentle best accuracy for English-only speech (requires a dedicated endpoint, contact us at inquiries@fireworks.ai).

language

string | null

The target language for transcription. See the Supported Languages section below for a complete list of available languages.

prompt

string | null

The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. Um, here's, uh, what was recorded. will make the model to include the filler words into the transcription.

temperature

float | list[float]

default:"0"

Sampling temperature to use when decoding text tokens during transcription. Alternatively, fallback decoding can be enabled by passing a list of temperatures like 0.0,0.2,0.4,0.6,0.8,1.0. This can help to improve performance.

response_format

string

default:"json"

The format in which to return the response. Can be one of json, text, srt, verbose_json, or vtt.

timestamp_granularities

string | list[string]

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported. Can be one of word, segment, or word,segment. If not present, defaults to segment.

diarize

string

Whether to get speaker diarization for the transcription. Can be one of true, or false. If not present, defaults to false.

Enabling diarization also requires other fields to hold specific values:

response_format must be set verbose_json.
timestamp_granularities must include word to use diarization.

min_speakers

int

The minimum number of speakers to detect for diarization. diarize must be set true to use min_speakers. If not present, defaults to 1.

max_speakers

int

The maximum number of speakers to detect for diarization. diarize must be set true to use max_speakers. If not present, defaults to inf.

preprocessing

string

Audio preprocessing mode. Currently supported:

none to skip audio preprocessing.
dynamic for arbitrary audio content with variable loudness.
soft_dynamic for speech intense recording such as podcasts and voice-overs.
bass_dynamic for boosting lower frequencies;

Response

text

string

required


# Download audio file
curl -L -o "audio.flac" "https://tinyurl.com/4997djsh"

# Make request
curl -X POST "https://audio-prod.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions" \
-H "Authorization: <FIREWORKS_API_KEY>" \
-F "file=@audio.flac"

Supported Languages

The following languages are supported for transcription:

Language Code & Name

Language Code	Language Name
en	English
zh	Chinese
de	German
es	Spanish
ru	Russian
ko	Korean
fr	French
ja	Japanese
pt	Portuguese
tr	Turkish
pl	Polish
ca	Catalan
nl	Dutch
ar	Arabic
sv	Swedish
it	Italian
id	Indonesian
hi	Hindi
fi	Finnish
vi	Vietnamese
he	Hebrew
uk	Ukrainian
el	Greek
ms	Malay
cs	Czech
ro	Romanian
da	Danish
hu	Hungarian
ta	Tamil
no	Norwegian
th	Thai
ur	Urdu
hr	Croatian
bg	Bulgarian
lt	Lithuanian
la	Latin
mi	Maori
ml	Malayalam
cy	Welsh
sk	Slovak
te	Telugu
fa	Persian
lv	Latvian
bn	Bengali
sr	Serbian
az	Azerbaijani
sl	Slovenian
kn	Kannada
et	Estonian
mk	Macedonian
br	Breton
eu	Basque
is	Icelandic
hy	Armenian
ne	Nepali
mn	Mongolian
bs	Bosnian
kk	Kazakh
sq	Albanian
sw	Swahili
gl	Galician
mr	Marathi
pa	Punjabi
si	Sinhala
km	Khmer
sn	Shona
yo	Yoruba
so	Somali
af	Afrikaans
oc	Occitan
ka	Georgian
be	Belarusian
tg	Tajik
sd	Sindhi
gu	Gujarati
am	Amharic
yi	Yiddish
lo	Lao
uz	Uzbek
fo	Faroese
ht	Haitian Creole
ps	Pashto
tk	Turkmen
nn	Nynorsk
mt	Maltese
sa	Sanskrit
lb	Luxembourgish
my	Myanmar
bo	Tibetan
tl	Tagalog
mg	Malagasy
as	Assamese
tt	Tatar
haw	Hawaiian
ln	Lingala
ha	Hausa
ba	Bashkir
jw	Javanese
su	Sundanese
yue	Cantonese

Streaming Transcription

Translate audio


# Download audio file
curl -L -o "audio.flac" "https://tinyurl.com/4997djsh"

# Make request
curl -X POST "https://audio-prod.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions" \
-H "Authorization: <FIREWORKS_API_KEY>" \
-F "file=@audio.flac"

LLM API

Response API

Embeddings API

Image API

Audio API

Audio batch API

Accounts

Deployments

Models

LoRAs

Supervised fine-tuning jobs

Reinforcement fine-tuning jobs

Datasets

Users

API Keys

Try notebook

Request

(multi-part form)

Response

Supported Languages

LLM API

Response API

Embeddings API

Image API

Audio API

Audio batch API

Accounts

Deployments

Models

LoRAs

Supervised fine-tuning jobs

Reinforcement fine-tuning jobs

Datasets

Users

API Keys

Try notebook

​Request

(multi-part form)

​Response

​Supported Languages

Request

Response

Supported Languages