Audio API
Align transcription
POST
/
audio
/
alignments
The default api.fireworks.ai endpoint is for evaluation use only. To unlock the best performance, get a dedicated endpoint by contacting us at inquiries@fireworks.ai.
Request
(multi-part form)
file
file | string
requiredThe input audio file to align with text. Common file formats such as mp3, flac, and wav are supported. Note that the audio will be resampled to 16kHz, downmixed to mono, and reformatted to 16-bit signed little-endian format before transcription. Pre-converting the file before sending it to the API can improve runtime performance
text
string
requiredThe text to align with the audio.
vad_model
string
default: "silero"String name of the voice activity detection (VAD) model to use. Can be one of silero
, or whisperx-pyannet
.
alignment_model
string
default: "tdnn_ffn"String name of the alignment model to use. Can be one of tdnn_ffn
, mms_fa
, or gentle
.
response_format
string
default: "json"The format in which to return the response. Can be one of srt
, verbose_json
, or vtt
.
preprocessing
string
Audio preprocessing mode. Currently supported:
none
to skip audio preprocessing.dynamic
for arbitrary audio content with variable loudness.soft_dynamic
for speech intense recording such as podcasts and voice-overs.bass_dynamic
for boosting lower frequencies;
Response
text
string
requiredWas this page helpful?