POST
/
audio
/
translations

Authorizations

Authorization
string
headerrequired

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
file
file
required

The input audio file to transcribe. Common file formats such as mp3, flac, and wav are supported. Note that the audio will be resampled to 16kHz, downmixed to mono, and reformatted to 16-bit signed little-endian format before transcription. Pre-converting the file before sending it to the API can improve runtime performance.

model
string
default: whisper-v3

String name of the ASR model to use. Currently whisper-v3 is supported.

vad_model
string
default: silero

String name of the voice activity detection (VAD) model to use. Can be one of silero, or whisperx-pyannet.

alignment_model
string
default: tdnn_ffn

String name of the alignment model to use. Can be one of tdnn_ffn, mms_fa, or gentle.

prompt
string | null

The input prompt with which to prime transcription. This can be used, for example, to continue a prior transcription given new audio data.

temperature
number
default: 0

Sampling temperature to use when decoding text tokens during transcription.

response_format
string
default: json

The format in which to return the response. Can be one of json, text, srt, verbose_json, or vtt.

preprocessing
string
default: none

Audio preprocessing mode. Currently supported:

  • none to skip audio preprocessing.
  • dynamic for arbitrary audio content with variable loudness.
  • soft_dynamic for speech intense recording such as podcasts and voice-overs.
  • bass_dynamic for boosting lower frequencies;

Response

200 - application/json
text
string
required