Querying transcription models

Fireworks AI provides three ASR (Automatic Speech Recognition) features: Streaming Transcription, Pre-recorded Transcription, and Pre-recorded Translation. This guide shows you how to get started with each feature.

Streaming Transcription

Convert audio to text in real-time using WebSocket connections. Perfect for voice agents and live applications.

Quick Start

Use our optimized streaming model fireworks-asr-large for the best real-time performance. For a working example of streaming transcription see the following resources:

For more detailed information, see the full streaming API documentation and the source code

Pre-recorded Transcription

Convert audio files to text. Supports files up to 1GB in formats like MP3, FLAC, and WAV. Transcribe multiple hours of audio in minutes.

Quick Start

For a working example of pre-recorded transcription see the Python notebook Available Models:

whisper-v3: Highest accuracy
- model=whisper-v3
- base_url=https://audio-prod.us-virginia-1.direct.fireworks.ai
whisper-v3-turbo: Faster processing
- model=whisper-v3-turbo
- base_url=https://audio-turbo.us-virginia-1.direct.fireworks.ai

For more detailed information, see the full transcription API documentation

Pre-recorded Translation

Translate audio from any of our supported languages to English. Supports files up to 1GB in formats like MP3, FLAC, and WAV.

Quick Start

!pip install fireworks-ai requests

from fireworks.client.audio import AudioInference
import requests
import time
from dotenv import load_dotenv
import os

load_dotenv()

# Prepare client
audio = requests.get("https://tinyurl.com/3cy7x44v").content
client = AudioInference(
    model="whisper-v3",
    base_url="https://audio-prod.us-virginia-1.direct.fireworks.ai",
    #
    # Or for the turbo version
    # model="whisper-v3-turbo",
    # base_url="https://audio-turbo.us-virginia-1.direct.fireworks.ai",
    api_key=os.getenv("FIREWORKS_API_KEY")
)

# Make request
start = time.time()
r = await client.translate_async(audio=audio)
print(f"Took: {(time.time() - start):.3f}s. Text: '{r.text}'")

For more detailed information, see the full translation API documentation

Supported Languages

We support 95+ languages including English, Spanish, French, German, Chinese, Japanese, Russian, Portuguese, and many more. See the complete language list.

Common Use Cases

Call Center / Customer Service: Transcribe or translate customer calls
Note Taking: Transcribe audio for automated note taking
Voice Agents: Use streaming transcription to create voice assistants, for a seamless voice-agent experience checkout our Voice Agent Platform

Next Steps

Explore advanced features like speaker diarization and custom prompts
Checkout our voice agent platform Voice Agent Platform
Contact us at inquiries@fireworks.ai for dedicated endpoints and enterprise features

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

Streaming Transcription

Quick Start

Pre-recorded Transcription

Quick Start

Pre-recorded Translation

Quick Start

Supported Languages

Common Use Cases

Next Steps

Get Started

Querying models

Dedicated Deployments

Fine-tuning

Integrations

Policies

Administration

​Streaming Transcription

​Quick Start

​Pre-recorded Transcription

​Quick Start

​Pre-recorded Translation

​Quick Start

​Supported Languages

​Common Use Cases

​Next Steps

Streaming Transcription

Quick Start

Pre-recorded Transcription

Quick Start

Pre-recorded Translation

Quick Start

Supported Languages

Common Use Cases

Next Steps