Fireworks AI provides three ASR (Automatic Speech Recognition) features: Streaming Transcription, Pre-recorded Transcription, and Pre-recorded Translation. This guide shows you how to get started with each feature.

Streaming Transcription

Convert audio to text in real-time using WebSocket connections. Perfect for voice agents and live applications.

Quick Start

Use our optimized streaming model fireworks-asr-large for the best real-time performance.

For a working example of streaming transcription see the following resources:

  1. Python notebook
  2. Python cookbook

For more detailed information, see the full streaming API documentation and the source code

Pre-recorded Transcription

Convert audio files to text. Supports files up to 1GB in formats like MP3, FLAC, and WAV. Transcribe multiple hours of audio in minutes.

Quick Start

For a working example of pre-recorded transcription see the Python notebook

Available Models:

  • whisper-v3: Highest accuracy
    • model=whisper-v3
    • base_url=https://audio-prod.us-virginia-1.direct.fireworks.ai
  • whisper-v3-turbo: Faster processing
    • model=whisper-v3-turbo
    • base_url=https://audio-turbo.us-virginia-1.direct.fireworks.ai

For more detailed information, see the full transcription API documentation

Pre-recorded Translation

Translate audio from any of our supported languages to English. Supports files up to 1GB in formats like MP3, FLAC, and WAV.

Quick Start

!pip install fireworks-ai requests

from fireworks.client.audio import AudioInference
import requests
import time
from dotenv import load_dotenv
import os

load_dotenv()

# Prepare client
audio = requests.get("https://tinyurl.com/3cy7x44v").content
client = AudioInference(
    model="whisper-v3",
    base_url="https://audio-prod.us-virginia-1.direct.fireworks.ai",
    #
    # Or for the turbo version
    # model="whisper-v3-turbo",
    # base_url="https://audio-turbo.us-virginia-1.direct.fireworks.ai",
    api_key=os.getenv("FIREWORKS_API_KEY")
)

# Make request
start = time.time()
r = await client.translate_async(audio=audio)
print(f"Took: {(time.time() - start):.3f}s. Text: '{r.text}'")

For more detailed information, see the full translation API documentation

Supported Languages

We support 95+ languages including English, Spanish, French, German, Chinese, Japanese, Russian, Portuguese, and many more. See the complete language list.

Common Use Cases

  • Call Center / Customer Service: Transcribe or translate customer calls
  • Note Taking: Transcribe audio for automated note taking
  • Voice Agents: Use streaming transcription to create voice assistants, for a seamless voice-agent experience checkout our Voice Agent Platform

Next Steps

  1. Explore advanced features like speaker diarization and custom prompts
  2. Checkout our voice agent platform Voice Agent Platform
  3. Contact us at inquiries@fireworks.ai for dedicated endpoints and enterprise features