1

Open a WebSocket

Streaming transcription is performed over a WebSocket. Provide the transcription parameters and establish a WebSocket connection to the endpoint.

2

Stream audio and receive transcriptions

Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). In parallel, receive transcription from the WebSocket.

URL

Please use the following serverless endpoint:

wss://audio-streaming.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions/streaming

Headers

Authorization
string
required

Your Fireworks API key, e.g. Authorization=API_KEY.

Query Parameters

response_format
string
default:
"verbose_json"

The format in which to return the response. Currently only verbose_json is recommended for streaming.

language
string | null

The target language for transcription. The set of supported target languages can be found here.

prompt
string | null

The input prompt that the model will use when generating the transcription. Can be used to specify custom words or specify the style of the transcription. E.g. Um, here's, uh, what was recorded. will make the model to include the filler words into the transcription.

temperature
float
default:
"0"

Sampling temperature to use when decoding text tokens during transcription.

Streaming Audio

Stream short audio chunks (50-400ms) in binary frames of PCM 16-bit little-endian at 16kHz sample rate and single channel (mono). Typically, you will:

  1. Resample your audio to 16 kHz if it is not already.
  2. Convert it to mono.
  3. Send 50ms chunks (16,000 Hz * 0.05s = 800 samples) of audio in 16-bit PCM (signed, little-endian) format.

Handling Responses

The client maintains a state dictionary, starting with an empty dictionary {}. When the server sends the first transcription message, it contains a list of segments. Each segment has an id and text:

# Server initial message:
{
    "segments": [
        {"id": "0", "text": "This is the first sentence"},
        {"id": "1", "text": "This is the second sentence"}
    ]
}

# Client initial state:
{
    "0": "This is the first sentence",
    "1": "This is the second sentence",
}

When the server sends the next updates to the transcription, the client updates the state dictionary based on the segment id:

# Server continuous message:
{
    "segments": [
        {"id": "1", "text": "This is the second sentence modified"},
        {"id": "2", "text": "This is the third sentence"}
    ]
}

# Client updated state:
{
    "0": "This is the first sentence",
    "1": "This is the second sentence modified",   # overwritten
    "2": "This is the third sentence",             # new
}

Example Usage

Check out a brief Python example below or example sources:

!pip3 install requests torch torchaudio websocket-client

import io
import time
import json
import torch
import requests
import torchaudio
import threading
import websocket
import urllib.parse

lock = threading.Lock()
state = {}

def on_open(ws):
    def send_audio_chunks():
        for chunk in audio_chunk_bytes:
            ws.send(chunk, opcode=websocket.ABNF.OPCODE_BINARY)
            time.sleep(chunk_size_ms / 1000)

        final_checkpoint = json.dumps({"checkpoint_id": "final"})
        ws.send(final_checkpoint, opcode=websocket.ABNF.OPCODE_TEXT)

    threading.Thread(target=send_audio_chunks).start()

def on_message(ws, message):
    message = json.loads(message)
    if message.get("checkpoint_id") == "final":
        ws.close()
        return

    update = {s["id"]: s["text"] for s in message["segments"]}
    with lock:
        state.update(update)
        print("\n".join(f" - {k}: {v}" for k, v in state.items()))

def on_error(ws, error):
    print(f"WebSocket error: {error}")

# Open a connection URL with query params
url = "ws://audio-streaming.us-virginia-1.direct.fireworks.ai/v1/audio/transcriptions/streaming"
params = urllib.parse.urlencode({
    "language": "en",
})
ws = websocket.WebSocketApp(
    f"{url}?{params}",
    header={"Authorization": "<FIREWORKS_API_KEY>"},
    on_open=on_open,
    on_message=on_message,
    on_error=on_error,
)
ws.run_forever()

Dedicated endpoint

For fixed throughput and predictable SLAs, you may request a dedicated endpoints for streaming transcription at inquiries@fireworks.ai or discord.