Audio (Speech-to-Text)

AI Foundation Services provides Whisper-based audio models for transcription and translation, compatible with the OpenAI Audio API.

What you’ll learn:

How to transcribe audio to text in the original language
How to translate audio from any language to English
Available audio models and parameters

List Audio Models

Audio models have model_type: "STT" in their metadata. Use the models endpoint and filter:

Python
curl

from openai import OpenAI

client = OpenAI()

models = client.models.list()
for model in models.data:
    if model.meta_data.get("model_type") == "STT":
        print(model.id)

curl "$OPENAI_BASE_URL/models" \
  -H "Authorization: Bearer $OPENAI_API_KEY"
# Filter results for models with model_type "STT"
# Available audio models: whisper-large-v3, whisper-large-v3-turbo

Audio Transcription

The transcription API converts audio into text in the same language as the input. It auto-detects the language from the first 30 seconds if language is not specified.

Python
curl

from openai import OpenAI

client = OpenAI()

with open("/path/to/audio_file.mp3", "rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="whisper-large-v3",
        file=audio_file,
        # language="en"  # Optional: specify language
    )

print(f"Transcription: {transcription.text}")

curl -X POST "$OPENAI_BASE_URL/audio/transcriptions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F "model=whisper-large-v3" \
  -F "language=en" \
  -F "temperature=0.0" \
  -F "file=@/path/to/audio_file.mp3"

Example output:

Transcription: The stale smell of old beer lingers. It takes heat to bring out the odor.
A cold dip restores health and zest. A salt pickle tastes fine with ham.

Audio Translation

The translation API translates audio from any language into English.

Python
curl

from openai import OpenAI

client = OpenAI()

with open("/path/to/audio_file.mp3", "rb") as audio_file:
    translation = client.audio.translations.create(
        model="whisper-large-v3",
        file=audio_file,
        temperature=1.0,
    )

print(f"Translation: {translation.text}")

curl -X POST "$OPENAI_BASE_URL/audio/translations" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F "model=whisper-large-v3" \
  -F "temperature=1.0" \
  -F "file=@/path/to/audio_file.mp3"

Parameters

Parameter	Type	Description
`model`	string	Audio model ID (e.g., `whisper-large-v3`)
`file`	file	The audio file to process
`language`	string	Optional. ISO language code. Auto-detected if omitted.
`temperature`	float	`0.0` for deterministic, higher for more varied output

Key Features

Auto Language Detection — Identifies input language from the first 30 seconds
Customizable Output — Adjust behavior with language and temperature parameters
Efficient Processing — Low latency for both transcription and translation

Next Steps

Chat Completions — Process transcribed text with LLMs
Asynchronous Requests — Queue long audio files for async processing
API Endpoints — Full endpoint reference