Audio (Speech-to-Text)
AI Foundation Services provides Whisper-based audio models for transcription and translation, compatible with the OpenAI Audio API.
What you’ll learn:
- How to transcribe audio to text in the original language
- How to translate audio from any language to English
- Available audio models and parameters
List Audio Models
Section titled “List Audio Models”Audio models have model_type: "STT" in their metadata. Use the models endpoint and filter:
from openai import OpenAI
client = OpenAI()
models = client.models.list()for model in models.data: if model.meta_data.get("model_type") == "STT": print(model.id)curl "$OPENAI_BASE_URL/models" \ -H "Authorization: Bearer $OPENAI_API_KEY"# Filter results for models with model_type "STT"# Available audio models: whisper-large-v3, whisper-large-v3-turboAudio Transcription
Section titled “Audio Transcription”The transcription API converts audio into text in the same language as the input. It auto-detects the language from the first 30 seconds if language is not specified.
from openai import OpenAI
client = OpenAI()
with open("/path/to/audio_file.mp3", "rb") as audio_file: transcription = client.audio.transcriptions.create( model="whisper-large-v3", file=audio_file, # language="en" # Optional: specify language )
print(f"Transcription: {transcription.text}")curl -X POST "$OPENAI_BASE_URL/audio/transcriptions" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -F "model=whisper-large-v3" \ -F "language=en" \ -F "temperature=0.0" \ -F "file=@/path/to/audio_file.mp3"Example output:
Transcription: The stale smell of old beer lingers. It takes heat to bring out the odor.A cold dip restores health and zest. A salt pickle tastes fine with ham.Audio Translation
Section titled “Audio Translation”The translation API translates audio from any language into English.
from openai import OpenAI
client = OpenAI()
with open("/path/to/audio_file.mp3", "rb") as audio_file: translation = client.audio.translations.create( model="whisper-large-v3", file=audio_file, temperature=1.0, )
print(f"Translation: {translation.text}")curl -X POST "$OPENAI_BASE_URL/audio/translations" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -F "model=whisper-large-v3" \ -F "temperature=1.0" \ -F "file=@/path/to/audio_file.mp3"Parameters
Section titled “Parameters”| Parameter | Type | Description |
|---|---|---|
model | string | Audio model ID (e.g., whisper-large-v3) |
file | file | The audio file to process |
language | string | Optional. ISO language code. Auto-detected if omitted. |
temperature | float | 0.0 for deterministic, higher for more varied output |
Key Features
Section titled “Key Features”- Auto Language Detection — Identifies input language from the first 30 seconds
- Customizable Output — Adjust behavior with
languageandtemperatureparameters - Efficient Processing — Low latency for both transcription and translation
Next Steps
Section titled “Next Steps”- Chat Completions — Process transcribed text with LLMs
- Asynchronous Requests — Queue long audio files for async processing
- API Endpoints — Full endpoint reference