API Endpoints
AI Foundation Services provides an OpenAI-compatible REST API. All endpoints use the base URL:
https://llm-server.llmhub.t-systems.net/v2For the complete OpenAPI specification, see the interactive API docs (Redoc).
Endpoints
Section titled “Endpoints”| Method | Endpoint | Description |
|---|---|---|
GET | /models | List all available models |
GET | /models/{model_id} | Get model details and metadata |
POST | /chat/completions | Create a chat completion |
POST | /completions | Create a text completion |
POST | /embeddings | Create embeddings |
POST | /audio/transcriptions | Transcribe audio to text |
POST | /audio/translations | Translate audio to English |
GET | /audio/models | List available audio models |
POST | /images/generations | Generate images from text |
POST | /responses | Create a response (Responses API) |
POST | /fine_tuning/jobs | Create a fine-tuning job |
GET | /fine_tuning/jobs | List fine-tuning jobs |
POST | /fine_tuning/jobs/{id}/cancel | Cancel a fine-tuning job |
GET | /fine_tuning/jobs/{id}/events | List fine-tuning events |
POST | /files | Upload a file |
GET | /files | List uploaded files |
DELETE | /files/{id} | Delete a file |
Queue Endpoints (Asynchronous)
Section titled “Queue Endpoints (Asynchronous)”All major endpoints have /queue variants for asynchronous processing. See the Asynchronous Requests guide for details.
| Method | Endpoint | Description |
|---|---|---|
POST | /queue/chat/completions | Async chat completion |
POST | /queue/completions | Async text completion |
POST | /queue/embeddings | Async embeddings |
POST | /queue/audio/transcriptions | Async audio transcription |
POST | /queue/audio/translations | Async audio translation |
POST | /queue/images/generations | Async image generation |
POST | /queue/images/edits | Async image editing |
GET | /queue/models | List models (queue) |
API Versioning
Section titled “API Versioning”The API uses versioned paths:
| Path Prefix | Purpose | Examples |
|---|---|---|
/v2 | Default — LLM inference, embeddings, audio, images | /v2/chat/completions, /v2/embeddings |
/v1 | Visual RAG, vector stores, file management | /v1/vector_stores, /v1/files |
/queue | Asynchronous processing (mirrors /v2 endpoints) | /queue/chat/completions |
- Use
/v2for all standard LLM API calls (this is the default when using OpenAI SDKs with our base URL) - Use
/v1for Visual RAG and file management operations - Use
/queuefor long-running or batch workloads
Authentication
Section titled “Authentication”All requests require an API key in the Authorization header:
Authorization: Bearer YOUR_API_KEYSee Authentication for setup details.
Request Format
Section titled “Request Format”All POST requests use JSON:
curl -X POST "https://llm-server.llmhub.t-systems.net/v2/chat/completions" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "Llama-3.3-70B-Instruct", "messages": [{"role": "user", "content": "Hello"}]}'Response Format
Section titled “Response Format”Responses follow the OpenAI response format. For chat completions:
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1710000000, "model": "Llama-3.3-70B-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19 }}