Asynchronous Requests (Queue API)
Submit long-running API requests to a queue for asynchronous processing. The Queue API mirrors the standard endpoints but processes requests in the background, making it ideal for batch workloads and large inference jobs.
What you’ll learn:
- How to submit requests to the Queue API
- When to use async vs. synchronous endpoints
- How to poll for and retrieve results
When to Use the Queue API
Section titled “When to Use the Queue API”Use the /queue endpoints when:
- Long-running inference — Large models or complex prompts that may exceed synchronous timeout limits
- Batch processing — Submitting many requests without waiting for each to complete
- Background jobs — Fire-and-forget workloads where you process results later
- Rate limit management — Spreading load across time instead of bursting
For interactive or real-time use cases (chatbots, streaming), use the standard /v2 endpoints instead.
Endpoint Mapping
Section titled “Endpoint Mapping”Every standard endpoint has a /queue equivalent:
| Standard Endpoint | Queue Endpoint | Description |
|---|---|---|
POST /v2/chat/completions | POST /queue/chat/completions | Chat completion |
POST /v2/completions | POST /queue/completions | Text completion |
POST /v2/embeddings | POST /queue/embeddings | Embeddings |
POST /v2/audio/transcriptions | POST /queue/audio/transcriptions | Audio transcription |
POST /v2/audio/translations | POST /queue/audio/translations | Audio translation |
POST /v2/images/generations | POST /queue/images/generations | Image generation |
POST /v2/images/edits | POST /queue/images/edits | Image editing |
GET /v2/models | GET /queue/models | List models |
The request body for each queue endpoint is identical to its standard counterpart — just change the base path from /v2 to /queue.
Basic Usage
Section titled “Basic Usage”Submit a chat completion to the queue:
curl -X POST "https://llm-server.llmhub.t-systems.net/queue/chat/completions" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "Llama-3.3-70B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."} ], "max_tokens": 2000 }'from openai import OpenAI
client = OpenAI( base_url="https://llm-server.llmhub.t-systems.net/queue",)
response = client.chat.completions.create( model="Llama-3.3-70B-Instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."}, ], max_tokens=2000,)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://llm-server.llmhub.t-systems.net/queue",});
const response = await client.chat.completions.create({ model: "Llama-3.3-70B-Instruct", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Write a detailed analysis of renewable energy trends in Europe." }, ], max_tokens: 2000,});
console.log(response.choices[0].message.content);Batch Processing Example
Section titled “Batch Processing Example”Process multiple prompts efficiently using the queue:
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI( base_url="https://llm-server.llmhub.t-systems.net/queue",)
prompts = [ "Summarize the benefits of solar energy.", "Explain how wind turbines generate electricity.", "Describe the future of hydrogen fuel cells.",]
async def process_prompt(prompt): response = await client.chat.completions.create( model="Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": prompt}], max_tokens=500, ) return response.choices[0].message.content
async def main(): tasks = [process_prompt(p) for p in prompts] results = await asyncio.gather(*tasks) for prompt, result in zip(prompts, results): print(f"Prompt: {prompt[:50]}...") print(f"Result: {result[:100]}...\n")
asyncio.run(main())import OpenAI from "openai";
const client = new OpenAI({ baseURL: "https://llm-server.llmhub.t-systems.net/queue",});
const prompts = [ "Summarize the benefits of solar energy.", "Explain how wind turbines generate electricity.", "Describe the future of hydrogen fuel cells.",];
const results = await Promise.all( prompts.map((prompt) => client.chat.completions.create({ model: "Llama-3.3-70B-Instruct", messages: [{ role: "user", content: prompt }], max_tokens: 500, }) ));
results.forEach((result, i) => { console.log(`Prompt: ${prompts[i]}`); console.log(`Result: ${result.choices[0].message.content}\n`);});Queue Endpoints for Other APIs
Section titled “Queue Endpoints for Other APIs”Embeddings
Section titled “Embeddings”from openai import OpenAI
client = OpenAI( base_url="https://llm-server.llmhub.t-systems.net/queue",)
response = client.embeddings.create( model="text-embedding-bge-m3", input="The benefits of renewable energy in Europe",)
print(f"Embedding dimension: {len(response.data[0].embedding)}")Audio Transcription
Section titled “Audio Transcription”from openai import OpenAI
client = OpenAI( base_url="https://llm-server.llmhub.t-systems.net/queue",)
with open("meeting_recording.mp3", "rb") as audio_file: transcript = client.audio.transcriptions.create( model="whisper-large-v3", file=audio_file, )
print(transcript.text)Image Generation
Section titled “Image Generation”from openai import OpenAI
client = OpenAI( base_url="https://llm-server.llmhub.t-systems.net/queue",)
result = client.images.generate( model="gpt-image-1", prompt="A futuristic data center powered by renewable energy",)
print(result.data[0].b64_json[:50] + "...")Next Steps
Section titled “Next Steps”- Chat Completions — Standard synchronous chat API
- Streaming — Real-time token-by-token responses
- Rate Limits — Understand TPM/RPM limits and headers
- API Endpoints — Full endpoint reference