Asynchronous Requests (Queue API)

Submit long-running API requests to a queue for asynchronous processing. The Queue API mirrors the standard endpoints but processes requests in the background, making it ideal for batch workloads and large inference jobs.

What you’ll learn:

How to submit requests to the Queue API
When to use async vs. synchronous endpoints
How to poll for and retrieve results

When to Use the Queue API

Use the /queue endpoints when:

Long-running inference — Large models or complex prompts that may exceed synchronous timeout limits
Batch processing — Submitting many requests without waiting for each to complete
Background jobs — Fire-and-forget workloads where you process results later
Rate limit management — Spreading load across time instead of bursting

For interactive or real-time use cases (chatbots, streaming), use the standard /v2 endpoints instead.

Endpoint Mapping

Every standard endpoint has a /queue equivalent:

Standard Endpoint	Queue Endpoint	Description
`POST /v2/chat/completions`	`POST /queue/chat/completions`	Chat completion
`POST /v2/completions`	`POST /queue/completions`	Text completion
`POST /v2/embeddings`	`POST /queue/embeddings`	Embeddings
`POST /v2/audio/transcriptions`	`POST /queue/audio/transcriptions`	Audio transcription
`POST /v2/audio/translations`	`POST /queue/audio/translations`	Audio translation
`POST /v2/images/generations`	`POST /queue/images/generations`	Image generation
`POST /v2/images/edits`	`POST /queue/images/edits`	Image editing
`GET /v2/models`	`GET /queue/models`	List models

The request body for each queue endpoint is identical to its standard counterpart — just change the base path from /v2 to /queue.

Basic Usage

Submit a chat completion to the queue:

curl -X POST "https://llm-server.llmhub.t-systems.net/queue/chat/completions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Llama-3.3-70B-Instruct",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."}
    ],
    "max_tokens": 2000
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

response = client.chat.completions.create(
    model="Llama-3.3-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a detailed analysis of renewable energy trends in Europe."},
    ],
    max_tokens=2000,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://llm-server.llmhub.t-systems.net/queue",
});

const response = await client.chat.completions.create({
  model: "Llama-3.3-70B-Instruct",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Write a detailed analysis of renewable energy trends in Europe." },
  ],
  max_tokens: 2000,
});

console.log(response.choices[0].message.content);

Batch Processing Example

Process multiple prompts efficiently using the queue:

Python
Node.js

import asyncio

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

prompts = [
    "Summarize the benefits of solar energy.",
    "Explain how wind turbines generate electricity.",
    "Describe the future of hydrogen fuel cells.",
]

async def process_prompt(prompt):
    response = await client.chat.completions.create(
        model="Llama-3.3-70B-Instruct",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=500,
    )
    return response.choices[0].message.content

async def main():
    tasks = [process_prompt(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for prompt, result in zip(prompts, results):
        print(f"Prompt: {prompt[:50]}...")
        print(f"Result: {result[:100]}...\n")

asyncio.run(main())

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://llm-server.llmhub.t-systems.net/queue",
});

const prompts = [
  "Summarize the benefits of solar energy.",
  "Explain how wind turbines generate electricity.",
  "Describe the future of hydrogen fuel cells.",
];

const results = await Promise.all(
  prompts.map((prompt) =>
    client.chat.completions.create({
      model: "Llama-3.3-70B-Instruct",
      messages: [{ role: "user", content: prompt }],
      max_tokens: 500,
    })
  )
);

results.forEach((result, i) => {
  console.log(`Prompt: ${prompts[i]}`);
  console.log(`Result: ${result.choices[0].message.content}\n`);
});

Queue Endpoints for Other APIs

Embeddings

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

response = client.embeddings.create(
    model="text-embedding-bge-m3",
    input="The benefits of renewable energy in Europe",
)

print(f"Embedding dimension: {len(response.data[0].embedding)}")

Audio Transcription

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

with open("meeting_recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-large-v3",
        file=audio_file,
    )

print(transcript.text)

Image Generation

from openai import OpenAI

client = OpenAI(
    base_url="https://llm-server.llmhub.t-systems.net/queue",
)

result = client.images.generate(
    model="gpt-image-1",
    prompt="A futuristic data center powered by renewable energy",
)

print(result.data[0].b64_json[:50] + "...")

Next Steps

Chat Completions — Standard synchronous chat API
Streaming — Real-time token-by-token responses
Rate Limits — Understand TPM/RPM limits and headers
API Endpoints — Full endpoint reference