Chat Completions
The Chat Completions API is the primary way to interact with LLMs on AI Foundation Services. It’s fully compatible with the OpenAI Chat API.
What you’ll learn:
- How to send chat completion requests with system and user messages
- How to use streaming for real-time responses
- How to use the Completion and Responses APIs
- Key parameters for controlling output
Basic Usage
Section titled “Basic Usage”curl -X POST "$OPENAI_BASE_URL/chat/completions" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "Llama-3.3-70B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is T-Cloud?"} ], "temperature": 0.1, "max_tokens": 256 }'from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create( model="Llama-3.3-70B-Instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is T-Cloud?"}, ], temperature=0.1, max_tokens=256,)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI();
const response = await client.chat.completions.create({ model: "Llama-3.3-70B-Instruct", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "What is T-Cloud?" }, ], temperature: 0.1, max_tokens: 256,});
console.log(response.choices[0].message.content);Streaming
Section titled “Streaming”Enable streaming to receive tokens as they’re generated. Set stream: true in your request:
stream = client.chat.completions.create( model="Llama-3.3-70B-Instruct", messages=[{"role": "user", "content": "Write a short poem about AI."}], stream=True,)
for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)For full streaming documentation including error handling, function calling with streams, and stream options, see the Streaming guide.
Completion API
Section titled “Completion API”The Completion API sends raw text directly to the LLM without the chat message format.
from openai import OpenAI
client = OpenAI()
completion = client.completions.create( model="Llama-3.3-70B-Instruct", prompt="What is the Python programming language?", stream=False, temperature=0.2, max_tokens=128,)
print(completion.choices[0].text)Response API
Section titled “Response API”Using an up-to-date version of the OpenAI Python package, you can use the Responses API:
from openai import OpenAI
client = OpenAI()
response = client.responses.create( model="gpt-4.1", input="Write a one-sentence bedtime story about a unicorn.",)
print(response.output_text)You can also use both input and instructions fields:
response = client.responses.create( model="gpt-4.1", instructions="You are a professional copywriter. Focus on benefits rather than features.", input="Create a product description for NoiseGuard Pro Headphones.", temperature=0.7, max_output_tokens=200,)
print(response.output[0].content[0].text)Parameters
Section titled “Parameters”| Parameter | Type | Description |
|---|---|---|
model | string | Model ID (e.g., Llama-3.3-70B-Instruct) |
messages | array | List of message objects with role and content |
temperature | float | Sampling temperature (0-2). Lower = more deterministic |
max_tokens | integer | Maximum tokens to generate |
stream | boolean | Enable streaming responses |
top_p | float | Nucleus sampling parameter |
For the full API specification, see the API Reference.
Next Steps
Section titled “Next Steps”- Streaming — Stream responses token-by-token
- Function Calling — Connect models to external tools
- Multimodal — Analyze images alongside text
- Asynchronous Requests — Queue-based processing for batch workloads