Skip to content

Chat Completions

The Chat Completions API is the primary way to interact with LLMs on AI Foundation Services. It’s fully compatible with the OpenAI Chat API.

What you’ll learn:

  • How to send chat completion requests with system and user messages
  • How to use streaming for real-time responses
  • How to use the Completion and Responses APIs
  • Key parameters for controlling output
Terminal window
curl -X POST "$OPENAI_BASE_URL/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Llama-3.3-70B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is T-Cloud?"}
],
"temperature": 0.1,
"max_tokens": 256
}'

Enable streaming to receive tokens as they’re generated. Set stream: true in your request:

stream = client.chat.completions.create(
model="Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": "Write a short poem about AI."}],
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)

For full streaming documentation including error handling, function calling with streams, and stream options, see the Streaming guide.

The Completion API sends raw text directly to the LLM without the chat message format.

from openai import OpenAI
client = OpenAI()
completion = client.completions.create(
model="Llama-3.3-70B-Instruct",
prompt="What is the Python programming language?",
stream=False,
temperature=0.2,
max_tokens=128,
)
print(completion.choices[0].text)

Using an up-to-date version of the OpenAI Python package, you can use the Responses API:

from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-4.1",
input="Write a one-sentence bedtime story about a unicorn.",
)
print(response.output_text)

You can also use both input and instructions fields:

response = client.responses.create(
model="gpt-4.1",
instructions="You are a professional copywriter. Focus on benefits rather than features.",
input="Create a product description for NoiseGuard Pro Headphones.",
temperature=0.7,
max_output_tokens=200,
)
print(response.output[0].content[0].text)
ParameterTypeDescription
modelstringModel ID (e.g., Llama-3.3-70B-Instruct)
messagesarrayList of message objects with role and content
temperaturefloatSampling temperature (0-2). Lower = more deterministic
max_tokensintegerMaximum tokens to generate
streambooleanEnable streaming responses
top_pfloatNucleus sampling parameter

For the full API specification, see the API Reference.