LlamaIndex Integration

Use AI Foundation Services with LlamaIndex for building RAG applications, indexing documents, and building chat engines.

Setup

pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-openai

Initialize LLM

import os

from llama_index.llms.azure_openai import AzureOpenAI

llm = AzureOpenAI(
    deployment_name="gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY"),
    azure_endpoint=os.getenv("OPENAI_BASE_URL"),
    api_version="2023-07-01-preview",
)

# Test
response_iter = llm.stream_complete("Tell me a joke.")
for response in response_iter:
    print(response.delta, end="", flush=True)

Initialize Embeddings

import os
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model_name="jina-embeddings-v2-base-de",
    api_key=os.getenv("OPENAI_API_KEY"),
    api_base=os.getenv("OPENAI_BASE_URL"),
)

# Test
query_embedding = embed_model.get_query_embedding("Hello world")
print(f"Embedding dimension: {len(query_embedding)}")

Simple RAG Example

1. Prepare Documents

mkdir example_data
# Place your PDF documents in the example_data directory
cp /path/to/your-documents.pdf example_data/

2. Index Documents

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

documents = SimpleDirectoryReader(
    input_dir="./example_data", filename_as_id=True
).load_data()

index = VectorStoreIndex.from_documents(
    documents=documents,
    transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=20)],
    embed_model=embed_model,
)

3. Create Chat Engine

from llama_index.core.postprocessor import LongContextReorder
from llama_index.core.memory import ChatMemoryBuffer

CONTEXT_PROMPT = """\
You are a helpful AI assistant. Answer based on the context provided.
If the context doesn't help, say: I can't find that in the given context.

Context:
{context_str}

Answer in the same language as the question.
"""

chat_engine = index.as_chat_engine(
    llm=llm,
    streaming=True,
    chat_mode="context",
    context_template=CONTEXT_PROMPT,
    node_postprocessors=[LongContextReorder()],
    memory=ChatMemoryBuffer.from_defaults(token_limit=6000),
    similarity_top_k=10,
)

4. Ask Questions

response = chat_engine.stream_chat("How much revenue did Alphabet generate?")
for token in response.response_gen:
    print(token, end="")

Example output:

According to the context, Alphabet generated $69,787 million in revenue
in the quarter ended March 31, 2023.

Next Steps

Embeddings Guide — Learn more about embedding models
LangChain Integration — Alternative RAG framework