LlamaIndex Integration
Use AI Foundation Services with LlamaIndex for building RAG applications, indexing documents, and building chat engines.
pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-openaiInitialize LLM
Section titled “Initialize LLM”import os
from llama_index.llms.azure_openai import AzureOpenAI
llm = AzureOpenAI( deployment_name="gpt-4o", api_key=os.getenv("OPENAI_API_KEY"), azure_endpoint=os.getenv("OPENAI_BASE_URL"), api_version="2023-07-01-preview",)
# Testresponse_iter = llm.stream_complete("Tell me a joke.")for response in response_iter: print(response.delta, end="", flush=True)Initialize Embeddings
Section titled “Initialize Embeddings”import osfrom llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding( model_name="jina-embeddings-v2-base-de", api_key=os.getenv("OPENAI_API_KEY"), api_base=os.getenv("OPENAI_BASE_URL"),)
# Testquery_embedding = embed_model.get_query_embedding("Hello world")print(f"Embedding dimension: {len(query_embedding)}")Simple RAG Example
Section titled “Simple RAG Example”1. Prepare Documents
Section titled “1. Prepare Documents”mkdir example_data# Place your PDF documents in the example_data directorycp /path/to/your-documents.pdf example_data/2. Index Documents
Section titled “2. Index Documents”from llama_index.core import VectorStoreIndex, SimpleDirectoryReaderfrom llama_index.core.node_parser import SentenceSplitter
documents = SimpleDirectoryReader( input_dir="./example_data", filename_as_id=True).load_data()
index = VectorStoreIndex.from_documents( documents=documents, transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=20)], embed_model=embed_model,)3. Create Chat Engine
Section titled “3. Create Chat Engine”from llama_index.core.postprocessor import LongContextReorderfrom llama_index.core.memory import ChatMemoryBuffer
CONTEXT_PROMPT = """\You are a helpful AI assistant. Answer based on the context provided.If the context doesn't help, say: I can't find that in the given context.
Context:{context_str}
Answer in the same language as the question."""
chat_engine = index.as_chat_engine( llm=llm, streaming=True, chat_mode="context", context_template=CONTEXT_PROMPT, node_postprocessors=[LongContextReorder()], memory=ChatMemoryBuffer.from_defaults(token_limit=6000), similarity_top_k=10,)4. Ask Questions
Section titled “4. Ask Questions”response = chat_engine.stream_chat("How much revenue did Alphabet generate?")for token in response.response_gen: print(token, end="")Example output:
According to the context, Alphabet generated $69,787 million in revenuein the quarter ended March 31, 2023.Next Steps
Section titled “Next Steps”- Embeddings Guide — Learn more about embedding models
- LangChain Integration — Alternative RAG framework