# LlamaCloud: Managed RAG Services

LlamaCloud provides managed services for document processing and RAG, including LlamaParse for document parsing and managed indexes.

## Learning Objectives

By the end of this notebook, you will:
1. Understand LlamaCloud services and pricing
2. Use LlamaParse for document processing
3. Work with managed indexes
4. Integrate LlamaCloud with local pipelines

---

## LlamaCloud Overview

LlamaCloud offers several services:

| Service | Description | Use Case |
|---------|-------------|----------|
| **LlamaParse** | Advanced document parsing | Complex PDFs, tables, images |
| **LlamaExtract** | Structured data extraction | Form processing |
| **Managed Index** | Cloud-hosted vector index | Production RAG |

### Getting Started

1. Sign up at [cloud.llamaindex.ai](https://cloud.llamaindex.ai)
2. Get your API key from the dashboard
3. Set `LLAMA_CLOUD_API_KEY` in your `.env` file

In [None]:
# Setup
import nest_asyncio
nest_asyncio.apply()

import os
from dotenv import load_dotenv
load_dotenv()

from llama_index.core import Settings, VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure
Settings.llm = OpenAI(model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Check LlamaCloud API key
llama_cloud_key = os.getenv("LLAMA_CLOUD_API_KEY")
if llama_cloud_key:
    print("✓ LlamaCloud API key configured")
else:
    print("✗ LlamaCloud API key not found")
    print("  Some features in this notebook require a LlamaCloud account.")
    print("  Sign up at: https://cloud.llamaindex.ai")

## 1. LlamaParse: Advanced Document Parsing

LlamaParse excels at parsing complex documents that standard tools struggle with:
- Multi-column layouts
- Tables with merged cells
- Images and figures
- Mathematical formulas

In [None]:
# LlamaParse requires: pip install llama-parse
try:
    from llama_parse import LlamaParse
    LLAMAPARSE_AVAILABLE = True
    print("✓ LlamaParse is available")
except ImportError:
    LLAMAPARSE_AVAILABLE = False
    print("✗ LlamaParse not installed. Run: pip install llama-parse")

In [None]:
if LLAMAPARSE_AVAILABLE and llama_cloud_key:
    # Initialize LlamaParse
    parser = LlamaParse(
        api_key=llama_cloud_key,
        result_type="markdown",  # or "text"
        verbose=True,
        language="en",
    )
    print("✓ LlamaParse initialized!")
else:
    print("LlamaParse demo requires API key. Showing example code instead.")

In [None]:
# Example: Parse a PDF document
# (Replace with your actual PDF file path)

EXAMPLE_PARSE_CODE = '''
# Parse a single document
documents = parser.load_data("./sample.pdf")

# The parsed content is in markdown format
for doc in documents:
    print(f"Parsed content preview:")
    print(doc.text[:500])
    print(f"\\nMetadata: {doc.metadata}")
'''

print("Example LlamaParse usage:")
print(EXAMPLE_PARSE_CODE)

### LlamaParse Configuration Options

In [None]:
# Advanced LlamaParse configuration
ADVANCED_CONFIG = '''
parser = LlamaParse(
    api_key=llama_cloud_key,
    
    # Output format
    result_type="markdown",  # "text" or "markdown"
    
    # Language settings
    language="en",
    
    # Parsing options
    skip_diagonal_text=False,
    invalidate_cache=False,
    do_not_cache=False,
    
    # Custom instructions for parsing
    parsing_instruction="Extract all tables as markdown. Preserve mathematical formulas.",
    
    # Premium features (requires higher tier)
    use_vendor_multimodal_model=False,
    vendor_multimodal_model_name=None,
)
'''

print("Advanced LlamaParse configuration:")
print(ADVANCED_CONFIG)

## 2. Using LlamaParse with RAG

In [None]:
# Complete pipeline with LlamaParse
LLAMAPARSE_RAG_CODE = '''
from llama_parse import LlamaParse
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import MarkdownElementNodeParser

# Initialize parser
parser = LlamaParse(
    api_key=os.getenv("LLAMA_CLOUD_API_KEY"),
    result_type="markdown",
)

# Use as file extractor in SimpleDirectoryReader
file_extractor = {
    ".pdf": parser,
}

# Load documents
documents = SimpleDirectoryReader(
    input_dir="./documents",
    file_extractor=file_extractor,
).load_data()

# Use markdown-aware node parser for better chunking
node_parser = MarkdownElementNodeParser(
    llm=Settings.llm,
    num_workers=4,
)

# Get nodes
nodes = node_parser.get_nodes_from_documents(documents)

# Build index
index = VectorStoreIndex(nodes=nodes)

# Query
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")
print(response)
'''

print("Complete LlamaParse + RAG pipeline:")
print(LLAMAPARSE_RAG_CODE)

## 3. LlamaCloud Managed Index

LlamaCloud offers managed vector indexes that handle:
- Document ingestion
- Embedding generation
- Vector storage and retrieval
- Automatic updates

In [None]:
# Managed Index example code
MANAGED_INDEX_CODE = '''
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

# Create a new managed index
index = LlamaCloudIndex.from_documents(
    documents,
    name="my-production-index",
    project_name="my-project",
    api_key=os.getenv("LLAMA_CLOUD_API_KEY"),
)

# Or connect to existing index
index = LlamaCloudIndex(
    name="my-production-index",
    project_name="my-project",
    api_key=os.getenv("LLAMA_CLOUD_API_KEY"),
)

# Query the managed index
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response)

# Add more documents to existing index
index.insert_documents(new_documents)
'''

print("LlamaCloud Managed Index usage:")
print(MANAGED_INDEX_CODE)

## 4. LlamaExtract: Structured Data Extraction

Extract structured data from documents using schemas:

In [None]:
# LlamaExtract example
LLAMA_EXTRACT_CODE = '''
from llama_cloud import LlamaExtract
from pydantic import BaseModel
from typing import List, Optional

# Define extraction schema
class Invoice(BaseModel):
    """Schema for invoice extraction."""
    invoice_number: str
    date: str
    vendor_name: str
    total_amount: float
    line_items: List[dict]

class ContactInfo(BaseModel):
    """Schema for contact information."""
    name: str
    email: Optional[str]
    phone: Optional[str]
    company: Optional[str]

# Initialize extractor
extractor = LlamaExtract(
    api_key=os.getenv("LLAMA_CLOUD_API_KEY"),
)

# Extract structured data
result = extractor.extract(
    documents=["invoice.pdf"],
    schema=Invoice,
)

# Access extracted data
for item in result:
    invoice = Invoice(**item)
    print(f"Invoice: {invoice.invoice_number}")
    print(f"Total: ${invoice.total_amount}")
'''

print("LlamaExtract structured extraction:")
print(LLAMA_EXTRACT_CODE)

## 5. Comparing Local vs Cloud

When to use LlamaCloud vs local processing:

In [None]:
comparison = """
╔══════════════════════╦═══════════════════════════╦═══════════════════════════╗
║ Aspect               ║ Local Processing          ║ LlamaCloud                ║
╠══════════════════════╬═══════════════════════════╬═══════════════════════════╣
║ Setup                ║ More configuration        ║ Quick setup               ║
║ Cost                 ║ Compute costs only        ║ Pay per document/query    ║
║ PDF Quality          ║ Basic extraction          ║ Best-in-class parsing     ║
║ Scaling              ║ Manual infrastructure     ║ Automatic scaling         ║
║ Data Privacy         ║ Data stays local          ║ Data sent to cloud        ║
║ Maintenance          ║ Self-managed              ║ Managed service           ║
║ Complex Documents    ║ Limited capability        ║ Excellent support         ║
╚══════════════════════╩═══════════════════════════╩═══════════════════════════╝

RECOMMENDATION:
- Use LlamaCloud for: Complex PDFs, production systems, quick prototypes
- Use Local for: Simple documents, data privacy requirements, cost control
"""

print(comparison)

## 6. Local Fallback Pattern

A pattern for using LlamaCloud with local fallback:

In [None]:
from llama_index.core import Document
from pathlib import Path

class HybridDocumentLoader:
    """Load documents with LlamaCloud fallback to local parsing."""
    
    def __init__(self, use_llamacloud: bool = True):
        self.use_llamacloud = use_llamacloud and llama_cloud_key
        self.llamaparse = None
        
        if self.use_llamacloud and LLAMAPARSE_AVAILABLE:
            try:
                from llama_parse import LlamaParse
                self.llamaparse = LlamaParse(
                    api_key=llama_cloud_key,
                    result_type="markdown",
                )
            except Exception as e:
                print(f"LlamaParse init failed: {e}")
                self.use_llamacloud = False
    
    def load_document(self, file_path: str) -> list:
        """Load a document, using LlamaCloud if available."""
        path = Path(file_path)
        
        # Try LlamaCloud for PDFs
        if path.suffix.lower() == '.pdf' and self.llamaparse:
            try:
                print(f"Using LlamaParse for: {path.name}")
                return self.llamaparse.load_data(str(path))
            except Exception as e:
                print(f"LlamaParse failed, falling back to local: {e}")
        
        # Fallback to local parsing
        print(f"Using local parsing for: {path.name}")
        return self._local_parse(path)
    
    def _local_parse(self, path: Path) -> list:
        """Local document parsing fallback."""
        if path.suffix.lower() == '.txt':
            return [Document(text=path.read_text())]
        elif path.suffix.lower() == '.pdf':
            # Use pypdf or similar
            try:
                from pypdf import PdfReader
                reader = PdfReader(str(path))
                text = "\n".join(page.extract_text() for page in reader.pages)
                return [Document(text=text)]
            except ImportError:
                print("pypdf not installed for local PDF parsing")
                return []
        else:
            return [Document(text=path.read_text())]

print("✓ HybridDocumentLoader defined!")

In [None]:
# Test the hybrid loader
loader = HybridDocumentLoader(use_llamacloud=True)

print(f"LlamaCloud enabled: {loader.use_llamacloud}")
print(f"LlamaParse available: {loader.llamaparse is not None}")

## 7. Summary

You've learned about LlamaCloud services:

### Key Takeaways

| Service | Purpose | Key Feature |
|---------|---------|-------------|
| **LlamaParse** | Document parsing | Best PDF/table handling |
| **LlamaExtract** | Structured extraction | Schema-based extraction |
| **Managed Index** | Cloud RAG | Zero infrastructure |

### When to Use LlamaCloud

1. **Complex documents**: Tables, images, multi-column layouts
2. **Production systems**: Need reliability and scale
3. **Quick prototypes**: Get started fast
4. **Structured extraction**: Need specific data fields

### Resources

- [LlamaCloud Documentation](https://docs.cloud.llamaindex.ai/)
- [LlamaParse Guide](https://docs.cloud.llamaindex.ai/llamaparse/getting_started)
- [Pricing](https://cloud.llamaindex.ai/pricing)

---

## Exercises

1. **PDF comparison**: Parse the same PDF with LlamaParse and local tools, compare quality

2. **Extraction schema**: Design a schema for your use case and test with LlamaExtract

3. **Managed index**: Create a managed index and compare query latency with local

4. **Cost analysis**: Calculate costs for your expected document volume

In [None]:
# Exercise space
# Experiment with LlamaCloud services here!