# LLM + Graph Visualization (Split-GPU)

**Duration:** ~30 min | **Platform:** Kaggle dual Tesla T4

This notebook demonstrates the **split-GPU architecture**: LLM inference on GPU 0
and RAPIDS/Graphistry graph analytics on GPU 1.

### Architecture
```
GPU 0 (15 GB)          GPU 1 (15 GB)
┌────────────────┐     ┌────────────────┐
│  llama-server  │     │  cuDF / cuGraph │
│  (LLM model)   │     │  Graphistry     │
└────────────────┘     └────────────────┘
```

### What you'll learn
1. Configure split-GPU mode
2. Extract entities from text with LLM
3. Build graph structures on GPU with RAPIDS
4. Visualize with Graphistry
5. Build traced end-to-end pipelines

In [None]:
!pip install -q git+https://github.com/llamatelemetry/llamatelemetry.git@v1.0.0
# RAPIDS and graphistry are pre-installed on Kaggle GPU notebooks

## Setup Split-GPU Mode

Start the LLM server on GPU 0 with `tensor_split="1.0,0.0"` to leave GPU 1
entirely free for RAPIDS.

In [None]:
import llamatelemetry
from llamatelemetry.llama import ServerManager, LlamaCppClient
from llamatelemetry.kaggle import rapids_gpu, auto_register_graphistry
from huggingface_hub import hf_hub_download

llamatelemetry.init(service_name="graph-viz")

model_path = hf_hub_download(
    repo_id="bartowski/google_gemma-3-1b-it-GGUF",
    filename="google_gemma-3-1b-it-Q4_K_M.gguf",
    cache_dir="/root/.cache/huggingface",
)

# LLM on GPU 0 only
mgr = ServerManager()
mgr.start_server(model_path=model_path, gpu_layers=99, tensor_split="1.0,0.0", ctx_size=2048)
mgr.wait_until_ready(timeout=60)
client = LlamaCppClient(base_url="http://127.0.0.1:8090")

# Register Graphistry (uses Kaggle secrets)
auto_register_graphistry()
print("Split-GPU mode active: LLM on GPU 0, RAPIDS on GPU 1")

## Generate Data with LLM

Use the LLM to extract entities and relationships from text.

In [None]:
import json

@llamatelemetry.trace(name="extract-entities")
def extract_entities(text):
    prompt = f"""Extract entities and relationships from this text as JSON.
Return format: {{"entities": ["name1", "name2"], "relationships": [["entity1", "relation", "entity2"]]}}

Text: {text}

JSON:"""

    resp = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        max_tokens=256, temperature=0.3,
    )
    try:
        return json.loads(resp.choices[0].message.content)
    except json.JSONDecodeError:
        # Fallback: try to find JSON in response
        content = resp.choices[0].message.content
        start = content.find("{")
        end = content.rfind("}") + 1
        if start >= 0 and end > start:
            return json.loads(content[start:end])
        return {"entities": [], "relationships": []}

texts = [
    "Albert Einstein developed the theory of relativity at the University of Zurich. He later moved to Princeton.",
    "Marie Curie discovered radium and polonium. She worked at the University of Paris with Pierre Curie.",
    "Alan Turing worked at Bletchley Park during WWII. He later joined the University of Manchester.",
]

all_entities = []
all_relationships = []
for text in texts:
    result = extract_entities(text)
    all_entities.extend(result.get("entities", []))
    all_relationships.extend(result.get("relationships", []))
    print(f"Extracted {len(result.get('entities', []))} entities, {len(result.get('relationships', []))} relationships")

print(f"\nTotal: {len(all_entities)} entities, {len(all_relationships)} relationships")

## Build Graph on GPU 1

Use RAPIDS cuDF and cuGraph on GPU 1 to construct the graph.

In [None]:
with rapids_gpu(1):
    try:
        import cudf
        import cugraph

        # Build edge DataFrame
        edges = []
        for rel in all_relationships:
            if len(rel) >= 3:
                edges.append({"src": rel[0], "relationship": rel[1], "dst": rel[2]})

        if edges:
            edge_df = cudf.DataFrame(edges)
            print(f"Edge DataFrame ({len(edge_df)} edges):")
            print(edge_df.to_pandas())
        else:
            print("No edges extracted — using sample data")
            edge_df = cudf.DataFrame({
                "src": ["Einstein", "Einstein", "Curie", "Curie", "Turing"],
                "relationship": ["worked_at", "moved_to", "discovered", "worked_at", "worked_at"],
                "dst": ["Zurich", "Princeton", "Radium", "Paris", "Bletchley Park"],
            })
    except ImportError:
        import pandas as pd
        print("RAPIDS not available — using pandas fallback")
        edge_df = pd.DataFrame({
            "src": ["Einstein", "Einstein", "Curie", "Curie", "Turing"],
            "relationship": ["worked_at", "moved_to", "discovered", "worked_at", "worked_at"],
            "dst": ["Zurich", "Princeton", "Radium", "Paris", "Bletchley Park"],
        })
        print(edge_df)

## Visualize with Graphistry

Graphistry renders interactive GPU-accelerated graph visualizations.

In [None]:
with rapids_gpu(1):
    try:
        import graphistry

        g = graphistry.edges(edge_df, "src", "dst")
        g = g.bind(edge_title="relationship")
        g.plot()
    except Exception as e:
        print(f"Graphistry visualization: {e}")
        print("Tip: Ensure GRAPHISTRY_API_KEY is set in Kaggle secrets")
        # Fallback: print the graph structure
        print("\nGraph structure:")
        if hasattr(edge_df, 'to_pandas'):
            print(edge_df.to_pandas().to_string(index=False))
        else:
            print(edge_df.to_string(index=False))

## Combined Pipeline

A full traced pipeline: extract → build → visualize.

In [None]:
@llamatelemetry.workflow(name="graph-pipeline")
def graph_pipeline(texts):
    """End-to-end: text → entities → graph → visualization."""

    # Step 1: Extract entities from all texts
    entities, relationships = [], []
    for text in texts:
        with llamatelemetry.span("extract", text_length=len(text)):
            result = extract_entities(text)
            entities.extend(result.get("entities", []))
            relationships.extend(result.get("relationships", []))

    # Step 2: Build graph
    with llamatelemetry.span("build-graph", num_edges=len(relationships)):
        import pandas as pd
        edges = []
        for rel in relationships:
            if len(rel) >= 3:
                edges.append({"src": rel[0], "rel": rel[1], "dst": rel[2]})
        df = pd.DataFrame(edges) if edges else pd.DataFrame(columns=["src", "rel", "dst"])

    print(f"Pipeline complete: {len(entities)} entities, {len(edges)} edges")
    return df

result_df = graph_pipeline([
    "Isaac Newton formulated the laws of motion at Cambridge University.",
    "Niels Bohr developed the atomic model at the University of Copenhagen.",
])
print(result_df)

## Summary

The split-GPU architecture enables:
- **GPU 0**: Full 15 GB for the LLM model
- **GPU 1**: Full 15 GB for RAPIDS analytics + Graphistry rendering
- **No VRAM contention** between LLM and graph workloads

In [None]:
mgr.stop_server()
llamatelemetry.shutdown()
print("Done.")