# AI Village - Colab LLM Server

This notebook runs multiple LLMs on Colab's GPU and exposes them via Cloudflare tunnel.
I ran this using A100 GPU. You can use a compatible GPU for your use-case. For this example, I'll use A100 for this

## Instructions
1. **Set GPU:** Runtime → Change runtime type → **A100 GPU**
2. Run cells 1-6 in order
3. Copy the tunnel URL from Cell 6
4. Update `config.py` on your laptop with the new URL
5. Run Cell 7 to keep the session alive

## Models Used
| Agent | Model | Size |
|-------|-------|------|
| Scout | mistral:7b | 4.1GB |
| Conservative Engineer | llama3.1:8b | 4.7GB |
| Quality Engineer | codellama:13b | 7.4GB |
| Innovative Engineer | mixtral:8x7b | 26GB |

In [None]:
# Cell 1: Check GPU
# You should see an A100 or similar GPU

!nvidia-smi --query-gpu=name,memory.total --format=csv
print("\n✓ GPU is available!")


In [None]:
# Cell 2: Install Ollama

!curl -fsSL https://ollama.com/install.sh | sh
print("\n✓ Ollama installed!")


In [None]:
# Cell 3: Start Ollama server on all interfaces
# IMPORTANT: This binds to 0.0.0.0 so the tunnel can reach it

import subprocess
import time
import os

# Kill any existing Ollama process
!pkill -9 -f ollama 2>/dev/null || true
time.sleep(2)

# Start Ollama with binding to all interfaces
env = os.environ.copy()
env["OLLAMA_HOST"] = "0.0.0.0:11434"

process = subprocess.Popen(
    ["ollama", "serve"],
    env=env,
    stdout=open('/tmp/ollama.log', 'w'),
    stderr=subprocess.STDOUT
)
time.sleep(5)

# Verify it's running
!curl -s http://localhost:11434 && echo " Ollama server ready!"


In [None]:
# Cell 4: Pull all models (10-15 min on first run)
# These are cached after first download

print("Pulling models... This takes 10-15 minutes on first run.\n")

models = [
    ("mistral:7b", "Scout"),
    ("llama3.1:8b", "Conservative Engineer"),
    ("codellama:13b", "Quality Engineer"),
    ("mixtral:8x7b", "Innovative Engineer")
]

for model, agent in models:
    print(f"Pulling {model} (for {agent})...")
    !ollama pull {model}
    print(f"{model} ready\n")

print("All models downloaded!")


In [None]:
# Cell 5: Create Cloudflare Tunnel
# This exposes your Ollama server to the internet (no account needed)

# Install cloudflared
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb > /dev/null 2>&1
!rm cloudflared-linux-amd64.deb

# Kill any existing tunnels
!pkill -f cloudflared 2>/dev/null || true

import subprocess
import re
import time

# Start cloudflared tunnel
print("Starting Cloudflare tunnel...")
process = subprocess.Popen(
    ["cloudflared", "tunnel", "--url", "http://localhost:11434"],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    text=True
)

# Wait for URL
time.sleep(3)
tunnel_url = None

for _ in range(30):
    line = process.stdout.readline()
    if line:
        match = re.search(r'(https://[a-z0-9-]+\.trycloudflare\.com)', line)
        if match:
            tunnel_url = match.group(1)
            break
    time.sleep(1)

if tunnel_url:
    print("TUNNEL READY!")
    print(f"\nURL: {tunnel_url}")
    print(f"\nCopy this to your config.py:\n")
    print(f'REMOTE_OLLAMA_URL = "{tunnel_url}/api/generate"')
else:
    print("Could not get tunnel URL. Re-run this cell.")


In [None]:
# Cell 7: Keep session alive
# Run this and KEEP THE TAB OPEN while using the AI Village

import time
from datetime import datetime

print("="*60)
print("LLM Server is running!")
print("="*60)
print("\nKeep this tab open. Close it to shut down the server.")
print("The dots below show the server is still active.\n")

start_time = datetime.now()
try:
    while True:
        elapsed = datetime.now() - start_time
        mins = int(elapsed.total_seconds() // 60)
        print(".", end="", flush=True)
        if mins > 0 and mins % 10 == 0:
            print(f" [{mins}m]", end="", flush=True)
        time.sleep(60)
except KeyboardInterrupt:
    print("\n\nServer stopped.")
