# Running Ollama with Ngrok in Google Colab

This notebook sets up an Ollama server with Ngrok tunneling in Google Colab, allowing you to access your Ollama instance from anywhere.

## Setup Instructions

First, you'll need to configure your ngrok authentication token:

1. Go to https://ngrok.com and sign up for a free account
2. After signing in, go to https://dashboard.ngrok.com/get-started/your-authtoken
3. Copy your authtoken
4. In Google Colab:
   - Click on the key icon in the left sidebar to open "Secrets"
   - Click "Add new secret"
   - Set "Name" as: authtoken
   - Set "Value" as: your-ngrok-token-here
   - Click "Add"
5. Make sure to use a GPU runtime (Runtime -> Change runtime type -> GPU)

## 1. Install Required Packages


In [None]:
# Install Ollama
!curl https://ollama.ai/install.sh | sh

# Install CUDA drivers
!echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
!sudo apt-get update && sudo apt-get install -y cuda-drivers

# Install pyngrok
!pip install pyngrok


## 2. Import Dependencies and Configure Environment


In [None]:
from pyngrok import ngrok
from google.colab import userdata
import os
import subprocess
import threading
import time

# Verify ngrok token exists
try:
    token = userdata.get('authtoken')
    if not token:
        raise ValueError("No authtoken found in Colab secrets")
    ngrok.set_auth_token(token)
except Exception as e:
    print("ERROR: Could not find ngrok authtoken in Colab secrets!")
    print("Please follow the instructions at the top of this notebook to set up your ngrok authtoken")
    print("Then restart the runtime and run again")
    raise e

# Set LD_LIBRARY_PATH for NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})


## 3. Helper Functions


In [None]:
def run_and_print_output(process):
    """Helper function to continuously read and print process output"""
    while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
            break
        if output:
            print(output.strip())


## 4. Start Ollama Server


In [None]:
# Start ollama serve
print('>>> starting ollama serve')
ollama_serve = subprocess.Popen(
    ['ollama', 'serve'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True
)

# Start thread to print ollama serve output
serve_thread = threading.Thread(target=run_and_print_output, args=(ollama_serve,))
serve_thread.daemon = True
serve_thread.start()

# Give ollama serve a moment to start up
time.sleep(5)


## 5. Pull Ollama Model


In [None]:
# Pull the model
print('>>> starting ollama pull qwen2.5-coder:32b')
# ****************************************************
# Change the model name to the one you want to pull
# See model library on Ollama website
# ****************************************************
pull_process = subprocess.Popen(
    ['ollama', 'pull', 'qwen2.5-coder:32b'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True
)

# Print pull process output
while True:
    output = pull_process.stdout.readline()
    if output == '' and pull_process.poll() is not None:
        break
    if output:
        print(output.strip())


## 6. Start Ngrok Tunnel


In [None]:
# Start ngrok
print('>>> starting ngrok http server')
ngrok_process = subprocess.Popen(
    ['ngrok', 'http', '--log', 'stderr', '11434', '--host-header=localhost:11434'],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    universal_newlines=True
)

# Start thread to print ngrok output
ngrok_thread = threading.Thread(target=run_and_print_output, args=(ngrok_process,))
ngrok_thread.daemon = True
ngrok_thread.start()


## 7. Get Public URL and Keep Server Running


In [None]:
# Get and display the public URL
time.sleep(5)  # Wait for ngrok to start
try:
    tunnels = ngrok.get_tunnels()
    if tunnels:
        print("\n=== Your Ollama server is available at ===")
        print(tunnels[0].public_url)
        print("=====================================")
    else:
        print("No active ngrok tunnels found")
except Exception as e:
    print("Error getting ngrok URL:", e)

# Keep the main process running
try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    print("Shutting down...")
    ollama_serve.terminate()
    ngrok_process.terminate()


## Usage Instructions

1. Run all cells in order
2. Wait for the model to download (this may take a while)
3. Once complete, you'll see a public URL where your Ollama server is accessible
4. You can now use this URL to connect to your Ollama instance from anywhere

Note: The server will keep running until you stop the notebook execution or disconnect from the Colab runtime.
