<a href="https://colab.research.google.com/github/zhaw-iwi/LLM-Intro/blob/main/deepseek_playground.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example to Run Deepseek locally / in Google colab with ollama

Prerequisites: Nvidia GPU with Cuda, Unix/Linux environment


## Download ollama and install ollama

In [1]:
!curl https://ollama.ai/install.sh | sh

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13281    0 13281    0     0  68271      0 --:--:-- --:--:-- --:--:-- 68458
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


## Install cuda drivers
- The first command disables the interactive shell, as we do not have access to it in colab
- the second command installs the latest version of the cuda drivers to optimize the inference (predicition) of the model

In [2]:
!echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections
!sudo apt-get update && sudo apt-get install -y cuda-drivers

Get:1 http://archive.ubuntu.com/ubuntu noble InRelease [256 kB]
Get:2 http://security.ubuntu.com/ubuntu noble-security InRelease [126 kB]      
Get:3 https://dl.yarnpkg.com/debian stable InRelease                           
Get:4 http://archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB]
Get:5 http://archive.ubuntu.com/ubuntu noble-backports InRelease [126 kB]
Get:6 https://packages.microsoft.com/repos/microsoft-ubuntu-noble-prod noble InRelease [3600 B]
Get:7 https://repo.anaconda.com/pkgs/misc/debrepo/conda stable InRelease [3961 B]
Get:8 http://security.ubuntu.com/ubuntu noble-security/universe amd64 Packages [1181 kB]
Get:9 http://archive.ubuntu.com/ubuntu noble/restricted amd64 Packages [117 kB]
Get:10 http://security.ubuntu.com/ubuntu noble-security/multiverse amd64 Packages [33.1 kB]
Get:11 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages [1687 kB]
Get:12 http://archive.ubuntu.com/ubuntu noble/multiverse amd64 Packages [331 kB]
Get:13 http://security.u

## Ensure the cuda drivers are used

In [3]:
import os

# Set LD_LIBRARY_PATH so the system NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})

## Start Ollama
- nohup is a command line expression that prevents a process from stopping after exiting the terminal.

In [6]:
import subprocess
import os

# Startet den Ollama-Server im Hintergrund und schreibt Logs + PID
log_path = '/tmp/ollama.log'
pid_path = '/tmp/ollama.pid'
logfile = open(log_path, 'ab')
proc = subprocess.Popen(['ollama', 'serve'], stdout=logfile, stderr=logfile, start_new_session=True)
print(f'Ollama gestartet mit PID {proc.pid}; Logs: {log_path}')
with open(pid_path, 'w') as f:
    f.write(str(proc.pid))

Ollama gestartet mit PID 10006; Logs: /tmp/ollama.log


In [7]:
# Prüfen / Stoppen des Ollama-Servers
import os, signal
pid_path = '/tmp/ollama.pid'
if os.path.exists(pid_path):
    with open(pid_path) as f:
        pid = int(f.read().strip())
    print('PID:', pid)
    try:
        os.kill(pid, 0)
        print('Prozess läuft')
    except OSError:
        print('Prozess nicht gefunden / nicht laufend')
else:
    print('Keine PID-Datei gefunden unter', pid_path)

# Zum Stoppen: ent-kommentieren und ausführen:
# os.kill(pid, signal.SIGTERM)

PID: 10006
Prozess läuft


# Donwload the deepseek model

In [8]:
!ollama pull deepseek-r1:7b

[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling ma

## Install ollama in your Python environment

In [9]:
!pip install ollama

Collecting ollama
  Downloading ollama-0.6.1-py3-none-any.whl.metadata (4.3 kB)
  Downloading ollama-0.6.1-py3-none-any.whl.metadata (4.3 kB)
Downloading ollama-0.6.1-py3-none-any.whl (14 kB)
Downloading ollama-0.6.1-py3-none-any.whl (14 kB)
Installing collected packages: ollama
Installing collected packages: ollama
Successfully installed ollama-0.6.1
Successfully installed ollama-0.6.1


## Example model request

In [10]:
import ollama
response = ollama.chat(model='deepseek-r1:7b', messages=[
  {
    'role': 'user',
    'content': "How many r's are in a strawberry?",
  },
])
print(response['message']['content'])

ResponseError: llama runner process has terminated: signal: terminated (status code: 500)