<a href="https://colab.research.google.com/github/uppercaveman/ollama-learn/blob/main/ollama_llama3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 检查Python环境

In [1]:
!pip --version

pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)


In [2]:
!pip3 --version

pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)


# 查看GPU信息

In [5]:
# 选择并连接T4服务器
# nvidia-smi 是 NVIDIA 提供的一个命令行工具，用于监控和管理 NVIDIA GPU 设备
!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


In [1]:
# 更多内存信息
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install -q gputil

import psutil
import humanize
import os
import GPUtil as GPU

GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
    process = psutil.Process(os.getpid())
    virtual_memory = humanize.naturalsize( psutil.virtual_memory().available )
    memory_info = humanize.naturalsize( process.memory_info().rss)
    memoryFree =gpu.memoryFree
    memoryUsed = gpu.memoryUsed
    memoryUtil = gpu.memoryUtil*100
    memoryTotal = gpu.memoryTotal

    print(f"Gen RAM Free: {virtual_memory} | Proc size: {memory_info}" )

    print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".\
          format(memoryFree, memoryUsed, memoryUtil*100, memoryTotal))
printm()

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for gputil (setup.py) ... [?25l[?25hdone
Gen RAM Free: 12.6 GB | Proc size: 99.2 MB
GPU RAM Free: 15101MB | Used: 0MB | Util   0% | Total 15360MB


In [2]:
# 查看cuda版本信息
!nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


# 安装ollama

In [4]:
# 下载安装
!curl -fsSL https://ollama.com/install.sh | sh

>>> Downloading ollama...
############################################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [5]:
# 查看版本
!ollama -v



In [8]:
# 没有权限
!systemctl status ollama

System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down


In [9]:
# 手动后台启动
!ollama serve > log.log 2>&1 &

In [10]:
# 查看启动日志
!cat log.log

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIG6qRavXiiQoYymztmwmDM3BAKaOI4ZUCTNVXLzcb+iX

time=2024-04-26T00:22:40.776Z level=INFO source=images.go:817 msg="total blobs: 0"
time=2024-04-26T00:22:40.776Z level=INFO source=images.go:824 msg="total unused blobs removed: 0"
time=2024-04-26T00:22:40.776Z level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.1.32)"
time=2024-04-26T00:22:40.776Z level=INFO source=payload.go:28 msg="extracting embedded files" dir=/tmp/ollama633719744/runners


# 下载并启动llama3

In [11]:
# 下载并后台启动
!ollama run llama3 > llama3.log 2>&1 &

In [16]:
# 参考日志，等待下载启动
!cat llama3.log

[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest 
pulling 00e1317cbf74...   0% ▕                ▏    0 B/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 00e1317cbf74...   0% ▕                ▏    0 B/4.7 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 00e1317cbf

# WebUI

In [15]:
!docker ps -a
# 谷歌Colab不支持Docker，也没有计划支持
# https://github.com/googlecolab/colabtools/issues/299

/bin/bash: line 1: docker: command not found


In [21]:
!git clone https://github.com/ollama-ui/ollama-ui

Cloning into 'ollama-ui'...
remote: Enumerating objects: 221, done.[K
remote: Counting objects: 100% (112/112), done.[K
remote: Compressing objects: 100% (26/26), done.[K
remote: Total 221 (delta 93), reused 86 (delta 86), pack-reused 109[K
Receiving objects: 100% (221/221), 177.34 KiB | 11.82 MiB/s, done.
Resolving deltas: 100% (120/120), done.


In [22]:
!cd /content/ollama-ui && make
# 本地无法访问

# Check if resources directory exists, if not create it
# Check SHA-256 hash
resources/bootstrap.bundle.min.js: OK
resources/bootstrap.min.css: OK
resources/marked.min.js: OK
resources/purify.min.js: OK
make[1]: Entering directory '/content/ollama-ui'
python3 -m http.server --bind 127.0.0.1
ollama serve
Error: listen tcp 127.0.0.1:11434: bind: address already in use
make[1]: *** [Makefile:13: ollama_server] Error 1
make[1]: *** Waiting for unfinished jobs....
Serving HTTP on 127.0.0.1 port 8000 (http://127.0.0.1:8000/) ...

Keyboard interrupt received, exiting.
make: *** [Makefile:5: default] Interrupt
^C


In [26]:
!curl ifconfig.me/ip

34.83.152.119

In [27]:
!git clone https://github.com/datvodinh/rag-chatbot.git

Cloning into 'rag-chatbot'...
remote: Enumerating objects: 444, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (42/42), done.[K
remote: Total 444 (delta 14), reused 39 (delta 11), pack-reused 389[K
Receiving objects: 100% (444/444), 6.70 MiB | 19.49 MiB/s, done.
Resolving deltas: 100% (222/222), done.


In [30]:
!cd /content/rag-chatbot && pip3 install .

Processing /content/rag-chatbot
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting chromadb<0.5.0,>=0.4.24 (from rag-chatbot==0.1.0)
  Downloading chromadb-0.4.24-py3-none-any.whl (525 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m525.5/525.5 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops<0.8.0,>=0.7.0 (from rag-chatbot==0.1.0)
  Downloading einops-0.7.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gradio<5.0.0,>=4.21.0 (from rag-chatbot==0.1.0)
  Downloading gradio-4.28.0-py3-none-any.whl (12.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.2/12.2 MB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index<0.11.0,>=0.10.22 (from rag-chatbot==0.1.0)
  Downloa

In [None]:
!cd /content/rag-chatbot && python app.py --host 34.83.152.119

config.json: 100% 743/743 [00:00<00:00, 3.43MB/s]
model.safetensors: 100% 133M/133M [00:01<00:00, 82.1MB/s]
tokenizer_config.json: 100% 366/366 [00:00<00:00, 823kB/s]
vocab.txt: 100% 232k/232k [00:00<00:00, 12.5MB/s]
tokenizer.json: 100% 711k/711k [00:00<00:00, 5.40MB/s]
special_tokens_map.json: 100% 125/125 [00:00<00:00, 659kB/s]
>>> starting ollama serve
time=2024-04-26T01:31:56.525Z level=INFO source=images.go:817 msg="total blobs: 5"
time=2024-04-26T01:31:56.526Z level=INFO source=images.go:824 msg="total unused blobs removed: 0"
time=2024-04-26T01:31:56.527Z level=INFO source=routes.go:1143 msg="Listening on 127.0.0.1:11434 (version 0.1.32)"
time=2024-04-26T01:31:56.532Z level=INFO source=payload.go:28 msg="extracting embedded files" dir=/tmp/ollama316464241/runners
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
time=2024-04-26T01:32:02.771Z level=INFO source=payload.go:41 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v