# ollama 介绍

Get up and running with large language models.


https://ollama.com/
https://github.com/ollama/ollama


# UI 交互

## 下载 ollama


https://ollama.com/download

可以直接下载安装包，支持 windows、Mac、Linux 三端。


```bash
curl -fsSL https://ollama.com/install.sh | bash

ollama --version
# output: ollama version is 0.5.11
```


Docker 安装

```bash
docker pull ollama/ollama
docker run -p 11434:11434 ollama/ollama
```

访问 http://localhost:11434 即可使用 Ollama


## 下载与运行模型

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

下载模型

```bash
ollama pull llama3.2
```


运行模型

注：如果运行模型时，模型尚未下载，则会自动下载，下载完成后自动运行模型。


```bash
ollama run llama3.2
```

运行模型后，就可以在命令行里一问一答的进行对话了。  



```text
>>> Send a message (/? for help)
>>> /?
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load <model>   Load a session or model
  /save <model>   Save your current session
  /clear          Clear session context
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts

Use """ to begin a multi-line message.

>>> who are you?
I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

>>> can you speek chinese?
I can understand and generate text in Simplified Chinese, but my proficiency may not be as high as that of a native speaker or a professional translator.

If you'd like to communicate in Chinese, I can try to:

1. Understand and respond to simple questions or phrases
2. Generate text in Simplified Chinese on various topics
3. Translate English text into Simplified Chinese

However, please note that my ability to understand nuances, idioms, and complex conversations may be limited.

Which aspect of Chinese language would you like me to help with?
```


### Multiline input

```text
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
```


### Multimodal models


```bash
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
```

### Pass the prompt as an argument

```bash
ollama run llama3.2 "Summarize this file: $(cat README.md)"
```


# SDK 交互

https://github.com/ollama/ollama-python/tree/main/examples

安装 python SDK


## 安装 SDK

```bash
pip install ollama
```


In [3]:

# python 获取系统变量 HOME 变量
import os
homePath = os.environ['HOME']
print("homePath: ",homePath)

# 修改成自己的 HOME 路径
basePath=homePath

print("basePath: ",basePath)

homePath:  /Users/tiankonguse-m3
basePath:  /Users/tiankonguse-m3


In [None]:
# 建议手动在命令行里运行
%pip install ollama


## 文本补全 generate

In [4]:
import shlex
import ollama
import json

prompts = ["Who are you", "What is your name"]

# Iterate over prompts and generate responses
for prompt in prompts:
    response = ollama.generate(
        model="llama3.2",  # 模型名称
        prompt=prompt  # 提示文本
    )

    response = response.response

    # 打印 response 的类型
    # print("Response Type:", type(response))


    print("prompt:", prompt)
    # 如果 response 是 数组，使用 ”“ 连接数组
    if isinstance(response, list):
        print("Response list:", " ".join(response)) 
    elif isinstance(response, str):
        print("Response str:", response)
    else:
        print("Response other:", response)
    print("")

# prompt: Who are you
# Response str: I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

# prompt: What is your name
# Response str: I don't have a personal name, but I'm an AI designed to assist and communicate with users. You can think of me as a conversational AI or a chatbot. I'm here to help answer your questions, provide information, and engage in conversations to the best of my abilities. Is there something specific you'd like to talk about or ask?


prompt: Who are you
Response str: I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

prompt: What is your name
Response str: I don't have a personal name. I'm an AI designed to assist and provide information, and I'm often referred to as a "language model" or a "chatbot." My purpose is to help users like you with their questions and tasks, and I don't have a personal identity or emotions. Is there anything else I can help you with?



## 对话模式(chat)

In [5]:
from ollama import chat

response = chat(
    model="llama3.2",
    messages=[
        {"role": "user", "content": "Who are you?"}
    ]
)
# 转化为 json 格式化打印 response

print(response.message.content)
# I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."


I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."


## 流式响应

In [None]:
from ollama import chat

stream = chat(
    model="llama3.2",
    messages=[{"role": "user", "content": "Who are you?"}],
    stream=True
)

# 效果：单词一个个出来
for chunk in stream:
    print(chunk["message"]["content"], end="", flush=True)
# I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

I'm an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."

## 其他 SDK API

- 列出所有可用的模型 `ollama.list()`
- 显示指定模型的详细信息 `ollama.show('llama3.2')`
- 从远程仓库拉取模型 `ollama.pull('llama3.2')`
- 生成文本嵌入 `ollama.embed(model='llama3.2', input='The sky is blue because of rayleigh scattering')`
- 查看正在运行的模型列表 `ollama.ps()`


## 自动补全模式 fill-in-middle

填充开头开结尾，自动补充中间


使用场景：

- 代码补全， 模型：codellama:7b-code
- 

In [7]:
from ollama import generate

prompt = '''def Sort(s: str) -> str:
    """ '''

suffix = """
    return result
"""

response = generate(
  model='qwen2.5-coder:0.5b',
  prompt=prompt,
  suffix=suffix,
  options={
    'num_predict': 128,
    'temperature': 0,
    'top_p': 0.9,
    'stop': ['<EOT>'],
  },
)

print(response['response'])

Sort the string s in ascending order """
    # Convert the string to a list of characters
    char_list = list(s)
    
    # Sort the list of characters
    char_list.sort()
    
    # Join the sorted list back into a string
    result = ''.join(char_list)
    
    return result

# Example usage:
input_string = "hello"
sorted_string = Sort(input_string)
print(sorted_string)  # Output: "ehllo"  # The characters are sorted in ascending order



## call function

自定义插件函数，大模型预处理后，主动调研插件，结果再传给大模型，最后输出结果。


In [8]:
from ollama import ChatResponse, chat


def add_two_numbers(a: int, b: int) -> int:
  """
  Add two numbers

  Args:
    a (int): The first number
    b (int): The second number

  Returns:
    int: The sum of the two numbers
  """

  # The cast is necessary as returned tool call arguments don't always conform exactly to schema
  # E.g. this would prevent "what is 30 + 12" to produce '3012' instead of 42
  return int(a) + int(b)


def subtract_two_numbers(a: int, b: int) -> int:
  """
  Subtract two numbers
  """

  # The cast is necessary as returned tool call arguments don't always conform exactly to schema
  return int(a) - int(b)


# Tools can still be manually defined and passed into chat
subtract_two_numbers_tool = {
  'type': 'function',
  'function': {
    'name': 'subtract_two_numbers',
    'description': 'Subtract two numbers',
    'parameters': {
      'type': 'object',
      'required': ['a', 'b'],
      'properties': {
        'a': {'type': 'integer', 'description': 'The first number'},
        'b': {'type': 'integer', 'description': 'The second number'},
      },
    },
  },
}

messages = [{'role': 'user', 'content': 'What is 1 + 4 - 2 * 6?'}]
print('Prompt:', messages[0]['content'])

available_functions = {
  'add_two_numbers': add_two_numbers,
  'subtract_two_numbers': subtract_two_numbers,
}

model = "llama3.2"

response: ChatResponse = chat(
  model,
  messages=messages,
  tools=[add_two_numbers, subtract_two_numbers_tool],
)

if response.message.tool_calls:
  # There may be multiple tool calls in the response
  for tool in response.message.tool_calls:
    # Ensure the function is available, and then call it
    if function_to_call := available_functions.get(tool.function.name):
      print('Calling function:', tool.function.name)
      print('Arguments:', tool.function.arguments)
      output = function_to_call(**tool.function.arguments)
      print('Function output:', output)
    else:
      print('Function', tool.function.name, 'not found')

# Only needed to chat with the model using the tool call results
if response.message.tool_calls:
  # Add the function response to messages for the model to use
  messages.append(response.message)
  messages.append({'role': 'tool', 'content': str(output), 'name': tool.function.name})
  
  print('Tool call result added to messages: ', messages)
  # Get final response from model with function outputs
  final_response = chat(model, messages=messages)
  print('Final response:', final_response.message.content)

else:
  print('No tool calls returned from model')

Prompt: What is 1 + 4 - 2 * 6?
Calling function: add_two_numbers
Arguments: {'a': '7', 'b': '4'}
Function output: 11
Calling function: subtract_two_numbers
Arguments: {'a': '3', 'b': '-12'}
Function output: 15
Tool call result added to messages:  [{'role': 'user', 'content': 'What is 1 + 4 - 2 * 6?'}, Message(role='assistant', content='', images=None, tool_calls=[ToolCall(function=Function(name='add_two_numbers', arguments={'a': '7', 'b': '4'})), ToolCall(function=Function(name='subtract_two_numbers', arguments={'a': '3', 'b': '-12'}))]), {'role': 'tool', 'content': '15', 'name': 'subtract_two_numbers'}]
Final response: To calculate the expression 1 + 4 - 2 * 6, we need to follow the order of operations (PEMDAS):

1. Multiply 2 and 6: 2 * 6 = 12
2. Add 1 and 4: 1 + 4 = 5
3. Subtract 12 from 5: 5 - 12 = -7

So, the final result is -7.


## chat with image

对话时，带上图片

In [10]:
import base64
from pathlib import Path
from ollama import chat

# from pathlib import Path

# Pass in the path to the image
# path = input('Pass in the path to the image')
path = basePath + '/project/github/ComfyUI/output/ComfyUI_00098_.png'
# path= basePath + '/project/github/faceswap/photo/wyz_wbq/video-frame-1.png'

model = 'llava' # 速度很快
model = 'llama3.2-vision' # 速度很慢
# model = 'llama3.2-vision:11b'

response = chat(
  model=model,
  messages=[
    {
      'role': 'user',
      'content': 'What is in this image? More details',
      'images': [path],
    }
  ],
)

print(response.message.content)


# You can also pass in base64 encoded image data
img = base64.b64encode(Path(path).read_bytes()).decode()
# or the raw bytes
# img = Path(path).read_bytes()

response = chat(
  model='llava',
  messages=[
    {
      'role': 'user',
      'content': 'What is in this image? Can you describe it?',
      'images': [img],
    }
  ],
)

print(response.message.content)

This image depicts a woman standing on a wet street, wearing an elegant white dress. Her long brown hair flows freely as she twirls her skirt and gazes directly at the camera with a warm smile. The dress features thin straps, a fitted bodice, and a flowing skirt that adds to her graceful pose. She wears open-toed heels that complement her outfit, and her right arm is raised above her head while holding onto her skirt with her left hand.

The background of the image is blurred but appears to be a city street or park during the evening hours, with lights reflecting off the wet pavement and trees visible in the distance. The overall atmosphere suggests a romantic or celebratory setting, possibly a photo shoot or special occasion.
 In the image, there is a young woman standing on a rain-soaked street during what appears to be either dawn or dusk. She is holding her hair with one hand while waving at the camera with the other. The woman is dressed in a flowing white dress, which contrasts w

## generate with image



In [11]:
import base64
from pathlib import Path
from ollama import generate

# Pass in the path to the image
path = basePath + '/project/github/ComfyUI/output/ComfyUI_00098_.png'
img = base64.b64encode(Path(path).read_bytes()).decode()



for response in generate('llava', 'explain this image:', images=[img], stream=True):
  print(response['response'], end='', flush=True)

print()

 This is a photograph of a woman walking down the street in what appears to be a city setting during rainy weather. The woman is wearing a sleeveless dress, which suggests that it might be a warmer season or the climate is such that she's dressed for comfort rather than cold. She has her left arm outstretched and her right hand lightly touching her skirt as if gently spinning around, which adds a sense of movement and joy to the scene.

The rain is visible in the background, adding a serene and somewhat moody atmosphere to the image. The wet surface of the street reflects the glow of streetlights, creating a warm contrast to the cool tones of the overcast sky. The woman's pose and expression convey a feeling of carefree enjoyment, possibly on her way home or exploring the city.

The overall composition of the photograph captures a moment of everyday life with an artistic touch, emphasizing the interplay between human emotion, urban architecture, and weather conditions. 


## structured-outputs-image

In [12]:
from pathlib import Path
from typing import Literal

from pydantic import BaseModel

from ollama import chat


# Define the schema for image objects
class Object(BaseModel):
  name: str
  confidence: float
  attributes: str


class ImageDescription(BaseModel):
  summary: str
  objects: list[Object]
  scene: str
  colors: list[str]
  time_of_day: Literal['Morning', 'Afternoon', 'Evening', 'Night']
  setting: Literal['Indoor', 'Outdoor', 'Unknown']
  text_content: str | None = None


# Get path from user input
path = basePath + '/project/github/ComfyUI/output/ComfyUI_00098_.png'
path = Path(path)

# Verify the file exists
if not path.exists():
  raise FileNotFoundError(f'Image not found at: {path}')

model = 'llava'

# Set up chat as usual
response = chat(
  model=model,
  format=ImageDescription.model_json_schema(),  # Pass in the schema for the response
  messages=[
    {
      'role': 'user',
      'content': 'Analyze this image and return a detailed JSON description including objects, scene, colors and any text detected. If you cannot determine certain details, leave those fields empty.',
      'images': [path],
    },
  ],
  options={'temperature': 0},  # Set temperature to 0 for more deterministic output
)


# Convert received content to the schema
image_analysis = ImageDescription.model_validate_json(response.message.content)

# json 格式化输出, 分隔符 2个空格
json_output = image_analysis.model_dump_json(indent=2)
print(json_output)

{
  "summary": "A woman is captured in mid-stride on a rainy night, with her arms outstretched as if embracing the moment. She's wearing a white dress and heels, which contrast with the dark, wet street around her.",
  "objects": [
    {
      "name": "Woman",
      "confidence": 0.95,
      "attributes": "She is in motion, smiling, and appears to be enjoying herself despite the rain."
    },
    {
      "name": "Street",
      "confidence": 0.85,
      "attributes": "The street is wet from recent rainfall, with puddles visible on the asphalt."
    },
    {
      "name": "Rain",
      "confidence": 0.75,
      "attributes": "It's raining, creating a reflective sheen on the street and sidewalk."
    },
    {
      "name": "Dress",
      "confidence": 0.85,
      "attributes": "The woman is wearing a white dress that stands out against the darker tones of the scene."
    },
    {
      "name": "Heels",
      "confidence": 0.75,
      "attributes": "She's wearing high heels, which add an 

# Ollama 相关命令


## ollama --help

```bash
ollama --help

```

```text
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  stop        Stop a running model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  ps          List running models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.
```


* serve：启动 ollama 服务。
* create：根据一个 Modelfile 创建一个模型。
* show：显示某个模型的详细信息。
* run：运行一个模型。
* stop：停止一个正在运行的模型。
* pull：从一个模型仓库（registry）拉取一个模型。
* push：将一个模型推送到一个模型仓库。
* list：列出所有模型。
* ps：列出所有正在运行的模型。
* cp：复制一个模型。
* rm：删除一个模型。
* help：获取关于任何命令的帮助信息



## 创建自定义模型

https://github.com/ollama/ollama/blob/main/docs/import.md
https://github.com/ollama/ollama/blob/main/docs/modelfile.md

Create a file named Modelfile, with a FROM instruction with the local filepath to the model you want to import.

```Modelfile
FROM ./vicuna-33b.Q4_0.gguf
```


```bash
# Create the model in Ollama
ollama create example -f Modelfile  

# Run the model
ollama run example
```

## Customize a prompt

https://github.com/ollama/ollama/blob/main/docs/modelfile.md

Models from the Ollama library can be customized with a prompt.

```Modelfile
FROM llama3.2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
```


create and run the model:

```bash
ollama create mario -f ./Modelfile
ollama run mario
# >>> hi
# Hello! It's your friend Mario.
```

# REST API

https://github.com/ollama/ollama/blob/main/docs/api.md

## Generate a response

https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion

```bash
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Why is the sky blue?"
}'

# {
#   "model": "llama3.2",
#   "created_at": "2023-08-04T08:52:19.385406455-07:00",
#   "response": "The",
#   "done": false
# }
```


## Chat with a model

https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion

```bash
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

# {
#   "model": "llama3.2",
#   "created_at": "2023-08-04T08:52:19.385406455-07:00",
#   "message": {
#     "role": "assistant",
#     "content": "The",
#     "images": null
#   },
#   "done": false
# }
```


## Create a Model


https://github.com/ollama/ollama/blob/main/docs/api.md#create-a-model

```bash
curl http://localhost:11434/api/create -d '{
  "model": "mario",
  "from": "llama3.2",
  "system": "You are Mario from Super Mario Bros."
}'
```


## List Local Models

https://github.com/ollama/ollama/blob/main/docs/api.md#list-local-models

```bash
curl http://localhost:11434/api/tags
```

## Show Model Information

https://github.com/ollama/ollama/blob/main/docs/api.md#show-model-information


```bash
curl http://localhost:11434/api/show -d '{
  "model": "llama3.2"
}'
```

## List Running Models

https://github.com/ollama/ollama/blob/main/docs/api.md#list-running-models


```bash
curl http://localhost:11434/api/ps
```

## Generate Embedding

https://github.com/ollama/ollama/blob/main/docs/api.md#generate-embedding

```bash
curl http://localhost:11434/api/embeddings -d '{
  "model": "all-minilm",
  "prompt": "Here is an article about llamas..."
}'
```


# export model gguf


PyTorch 格式（.bin 或 .pt） ：原始模型权重文件，通常需要转换为 GGUF 或 GGML 格式后才能使用。  
Safetensors 格式 ：一种安全且高效的权重存储格式，常用于 Hugging Face 模型。  
不过，这些格式通常不会直接作为 Ollama 的默认模型格式，而是需要经过转换。  


Ollama 会将下载的模型存储在本地目录中。默认情况下，模型文件通常位于以下路径：

Linux/macOS : ~/.ollama/models/

```bash
ollama show deepseek-r1:1.5b --modelfile | head -n 10

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM deepseek-r1:1.5b
# FROM ~/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc

cd ~/.ollama/models/
cat ~/.ollama/models/blobs/sha256-aabd4debf0c8f08881923f2c25fc0fdeed24435271c2b3e92c4af36704040dbc > ollama-export-deepseek-r1-1.5B.gguf
```

## guuf to PyTorch 格式


```bash
python3 convert-pth-to-ggml.py <path_to_model> <output_path>
```

