# Kani-Vision Quickstart
This colab notebook runs through the quickstart example found [here](https://github.com/zhudotexe/kani-vision/blob/main/examples/gpt-vision.py).

Feel free to make a copy of this notebook and modify the code cells to run other examples!

## Install kani-vision
First we'll install the library. kani-vision requires Python 3.10+.

In [None]:
!python --version
# for the latest development version:
!pip install -q 'kani-vision[openai] @ git+https://github.com/zhudotexe/kani-vision.git@main'
# for the stable version:
# !pip install -q 'kani-vision[openai]'

## Imports
Then, import all the necessary components. Note that instead of using kani's default `chat_in_terminal`, we're using a vision-specific `chat_in_terminal_vision`.

In [None]:
from kani import Kani
from kani.ext.vision import ImagePart, chat_in_terminal_vision
from kani.ext.vision.engines.openai import OpenAIVisionEngine

## OpenAI Key
To use the OpenAIVisionEngine, you need your OpenAI API key. You can find it here: https://platform.openai.com/account/api-keys

In [None]:
# Insert your OpenAI API key (https://platform.openai.com/account/api-keys)
api_key = "sk-..."  # @param {type:"string"}

## Kani
Set up the kani engine and harness.

kani uses an Engine to interact with the language model. You can specify other model parameters in the engine, like `temperature=0.7`, or change the model here.

You can also try uncommenting the LLaVA code and using the LLaVA v1.5 engine! You'll likely need to change the Colab runtime to an A100 GPU runtime if so.

The kani manages the chat state, prompting, and function calling. Here, we only give it the engine to call
ChatGPT, but you can specify other parameters like `system_prompt="You are..."` in the kani.

In [None]:
# uncomment the next few lines to use LLaVA
# !pip install -q 'kani-vision[llava]' bitsandbytes accelerate
# !pip install --no-deps "llava @ git+https://github.com/haotian-liu/LLaVA.git@v1.1.1"
# import torch
# from kani.ext.vision.engines.llava import LlavaEngine
# from transformers import BitsAndBytesConfig
# engine = LlavaEngine(
#     "liuhaotian/llava-v1.5-7b",
#     model_load_kwargs={
#         "device_map": "auto",
#         "load_in_4bit": True,
#         "low_cpu_mem_usage": True,
#         "quantization_config": BitsAndBytesConfig(
#             load_in_4bit=True,
#             bnb_4bit_compute_dtype=torch.float16,
#             bnb_4bit_use_double_quant=True,
#             bnb_4bit_quant_type='nf4'
#         )
#     }
# )

# comment the next line if using LLaVA
engine = OpenAIVisionEngine(api_key, model="gpt-4-vision-preview")
ai = Kani(engine)

## Programmatic Example
Now, you can query the kani with images by sending a prompt that's a list of two items: the string that is your query, and an `ImagePart` containing the image to send. You can provide these in any order, and even provide multiple strings or image parts in a single query!

To construct an `ImagePart`, use the `ImagePart.from_path`, `ImagePart.from_bytes`, or `ImagePart.from_image` classmethods.

Here, we'll download the kani-vision logo.

In [None]:
from IPython.display import Image

!wget -nc https://kani-vision.readthedocs.io/en/latest/_static/kani-vision-logo.png
Image("kani-vision-logo.png", width=300, height=300)

In [None]:
# This line might show an error about await outside async function - you can ignore this
msg = await ai.chat_round_str(["Please describe this image:", ImagePart.from_path("kani-vision-logo.png")])
print(msg)

## Chat
Just like the base kani library, kani-vision comes with a utility to interact with a multimodal kani through your terminal!

To send an image to the kani, prefix a file path or URL with an exclamation point, like `Describe this image: !path/to/file.png` or `Describe this image: !https://example.com/image.png`.

You can end the chat by sending the message `!stop`.

In [None]:
chat_in_terminal_vision(ai, stopword="!stop")