# CLIP tutorial written by Diana

## Intallation

### Install miniconda:
``` bash
cd /zfs/ai4good/student/\<your_username\>
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash /zfs/ai4good/student/\<your_username\>/Miniconda3-latest-Linux-x86_64.sh
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
# create a symbolic link so you can start conda from home 
# not enough space to install conda in the home directory
cd $HOME
ln -s /zfs/ai4good/student/\<your_username\>/miniconda3 ~/miniconda3
```
### Make a virtual enviroment (pytorch needs an older python than 13)
``` bash
conda create -n torch_env python=3.10 -y # torch wants python up to version 12
conda activate torch_env 
```
`torch_env` is just an enviroment name I chose, 
feel free to replace with anything else.
To deactivate an active environment, use `conda deactivate`
to list active `conda info --env`

### Install Pytorch and torchvision:
``` bash
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
```

### ADDITIONAL IMPORTANT DEPENDANCIES:
use `conda install <package_name>`
``` bash
conda install pandas scikit-learn
pip install open_clip_torch
```

### Add jupyter notbook to your setup
``` bash
conda install ipykernel
```
* make sure your python and jupyter extentions are installed



## test code sample

In [2]:
import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

address = "/home/c/dkorot/AI4GOOD/CLIP/"
# "./../../CLIP/" #adjust as needed!
image = preprocess(Image.open(address+"CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.9927937  0.00421068 0.00299572]]

Label probs: [[0.9927   0.004253 0.003016]]


## Zero shot example

In [3]:
import os
import clip
import torch
from torchvision.datasets import CIFAR100

# Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device)

# Download the dataset
cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False)

# Prepare the inputs
image, class_id = cifar100[3637]
image_input = preprocess(image).unsqueeze(0).to(device)
text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in cifar100.classes]).to(device)

# Calculate features
with torch.no_grad():
    image_features = model.encode_image(image_input)
    text_features = model.encode_text(text_inputs)

# Pick the top 5 most similar labels for the image
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
values, indices = similarity[0].topk(5)

# Print the result
print("\nTop predictions:\n")
for value, index in zip(values, indices):
    print(f"{cifar100.classes[index]:>16s}: {100 * value.item():.2f}%")

100%|██████████| 169M/169M [00:06<00:00, 24.5MB/s] 




Top predictions:

           snake: 65.43%
          turtle: 12.29%
    sweet_pepper: 3.87%
          lizard: 1.88%
       crocodile: 1.74%
