![Roboflow Notebooks banner](https://camo.githubusercontent.com/aec53c2b5fb6ed43d202a0ab622b58ba68a89d654fbe3abab0c0cc8bd1ff424e/68747470733a2f2f696b2e696d6167656b69742e696f2f726f626f666c6f772f6e6f7465626f6f6b732f74656d706c6174652f62616e6e657274657374322d322e706e673f696b2d73646b2d76657273696f6e3d6a6176617363726970742d312e342e33267570646174656441743d31363732393332373130313934)

# Image Classification with DINOv2

DINOv2, released by Meta Research in April 2023, implements a self-supervised method of training computer vision models.

DINOv2 was trained using 140 million images without labels. The embeddings generated by DINOv2 can be used for classification, image retrieval, segmentation, and depth estimation. With that said, Meta Research did not release heads for segmentation and depth estimation.
In this guide, we are going to build an image classifier using embeddings from DINOv2. To do so, we will:

1. Load a folder of images
2. Compute embeddings for each image
3. Save all the embeddings in a file and vector store
4. Train an SVM classifier to classify images

By the end of this notebook, we'll have a classifier trained on our dataset.

Without further ado, let's begin!

## Import Packages

First, let's import the packages we will need for this project.

In [1]:
import numpy as np
import torch
import torchvision.transforms as T
from PIL import Image
import os
#import cv2
import json
import glob
from tqdm.notebook import tqdm

In [2]:
import roboflow
#import supervision as sv


In [3]:
cwd = os.getcwd()
cwd

'/workspaces/gc_quant_trading_research'

Load folder containing the trading images

In [4]:
cwd = os.getcwd()

ROOT_DIR = os.path.join(cwd)

labels = {}

for folder in os.listdir(ROOT_DIR):
  try:
    print(folder)
    for file in os.listdir(os.path.join(ROOT_DIR, folder)):
        if file.endswith(".png"):
            full_name = os.path.join(ROOT_DIR, folder, file)
            labels[full_name] = folder
  except:
    pass

files = labels.keys()

2D_embedding_viz.ipynb
all_embeddings.json
bear
range
# %% [markdown]
now
Dinov2_classification_gc.ipynb
2d_scatter_plot_4candles.html
2d_scatter_plot.html
bull
README.md
dockerfile
notebooks
.git
data
Now
requirements.txt


In [5]:
list(files)

['/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-17 at 11.42.18.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-18 at 13.20.28.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-17 at 11.27.56.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-20 at 11.00.31.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-17 at 12.06.59.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-18 at 16.48.19.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-20 at 10.59.34.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-20 at 11.05.53.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-18 at 16.28.54.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-18 at 16.29.50.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-18 at 16.40.39.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-09-1

In [6]:
# prompt: get data from dictionary files

values = [labels[key] for key in files]

## Load the Model and Compute Embeddings

To train our classifier, we need:

1. The embeddings associated with each image in our dataset, and;
2. The labels associated with each image.

To calculate embeddings, we'll use DINOv2. Below, we load the smallest DINOv2 weights and define functions that will load and compute embeddings for every image in a specified list.

We store all of our vectors in a dictionary that is saved to disk so we can reference them again if needed. Note that in production environments one may opt for using another data structure such as a vector embedding database (i.e. faiss) for storing embeddings.

In [7]:
dinov2_vits14 = torch.hub.load("facebookresearch/dinov2", "dinov2_vitb14")

device = torch.device('cuda' if torch.cuda.is_available() else "cpu")

dinov2_vits14.to(device)

transform_image = T.Compose([T.ToTensor(),
                             T.Resize((70, 210)),
                             #T.CenterCrop(224),
                             T.Normalize([0.5], [0.5])])

Using cache found in /home/codespace/.cache/torch/hub/facebookresearch_dinov2_main
xFormers is not available (SwiGLU)
xFormers is not available (Attention)
xFormers is not available (Block)


In [8]:
def load_image(img: str) -> torch.Tensor:
    """
    Load an image and return a tensor that can be used as an input to DINOv2.
    """
    img = Image.open(img)

    transformed_img = transform_image(img)[:3].unsqueeze(0)

    return transformed_img

def compute_embeddings(files: list) -> dict:
    """
    Create an index that contains all of the images in the specified list of files.
    """
    all_embeddings = {}

    with torch.no_grad():
      for i, file in enumerate(files):
        embeddings = dinov2_vits14(load_image(file).to(device))

        all_embeddings[file] = np.array(embeddings[0].cpu().numpy()).reshape(1, -1).tolist()

    with open("all_embeddings.json", "w") as f:
        f.write(json.dumps(all_embeddings))

    return all_embeddings

## Compute Embeddings

The code below computes the embeddings for all the images in our dataset. This step will take a few minutes for the MIT Indoor Scene Recognition dataset. There are over 10,000 images in the training set that we need to pass through DINOv2.

In [9]:
embeddings = compute_embeddings(files)

In [10]:
#get 1st dictionalry key value
key = list(embeddings.keys())[0]
embeddings[key]

[[-0.5546050667762756,
  2.2193191051483154,
  -1.4536488056182861,
  0.6365244388580322,
  -1.1241151094436646,
  1.063398838043213,
  -0.9994652271270752,
  0.5529789924621582,
  -0.41774314641952515,
  -1.356476068496704,
  -1.9866816997528076,
  1.357814908027649,
  -0.5309105515480042,
  -0.5723036527633667,
  0.13733233511447906,
  2.688227415084839,
  -1.3967474699020386,
  -0.44241195917129517,
  -0.7701922655105591,
  3.5042202472686768,
  0.19287927448749542,
  2.302384853363037,
  0.8562555313110352,
  -1.2798537015914917,
  -0.1325562745332718,
  -1.6369308233261108,
  0.6050942540168762,
  0.671090304851532,
  -1.2835232019424438,
  2.287829875946045,
  -0.7349759340286255,
  2.852987051010132,
  -0.08251584321260452,
  0.5362967252731323,
  -2.177088499069214,
  -1.5714917182922363,
  0.7644075155258179,
  -3.375225305557251,
  0.41770511865615845,
  -2.284717321395874,
  -2.1978302001953125,
  0.6095972657203674,
  0.5886192321777344,
  -3.0735816955566406,
  -0.14814707

In [11]:
embedding_list = list(embeddings.values())
embedding_arr = np.array(embedding_list).reshape(-1, 768)

In [12]:
np.array(embedding_list).reshape(-1, 768).shape

(567, 768)

In [13]:
# Check the shape of embedding_list before reshaping
embedding_array = np.array(embedding_list)
print(f"Original shape: {embedding_array.shape}")
print(f"Total number of elements: {embedding_array.size}")

# Attempt to reshape
reshaped_array = embedding_array.reshape(-1, 768)
print(f"Reshaped array shape: {reshaped_array.shape}")

Original shape: (567, 1, 768)
Total number of elements: 435456
Reshaped array shape: (567, 768)


## Train a Classification Model

The embeddings we have computed can be used as an input in a classification model. For this guide, we will be using SVM, a linear classification model.

Below, we make lists of both all of the embeddings we have computed and their associated labels. We then fit our model using those lists.

In [14]:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier

#fit a svm model
def fit_svm(embeddings, labels):
    clf = svm.SVC(gamma='scale')
    y = [labels[file] for file in files]
    embedding_list = list(embeddings.values())
    clf.fit(np.array(embedding_list).reshape(-1, 768), y)
    return clf

clf_svm = fit_svm(embeddings, labels)


#fit a random forest model
def fit_rf(embeddings, labels):
    rf = RandomForestClassifier(n_estimators=1000)
    y = [labels[file] for file in files]
    embedding_list = list(embeddings.values())
    rf.fit(np.array(embedding_list).reshape(-1, 768), y)
    return rf

clf_rf = fit_rf(embeddings, labels)

## Classify an Image

We now have a classifier we can use to classify images!

Change the `input_file` value below to the path of a file in the `valid` or `test` directories in the image dataset with which we have been working.

Then, run the cell to classify the image.

In [19]:
import cv2
#any file in the folder title Now
input_files = glob.glob("Now/*.png")


In [20]:
import pandas as pd
from sklearn import svm

# Initialize an empty DataFrame
predictions_df = pd.DataFrame(columns=['image_file', 'svm_prediction', 'rf_prediction'])



for input_file in input_files:
    new_image = load_image(input_file)
    print(input_file)

    with torch.no_grad():
        embedding = dinov2_vits14(new_image.to(device))
        
        # Generate predictions
        svm_prediction = clf_svm.predict(np.array(embedding[0].cpu()).reshape(1, -1))
        rf_prediction = clf_rf.predict(np.array(embedding[0].cpu()).reshape(1, -1))

        # Add the predictions to the DataFrame using loc
        predictions_df.loc[len(predictions_df)] = [input_file, svm_prediction[0], rf_prediction[0]]

        print()
        print("SVM Predicted class: " + svm_prediction[0])
        print("RF Predicted class: " + rf_prediction[0])

# Print the DataFrame
predictions_df

Now/Screenshot 2024-09-18 at 16.59.45.png

SVM Predicted class: bull
RF Predicted class: bear
Now/Screenshot 2024-09-18 at 16.33.27.png

SVM Predicted class: bull
RF Predicted class: bull
Now/Screenshot 2024-09-18 at 16.42.03.png

SVM Predicted class: bull
RF Predicted class: bull
Now/Screenshot 2024-09-18 at 16.33.39.png

SVM Predicted class: bull
RF Predicted class: bull
Now/Screenshot 2024-09-18 at 16.59.55.png

SVM Predicted class: bull
RF Predicted class: bull
Now/Screenshot 2024-09-18 at 16.33.06.png

SVM Predicted class: bull
RF Predicted class: bull
Now/Screenshot 2024-09-18 at 16.42.27.png

SVM Predicted class: bull
RF Predicted class: bull
Now/Screenshot 2024-09-18 at 16.59.36.png

SVM Predicted class: bull
RF Predicted class: bull


Unnamed: 0,image_file,svm_prediction,rf_prediction
0,Now/Screenshot 2024-09-18 at 16.59.45.png,bull,bear
1,Now/Screenshot 2024-09-18 at 16.33.27.png,bull,bull
2,Now/Screenshot 2024-09-18 at 16.42.03.png,bull,bull
3,Now/Screenshot 2024-09-18 at 16.33.39.png,bull,bull
4,Now/Screenshot 2024-09-18 at 16.59.55.png,bull,bull
5,Now/Screenshot 2024-09-18 at 16.33.06.png,bull,bull
6,Now/Screenshot 2024-09-18 at 16.42.27.png,bull,bull
7,Now/Screenshot 2024-09-18 at 16.59.36.png,bull,bull
