![Roboflow Notebooks banner](https://camo.githubusercontent.com/aec53c2b5fb6ed43d202a0ab622b58ba68a89d654fbe3abab0c0cc8bd1ff424e/68747470733a2f2f696b2e696d6167656b69742e696f2f726f626f666c6f772f6e6f7465626f6f6b732f74656d706c6174652f62616e6e657274657374322d322e706e673f696b2d73646b2d76657273696f6e3d6a6176617363726970742d312e342e33267570646174656441743d31363732393332373130313934)

# Image Classification with DINOv2

DINOv2, released by Meta Research in April 2023, implements a self-supervised method of training computer vision models.

DINOv2 was trained using 140 million images without labels. The embeddings generated by DINOv2 can be used for classification, image retrieval, segmentation, and depth estimation. With that said, Meta Research did not release heads for segmentation and depth estimation.
In this guide, we are going to build an image classifier using embeddings from DINOv2. To do so, we will:

1. Load a folder of images
2. Compute embeddings for each image
3. Save all the embeddings in a file and vector store
4. Train an SVM classifier to classify images

By the end of this notebook, we'll have a classifier trained on our dataset.

Without further ado, let's begin!

## Import Packages

First, let's import the packages we will need for this project.

In [1]:
import numpy as np
import torch
import torchvision.transforms as T
from PIL import Image
import os
#import cv2
import json
import glob
from tqdm.notebook import tqdm

In [2]:
cwd = os.getcwd()
cwd

'/workspaces/gc_quant_trading_research'

Load folder containing the trading images

In [3]:
cwd = os.getcwd()

ROOT_DIR = os.path.join(cwd)

labels = {}
# remove the folder Now from the for loop

for folder in os.listdir(ROOT_DIR):
  if folder == 'test':
    continue 
  else:
    try:
      print(folder)
      for file in os.listdir(os.path.join(ROOT_DIR, folder)):
          if file.endswith(".png"):
              full_name = os.path.join(ROOT_DIR, folder, file)
              labels[full_name] = folder
    except:
      pass

files = labels.keys()

2D_embedding_viz.ipynb
all_embeddings.json
bear
# %% [markdown]
now
Dinov2_classification_gc.ipynb
classification_app.py
2d_scatter_plot_4candles.html
2d_scatter_plot.html
bull
README.md
models
ml_experiment_config.toml
dockerfile
notebooks
.git
data
requirements.txt


In [None]:
list(files)

['/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.38.50.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.47.46.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.51.20.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.48.14.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.43.38.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-11 at 11.43.44.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.41.04.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.39.06.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.45.49.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.42.40.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-10 at 14.49.11.png',
 '/workspaces/gc_quant_trading_research/bear/Screenshot 2024-10-1

In [5]:
# prompt: get data from dictionary files

values = [labels[key] for key in files]

## Load the Model and Compute Embeddings

To train our classifier, we need:

1. The embeddings associated with each image in our dataset, and;
2. The labels associated with each image.

To calculate embeddings, we'll use DINOv2. Below, we load the smallest DINOv2 weights and define functions that will load and compute embeddings for every image in a specified list.

We store all of our vectors in a dictionary that is saved to disk so we can reference them again if needed. Note that in production environments one may opt for using another data structure such as a vector embedding database (i.e. faiss) for storing embeddings.

In [6]:
MODEL = "dinov2_vitb14"
EMBEDDING_SIZE = 768 # 768 for vitb14, 384 for vits14

dinov2_vits14 = torch.hub.load("facebookresearch/dinov2", MODEL)

device = torch.device('cuda' if torch.cuda.is_available() else "cpu")

dinov2_vits14.to(device)

transform_image = T.Compose([T.ToTensor(),
                             T.Resize((70, 210)),
                             #T.CenterCrop(224),
                             T.Normalize([0.5], [0.5])])

Using cache found in /home/codespace/.cache/torch/hub/facebookresearch_dinov2_main


In [7]:
def load_image(img: str) -> torch.Tensor:
    """
    Load an image and return a tensor that can be used as an input to DINOv2.
    """
    img = Image.open(img)

    transformed_img = transform_image(img)[:3].unsqueeze(0)

    return transformed_img

def compute_embeddings(files: list) -> dict:
    """
    Create an index that contains all of the images in the specified list of files.
    """
    all_embeddings = {}

    with torch.no_grad():
      for i, file in enumerate(files):
        embeddings = dinov2_vits14(load_image(file).to(device))

        all_embeddings[file] = np.array(embeddings[0].cpu().numpy()).reshape(1, -1).tolist()

    with open("all_embeddings.json", "w") as f:
        f.write(json.dumps(all_embeddings))

    return all_embeddings

## Compute Embeddings

The code below computes the embeddings for all the images in our dataset. This step will take a few minutes for the MIT Indoor Scene Recognition dataset. There are over 10,000 images in the training set that we need to pass through DINOv2.

In [8]:
embeddings = compute_embeddings(files)

In [9]:
#get 1st dictionary key value
key = list(embeddings.keys())[0]
embeddings[key]

[[0.25739359855651855,
  0.5484534502029419,
  -0.7849557399749756,
  -0.18262524902820587,
  -2.0091757774353027,
  0.296808660030365,
  0.6047267913818359,
  -2.1983249187469482,
  2.2300755977630615,
  -2.392838954925537,
  -1.7101702690124512,
  0.5479646325111389,
  0.7578775882720947,
  0.3057825267314911,
  -1.2586312294006348,
  2.2911462783813477,
  -0.7474812269210815,
  -1.7895296812057495,
  1.9603365659713745,
  3.4897568225860596,
  -1.0829657316207886,
  2.063508987426758,
  2.0953269004821777,
  -2.214665412902832,
  -1.171596884727478,
  -1.6639513969421387,
  -0.5190342664718628,
  0.02386048622429371,
  -2.6689791679382324,
  1.3841900825500488,
  -1.2854022979736328,
  1.039711594581604,
  -1.8084765672683716,
  -0.49311473965644836,
  -1.2385468482971191,
  -2.54644513130188,
  1.5180877447128296,
  -0.45519161224365234,
  0.41101229190826416,
  -1.9436042308807373,
  0.9734461903572083,
  -0.19821783900260925,
  1.0360254049301147,
  -0.9413670897483826,
  -0.9999

In [10]:
# Check the shape of embedding_list before reshaping
embedding_list = list(embeddings.values())
embedding_array = np.array(embedding_list)
print(f"Original shape: {embedding_array.shape}")
print(f"Total number of elements: {embedding_array.size}")

# Attempt to reshape
reshaped_array = embedding_array.reshape(-1, EMBEDDING_SIZE)
print(f"Reshaped array shape: {reshaped_array.shape}")

Original shape: (214, 1, 768)
Total number of elements: 164352
Reshaped array shape: (214, 768)


## Train a Classification Model

The embeddings we have computed can be used as an input in a classification model. For this guide, we will be using SVM, a linear classification model.

Below, we make lists of both all of the embeddings we have computed and their associated labels. We then fit our model using those lists.

In [11]:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA

y = [labels[file] for file in files]
embedding_list = list(embeddings.values())
embedding_arr = np.array(embedding_list).reshape(-1, EMBEDDING_SIZE)
N_COMPONENTS = 100
N_ESTIMATORS = 100
MAX_DEPTH = 100
R_STATE = 5

#fit a svm model
def fit_svm_pca(embedding_list, y):
    clf = svm.SVC(gamma='scale')
    # Convert embedding_list to a 2D array
    embedding_array = np.array(embedding_list).reshape(len(embedding_list), -1)
    
    # Apply PCA to reduce dimensions to x principal components
    pca = PCA(n_components=N_COMPONENTS, random_state=R_STATE)
    embedding_list = pca.fit_transform(embedding_array)
    
    clf.fit(embedding_list, y)
    return clf, pca

clf_svm_pca, pca = fit_svm_pca(embedding_list, y)


#fit a svm model with pca
def fit_svm(embedding_list, y):
    clf = svm.SVC(gamma='scale')
    clf.fit(np.array(embedding_list).reshape(-1, EMBEDDING_SIZE), y)
    return clf

clf_svm = fit_svm(embedding_list, y)


#fit a random forest model
def fit_rf(embedding_list, y):
    rf = RandomForestClassifier(n_estimators=N_ESTIMATORS, max_depth=MAX_DEPTH, random_state=R_STATE) #70
    rf.fit(np.array(embedding_list).reshape(-1, EMBEDDING_SIZE), y)
    return rf

clf_rf = fit_rf(embedding_list, y)
#save this rf model


#fit a random forest model with pca
def fit_rf_pca(embedding_list, y):
    rf = RandomForestClassifier(n_estimators=N_ESTIMATORS, max_depth=MAX_DEPTH, random_state=R_STATE) #70

    # Convert embedding_list to a 2D array
    embedding_array = np.array(embedding_list).reshape(len(embedding_list), -1)
    
    # Apply PCA to reduce dimensions to x principal components
    pca = PCA(n_components=N_COMPONENTS, random_state=R_STATE)
    embedding_list = pca.fit_transform(embedding_array)

    rf.fit(embedding_list, y)
    return rf, pca

clf_rf_pca, rf_pca = fit_rf_pca(embedding_list, y)




In [12]:
import joblib
save_model = "no"
if save_model == "yes":

    joblib.dump(clf_rf_pca, 'models/rf_pca/best_rf_model_oct1st.pkl')
    joblib.dump(rf_pca, 'models/rf_pca/best_pca_model.pkl')

## Classify an Image

We now have a classifier we can use to classify images!

Change the `input_file` value below to the path of a file in the `valid` or `test` directories in the image dataset with which we have been working.

Then, run the cell to classify the image.

In [13]:
#any file in the folder title Now
TEST_FOLDER = "bear"
input_files = glob.glob(f'test/{TEST_FOLDER}/*.png')

for file in glob.glob(f'test/bull/*.png'):
    input_files.append(file)


In [14]:
input_files

['test/bear/Screenshot 2024-10-11 at 12.01.28.png',
 'test/bear/Screenshot 2024-10-11 at 12.01.39.png',
 'test/bear/Screenshot 2024-10-11 at 11.59.10.png',
 'test/bear/Screenshot 2024-10-11 at 11.59.49.png',
 'test/bear/Screenshot 2024-10-11 at 12.02.52.png',
 'test/bear/Screenshot 2024-10-11 at 12.00.53.png',
 'test/bear/Screenshot 2024-10-11 at 12.01.07.png',
 'test/bear/Screenshot 2024-10-11 at 11.59.19.png',
 'test/bear/Screenshot 2024-10-11 at 12.01.56.png',
 'test/bear/Screenshot 2024-10-11 at 12.03.09.png',
 'test/bear/Screenshot 2024-10-11 at 11.59.31.png',
 'test/bear/Screenshot 2024-10-11 at 12.00.44.png',
 'test/bear/Screenshot 2024-10-11 at 12.02.05.png',
 'test/bear/Screenshot 2024-10-11 at 12.01.47.png',
 'test/bear/Screenshot 2024-10-11 at 12.02.28.png',
 'test/bear/Screenshot 2024-10-11 at 12.02.43.png',
 'test/bear/Screenshot 2024-10-11 at 12.03.18.png',
 'test/bear/Screenshot 2024-10-11 at 12.01.21.png',
 'test/bear/Screenshot 2024-10-11 at 12.02.23.png',
 'test/bear/

In [15]:
# Initialize an empty DataFrame
import pandas as pd

predictions_df = pd.DataFrame(columns=['image_file', 'svm_prediction', 'svm_pca_prediction', 'rf_prediction', 
                                       'rf_pca_prediction'])



for input_file in input_files:
    new_image = load_image(input_file)

    with torch.no_grad():
        new_embedding = dinov2_vits14(new_image.to(device))

        # Convert embedding to numpy array and reshape
        new_embedding_array = np.array(new_embedding[0].cpu()).reshape(1, -1)
        new_embedding_pca = pca.transform(new_embedding_array)
        
        # Generate with newly trained model the predictions
        svm_pca_prediction = clf_svm_pca.predict(new_embedding_pca)
        svm_prediction = clf_svm.predict(new_embedding_array)
        rf_prediction = clf_rf.predict(new_embedding_array)
        rf_pca_prediction = clf_rf_pca.predict(new_embedding_pca)



 

        # Add the predictions to the DataFrame using loc
        predictions_df.loc[len(predictions_df)] = [input_file, svm_prediction[0],svm_pca_prediction[0],
                                                    rf_prediction[0], rf_pca_prediction[0]]


In [16]:
from sklearn.metrics import accuracy_score, confusion_matrix

# Extract true labels from the file paths
predictions_df['true_label'] = predictions_df['image_file'].apply(lambda x: 'bull' if 'bull' in x else 'bear')


# Compute accuracy for SVM and RF predictions
svm_accuracy = accuracy_score(predictions_df['true_label'], predictions_df['svm_prediction'])
svm_pca_accuracy = accuracy_score(predictions_df['true_label'], predictions_df['svm_pca_prediction'])
rf_accuracy = accuracy_score(predictions_df['true_label'], predictions_df['rf_prediction'])
rf_pca_accuracy = accuracy_score(predictions_df['true_label'], predictions_df['rf_pca_prediction'])


print('ALL')
print('--------------------------------------------------')
print(f"SVM Accuracy: {round(svm_accuracy, 2)}")
print(f"RF Accuracy: {round(rf_accuracy,2)}")
print(f"SVM PCA Accuracy: {round(svm_pca_accuracy,2)}")
print(f"RF PCA Accuracy: {round(rf_pca_accuracy,2)}")
print('--------------------------------------------------')


predictions_df

ALL
--------------------------------------------------
SVM Accuracy: 0.47
RF Accuracy: 0.58
SVM PCA Accuracy: 0.5
RF PCA Accuracy: 0.56
--------------------------------------------------


Unnamed: 0,image_file,svm_prediction,svm_pca_prediction,rf_prediction,rf_pca_prediction,true_label
0,test/bear/Screenshot 2024-10-11 at 12.01.28.png,bear,bear,bear,bull,bear
1,test/bear/Screenshot 2024-10-11 at 12.01.39.png,bull,bull,bear,bull,bear
2,test/bear/Screenshot 2024-10-11 at 11.59.10.png,bear,bear,bear,bull,bear
3,test/bear/Screenshot 2024-10-11 at 11.59.49.png,bear,bear,bear,bear,bear
4,test/bear/Screenshot 2024-10-11 at 12.02.52.png,bear,bear,bear,bear,bear
5,test/bear/Screenshot 2024-10-11 at 12.00.53.png,bull,bull,bull,bear,bear
6,test/bear/Screenshot 2024-10-11 at 12.01.07.png,bull,bull,bear,bear,bear
7,test/bear/Screenshot 2024-10-11 at 11.59.19.png,bear,bear,bear,bear,bear
8,test/bear/Screenshot 2024-10-11 at 12.01.56.png,bear,bear,bear,bear,bear
9,test/bear/Screenshot 2024-10-11 at 12.03.09.png,bull,bull,bull,bear,bear
