# Match Steam Images

- https://github.com/woctezuma/steam-DINOv2

## Installation

In [None]:
%cd /content
!git clone https://github.com/woctezuma/steam-DINOv2.git
%cd steam-DINOv2
%pip install --quiet -r requirements.txt

## Download the image dataset

In [None]:
%cd /content

!curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/input/images_partA.tar.gz
!curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/input/images_partB.tar.gz

!tar xzf images_partA.tar.gz
!tar xzf images_partB.tar.gz

!curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/input/apps.json
!curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/input/filtered_indices.json

## Pick a DINOv2 model

- https://github.com/facebookresearch/dinov2#pretrained-models

In [None]:
all_model_names = [ 'dinov2_vits14', 'dinov2_vitb14', 'dinov2_vitl14' ]
model_name = all_model_names[0]

## Extract features

- https://github.com/woctezuma/feature-extractor

In [None]:
extract_features_from_scratch = False

if extract_features_from_scratch:
  %cd /content
  !git clone https://github.com/woctezuma/feature-extractor.git
  %cd feature-extractor
  %pip install --quiet -r requirements.txt

  !python extract_fts.py \
  --data_dir /content/images --batch_size 256 \
  --resize_size 224 --keep_ratio --crop_size 224 \
  --model_repo "facebookresearch/dinov2" --model_name {model_name} \
  --torch_features fts_{model_name}.pth \
  --numpy_features fts_{model_name}.npy

else:
  %mkdir -p /content/feature-extractor/features/
  %cd /content/feature-extractor/features/

  !curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/features/fts_{model_name}.npy
  !curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/features/img_list.json

## Match features

- https://github.com/woctezuma/feature-matcher

In [None]:
match_features_from_scratch = False

if match_features_from_scratch:
  %cd /content
  !git clone https://github.com/woctezuma/feature-matcher.git
  %cd feature-matcher
  %pip install --quiet -r requirements.txt

  !python match_fts.py \
  --input_dir /content/feature-extractor/features \
  --feature_filename fts_{model_name}.npy \
  --numpy_matches matches_{model_name}.npy \
  --numpy_similarity_scores scores_{model_name}.npy \
  --num_neighbors 10

else:
  %mkdir -p /content/feature-matcher/matches/
  %cd /content/feature-matcher/matches/

  !curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/matches/matches_{model_name}.npy
  !curl -OL https://github.com/woctezuma/steam-DINOv2/releases/download/matches/scores_{model_name}.npy

## Define functions

In [None]:
%cd /content/steam-DINOv2

In [None]:
from src.data_utils import load_data, APP_LIST_FNAME, FILTERED_INDEX_FNAME

DATA_FOLDER = "/content"

def load_apps():
  fname = f"{DATA_FOLDER}/{APP_LIST_FNAME}"
  return load_data(fname)

def load_indices():
  fname = f"{DATA_FOLDER}/{FILTERED_INDEX_FNAME}"
  return load_data(fname)

In [None]:
import numpy as np

def load_precomputed_embeddings(model_name):
  fname = f'/content/feature-extractor/features/fts_{model_name}.npy'
  return np.load(fname)

def load_precomputed_matches(model_name):
  fname = f"/content/feature-matcher/matches/matches_{model_name}.npy"
  return np.load(fname)

In [None]:
from src.match_utils import build_faiss_index

def load_faiss_index(model_name):
  embeddings = load_precomputed_embeddings(model_name)
  return build_faiss_index(embeddings)

## Process a query image

The query image is downloaded and processed on the fly.

In [None]:
from src.transform_utils import get_transform

preprocess = get_transform(resize_size=224, keep_ratio=True, crop_size=224)

In [None]:
import torch

model = torch.hub.load('facebookresearch/dinov2', model_name).cuda()

In [None]:
base_apps = load_apps()
base_indices = load_indices()
index = load_faiss_index(model_name)

In [None]:
from src.pipeline_utils import find_similar_app_ids

num_neighbors = 10

app_id = 271590
similar_app_ids = find_similar_app_ids(app_id, preprocess, model, index, base_apps, base_indices, num_neighbors, verbose=True)

## Process a query appID

Pre-computed matches are used if they exist.
Otherwise, an image is downloaded from Steam and processed on the fly.

In [None]:
filtered_app_ids = [base_apps[i] for i in base_indices]
print(f"#appIDs = {len(filtered_app_ids)}")

In [None]:
precomputed_matches = load_precomputed_matches(model_name)

In [None]:
from src.pipeline_utils import get_matches

app_id = 271590
similar_app_ids = get_matches(app_id, precomputed_matches, filtered_app_ids, preprocess, model, index, base_apps, base_indices, num_neighbors, verbose=True)

app_id = 2446820
similar_app_ids = get_matches(app_id, precomputed_matches, filtered_app_ids, preprocess, model, index, base_apps, base_indices, num_neighbors, verbose=True)

NB: for the same appID, there can exist some differences between matches computed on the fly and pre-computed matches, because matches are obtained based on features extracted from images resized with different interpolation algorithms:
- for on-the-fly matching, images are resized with [`transforms.InterpolationMode.BICUBIC`][dinov2-bicubic-interpolation],
- for pre-computed matches, images were resized by [`img2dataset`][img2dataset-downscale-interpolation] with [`cv2.INTER_AREA`][opencv-interpolation-flags], as suggested [in the doc][opencv-resize] of OpenCV for downscale interpolation.

[dinov2-bicubic-interpolation]: <https://github.com/facebookresearch/dinov2/blob/c3c2683a13cde94d4d99f523cf4170384b00c34c/dinov2/data/transforms.py#L81>
[opencv-interpolation-flags]: <https://docs.opencv.org/4.8.0/da/d54/group__imgproc__transform.html#ga5bb5a1fea74ea38e1a5445ca803ff121>
[img2dataset-downscale-interpolation]: <https://github.com/rom1504/img2dataset/blob/f0188aedb897f94eb0d39ccefba641174244b927/img2dataset/resizer.py#L88>
[opencv-resize]: <https://docs.opencv.org/4.8.0/da/d54/group__imgproc__transform.html#ga47a974309e9102f5f08231edc7e7529d>

## Export Top 100 to Markdown

In [None]:
from src.steam_utils import get_top_100

# TODO