# Match Steam Banners with Facebook's DINO

Code inspired from:
-   https://github.com/woctezuma/match-steam-banners
-   https://github.com/woctezuma/steam-CLIP
-   https://github.com/woctezuma/steam-DINO

## Setting

### Check CUDA version

We assume CUDA version is 10.1 in the installation section.

In [None]:
import subprocess

CUDA_version = [s for s in subprocess.check_output(["nvcc", "--version"]).decode("UTF-8").split(", ") if s.startswith("release")][0].split(" ")[-1]
print("CUDA version:", CUDA_version)

### Clone my repository

In [None]:
%cd /content/

In [None]:
!rm -rf match-steam-banners/

!git clone https://github.com/woctezuma/match-steam-banners.git

### Install Python requirements

In [None]:
%cd /content/match-steam-banners/

!git pull

# Switch to the branch tailored for DINO
!git checkout facebook-dino

In [None]:
# !pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
!pip install --upgrade pip
!pip install ftfy

In [None]:
import numpy as np
import torch

print("Torch version:", torch.__version__)

In [None]:
# !pip install git+https://github.com/openai/CLIP.git
!git clone https://github.com/openai/CLIP.git
%mv CLIP/clip .

### Copy utility functions from Facebook's DINO repository

In [None]:
!git clone https://github.com/facebookresearch/dino.git
%mv dino/vision_transformer.py .
%mv dino/utils.py .

## Image data

### Mount Google Drive

In [None]:
!pip install Google-Colab-Transfer

In [None]:
import colab_transfer

colab_transfer.mount_google_drive()

### Import image data from Google Drive

#### First batch of downloaded data, at 224x224 resolution

In [None]:
# colab_transfer.copy_file('resized_vertical_steam_banners_224.tar',
#                          source='/content/drive/MyDrive/data/',
#                          destination='/content/match-steam-banners/data/')

# Alternatively, run:

!gdown --id 1--cxY3jvTVWq-lZt8NvfN2fHND7YhKN4
%mkdir -p data/
%mv resized_vertical_steam_banners_224.tar data/

In [None]:
%cd /content/match-steam-banners/
!tar -xf data/resized_vertical_steam_banners_224.tar

In [None]:
%cd /content/match-steam-banners/
%mv data/resized_vertical_steam_banners_224 data/resized_vertical_steam_banners

#### Second batch of downloaded data, at 256x256 resolution

In [None]:
!gdown --id 1-8d3g7ZKS-E3A60jUqnPyxYPJGTeJx7F
%mkdir -p data/
%mv resized_vertical_steam_banners_256_v2_delta_only.tar data/

In [None]:
%cd /content/match-steam-banners/
!tar -xf data/resized_vertical_steam_banners_256_v2_delta_only.tar

In [None]:
!apt-get update > /dev/null
!apt-get install imagemagick > /dev/null

In [None]:
%cd /content/match-steam-banners/
%mv content/data/resized_vertical_steam_banners data/resized_vertical_steam_banners_256

In [None]:
!mogrify \
 -resize '224x224!' \
 -path /content/match-steam-banners/data/resized_vertical_steam_banners \
 /content/match-steam-banners/data/resized_vertical_steam_banners_256/*.jpg

### Alternatively, import image data from Github Releases

In [None]:
%cd /content/
%mkdir -p match-steam-banners/data

In [None]:
%cd /content/match-steam-banners/data/
!wget https://github.com/woctezuma/steam-DINO/releases/download/input/resized_vertical_steam_banners_v2.tar.gz
!tar -xzf resized_vertical_steam_banners_v2.tar.gz

## 1. Features

First, compute and store the 384/768 features for Small/Base corresponding to each banner.

### Compute features

For Simple/Small/16, it takes about 10 seconds to process 1k images. Total time: ~ 6 minutes.

For Complex/Base/Base/8, it takes about 70 seconds to process 1k images. Total time: ~ 42 minutes.

In [None]:
%pip install torchvision --upgrade > /dev/null

In [None]:
%cd /content/match-steam-banners/
!python build_feature_index.py

In [None]:
!du -sh data/label_database.avg.npy

### Export feature data to Google Drive

In [None]:
import colab_transfer as ct

local_folder_name = 'match-steam-banners/data/'
local_folder_path = ct.get_path_to_home_of_local_machine() + local_folder_name

gdrive_folder_name ='steam-DINO/'
gdrive_folder_path = ct.get_path_to_home_of_google_drive() + gdrive_folder_name

In [None]:
# colab_transfer.copy_file('frozen_app_ids.txt',
#                          source=local_folder_path,
#                          destination=gdrive_folder_path)

# colab_transfer.copy_file('label_database.avg.npy', 
#                          source=local_folder_path,
#                          destination=gdrive_folder_path)

### Import feature data from Google Drive

In [None]:
colab_transfer.copy_file('frozen_app_ids.txt',
                         source=gdrive_folder_path,
                         destination=local_folder_path)

colab_transfer.copy_file('label_database.avg.npy', 
                         source=gdrive_folder_path,
                         destination=local_folder_path)

In [None]:
# Alternatively:

from pathlib import Path

%mkdir -p data

if not Path('data/frozen_app_ids.txt').exists():
  print('Downloading')
  !gdown --id 1iNgl_3AJotauknzb-La9Dsw8h3I7QQYh
  %mv frozen_app_ids.txt data/

if not Path('data/label_database.avg.npy').exists():
  print('Downloading')
  !gdown --id 1-DxgMXIo0qTh1CJ-fiHEiCkOsiH8nyrC
  %mv label_database.avg.npy data/

## 2. Similar games

Find the 10 most similar store banners to curated query appIDs.

### Ping SteamSpy and GameDataCrunch API

SteamSpy API may block requests from Google Colab. In this case, responses are empty:

> JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If you encounter this issue, you can bypass it by running the following from **your own local machine** to ping SteamSpy API:

In [None]:
%pip install steamspypi gamedatacrunch

In [None]:
%cd /content/match-steam-banners/
!python steam_spy_utils.py

Then **manually** upload the 2 cached files from within Colab's interface:
```
data/
├ 20210116_gamedatacrunch.json   # GameDataCrunch database of games
└ 20210116_top_100_app_ids.txt   # top100 most played games in the past 2 weeks
```

The GameDataCrunch database is used to retrieve game names corresponding to appIDs.

Top 100 appIDs are used as a list of popular games to test our algorithm onto.

### Run the workflow

In [None]:
# If juggling with different models to extract features,
# then use a symbolic link as follows:

model_choice = 'ComplexB8'

%cd /content/match-steam-banners/data/
!rm label_database.avg.npy
!ln -s /content/out/label_database.{model_choice}.npy label_database.avg.npy

# Caveat: the .npy file has to correspond to the values specified in dino_utils.py,
# because features of query will be computed from scratch with values from dino_utils,
# and then compared to pre-computed features stored in one of the files label_database.*.npy
#
# Therefore, manually adjust values in dino_utils.py to match values used for creating the .npy file!

In [None]:
!echo {model_choice}

In [None]:
%cd /content/match-steam-banners/
!python retrieve_similar_features.py > log_similar_{model_choice}.txt

## 3. Unique games

Find the one most similar store banner to all appIDs available on the store, then display the most unique games.

In [None]:
# If juggling with different models to extract features,
# then use a symbolic link as follows:

model_choice = 'ComplexB8'

%cd /content/match-steam-banners/data/
!rm label_database.avg.npy
!ln -s /content/out/label_database.{model_choice}.npy label_database.avg.npy

# NB: the .npy file DOES NOT HAVE to correspond to the values specified in dino_utils.py,
# because the following script ONLY RELIES ON PRE-COMPUTED features!
#
# Therefore, there is **no** need to manually adjust values in dino_utils.py!

In [None]:
!echo {model_choice}

In [None]:
# Ensure a potential JSON file storing the unique games is deleted
# Otherwise, the file will be loaded from disk, instead of being created from scratch, by find_unique_games.py
!rm /content/match-steam-banners/data/unique_games.avg.json

In [None]:
%cd /content/match-steam-banners/
!python find_unique_games.py > log_unique_{model_choice}.txt

## 4. Export data and matches for a web app

Exact kNN search is performed with the `faiss` package, because it is noticably faster than with other packages.

References:
-   https://github.com/facebookresearch/faiss
-   https://github.com/facebookresearch/faiss/wiki/Getting-started
-   https://github.com/kyamagu/faiss-wheels

In [None]:
%pip install faiss-gpu

In [None]:
!wget -O IStoreService_page_1.json https://raw.githubusercontent.com/woctezuma/steam-store-snapshots/main/data/IStoreService.json
!wget https://raw.githubusercontent.com/woctezuma/steam-store-snapshots/main/data/IStoreService_page_2.json

In [None]:
import json

with open('IStoreService_page_1.json', 'r', encoding='utf8') as f:
    data_1 = json.load(f)
    l_1 = data_1["response"]["apps"]
    
with open('IStoreService_page_2.json', 'r', encoding='utf8') as f:
    data_2 = json.load(f)
    l_2 = data_2["response"]["apps"]

data = dict()
data['response'] = dict()
data['response']['apps'] = l_1 + l_2

with open('IStoreService.json', 'w', encoding='utf8') as f:
    json.dump(data, f)

In [None]:
# If juggling with different models to extract features,
# then use a symbolic link as follows:

model_choice = 'ComplexB8'

%cd /content/match-steam-banners/data/
!rm label_database.avg.npy
!ln -s /content/out/label_database.{model_choice}.npy label_database.avg.npy

# NB: the .npy file DOES NOT HAVE to correspond to the values specified in dino_utils.py,
# because the following script ONLY RELIES ON PRE-COMPUTED features!
#
# Therefore, there is **no** need to manually adjust values in dino_utils.py!

In [None]:
!echo {model_choice}

In [None]:
%cd /content/match-steam-banners/
!python export_data_for_web_app.py

Files (both .npy and .json) can be found in `data_export/`.

In [None]:
!du -sh data_export/matches_faiss.npy