# Match Steam Banners with OpenAI's CLIP

Code inspired from:
-   https://github.com/woctezuma/match-steam-banners
-   https://github.com/woctezuma/steam-CLIP

## Setting

### Check CUDA version

We assume CUDA version is 10.1 in the installation section.

In [1]:
import subprocess

CUDA_version = [s for s in subprocess.check_output(["nvcc", "--version"]).decode("UTF-8").split(", ") if s.startswith("release")][0].split(" ")[-1]
print("CUDA version:", CUDA_version)

CUDA version: 10.1


### Clone my repository

In [2]:
%cd /content/

/content


In [3]:
!rm -rf match-steam-banners/

!git clone https://github.com/woctezuma/match-steam-banners.git

Cloning into 'match-steam-banners'...
remote: Enumerating objects: 293, done.[K
remote: Counting objects: 100% (293/293), done.[K
remote: Compressing objects: 100% (174/174), done.[K
remote: Total 293 (delta 171), reused 225 (delta 113), pack-reused 0[K
Receiving objects: 100% (293/293), 49.70 KiB | 8.28 MiB/s, done.
Resolving deltas: 100% (171/171), done.


### Install Python requirements

In [4]:
%cd /content/match-steam-banners/

!git pull

# Switch to the branch tailored for CLIP
!git checkout openai-clip

/content/match-steam-banners
Already up to date.
Branch 'openai-clip' set up to track remote branch 'openai-clip' from 'origin'.
Switched to a new branch 'openai-clip'


In [None]:
!pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

In [None]:
import numpy as np
import torch

print("Torch version:", torch.__version__)

Torch version: 1.7.1+cu101


### Copy utility functions from OpenAI's CLIP repository

In [None]:
!git clone https://github.com/openai/CLIP.git

%mv CLIP/bpe_simple_vocab_16e6.txt.gz .
%mv CLIP/*.py .

Cloning into 'CLIP'...
remote: Enumerating objects: 24, done.[K
remote: Total 24 (delta 0), reused 0 (delta 0), pack-reused 24[K
Unpacking objects: 100% (24/24), done.


## Image data

### Mount Google Drive

In [5]:
!pip install Google-Colab-Transfer

Collecting Google-Colab-Transfer
  Downloading https://files.pythonhosted.org/packages/a0/90/76fc38bcad442018977ed0e4e663473ef56a4d15395b2aa09055e8c49185/Google_Colab_Transfer-0.1.6-py3-none-any.whl
Installing collected packages: Google-Colab-Transfer
Successfully installed Google-Colab-Transfer-0.1.6


In [6]:
import colab_transfer

colab_transfer.mount_google_drive()

Mounted at /content/drive/


### Import image data from Google Drive

In [None]:
colab_transfer.copy_file('resized_vertical_steam_banners_224.tar',
                         source='/content/drive/MyDrive/data/',
                         destination='/content/match-steam-banners/data/')

# Alternatively, run:
# !gdown --id 1--cxY3jvTVWq-lZt8NvfN2fHND7YhKN4
# %mkdir -p data/
# %mv resized_vertical_steam_banners_224.tar data/

Copying /content/drive/MyDrive/data/resized_vertical_steam_banners_224.tar to /content/match-steam-banners/data/resized_vertical_steam_banners_224.tar


In [None]:
%cd /content/match-steam-banners/
!tar -xf data/resized_vertical_steam_banners_224.tar

/content/match-steam-banners


In [None]:
%cd /content/match-steam-banners/
%mv data/resized_vertical_steam_banners_224 data/resized_vertical_steam_banners

/content/match-steam-banners


## 1. Features

First, compute and store the 512 features corresponding to each banner.

### Compute features

It takes about 10 seconds to process 1k images. Total time: ~ 5 minutes.

In [None]:
!python build_feature_index.py

0/29982 in 0.14 s
1000/29982 in 12.44 s
2000/29982 in 21.83 s
3000/29982 in 31.28 s
4000/29982 in 40.71 s
5000/29982 in 50.17 s
6000/29982 in 59.53 s
7000/29982 in 68.92 s
8000/29982 in 78.23 s
9000/29982 in 87.56 s
10000/29982 in 96.97 s
11000/29982 in 106.27 s
12000/29982 in 115.59 s
13000/29982 in 124.90 s
14000/29982 in 134.53 s
15000/29982 in 144.29 s
16000/29982 in 154.49 s
17000/29982 in 165.01 s
18000/29982 in 175.40 s
19000/29982 in 185.73 s
20000/29982 in 195.91 s
21000/29982 in 206.65 s
22000/29982 in 218.06 s
23000/29982 in 228.62 s
24000/29982 in 239.07 s
25000/29982 in 249.39 s
26000/29982 in 259.68 s
27000/29982 in 269.79 s
28000/29982 in 279.86 s
29000/29982 in 289.84 s


In [None]:
!du -sh data/label_database.avg.npy

118M	data/label_database.avg.npy


### Export feature data to Google Drive

In [7]:
import colab_transfer as ct

local_folder_name = 'match-steam-banners/data/'
local_folder_path = ct.get_path_to_home_of_local_machine() + local_folder_name

gdrive_folder_name ='steam-CLIP/'
gdrive_folder_path = ct.get_path_to_home_of_google_drive() + gdrive_folder_name

In [None]:
# colab_transfer.copy_file('frozen_app_ids.txt',
#                          source=local_folder_path,
#                          destination=gdrive_folder_path)

# colab_transfer.copy_file('label_database.avg.npy', 
#                          source=local_folder_path,
#                          destination=gdrive_folder_path)

Copying /content/match-steam-banners/data/frozen_app_ids.txt to /content/drive/My Drive/frozen_app_ids.txt
Copying /content/match-steam-banners/data/label_database.avg.npy to /content/drive/My Drive/label_database.avg.npy


### Import feature data from Google Drive

In [8]:
colab_transfer.copy_file('frozen_app_ids.txt',
                         source=gdrive_folder_path,
                         destination=local_folder_path)

colab_transfer.copy_file('label_database.avg.npy', 
                         source=gdrive_folder_path,
                         destination=local_folder_path)

Copying /content/drive/My Drive/steam-CLIP/frozen_app_ids.txt to /content/match-steam-banners/data/frozen_app_ids.txt
Copying /content/drive/My Drive/steam-CLIP/label_database.avg.npy to /content/match-steam-banners/data/label_database.avg.npy


## 2. Similar games

Find the 10 most similar store banners to curated query appIDs.

### Ping SteamSpy and GameDataCrunch API

SteamSpy API may block requests from Google Colab. In this case, responses are empty:

> JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If you encounter this issue, you can bypass it by running the following from **your own local machine** to ping SteamSpy API:

In [None]:
!python steam_spy_utils.py

Then **manually** upload the 2 cached files from within Colab's interface:
```
data/
├ 20210116_gamedatacrunch.json   # GameDataCrunch database of games
└ 20210116_top_100_app_ids.txt   # top100 most played games in the past 2 weeks
```

The GameDataCrunch database is used to retrieve game names corresponding to appIDs.

Top 100 appIDs are used as a list of popular games to test our algorithm onto.

### Run the workflow

In [None]:
!python retrieve_similar_features.py


Query:

[<img alt="Counter-Strike" src="https://steamcdn-a.akamaihd.net/steam/apps/10/library_600x900.jpg" width="150">](https://store.steampowered.com/app/10)


[<img alt="Counter-Strike" src="https://steamcdn-a.akamaihd.net/steam/apps/10/library_600x900.jpg" width="150">](https://store.steampowered.com/app/10)[<img alt="Half-Life: Opposing Force" src="https://steamcdn-a.akamaihd.net/steam/apps/50/library_600x900.jpg" width="150">](https://store.steampowered.com/app/50)[<img alt="Half-Life: Blue Shift" src="https://steamcdn-a.akamaihd.net/steam/apps/130/library_600x900.jpg" width="150">](https://store.steampowered.com/app/130)[<img alt="Half-Life: Source" src="https://steamcdn-a.akamaihd.net/steam/apps/280/library_600x900.jpg" width="150">](https://store.steampowered.com/app/280)[<img alt="Half-Life Deathmatch: Source" src="https://steamcdn-a.akamaihd.net/steam/apps/360/library_600x900.jpg" width="150">](https://store.steampowered.com/app/360)

[<img alt="Half-Life" src="https://stea

## 3. Unique games

Find the one most similar store banner to all appIDs available on the store, then display the most unique games.

In [None]:
!python find_unique_games.py

Elapsed time: 34.64 s
Similarity threshold: 0.61

Query:

[<img alt="Pray in VR Medieval Christian Churches" src="https://steamcdn-a.akamaihd.net/steam/apps/1409340/library_600x900.jpg" width="150">](https://store.steampowered.com/app/1409340)


[<img alt="Priest vs. Poltergeist" src="https://steamcdn-a.akamaihd.net/steam/apps/1400560/library_600x900.jpg" width="150">](https://store.steampowered.com/app/1400560)


Query:

[<img alt="FAN'CIE VEER! (Fish Are Nasty, Cake Is Excellent Vektor Evading Emblazed Rapture)" src="https://steamcdn-a.akamaihd.net/steam/apps/892640/library_600x900.jpg" width="150">](https://store.steampowered.com/app/892640)


[<img alt="Vector's Adventures" src="https://steamcdn-a.akamaihd.net/steam/apps/773870/library_600x900.jpg" width="150">](https://store.steampowered.com/app/773870)


Query:

[<img alt="Welcome To... Chichester OVN 2 : Master Tormentor Grendel Jinx !?" src="https://steamcdn-a.akamaihd.net/steam/apps/1163480/library_600x900.jpg" width="150">](h

## 4. Export data and matches for a web app

Exact kNN search is performed with the `faiss` package, because it is noticably faster than with other packages.

References:
-   https://github.com/facebookresearch/faiss
-   https://github.com/facebookresearch/faiss/wiki/Getting-started
-   https://github.com/kyamagu/faiss-wheels

In [10]:
%pip install faiss-gpu

Collecting faiss-gpu
[?25l  Downloading https://files.pythonhosted.org/packages/7d/32/8b29e3f99224f24716257e78724a02674761e034e6920b4278cc21d19f77/faiss_gpu-1.6.5-cp36-cp36m-manylinux2014_x86_64.whl (67.6MB)
[K     |████████████████████████████████| 67.7MB 43kB/s 
[?25hInstalling collected packages: faiss-gpu
Successfully installed faiss-gpu-1.6.5


In [11]:
!wget https://raw.githubusercontent.com/woctezuma/steam-store-snapshots/main/data/IStoreService.json

--2021-01-19 19:00:22--  https://raw.githubusercontent.com/woctezuma/steam-store-snapshots/main/data/IStoreService.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9740490 (9.3M) [text/plain]
Saving to: ‘IStoreService.json’


2021-01-19 19:00:23 (52.6 MB/s) - ‘IStoreService.json’ saved [9740490/9740490]



In [12]:
!python export_data_for_web_app.py

#apps = 29982
#apps = 29982
(#apps, #features) = (29982, 512)
Elapsed time: 14.39 s


Files (both .npy and .json) can be found in `data_export/`.

In [14]:
!du -sh data_export/matches_faiss.npy

5.8M	data_export/matches_faiss.npy
