# Scrape Missing Values

Some of the columns are missing values that are available in [pokemondb.net](https://pokemondb.net/pokedex).

We will also scrape all Pokémon's sprites and compute 13 numerical features.

This notebook uses our custom scraper to retrieve missing values from the website.

## Setup

In [1]:
import pandas as pd

from scrape import Variant

In [2]:
IN_PATH = "data/0-raw.csv"
pokedex = pd.read_csv(IN_PATH)
pokedex.head()

Unnamed: 0.1,Unnamed: 0,pokedex_number,name,german_name,japanese_name,generation,status,species,type_number,type_1,...,against_ground,against_flying,against_psychic,against_bug,against_rock,against_ghost,against_dragon,against_dark,against_steel,against_fairy
0,0,1,Bulbasaur,Bisasam,フシギダネ (Fushigidane),1,Normal,Seed Pokémon,2,Grass,...,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
1,1,2,Ivysaur,Bisaknosp,フシギソウ (Fushigisou),1,Normal,Seed Pokémon,2,Grass,...,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
2,2,3,Venusaur,Bisaflor,フシギバナ (Fushigibana),1,Normal,Seed Pokémon,2,Grass,...,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
3,3,3,Mega Venusaur,Bisaflor,フシギバナ (Fushigibana),1,Normal,Seed Pokémon,2,Grass,...,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5
4,4,4,Charmander,Glumanda,ヒトカゲ (Hitokage),1,Normal,Lizard Pokémon,1,Fire,...,2.0,1.0,1.0,0.5,2.0,1.0,1.0,1.0,0.5,0.5


## Scraping

In [3]:
NAME = "name"
COLUMNS = Variant.PROPERTIES

In [4]:
fetched_variants = {
    i: Variant.fetch(variant_name)
    for i, variant_name in zip(pokedex.index, pokedex[NAME])
}

'Keldeo Ordinary Forme' not found, falling back to 'Keldeo Ordinary Form'
'Keldeo Ordinary Forme' not found, falling back to 'Keldeo Ordinary Form'
'Keldeo Resolute Forme' not found, falling back to 'Keldeo Resolute Form'
'Keldeo Resolute Forme' not found, falling back to 'Keldeo Resolute Form'
'Hoopa Hoopa Confined' not found, falling back to 'Hoopa Confined'
'Hoopa Hoopa Confined' not found, falling back to 'Hoopa Confined'
'Hoopa Hoopa Unbound' not found, falling back to 'Hoopa Unbound'
'Hoopa Hoopa Unbound' not found, falling back to 'Hoopa Unbound'


We will assume (perhaps foolishly) that any non-NA values in the dataset are correct, even if we find conflicting values through scraping.

In [6]:
filled = pokedex.copy()

for prop in Variant.PROPERTIES:
    filled[prop] = None

for i, variant in fetched_variants.items():
    assert filled.at[i, NAME] == variant.name
    print(variant.name)
    
    for column, value in variant.as_dict().items():
        if pd.isna(filled.at[i, column]):
            print(f"\t{column} = {value}")
            filled.at[i, column] = value
        elif filled.at[i, column] != value:
            print(f"\tExpected {column} = {value} but found {filled.at[i, column]}")

Bulbasaur
	base_experience = 64
	base_friendship = 50
	catch_rate = 45
	egg_cycles = 20
	growth_rate = Medium Slow
	percentage_male = 87.5
	sprite_size = 251
	sprite_perimeter = 52.0
	sprite_perimeter_to_size_ratio = 0.20717131474103587
	sprite_red_mean = 0.30119521912350594
	sprite_green_mean = 0.46108897742363886
	sprite_blue_mean = 0.28111866260448404
	sprite_brightness_mean = 0.3478009530505429
	sprite_red_sd = 0.2037074456253958
	sprite_green_sd = 0.2965685177678722
	sprite_blue_sd = 0.19631766289563313
	sprite_brightness_sd = 0.22752657029147685
	sprite_overflow_vertical = 0.0
	sprite_overflow_horizontal = 0.0
Ivysaur
	base_experience = 142
	base_friendship = 50
	catch_rate = 45
	egg_cycles = 20
	growth_rate = Medium Slow
	percentage_male = 87.5
	sprite_size = 283
	sprite_perimeter = 56.0
	sprite_perimeter_to_size_ratio = 0.1978798586572438
	sprite_red_mean = 0.31538834615118133
	sprite_green_mean = 0.39955657174530584
	sprite_blue_mean = 0.3410933277904802
	sprite_brightness_mea

## Save Results

In [None]:
OUT_PATH = "data/1-scrape_missing_values.csv"
filled.to_csv(OUT_PATH, index=False)