# Scrape Missing Values

Some of the columns are missing values that are available in [pokemondb.net](https://pokemondb.net/pokedex).

This notebook uses our custom scraper to retrieve missing values from the website.

## Setup

In [1]:
from functools import reduce

import pandas as pd

from scrape import Variant

In [2]:
IN_PATH = "pokedex_(Update_04.21).csv"
OUT_PATH = "filled.csv"

pokedex = pd.read_csv(IN_PATH)

## Scraping

In [3]:
NAME = "name"
COLUMNS = Variant.PROPERTIES

missing = {
    column: pd.isna(pokedex[column])
    for column in COLUMNS
}
missing_any = reduce(lambda c1, c2: c1 | c2, missing.values())

variants_to_fetch = pokedex.loc[missing_any, [NAME] + COLUMNS]
variants_to_fetch

Unnamed: 0,name,base_experience,base_friendship,catch_rate,egg_cycles,growth_rate,percentage_male
69,Galarian Meowth,,,,20.0,Medium Fast,
100,Galarian Ponyta,,,,20.0,Medium Fast,
102,Galarian Rapidash,,,,20.0,Medium Fast,
107,Galarian Slowbro,,70.0,75.0,20.0,Medium Fast,50.0
108,Magnemite,65.0,70.0,190.0,20.0,Medium Fast,
...,...,...,...,...,...,...,...
1040,Glastrier,,,3.0,120.0,Slow,
1041,Spectrier,,,3.0,120.0,Slow,
1042,Calyrex,,,3.0,120.0,Slow,
1043,Calyrex Ice Rider,,,3.0,120.0,Slow,


In [7]:
fetched_variants = {
    i: Variant.fetch(variant_name)
    for i, variant_name in zip(variants_to_fetch.index, variants_to_fetch[NAME])
}

'Keldeo Ordinary Forme' not found, falling back to 'Keldeo Ordinary Form'
'Keldeo Resolute Forme' not found, falling back to 'Keldeo Resolute Form'
'Hoopa Hoopa Confined' not found, falling back to 'Hoopa Confined'
'Hoopa Hoopa Unbound' not found, falling back to 'Hoopa Unbound'


We will assume (perhaps foolishly) that any non-NA values in the dataset are correct, even if we find conflicting values through scraping.

In [5]:
filled = pokedex.copy()

for i, variant in fetched_variants.items():
    assert filled.at[i, NAME] == variant.name
    print(variant.name)
    
    for column, value in variant.as_dict().items():
        if pd.isna(filled.at[i, column]):
            print(f"\t{column} = {value}")
            filled.at[i, column] = value
        elif filled.at[i, column] != value:
            print(f"\tExpected {column} = {value} but found {filled.at[i, column]}")

Galarian Meowth
	base_experience = 58
	base_friendship = 50
	catch_rate = 255
	percentage_male = 50.0
Galarian Ponyta
	base_experience = 82
	base_friendship = 50
	catch_rate = 190
	percentage_male = 50.0
Galarian Rapidash
	base_experience = 175
	base_friendship = 50
	catch_rate = 60
	percentage_male = 50.0
Galarian Slowbro
	base_experience = 172
	Expected base_friendship = 50 but found 70.0
Magnemite
	Expected base_friendship = 50 but found 70.0
	percentage_male = None
Magneton
	Expected base_friendship = 50 but found 70.0
	percentage_male = None
Galarian Farfetch'd
	base_experience = 132
	base_friendship = 50
	catch_rate = 45
	percentage_male = 50.0
Voltorb
	Expected base_friendship = 50 but found 70.0
	percentage_male = None
Electrode
	Expected base_friendship = 50 but found 70.0
	percentage_male = None
Galarian Weezing
	base_experience = 172
	base_friendship = 50
	catch_rate = 60
	percentage_male = 50.0
Staryu
	Expected base_friendship = 50 but found 70.0
	percentage_male = None
Sta

## Save Results

In [6]:
filled.to_csv(OUT_PATH, index=False)