Let's Play Pokemon!
For all my life, I have been fascinated with the global sensation that is the Pokemon franchise. Particularly, for the last 10 years, I have practiced and analyzed competitive battling.
The goal of this ongoing project is to gain some insights through EDA to help others make better battling decisions. Additionally, I build a classifier that predicts whether a given pokemon is a Legendary Pokemon.
i.e. using the KPIs the game developers decided on, can we best classify between
| Legendary | Non - Legendary |
|---|---|
![]() |
![]() |
Full Generations 1 - 7 dataset can be found on Kaggle. But this dataset is slightly off. It was webscraped off Serebii, and I noticed any pokemon with "Mega Evolutions" (i.e. more powerful forms) had their data imputed on the original pokemon's data.
To clean this and ensure that pokemon with Mega Evolutions were properly accounted for, I needed the original pokemons' data. I decided to use Gilles Armand's dataset which had the following limitations:
- Only contains Gen 1 - 6 data, but no pokemon from Gen 7 has a Mega form.
- Important numeric features missing:
- height_m
- weight_kg
Run the following command to initialize a MySQL database from the above source data:
cat database_creation/*.sql | mysql --local-infile=1 -uroot -p
After entering your root password, you should have the following db structure:
pokemon
|
|- pokestats
| |
| |- ad_pokestats (trigger)
| |- ai_pokestats (trigger)
| |- au_pokestats (trigger)
|
|- serebii
|
|- ad_serebii (trigger)
|- ai_serebii (trigger)
|- au_serebii (trigger)
Use this Bulbapedia page to work through each case and add relevant pokemon

